{"id":16148,"date":"2022-05-30T11:52:55","date_gmt":"2022-05-30T11:52:55","guid":{"rendered":"https:\/\/blog.datumo.com\/en\/?p=16148"},"modified":"2024-10-22T08:21:28","modified_gmt":"2024-10-22T08:21:28","slug":"breaking-the-data-boundaries-with-general-self-supervised-learning-approach","status":"publish","type":"post","link":"https:\/\/blog.datumo.com\/en\/tech\/16148","title":{"rendered":"Breaking the Data Boundaries With General Self-Supervised Learning Approach"},"content":{"rendered":"[vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div class=\"pix-content-box card      vc_custom_1650362800073    rounded-lg bg- w-100  \"   ><div class=\"\" style=\"z-index:30;position:relative;\">[vc_column_text css=&#8221;.vc_custom_1653911604527{padding-top: 40px !important;padding-right: 40px !important;padding-bottom: 0px !important;padding-left: 40px !important;}&#8221;]Understand how Data2vec achieves general self-supervision on speech, vision, and text data[\/vc_column_text][vc_raw_html]JTNDbWV0YSUyMGh0dHAtZXF1aXYlM0QlMjJyZWZyZXNoJTIyJTIwY29udGVudCUzRCUyMjAlM0IlMjB1cmwlM0RodHRwcyUzQSUyRiUyRmRhdHVtby5jb20lMkZlbiUyRmJyZWFraW5nLXRoZS1kYXRhLWJvdW5kYXJpZXMtd2l0aC1nZW5lcmFsLXNlbGYtc3VwZXJ2aXNlZC1sZWFybmluZy1hcHByb2FjaCUyRiUyMiUzRQ==[\/vc_raw_html]<\/div><\/div><div id=\"el1650294698986-a1b962b5-ef42\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1653911747614{padding-top: 40px !important;padding-bottom: 40px !important;}&#8221;]<img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-full wp-image-16154\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image.jpeg\" alt=\"\" width=\"1600\" height=\"900\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image.jpeg 1600w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image-300x169.jpeg 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image-1024x576.jpeg 1024w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image-768x432.jpeg 768w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/image-1536x864.jpeg 1536w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/>\r\n\r\n&nbsp;\r\n\r\nModen data is complex, diverse, and unsupervised. Data can have different modalities, such as text, image, and audio.\r\n\r\nIn the last two decades, Artificial intelligence (AI) has demonstrated powerful predictive capabilities to handle any kind of data. However, each type of data requires different training and processing techniques.\r\n\r\nExisting AI systems fail to provide a generic model capable of handling such diversified input simultaneously. The typical approach is to develop separate algorithms for every input source.\r\n\r\nTo fill the gap, researchers at Meta devised a general self-supervised learning solution called <a href=\"https:\/\/ai.facebook.com\/blog\/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text\/\">data2vec<\/a> that works on speech, vision, and text data at once.\r\n\r\nIn this post, we\u2019ll explore self-supervision and discuss the data2vec architecture. We\u2019ll also compare the performance of data2vec with the existing state-of-the-art speech, language, and image models to understand how self-supervision can potentially develop truly intelligent AI systems in the future.[\/vc_column_text][\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221; el_id=&#8221;pix_section_program&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div id=\"el1650442503491-f5da6b2f-fa35\" class=\"mb-3 text-left \"><h2 class=\"mb-32 pix-sliding-headline font-weight-bold secondary-font\" data-class=\"secondary-font text-heading-default\" data-style=\"\">What Is Self-Supervised Learning?<\/h2><\/div>[vc_column_text css=&#8221;.vc_custom_1653911738302{padding-top: 40px !important;padding-bottom: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">Before YOLO, models like Region-based Convolutional Neural Networks (R-CNN) and Deformity Parts Models (DPP) dominated the object detection space. DPP takes up to 14 seconds to detect all objects in an image, processing around 0.07 frames per second (FPS). R-CNN takes around six seconds more than DPP processing around 0.05 FPS. Here&#8217;s how YOLO changes the game: it can process as many as 65 FPS on a V100 GPU, detecting all objects in an image with considerable accuracy in just 22 milliseconds.<\/p>\r\n<p style=\"text-align: left;\">Now, imagine a self-driving car using DPP or R-CNN. If the vehicle was traveling on the freeway at 60mph, and the algorithm took 20 seconds to detect objects on the road like people and other cars, it would have traveled 0.3 miles or over 500 meters. This would make collisions practically impossible to avoid. The typical distance from one car to another on the freeway is 3 meters (10 ft). Real-time object detection that can be used in self-driving cars must detect objects before the vehicle covers the 3 meters traveling at a relatively high speed. In the 60mph example, YOLO would identify the objects before the car moves a meter, making it a truly real-time object detection algorithm.<\/p>\r\n[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650442607008-a85a832d-43f0\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Why Do We Need a Generic Data Handling Strategy?<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653911792040{padding-top: 40px !important;}&#8221;]In AI, each data type is processed differently. Self-supervised language models are trained by hiding or masking a portion of input data and predicting this hidden information in the sentence. During training, the models build and train a vocabulary of discrete words, which aids in predicting the hidden words more accurately.\r\n\r\nThe training looks more complex for computer vision (CV) and speech models. Vision models predict the intensities of missing pixels in an image or video, while speech models learn sound waveforms to predict the missing audio or video sounds. However, no pre-existing vocabulary of speech units or visual tokens exists as they are continuous in nature.\r\n\r\nBecause each data source has varying informational units, i.e., characters or words for text, pixels for images, and sound waveforms for speech, a unified AI model cannot manage the diverse nature of training data.[\/vc_column_text]<div id=\"el1650294913061-211813f5-5f2d\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Self-Supervised Data2vec Architecture Explained<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653911905319{padding-top: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">Data2vec provides a unified training mechanism for text, speech, and vision data. Data2vec simplifies the learning process by training a transformer network, masking the input data, and allowing models to predict their representations of the input data.<\/p>\r\n<p style=\"text-align: left;\">Data2vec is trained using two networks: <strong>teacher<\/strong> and <strong>student<\/strong>. First, the teacher network computes numerical representations from input text passages, images, or speech audio. Next, the input is masked and delivered to the student network, where the numerical representations of the hidden input are predicted by updating weights. The two models have similar structures except that the teacher network has slightly outdated weights to assist self-supervised learning in the student network.<\/p>\r\n<p style=\"text-align: left;\">Let\u2019s learn more about this training process in detail.<\/p>\r\n[\/vc_column_text]<div id=\"el1653911883809-a9aac731-d5bb\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h3 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">1. Transformer Architecture and Encoding Schemes<\/h3><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653911931245{padding-top: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">The <a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">transformer<\/a>, developed initially for language problems, is now widely adopted for many self-supervised learning tasks across different data domains. The data2vec algorithm uses standard transformer architecture and encodes each input data according to its data type.<\/p>\r\n<p style=\"text-align: left;\">Images are encoded using the <a href=\"https:\/\/viso.ai\/deep-learning\/vision-transformer-vit\/\">ViT<\/a>-strategy as a sequence of pixel patches. Each patch spanning 16&#215;16 pixels is linearly transformed and fed to the standard transformer.<\/p>\r\n<p style=\"text-align: left;\">Audio-based data is encoded using a multi-layer 1-D convolutional neural network that maps 16 kHz waveform to 50 Hz representations, similar to the encoding technique used in the <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf\">wave2vec 2.0<\/a> self-supervised speech recognition model.<\/p>\r\n<p style=\"text-align: left;\">For the text data, word units are obtained by pre-processing, and the input is tokenized with <a href=\"http:\/\/publications.rwth-aachen.de\/record\/668744\/files\/Sennrich_P16-1162.pdf?subformat=pdfa\">byte-pair encoding<\/a>.<\/p>\r\n[\/vc_column_text]<div id=\"el1653911939354-ed4cd84c-69a6\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h3 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">2. Different Masking Strategies<\/h3><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912060348{padding-top: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">Data2vec masks or hides some parts of the encoded input and feeds it to the transformer network.<\/p>\r\n<p style=\"text-align: left;\">The images are masked via block-wise masking strategy applied in <a href=\"https:\/\/arxiv.org\/pdf\/2106.08254.pdf\">BEiT<\/a>, which hides multiple adjacent image patches. For audio data, data2vec uses the masking technique of the self-supervised <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf\">wave2vec 2.0<\/a> speech model, while for language data, <a href=\"https:\/\/arxiv.org\/abs\/1810.04805\">BERT<\/a> token masking is adapted.<\/p>\r\n<p style=\"text-align: left;\">With a unified data handling strategy, the data2vec model can learn the underlying structure from any kind of unlabeled input data and predict information.<\/p>\r\n&nbsp;\r\n<p style=\"text-align: left;\"><span class=\"notion-enable-hover\" data-token-index=\"0\" data-reactroot=\"\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-16155\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/Untitled-13-e1653912026640.png\" alt=\"\" width=\"1318\" height=\"420\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/Untitled-13-e1653912026640.png 1318w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/Untitled-13-e1653912026640-300x96.png 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/Untitled-13-e1653912026640-1024x326.png 1024w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/Untitled-13-e1653912026640-768x245.png 768w\" sizes=\"(max-width: 1318px) 100vw, 1318px\" \/><\/span><\/p>\r\n&nbsp;\r\n<p style=\"text-align: left;\"><span class=\"notion-enable-hover\" data-token-index=\"0\" data-reactroot=\"\">A general self-supervised <\/span><a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/scontent.flhe7-2.fna.fbcdn.net\/v\/t39.8562-6\/271974914_483120576492438_4239522333319653600_n.pdf?_nc_cat=107&amp;ccb=1-5&amp;_nc_sid=ae5e01&amp;_nc_eui2=AeECpr-IGNDSS4C0WRHU5tDiik8Hz6Ia4hiKTwfPohriGEE5FlCC9tEWFZjCDhazz8z04nliGFbp5P9Q6gaYDQma&amp;_nc_ohc=efNwtKYvMFkAX-S_UF0&amp;_nc_ht=scontent.flhe7-2.fna&amp;oh=00_AT_ldDsPlsaaeq9p4tLP9jahJpgEEI_7EOFcRG5g-VJ4Uw&amp;oe=622D5111\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-1484080171\">Data2vec architecture<\/span><\/a><span class=\"notion-enable-hover\" data-token-index=\"2\" data-reactroot=\"\"> to learn different input sources.<\/span><\/p>\r\n[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650362147064-486b7dc2-a9b3\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Comparison of Data2vec Performance With Benchmarked Techniques<\/h2><\/div><\/div><\/div><div id=\"el1653912179018-39c123e0-00c8\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h3 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Performance on Image Data<\/h3><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912159580{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]To evaluate data2vec for visual data, researchers at Meta pre-trained the model on the benchmark image data of <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/www.image-net.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-2091525620\">ImageNet-1K<\/span><\/a>. Data2vec was fine-tuned with the labeled data from the same benchmark for image classification task. Results show data2vec outperforms previous state-of-the-art image models like MoCov3, DINO, BeiT, etc.\r\n\r\n&nbsp;\r\n\r\n<img decoding=\"async\" class=\"aligncenter size-full wp-image-16153\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.jpeg\" alt=\"\" width=\"1600\" height=\"900\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.jpeg 1600w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2-300x169.jpeg 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2-1024x576.jpeg 1024w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2-768x432.jpeg 768w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2-1536x864.jpeg 1536w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/>\r\n<p style=\"text-align: left;\"><span class=\"notion-enable-hover\" data-token-index=\"0\" data-reactroot=\"\">Data2vec vs. previous CV models. Image by <\/span><a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/ai.facebook.com\/blog\/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-305469156\">Meta AI<\/span><\/a><\/p>\r\n[\/vc_column_text]<div id=\"el1650450433074-0be5e40e-928e\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h3 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Performance on Speech Audio<\/h3><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912227234{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]To assess speech processing capabilities, data2vec was pre-trained and fine-tuned on <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/www.openslr.org\/12\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id--628604015\">Librispeech<\/span><\/a> audio data, which is composed of clean speech between 10 hours to 960 hours of duration (a standard benchmark in speech community). Data2vec was compared with previous state-of-the-art self-supervised speech recognition models like <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/ai.facebook.com\/blog\/wav2vec-20-learning-the-structure-of-speech-from-raw-audio\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"3\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-149955913\">wav2vec 2.0<\/span><\/a> and <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/ai.facebook.com\/blog\/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"5\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id--1573166501\">HuBERT<\/span><\/a>, and the results show improved performance of data2vec.\r\n\r\n&nbsp;\r\n\r\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16152\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1.jpeg\" alt=\"\" width=\"1600\" height=\"900\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1.jpeg 1600w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1-300x169.jpeg 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1-1024x576.jpeg 1024w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1-768x432.jpeg 768w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/1-1536x864.jpeg 1536w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/>\r\n<p style=\"text-align: left;\"><span class=\"notion-enable-hover\" data-token-index=\"0\" data-reactroot=\"\">Low word error rate recorded for data2vec against Librispeech benchmark models with 10h labeled data. Image by <\/span><a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/ai.facebook.com\/blog\/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-305469156\">Meta AI<\/span><\/a><\/p>\r\n[\/vc_column_text]<div id=\"el1653912237388-078f355b-35ec\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h3 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Performance on Text Data<\/h3><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912308576{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]To compare the text-based performance of data2vec, a similar processing setup like <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/arxiv.org\/abs\/1810.04805\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id--754910526\">BERT<\/span><\/a> was replicated by pre-training on the <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/www.cv-foundation.org\/openaccess\/content_iccv_2015\/papers\/Zhu_Aligning_Books_and_ICCV_2015_paper.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"3\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-836210646\">Books Corpus<\/span><\/a> and evaluating data2vec on <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/arxiv.org\/abs\/1804.07461\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"5\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-1052379574\">General Language Understanding Evaluation<\/span><\/a> (GLUE) benchmark. Comparison with <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/ai.facebook.com\/blog\/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"7\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id--1373996636\">RoBERTa<\/span><\/a> baseline language model proves that data2vec slightly outperforms on text data as well.\r\n\r\n&nbsp;\r\n\r\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16151\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0.jpeg\" alt=\"\" width=\"1600\" height=\"900\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0.jpeg 1600w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0-300x169.jpeg 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0-1024x576.jpeg 1024w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0-768x432.jpeg 768w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/0-1536x864.jpeg 1536w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/>\r\n\r\n&nbsp;\r\n\r\n<em>Data2vec scores are higher than the RoBERTa language model. Image by <a href=\"https:\/\/ai.facebook.com\/blog\/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text\/\">Meta AI<\/a><\/em>\r\n\r\nData2vec has the potential to train multimodal data (text, audio, image) by using the self-supervision mechanism, enabling AI researchers to develop all-in-one models.[\/vc_column_text]<div id=\"el1650362652282-42ee7789-aa09\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Limitations of Self-Supervised Data2vec<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912358482{padding-top: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">Data2vec is a significant step towards building more generalized AI models. But it has a few limitations.<\/p>\r\n<p style=\"text-align: left;\">Data2vec requires data-specific input encoding schemes. It also requires different masking schemes for each of the audio, image, and text data.<\/p>\r\n<p style=\"text-align: left;\">To build truly intelligent AI systems that learn by observing the real-world, future models should be able to process any kind of data using a unified encoding and masking approach.<\/p>\r\n[\/vc_column_text]<div id=\"el1653912321972-76de81fb-6f91\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Boost the Performance of Your AI Applications With High-Quality Data<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1653912404407{padding-top: 40px !important;}&#8221;]\r\n<p style=\"text-align: left;\">Data is growing exponentially, so we need efficient AI solutions to manage it. With a general self-supervised learning approach, data2vec handles unlabeled and diverse image, text, and audio data effectively. However, self-supervised techniques require more research before applying them to real-world applications. Until then, AI systems must feed on high-quality labeled datasets.<\/p>\r\n<p style=\"text-align: left;\">DATUMO is a leading crowdsourcing platform that enables quick and accurate data collection and annotation for audio, video, image, and text data. Our highly-trained <a href=\"https:\/\/selectstar-ai.medium.com\/what-is-crowdsourcing-where-do-we-need-it-401a38561bc4\">crowdsource<\/a> workers can diligently tag, edit, classify, segment, and transcribe data as per your needs. <a href=\"https:\/\/selectstar.ai\/contact\">Contact us<\/a> today and start curating high-quality datasets to fuel your AI applications.<\/p>\r\n[\/vc_column_text]<div id=\"el1653912321973-6ab51e7a-d357\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794934167-c0c94dd3-ea74\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-white  vc_custom_1652982865548  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-black font-weight-bold mb-0 animate-in\" style=\"\">See what we can do for you.<\/h3><p class=\"card-text pix-pt-10 text-black \" style=\"\">Build smarter AI with us.<\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/datumo.com\" class=\"btn mb-2     text-white btn-black d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Learn More<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794982519-9a19190b-7fde\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-black  vc_custom_1653912437211  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-white font-weight-bold mb-0 animate-in\" style=\"\">We would like to support the AI industry by sharing.<\/h3><p class=\"card-text pix-pt-10 text-white \" style=\"\"><\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/open.datumo.com\/en\" class=\"btn mb-2    vc_custom_1653912437215  btn-primary d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Download Open Datasets<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row]","protected":false},"excerpt":{"rendered":"[vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1653911747614{padding-top: 40px !important;padding-bottom: 40px !important;}&#8221;] &nbsp; Moden data is complex, diverse, and unsupervised. Data can have different modalities, such as text, image, and audio. In the last two decades, Artificial intelligence (AI) has demonstrated powerful predictive capabilities to&#8230;","protected":false},"author":1,"featured_media":2224,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[131],"tags":[26,149,157,127,158,138,159,132],"class_list":["post-16148","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-ai","tag-data","tag-data2vec","tag-datumo","tag-self-supervision","tag-speech","tag-text","tag-vision"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Breaking the Data Boundaries With General Self-Supervised Learning Approach - DATUMO<\/title>\n<meta name=\"description\" content=\"Understand how Data2vec achieves general self-supervision on speech, vision, and text data\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.datumo.com\/en\/tech\/16148\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Breaking the Data Boundaries With General Self-Supervised Learning Approach\" \/>\n<meta property=\"og:description\" content=\"Understand how Data2vec achieves general self-supervision on speech, vision, and text data\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.datumo.com\/en\/tech\/16148\" \/>\n<meta property=\"og:site_name\" content=\"DATUMO\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-30T11:52:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-22T08:21:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"910\" \/>\n\t<meta property=\"og:image:height\" content=\"830\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DATUMO\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Breaking the Data Boundaries With General Self-Supervised Learning Approach\" \/>\n<meta name=\"twitter:description\" content=\"Understand how Data2vec achieves general self-supervision on speech, vision, and text data\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"DATUMO\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"10\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148\"},\"author\":{\"name\":\"DATUMO\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\"},\"headline\":\"Breaking the Data Boundaries With General Self-Supervised Learning Approach\",\"datePublished\":\"2022-05-30T11:52:55+00:00\",\"dateModified\":\"2024-10-22T08:21:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148\"},\"wordCount\":2268,\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg\",\"keywords\":[\"AI\",\"data\",\"Data2vec\",\"datumo\",\"self-supervision\",\"Speech\",\"text\",\"vision\"],\"articleSection\":[\"tech\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148\",\"url\":\"https:\/\/blog.datumo.com\/en\/tech\/16148\",\"name\":\"Breaking the Data Boundaries With General Self-Supervised Learning Approach - DATUMO\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg\",\"datePublished\":\"2022-05-30T11:52:55+00:00\",\"dateModified\":\"2024-10-22T08:21:28+00:00\",\"description\":\"Understand how Data2vec achieves general self-supervision on speech, vision, and text data\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.datumo.com\/en\/tech\/16148\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg\",\"width\":910,\"height\":830},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16148#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.datumo.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Breaking the Data Boundaries With General Self-Supervised Learning Approach\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.datumo.com\/#website\",\"url\":\"https:\/\/blog.datumo.com\/\",\"name\":\"DATUMO\",\"description\":\"The Data for Smarter AI\",\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.datumo.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.datumo.com\/#organization\",\"name\":\"DATUMO\",\"url\":\"https:\/\/blog.datumo.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"width\":1080,\"height\":600,\"caption\":\"DATUMO\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\",\"name\":\"DATUMO\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"caption\":\"DATUMO\"},\"description\":\"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.\",\"sameAs\":[\"https:\/\/blog.datumo.com\/en\"],\"url\":\"https:\/\/blog.datumo.com\/en\/author\/selectstar\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Breaking the Data Boundaries With General Self-Supervised Learning Approach - DATUMO","description":"Understand how Data2vec achieves general self-supervision on speech, vision, and text data","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.datumo.com\/en\/tech\/16148","og_locale":"ko_KR","og_type":"article","og_title":"Breaking the Data Boundaries With General Self-Supervised Learning Approach","og_description":"Understand how Data2vec achieves general self-supervision on speech, vision, and text data","og_url":"https:\/\/blog.datumo.com\/en\/tech\/16148","og_site_name":"DATUMO","article_published_time":"2022-05-30T11:52:55+00:00","article_modified_time":"2024-10-22T08:21:28+00:00","og_image":[{"width":910,"height":830,"url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg","type":"image\/jpeg"}],"author":"DATUMO","twitter_card":"summary_large_image","twitter_title":"Breaking the Data Boundaries With General Self-Supervised Learning Approach","twitter_description":"Understand how Data2vec achieves general self-supervision on speech, vision, and text data","twitter_misc":{"\uae00\uc4f4\uc774":"DATUMO","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"10\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#article","isPartOf":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148"},"author":{"name":"DATUMO","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6"},"headline":"Breaking the Data Boundaries With General Self-Supervised Learning Approach","datePublished":"2022-05-30T11:52:55+00:00","dateModified":"2024-10-22T08:21:28+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148"},"wordCount":2268,"publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg","keywords":["AI","data","Data2vec","datumo","self-supervision","Speech","text","vision"],"articleSection":["tech"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/blog.datumo.com\/en\/tech\/16148","url":"https:\/\/blog.datumo.com\/en\/tech\/16148","name":"Breaking the Data Boundaries With General Self-Supervised Learning Approach - DATUMO","isPartOf":{"@id":"https:\/\/blog.datumo.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg","datePublished":"2022-05-30T11:52:55+00:00","dateModified":"2024-10-22T08:21:28+00:00","description":"Understand how Data2vec achieves general self-supervision on speech, vision, and text data","breadcrumb":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.datumo.com\/en\/tech\/16148"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#primaryimage","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2019\/12\/portfolio-6-1.jpg","width":910,"height":830},{"@type":"BreadcrumbList","@id":"https:\/\/blog.datumo.com\/en\/tech\/16148#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.datumo.com\/en\/"},{"@type":"ListItem","position":2,"name":"Breaking the Data Boundaries With General Self-Supervised Learning Approach"}]},{"@type":"WebSite","@id":"https:\/\/blog.datumo.com\/#website","url":"https:\/\/blog.datumo.com\/","name":"DATUMO","description":"The Data for Smarter AI","publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.datumo.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/blog.datumo.com\/#organization","name":"DATUMO","url":"https:\/\/blog.datumo.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","width":1080,"height":600,"caption":"DATUMO"},"image":{"@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6","name":"DATUMO","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","caption":"DATUMO"},"description":"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.","sameAs":["https:\/\/blog.datumo.com\/en"],"url":"https:\/\/blog.datumo.com\/en\/author\/selectstar"}]}},"_links":{"self":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/comments?post=16148"}],"version-history":[{"count":6,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16148\/revisions"}],"predecessor-version":[{"id":16909,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16148\/revisions\/16909"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media\/2224"}],"wp:attachment":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media?parent=16148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/categories?post=16148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/tags?post=16148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}