{"id":16430,"date":"2022-06-23T01:46:18","date_gmt":"2022-06-23T01:46:18","guid":{"rendered":"https:\/\/blog.datumo.com\/en\/?p=16430"},"modified":"2024-10-22T09:01:29","modified_gmt":"2024-10-22T09:01:29","slug":"fairness-ethics-why-is-diverse-data-important-for-your-a-i-models","status":"publish","type":"post","link":"https:\/\/blog.datumo.com\/en\/tech\/16430","title":{"rendered":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?"},"content":{"rendered":"<p>[vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div><div class=\"pix-content-box card      vc_custom_1654577545529 custom-responsive-4207517   rounded-lg bg- w-100  \"   ><div class=\"\" style=\"z-index:30;position:relative;\">[vc_column_text]<\/p>\r\n<p style=\"text-align: left;\"><span style=\"font-size: 14pt;\"><strong>\ud83d\udd11<\/strong> <strong>In 6 minutes you will learn:<\/strong><\/span><\/p>\r\n<p>&nbsp;<\/p>\r\n<ul>\r\n<li>The importance of data diversity<\/li>\r\n<li>Ways to achieve data diversity<\/li>\r\n<li>Ethics in achieving diverse data<\/li>\r\n<li>How Datumo achieves diversity in data<\/li>\r\n<\/ul>\r\n<p>[\/vc_column_text]<\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650294698986-a1b962b5-ef42\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655948827091{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;]We are all well aware that to effectively use an ML or AI model to solve a specific problem, it is crucial to have high-quality training data for the model itself. No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output. One important quality-related attribute of any dataset, regardless of the problem, is diversity. In this tutorial, we will talk about why diverse data is important for your models and what are the different ways in which you can introduce diversity and variability in your datasets. So, without wasting any time, let\u2019s begin![\/vc_column_text][\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221; el_id=&#8221;pix_section_program&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;][vc_raw_html]JTNDbWV0YSUyMGh0dHAtZXF1aXYlM0QlMjJyZWZyZXNoJTIyJTIwY29udGVudCUzRCUyMjAlM0IlMjB1cmwlM0RodHRwcyUzQSUyRiUyRmRhdHVtby5jb20lMkZlbiUyRmZhaXJuZXNzLWV0aGljcy13aHktaXMtZGl2ZXJzZS1kYXRhLWltcG9ydGFudC1mb3IteW91ci1hLWktbW9kZWxzJTJGJTIyJTNF[\/vc_raw_html]<div id=\"el1650442503491-f5da6b2f-fa35\" class=\"mb-3 text-left \"><h2 class=\"mb-32 pix-sliding-headline font-weight-bold secondary-font\" data-class=\"secondary-font text-heading-default\" data-style=\"\">Why is Diversity in a Dataset Important?<\/h2><\/div>[vc_column_text css=&#8221;.vc_custom_1655948933622{padding-top: 40px !important;}&#8221;]<\/p>\r\n<p style=\"text-align: left;\">We talked about having diverse data in one\u2019s dataset, but what exactly is diversity and why do we even need it? Diversity is basically the variety that you have in your data. At times, while trying to solve a problem through ML or AI models, the data that we collect can be too huge in quantity and that can have a severe impact on the performance of the model. What to do then? Well, you can definitely opt to cut down the data so that your model can process it faster, but what about all that valuable information that you will lose if you remove important data from your dataset? Surely that will reduce the accuracy of your model. The main question that now arises is, how can we find a middle ground where your dataset is of a suitable size for the model to process it in a reasonable amount of time and that the information that the dataset contains is variable enough to tackle the full range of cases that the intended system will have to confront? The answer is simple,\u00a0<strong class=\"bn ml\">DIVERSITY<\/strong>.<\/p>\r\n<p>&nbsp;<\/p>\r\n<figure id=\"attachment_16432\" aria-describedby=\"caption-attachment-16432\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><img fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-16432 size-full\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_q1pEm2a23h9YBwn9.jpeg\" alt=\"\" width=\"400\" height=\"225\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_q1pEm2a23h9YBwn9.jpeg 400w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_q1pEm2a23h9YBwn9-300x169.jpeg 300w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><figcaption id=\"caption-attachment-16432\" class=\"wp-caption-text\">Diverse Ethnicity is Crucial for Face Recognition<\/figcaption><\/figure>\r\n<p>&nbsp;<\/p>\r\n<p>&nbsp;<\/p>\r\n<p style=\"text-align: left;\">At times when the datasets are too large, the only way to do anything useful with them is to extract much smaller subsets from them and analyze those subsets instead. The subsets, however, need to be diverse enough so that the model can learn to handle and deal with all the different causes of the problem that it is trying to solve. Using diverse subsets is much more practical as compared to using a dataset with say a million data points, as that can be impossible to use on a desktop computer. Take the example of the face recognition and classification model. If this model is trained on a dataset of images showing different faces of people, and for each person, there are images taken from different angles, under changed lighting conditions, from varying distances from the camera lens, and in contrasting backgrounds, etc. Then the model is most likely to classify the faces more accurately as compared to being trained on a dataset that has thousands of similar types of images. In short, representative and diverse datasets are more likely to provide useful insights as compared to those who do not cover all facets of the problem at hand.<\/p>\r\n<p>[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650442607008-a85a832d-43f0\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">How to Introduce Diversity in a Dataset?<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1655949443196{padding-top: 40px !important;}&#8221;]<\/p>\r\n<p id=\"0e32\" class=\"pw-post-body-paragraph zs zt yn bn b zu zv hk zw zx zy ho zz aba abb abc abd abe abf abg abh abi abj abk abl abm jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Diversity in a dataset can be achieved in quite a number of ways. If you are collecting the data purely from somewhere, you can include diverse data items in it by gathering relevant data from various different resources as compared to just a single resource. Also, keeping in mind the context of the problem that the model needs to solve, helps in the process of elimination of the different resources from which you can gather data and only leaves a handful of genuine data sources. There are many sources for open datasets that you can utilize.<\/p>\r\n<p data-selectable-paragraph=\"\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-16434\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_0zV2KLnsfAte91Ou.jpeg\" alt=\"\" width=\"700\" height=\"350\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_0zV2KLnsfAte91Ou.jpeg 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_0zV2KLnsfAte91Ou-300x150.jpeg 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\r\n<p id=\"932b\" class=\"pw-post-body-paragraph xt xu wo bn b xv yz hk xx xy za ho ya yb zb yd ye yf zc yh yi yj zd yl ym yn jn iz\" data-selectable-paragraph=\"\">If you are gathering the data items on your own, e.g. you are taking pictures for an image classification problem, you can make sure that your dataset is as diverse and variable as possible by:<\/p>\r\n<ul class=\"\">\r\n<li id=\"c828\" class=\"zo zp wo bn b xv yz xy za yb zq yf zr yj zs yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">Taking pictures at different angles.<\/li>\r\n<li id=\"be98\" class=\"zo zp wo bn b xv zx xy zy yb zz yf aba yj abb yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">Taking pictures under different lighting conditions.<\/li>\r\n<li id=\"e7ca\" class=\"zo zp wo bn b xv zx xy zy yb zz yf aba yj abb yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">Taking pictures at varying the distance of the camera lens from the object in question.<\/li>\r\n<li id=\"4fd1\" class=\"zo zp wo bn b xv zx xy zy yb zz yf aba yj abb yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">Varying the object size and shape if possible and then taking pictures.<\/li>\r\n<li id=\"1cb6\" class=\"zo zp wo bn b xv zx xy zy yb zz yf aba yj abb yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">Changing the background of the object in question and then taking pictures.<\/li>\r\n<li id=\"0089\" class=\"zo zp wo bn b xv zx xy zy yb zz yf aba yj abb yn aiu zu zv zw iz\" data-selectable-paragraph=\"\">In the case of a colored object, taking pictures consisting of different colors.<\/li>\r\n<\/ul>\r\n<p>&nbsp;<\/p>\r\n<p id=\"239d\" class=\"pw-post-body-paragraph xt xu wo bn b xv yz hk xx xy za ho ya yb zb yd ye yf zc yh yi yj zd yl ym yn jn iz\" data-selectable-paragraph=\"\">The same concepts of diversity can apply to datasets consisting of data of a different type and nature as well.<\/p>\r\n<p>&nbsp;<\/p>\r\n<p id=\"87cc\" class=\"pw-post-body-paragraph xt xu wo bn b xv yz hk xx xy za ho ya yb zb yd ye yf zc yh yi yj zd yl ym yn jn iz\" data-selectable-paragraph=\"\">If you want to include diversity in a subset of a large dataset, one way can be to create a similarity matrix which is basically a huge table consisting of points and that maps every point in the dataset against every other point. The point of intersection of the row representing one data item and the column representing another constitutes the points\u2019 similarity score on some standard measure. However, this method of dealing with matrices can be quite a time consuming and resource-intensive since we are talking about practically a million data items in a matrix. You can opt for different algorithms to include variability in your subsets e.g.\u00a0<a class=\"au mn\" href=\"https:\/\/news.mit.edu\/2020\/automating-search-entirely-new-curiosity-algorithms-0428\" target=\"_blank\" rel=\"noopener ugc nofollow\">MIT Researcher\u2019s Algorithm<\/a>. In this algorithm, a small subset from a much larger dataset is chosen at random and the algorithm then selects one point inside the subset and another outside it randomly as well. It then chooses any one of three simple operations i.e. swapping of the points, adding the point outside the subset to the subset, or deleting the point inside the subset on the basis of a number of factors which include the size of the large set, the size of the subset itself, etc. This process continues till the subset is diverse enough to meet a certain measurable level.<\/p>\r\n<p>[\/vc_column_text]<div id=\"el1650294913061-211813f5-5f2d\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Fairness and Ethics<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1655949516553{padding-top: 40px !important;}&#8221;]<\/p>\r\n<p style=\"text-align: left;\">As mentioned above, one way to have diverse data is to collect it from different resources. However, while doing so, it is important to keep fairness, ethics, values, and good morals in mind. If you are collecting data from e.g. a website, it is important for you to first ask for permission from the owner of the data before utilizing it for your work or personal use. You can formally do so by dropping the concerned an email or contacting him\/her in any other way if possible instead of getting access to it without formal consent. Also, you should also provide references to the different sources from where you gathered the data in your formal documentation or anywhere else where you can.<\/p>\r\n<p>&nbsp;<\/p>\r\n<figure id=\"attachment_16436\" aria-describedby=\"caption-attachment-16436\" style=\"width: 700px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" class=\"wp-image-16436 size-full\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_M5Oyxw6dV7BhbHwD.png\" alt=\"\" width=\"700\" height=\"476\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_M5Oyxw6dV7BhbHwD.png 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_M5Oyxw6dV7BhbHwD-300x204.png 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><figcaption id=\"caption-attachment-16436\" class=\"wp-caption-text\">Make sure your algorithm is introduced with various colors and its overlaps to be fair and ethical<\/figcaption><\/figure>\r\n<p>[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1653971463480-ce74a014-4ae9\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655949611578{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]Given the nature of its process, crowdsourcing is a very efficient way to deliver diversity in your data. Here at <strong><a class=\"au mn\" href=\"https:\/\/www.datumo.com\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"pn\">D<\/em><\/a><\/strong><a href=\"https:\/\/www.datumo.com\"><strong>ATUMO<\/strong><\/a>, we\u00a0<strong class=\"bn ml\">crowdsource<\/strong>\u00a0our tasks to diverse users located globally to ensure quality and quantity simultaneously. Moreover, our in-house managers double-check the quality of the collected or processed data!<\/p>\r\n<p>&nbsp;<\/p>\r\n<p id=\"1eca\" class=\"pw-post-body-paragraph xt xu wo bn b xv yz hk xx xy za ho ya yb zb yd ye yf zc yh yi yj zd yl ym yn jn iz\" data-selectable-paragraph=\"\">Creating and maintaining diversity in your dataset is not an easy task. Thinking about and maintaining all the things mentioned above is quite a burden. Especially, for small- to medium-sized companies, managing human resources and technical specialties are very challenging. Therefore, it is often more efficient to find another service that does laborious works (including both collection and preprocessing) for you. For that, we could be your perfect solution! Check us out at <a class=\"au mn\" href=\"https:\/\/datumo.com\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">datumo.com<\/a>\u00a0for more information! Let us be your HELP!<\/p>\r\n<p>[\/vc_column_text]<div id=\"el1653972293756-76a5ecd1-3d25\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655949554110{border-top-width: 1px !important;padding-top: 80px !important;padding-bottom: 0px !important;border-top-color: rgba(0,0,0,0.2) !important;border-top-style: solid !important;}&#8221;]To sum it all up, in this tutorial, we started off by talking about how important it is to have a dataset that meets a certain standard of quality and one very important constituent of a good quality dataset is diversity. Generally, a good dataset is composed of plentiful training data. Diversity of the training data ensures that it can provide more discriminative information to the model so that it can accurately predict results. We then discussed ways in which one can introduce diversity in his\/her dataset e.g. by collecting data from different sources, using different algorithms to derive a diverse subset from a large set of data etc. Lastly, we touched upon the ethics and code of conduct that should be adopted while introducing variability in one\u2019s dataset.[\/vc_column_text]<div id=\"el1653971463481-f4f34d7c-39ce\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794934167-c0c94dd3-ea74\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-white  vc_custom_1652982865548  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-black font-weight-bold mb-0 animate-in\" style=\"\">See what we can do for you.<\/h3><p class=\"card-text pix-pt-10 text-black \" style=\"\">Build smarter AI with us.<\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/datumo.com\" class=\"btn mb-2     text-white btn-black d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Learn More<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794982519-9a19190b-7fde\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-black  vc_custom_1653971438710  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-white font-weight-bold mb-0 animate-in\" style=\"\">We would like to support the AI industry by sharing.<\/h3><p class=\"card-text pix-pt-10 text-white \" style=\"\"><\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/open.datumo.com\/en\" class=\"btn mb-2    vc_custom_1653971438714  btn-primary d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Download Open Datasets<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row]<\/p>\r\n","protected":false},"excerpt":{"rendered":"[vc_row pix_particles_check=&#8221;&#8221;][vc_column][\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1655948827091{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;]We are all well aware that to effectively use an ML or AI model to solve a specific problem, it is crucial to have high-quality training data for the&#8230;","protected":false},"author":1,"featured_media":16500,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[131],"tags":[26,143,208,207,150],"class_list":["post-16430","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-ai","tag-dataset","tag-daumo","tag-diverse-data","tag-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models? - DATUMO<\/title>\n<meta name=\"description\" content=\"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.datumo.com\/en\/tech\/16430\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?\" \/>\n<meta property=\"og:description\" content=\"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.datumo.com\/en\/tech\/16430\" \/>\n<meta property=\"og:site_name\" content=\"DATUMO\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-23T01:46:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-22T09:01:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DATUMO\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?\" \/>\n<meta name=\"twitter:description\" content=\"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"DATUMO\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"10\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430\"},\"author\":{\"name\":\"DATUMO\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\"},\"headline\":\"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?\",\"datePublished\":\"2022-06-23T01:46:18+00:00\",\"dateModified\":\"2024-10-22T09:01:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430\"},\"wordCount\":2185,\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\",\"keywords\":[\"AI\",\"Dataset\",\"daumo\",\"diverse data\",\"machine learning\"],\"articleSection\":[\"tech\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430\",\"url\":\"https:\/\/blog.datumo.com\/en\/tech\/16430\",\"name\":\"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models? - DATUMO\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\",\"datePublished\":\"2022-06-23T01:46:18+00:00\",\"dateModified\":\"2024-10-22T09:01:29+00:00\",\"description\":\"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.datumo.com\/en\/tech\/16430\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16430#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.datumo.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.datumo.com\/#website\",\"url\":\"https:\/\/blog.datumo.com\/\",\"name\":\"DATUMO\",\"description\":\"The Data for Smarter AI\",\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.datumo.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.datumo.com\/#organization\",\"name\":\"DATUMO\",\"url\":\"https:\/\/blog.datumo.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"width\":1080,\"height\":600,\"caption\":\"DATUMO\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\",\"name\":\"DATUMO\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"caption\":\"DATUMO\"},\"description\":\"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.\",\"sameAs\":[\"https:\/\/blog.datumo.com\/en\"],\"url\":\"https:\/\/blog.datumo.com\/en\/author\/selectstar\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models? - DATUMO","description":"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.datumo.com\/en\/tech\/16430","og_locale":"ko_KR","og_type":"article","og_title":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?","og_description":"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.","og_url":"https:\/\/blog.datumo.com\/en\/tech\/16430","og_site_name":"DATUMO","article_published_time":"2022-06-23T01:46:18+00:00","article_modified_time":"2024-10-22T09:01:29+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","type":"image\/jpeg"}],"author":"DATUMO","twitter_card":"summary_large_image","twitter_title":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?","twitter_description":"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.","twitter_image":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","twitter_misc":{"\uae00\uc4f4\uc774":"DATUMO","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"10\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#article","isPartOf":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430"},"author":{"name":"DATUMO","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6"},"headline":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?","datePublished":"2022-06-23T01:46:18+00:00","dateModified":"2024-10-22T09:01:29+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430"},"wordCount":2185,"publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","keywords":["AI","Dataset","daumo","diverse data","machine learning"],"articleSection":["tech"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/blog.datumo.com\/en\/tech\/16430","url":"https:\/\/blog.datumo.com\/en\/tech\/16430","name":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models? - DATUMO","isPartOf":{"@id":"https:\/\/blog.datumo.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","datePublished":"2022-06-23T01:46:18+00:00","dateModified":"2024-10-22T09:01:29+00:00","description":"No matter how efficient or accurate the model is, if it is provided with and trained upon a poor quality dataset, it will never produce the desired or correct output.","breadcrumb":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.datumo.com\/en\/tech\/16430"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#primaryimage","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-uklSGxYlp64-unsplash.jpg","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/blog.datumo.com\/en\/tech\/16430#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.datumo.com\/en\/"},{"@type":"ListItem","position":2,"name":"Fairness? Ethics? Why is Diverse Data Important for Your A.I. Models?"}]},{"@type":"WebSite","@id":"https:\/\/blog.datumo.com\/#website","url":"https:\/\/blog.datumo.com\/","name":"DATUMO","description":"The Data for Smarter AI","publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.datumo.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/blog.datumo.com\/#organization","name":"DATUMO","url":"https:\/\/blog.datumo.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","width":1080,"height":600,"caption":"DATUMO"},"image":{"@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6","name":"DATUMO","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","caption":"DATUMO"},"description":"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.","sameAs":["https:\/\/blog.datumo.com\/en"],"url":"https:\/\/blog.datumo.com\/en\/author\/selectstar"}]}},"_links":{"self":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16430","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/comments?post=16430"}],"version-history":[{"count":11,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16430\/revisions"}],"predecessor-version":[{"id":16929,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16430\/revisions\/16929"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media\/16500"}],"wp:attachment":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media?parent=16430"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/categories?post=16430"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/tags?post=16430"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}