{"id":16282,"date":"2022-06-21T05:32:57","date_gmt":"2022-06-21T05:32:57","guid":{"rendered":"https:\/\/blog.datumo.com\/en\/?p=16282"},"modified":"2022-07-05T06:26:32","modified_gmt":"2022-07-05T06:26:32","slug":"creating-the-best-quality-image-dataset","status":"publish","type":"post","link":"https:\/\/blog.datumo.com\/en\/tech\/16282","title":{"rendered":"Creating the Best Quality Image Dataset"},"content":{"rendered":"<p>[vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div><div class=\"pix-content-box card      vc_custom_1654577545529 custom-responsive-73331278   rounded-lg bg- w-100  \"   ><div class=\"\" style=\"z-index:30;position:relative;\">[vc_column_text]<\/p>\n<p style=\"text-align: left;\"><span style=\"font-size: 14pt;\"><strong>\ud83d\udd11<\/strong> <strong>In 5 minutes you will learn:<\/strong><\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li>How to create quality image dataset<\/li>\n<li>Things to consider when creating quality image dataset<\/li>\n<\/ul>\n<p>&nbsp;[\/vc_column_text]<\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650294698986-a1b962b5-ef42\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655958874907{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;]<\/p>\n<p id=\"4f08\" class=\"pw-post-body-paragraph xt xu wo bn b xv xw hk xx xy xz ho ya yb yc yd ye yf yg yh yi yj yk yl ym yn jn iz\" data-selectable-paragraph=\"\">Apart from building a model for solving a specific problem, it is of equal importance to have a good quality dataset for the problem at hand, because no matter how efficient or accurate your model is, if provided with the wrong dataset, you will never attain the desired output.<\/p>\n<p data-selectable-paragraph=\"\">\n<p id=\"c539\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" data-selectable-paragraph=\"\">A good dataset is crucial in achieving the highest possible accuracy of your model. It is also important that the dataset is processed in such a way that our model can make complete sense of the information. That way, the model can successfully learn from that dataset. Thus, the goal of our tutorial is to discuss ways to gather a dataset of raw images and then filter out the images to create the best possible dataset for image classification\/computer vision projects. So let&#8217;s begin!<\/p>\n<p>[\/vc_column_text][\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221; el_id=&#8221;pix_section_program&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div id=\"el1650442503491-f5da6b2f-fa35\" class=\"mb-3 text-left \"><h2 class=\"mb-32 pix-sliding-headline font-weight-bold secondary-font\" data-class=\"secondary-font text-heading-default\" data-style=\"\">Step 1: Planning<\/h2><\/div>[vc_column_text css=&#8221;.vc_custom_1655792863085{padding-top: 40px !important;padding-bottom: 40px !important;}&#8221;]<\/p>\n<p style=\"text-align: left;\">In the planning phase, before actually collecting the images, you must assess the context of the problem that you need to solve and then choose the best possible way to build a dataset for that problem. For example, there are many sources for open datasets that you can utilize if you are doing a common image classification project. Similarly, you can also take the pictures on your own or download them from a source. We\u2019ll be discussing both of these ways to gather images for a dataset.<\/p>\n<p>[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650442607008-a85a832d-43f0\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Step 2: Gathering Images<\/h2><\/div><\/div><\/div><div id=\"el1655790636245-77fa25f7-1b1d\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655790693179{padding-top: 40px !important;}&#8221;]<\/p>\n<h4 id=\"f539\" class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\"><strong>Taking Pictures on your own:<\/strong><\/h4>\n<p>&nbsp;<\/p>\n<p id=\"942e\" class=\"pw-post-body-paragraph xt xu wo bn b xv xw hk xx xy xz ho ya yb yc yd ye yf yg yh yi yj yk yl ym yn jn iz\" data-selectable-paragraph=\"\">While taking pictures, you should consider the following pointers so that your training dataset is both as variable and diverse as it can possibly get:<\/p>\n<ol class=\"\">\n<li id=\"e281\" class=\"ahl ahm wo bn b xv yo xy yp yb ahn yf aho yj ahp yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Take pictures of the object to be classified at different angles<\/li>\n<li id=\"756b\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Change your lighting conditions<\/li>\n<li id=\"70c5\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Change the object size<\/li>\n<li id=\"9cd6\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Vary the distance of your camera from the object<\/li>\n<li id=\"f88d\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Vary the background of the object<\/li>\n<li id=\"5bd1\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Take good quality images and in focus<\/li>\n<li id=\"2736\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">For a colored object, take images consisting of different colors<\/li>\n<\/ol>\n<p id=\"afe1\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" data-selectable-paragraph=\"\">Following these pointers will help to ensure that your image dataset is as realistic as possible. Training with such images will ensure good performance as a higher diversity of datasets, in turn, leads to higher accuracy.<\/p>\n<p>[\/vc_column_text]<div id=\"el1650442651668-7359ff25-270a\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655958935440{padding-top: 40px !important;}&#8221;]<\/p>\n<h4 id=\"5270\" class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\"><strong>Downloading images through the Fatkun Batch Download Image extension:<\/strong><\/h4>\n<h4><\/h4>\n<p>&nbsp;<\/p>\n<p class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\">Pre-requisites:<\/p>\n<ol class=\"\">\n<li id=\"90d2\" class=\"ahl ahm wo bn b xv yo xy yp yb ahn yf aho yj ahp yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\"><strong class=\"bn ml\">Google Chrome Browser<\/strong>. If you do not already have it downloaded, you can download it from\u00a0<a class=\"au mn\" href=\"https:\/\/www.google.com\/chrome\/?brand=CHBD&amp;gclsrc=aw.ds&amp;&amp;gclid=EAIaIQobChMI_LejjK-Y6AIVw9DeCh0MowmOEAAYASAAEgJsZvD_BwE\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/li>\n<li id=\"f705\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\"><strong class=\"bn ml\">Fatkun Batch Download Image<\/strong>. If you do not already have it downloaded, you can download it from\u00a0<a class=\"au mn\" href=\"https:\/\/chrome.google.com\/webstore\/detail\/fatkun-batch-download-ima\/nnjjahlikiabnchcpehcpkdeckfgnohf\/related?hl=en\" target=\"_blank\" rel=\"noopener ugc nofollow\">here<\/a>.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p id=\"7037\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" data-selectable-paragraph=\"\">Steps:<\/p>\n<ol class=\"\">\n<li id=\"7f7e\" class=\"ahl ahm wo bn b xv yo xy yp yb ahn yf aho yj ahp yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">After you are finished with the installation, search for the website and the pictures that you want to possess.<\/li>\n<li id=\"dd30\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Click on the extension\u2019s icon and with the aid of this, you can opt for either the current tabs or the open tabs.<\/li>\n<li id=\"b83b\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Now an extension will get opened which would display a new tab showing all images that have been detected by it. All the pictures that appear on the extension\u2019s tab by default have opted for the purpose of download. Once you have made the choice; you can click on \u2018save image\u2019.<\/li>\n<li id=\"29ca\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">The extension would now provide you with the warning and will ask where to save the file before it is been downloaded and you have to give the confirmation for each image.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>Hence, you can automatically download the images. The extension would create for you a new folder based on the title of the website and there you could download all the desired images. You could even click on\u00a0<code class=\"hh ajr ajs ajt aig b\">more options<\/code>\u00a0so that with the aid of the link you could simply filter the images, rename and sort them as per size.[\/vc_column_text]<div id=\"el1650294913061-211813f5-5f2d\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Step 3: Image Filtering<\/h2><\/div><\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1655959082117{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: px !important;padding-left: 20px !important;}&#8221;]<\/p>\n<p id=\"5840\" class=\"pw-post-body-paragraph xt xu wo bn b xv xw hk xx xy xz ho ya yb yc yd ye yf yg yh yi yj yk yl ym yn jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\">After having the images downloaded in a bulk, you are most likely to realize at first glance that some of the images that you have downloaded are either unclear, low in resolution, irrelevant and duplicates of other images. Therefore it is very important to rid your image set from such images first to construct the best possible image dataset for your model.<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>Deleting image duplicates<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p id=\"0de9\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\">While constructing an image dataset, it is very crucial that clear preference is given to quality in comparison to the quantity of the images at hand. Therefore, if there appear to be a lot of exact duplicates, you should filter them out using something like a Resnet18 which helps to remove duplication using duplicate feature vectors. This is, however, not very practical for large datasets but the idea with this is that duplicated images may allow models to cheat on performance metrics if they get put in both train and test splits, so reducing them as much as possible is good. Phashing is another potential duplication removal method. However, the con of this technique is that it sometimes falsely detects non-duplicate images as duplicates because of the fact that at times images are resized down quite small and turned to black and white so there is a tradeoff.<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>Deleting very small images<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p id=\"dd92\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\">It is important to remove very small images from the image set since these images give very little information and are mostly of poor quality. Thus, it is a good practice to standardize a reasonable threshold for image size, so that when an image size lies below or even way above the threshold, it is a sign that the image needs to be removed. For example, most image models take images between 224&#215;224 and 512&#215;512 so this helps to cut out the low-quality images that you may have downloaded. This process can also be automated by using various Python scripts to save time and improve accuracy.<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>Manual Pruning<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p id=\"b447\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\">This is not the most practical but at times inevitable to improve the quality of your dataset because no matter how much you automate this process of image cleansing, you cannot beat the human eye when it comes to sensing good quality images. This is an attempt to remove low quality or non-relevant images from the different classes in your dataset. This step sort of cascades to the quality of the final model and classes so it is recommended to be very aggressive in deleting images if you want well-defined classes.<\/p>\n<p>[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650362147064-486b7dc2-a9b3\" class=\"w-100 d-block \"><\/div><div id=\"el1650450433074-0be5e40e-928e\" class=\"w-100 d-block \"><\/div><div  class=\"pix-heading-el text-left \"><div><div class=\"slide-in-container\"><h2 class=\"text-heading-default font-weight-bold heading-text el-title_custom_color mb-12\" style=\"\" data-anim-type=\"\" data-anim-delay=\"0\">Some Other Key Pointers<\/h2><\/div><\/div><\/div><div id=\"el1650362652282-42ee7789-aa09\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655791923245{padding-top: 40px !important;}&#8221;]<\/p>\n<h4 id=\"8f76\" class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\"><strong>Amount of Data<\/strong><\/h4>\n<p>&nbsp;<\/p>\n<p class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\">Choosing the right amount of data i.e the number of images that you should use to train your model is also an important factor to consider. For Machine Learning projects, it should be at least 10 times the number of features per class. As for Deep Learning projects, it should be at least 100 times the number of features per class.<\/p>\n<p>&nbsp;<\/p>\n<h4 id=\"c91a\" class=\"ahe xl wo bn xm jt ahf ju hm jx ahg jy hq yb ahh ack hu yf ahi acm hy yj ahj aco ic ahk iz\"><strong>The sample quantities should be balanced among classes<\/strong><\/h4>\n<p>&nbsp;<\/p>\n<ol class=\"\">\n<li id=\"fc53\" class=\"ahl ahm wo bn b xv xw xy xz yb aia yf aib yj aic yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">The samples should represent the real situations where you are going to apply your model. For instance, if you are training a face classifier for using in situations where faces are smaller than 30&#215;30 pixels you should have low quality and low resolution face images in the training stage.<\/li>\n<li id=\"d51c\" class=\"ahl ahm wo bn b xv ahu xy ahv yb ahw yf ahx yj ahy yn ahq ahr ahs aht iz\" data-selectable-paragraph=\"\">Samples should have a maximum variety possible. For instance, in face classification, your dataset must have faces of people of different ages, ethnicities, genders, illumination conditions, orientation in-plane and out-of-plane, etc.<\/li>\n<\/ol>\n<p>[\/vc_column_text]<div id=\"el1655791931196-a000864c-6813\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1653971463480-ce74a014-4ae9\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655792312178{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]<\/p>\n<p id=\"db30\" class=\"pw-post-body-paragraph xt xu wo bn b xv xw hk xx xy xz ho ya yb yc yd ye yf yg yh yi yj yk yl ym yn jn iz\" data-selectable-paragraph=\"\">Most times, quality controlling within a company is quite a burden, especially your company is a small- or medium-sized company; having enough human resources is always a great challenge for companies in such sizes. Therefore, it is often more efficient to find another service that does laborious works for you. We could be your perfect solution!<\/p>\n<p id=\"9460\" class=\"pw-post-body-paragraph xt xu wo bn b xv yo hk xx xy yp ho ya yb yq yd ye yf yr yh yi yj ys yl ym yn jn iz\" data-selectable-paragraph=\"\">Here at <a class=\"au mn\" href=\"https:\/\/www.datumo.com\" target=\"_blank\" rel=\"noopener ugc nofollow\"><strong class=\"bn ml\"><em class=\"pn\">DATUMO<\/em><\/strong><\/a>, we crowdsource our tasks to diverse users located globally to ensure the quality and quantity on time. Moreover, our in-house managers double-check the quality of the collected or processed data.<\/p>\n<p>[\/vc_column_text]<div id=\"el1653972293756-76a5ecd1-3d25\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655791980560{border-top-width: 1px !important;padding-top: 80px !important;padding-bottom: 40px !important;background-color: rgba(255,255,255,0.2) !important;*background-color: rgb(255,255,255) !important;border-top-color: rgba(0,0,0,0.2) !important;border-top-style: solid !important;}&#8221;]<\/p>\n<p id=\"db30\" class=\"pw-post-body-paragraph xt xu wo bn b xv xw hk xx xy xz ho ya yb yc yd ye yf yg yh yi yj yk yl ym yn jn iz\" data-selectable-paragraph=\"\">In this post, we talked about the different practices that you can follow to make the best quality image dataset for your Machine learning or Deep learning projects. We first talked about how planning and simplifying the best route to gather images is important before jumping into the image actual collection process. It is important to remember the context of the problem that you are trying to solve through your project and then choose a collection technique. You can gather the images on your own or through a website or outside source e.g an open dataset. We discussed the best practices to follow when collecting the pictures by yourself and also talked about how quick and easy it is to use the Fatkun Batch Download Image extension for Chrome for the bulk download of images. Filtering the collected images by removing duplicates, deleting small pictures and manual pruning are key factors that lead to a good dataset. Lastly, deciding upon the perfect quantity of data, balancing the quantities in the different classes, introducing diversity in your samples as well as emphasizing on the context of the usage of the model are the practices, which if followed thoroughly, are guaranteed to make the best quality image dataset.<\/p>\n<p>[\/vc_column_text]<div id=\"el1653971463480-31c1935f-3278\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794934167-c0c94dd3-ea74\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-white  vc_custom_1652982865548  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-black font-weight-bold mb-0 animate-in\" style=\"\">See what we can do for you.<\/h3><p class=\"card-text pix-pt-10 text-black \" style=\"\">Build smarter AI with us.<\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/datumo.com\" class=\"btn mb-2     text-white btn-black d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Learn More<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794982519-9a19190b-7fde\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-black  vc_custom_1653971438710  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-white font-weight-bold mb-0 animate-in\" style=\"\">We would like to support the AI industry by sharing.<\/h3><p class=\"card-text pix-pt-10 text-white \" style=\"\"><\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/open.datumo.com\/en\" class=\"btn mb-2    vc_custom_1653971438714  btn-primary d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Download Open Datasets<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row]<\/p>\n","protected":false},"excerpt":{"rendered":"[vc_row pix_particles_check=&#8221;&#8221;][vc_column][\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1655958874907{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;] Apart from building a model for solving a specific problem, it is of equal importance to have a good quality dataset for the problem at hand, because no&#8230;","protected":false},"author":1,"featured_media":16448,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[131],"tags":[26,143,127,181,180],"class_list":["post-16282","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-ai","tag-dataset","tag-datumo","tag-image-classification","tag-image-dataset"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Creating the Best Quality Image Dataset - DATUMO<\/title>\n<meta name=\"description\" content=\"we are going to discuss the different practices which can help you to create the best quality image dataset.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.datumo.com\/en\/tech\/16282\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Creating the Best Quality Image Dataset - DATUMO\" \/>\n<meta property=\"og:description\" content=\"we are going to discuss the different practices which can help you to create the best quality image dataset.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.datumo.com\/en\/tech\/16282\" \/>\n<meta property=\"og:site_name\" content=\"DATUMO\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-21T05:32:57+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-07-05T06:26:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1459\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DATUMO\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Creating the Best Quality Image Dataset - DATUMO\" \/>\n<meta name=\"twitter:description\" content=\"we are going to discuss the different practices which can help you to create the best quality image dataset.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"DATUMO\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282\"},\"author\":{\"name\":\"DATUMO\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\"},\"headline\":\"Creating the Best Quality Image Dataset\",\"datePublished\":\"2022-06-21T05:32:57+00:00\",\"dateModified\":\"2022-07-05T06:26:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282\"},\"wordCount\":2385,\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\",\"keywords\":[\"AI\",\"Dataset\",\"datumo\",\"image classification\",\"image dataset\"],\"articleSection\":[\"tech\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282\",\"url\":\"https:\/\/blog.datumo.com\/en\/tech\/16282\",\"name\":\"Creating the Best Quality Image Dataset - DATUMO\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\",\"datePublished\":\"2022-06-21T05:32:57+00:00\",\"dateModified\":\"2022-07-05T06:26:32+00:00\",\"description\":\"we are going to discuss the different practices which can help you to create the best quality image dataset.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.datumo.com\/en\/tech\/16282\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg\",\"width\":1920,\"height\":1459},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16282#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.datumo.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Creating the Best Quality Image Dataset\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.datumo.com\/#website\",\"url\":\"https:\/\/blog.datumo.com\/\",\"name\":\"DATUMO\",\"description\":\"The Data for Smarter AI\",\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.datumo.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.datumo.com\/#organization\",\"name\":\"DATUMO\",\"url\":\"https:\/\/blog.datumo.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"width\":1080,\"height\":600,\"caption\":\"DATUMO\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\",\"name\":\"DATUMO\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"caption\":\"DATUMO\"},\"description\":\"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.\",\"sameAs\":[\"https:\/\/blog.datumo.com\/en\"],\"url\":\"https:\/\/blog.datumo.com\/en\/author\/selectstar\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Creating the Best Quality Image Dataset - DATUMO","description":"we are going to discuss the different practices which can help you to create the best quality image dataset.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.datumo.com\/en\/tech\/16282","og_locale":"ko_KR","og_type":"article","og_title":"Creating the Best Quality Image Dataset - DATUMO","og_description":"we are going to discuss the different practices which can help you to create the best quality image dataset.","og_url":"https:\/\/blog.datumo.com\/en\/tech\/16282","og_site_name":"DATUMO","article_published_time":"2022-06-21T05:32:57+00:00","article_modified_time":"2022-07-05T06:26:32+00:00","og_image":[{"width":1920,"height":1459,"url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","type":"image\/jpeg"}],"author":"DATUMO","twitter_card":"summary_large_image","twitter_title":"Creating the Best Quality Image Dataset - DATUMO","twitter_description":"we are going to discuss the different practices which can help you to create the best quality image dataset.","twitter_image":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","twitter_misc":{"\uae00\uc4f4\uc774":"DATUMO"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#article","isPartOf":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282"},"author":{"name":"DATUMO","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6"},"headline":"Creating the Best Quality Image Dataset","datePublished":"2022-06-21T05:32:57+00:00","dateModified":"2022-07-05T06:26:32+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282"},"wordCount":2385,"publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","keywords":["AI","Dataset","datumo","image classification","image dataset"],"articleSection":["tech"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/blog.datumo.com\/en\/tech\/16282","url":"https:\/\/blog.datumo.com\/en\/tech\/16282","name":"Creating the Best Quality Image Dataset - DATUMO","isPartOf":{"@id":"https:\/\/blog.datumo.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","datePublished":"2022-06-21T05:32:57+00:00","dateModified":"2022-07-05T06:26:32+00:00","description":"we are going to discuss the different practices which can help you to create the best quality image dataset.","breadcrumb":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.datumo.com\/en\/tech\/16282"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#primaryimage","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/milad-fakurian-bMSA5-tLFao-unsplash.jpg","width":1920,"height":1459},{"@type":"BreadcrumbList","@id":"https:\/\/blog.datumo.com\/en\/tech\/16282#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.datumo.com\/en\/"},{"@type":"ListItem","position":2,"name":"Creating the Best Quality Image Dataset"}]},{"@type":"WebSite","@id":"https:\/\/blog.datumo.com\/#website","url":"https:\/\/blog.datumo.com\/","name":"DATUMO","description":"The Data for Smarter AI","publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.datumo.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/blog.datumo.com\/#organization","name":"DATUMO","url":"https:\/\/blog.datumo.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","width":1080,"height":600,"caption":"DATUMO"},"image":{"@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6","name":"DATUMO","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","caption":"DATUMO"},"description":"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.","sameAs":["https:\/\/blog.datumo.com\/en"],"url":"https:\/\/blog.datumo.com\/en\/author\/selectstar"}]}},"_links":{"self":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16282","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/comments?post=16282"}],"version-history":[{"count":14,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16282\/revisions"}],"predecessor-version":[{"id":16529,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16282\/revisions\/16529"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media\/16448"}],"wp:attachment":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media?parent=16282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/categories?post=16282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/tags?post=16282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}