{"id":16367,"date":"2022-06-22T07:59:08","date_gmt":"2022-06-22T07:59:08","guid":{"rendered":"https:\/\/blog.datumo.com\/en\/?p=16367"},"modified":"2024-10-22T08:53:53","modified_gmt":"2024-10-22T08:53:53","slug":"problems-of-online-image-crawling","status":"publish","type":"post","link":"https:\/\/blog.datumo.com\/en\/tech\/16367","title":{"rendered":"Problems of Online Image Crawling"},"content":{"rendered":"<p>[vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_raw_html]JTNDbWV0YSUyMGh0dHAtZXF1aXYlM0QlMjJyZWZyZXNoJTIyJTIwY29udGVudCUzRCUyMjAlM0IlMjB1cmwlM0RodHRwcyUzQSUyRiUyRmRhdHVtby5jb20lMkZlbiUyRnByb2JsZW1zLW9mLW9ubGluZS1pbWFnZS1jcmF3bGluZyUyRiUyMiUzRQ==[\/vc_raw_html]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div><div class=\"pix-content-box card      vc_custom_1654577545529 custom-responsive-142846179   rounded-lg bg- w-100  \"   ><div class=\"\" style=\"z-index:30;position:relative;\">[vc_column_text]<\/p>\n<p style=\"text-align: left;\"><span style=\"font-size: 14pt;\"><strong>\ud83d\udd11<\/strong> <strong>In 9 minutes you will learn:<\/strong><\/span><\/p>\n<ul class=\"p-rich_text_list p-rich_text_list__bullet\" data-stringify-type=\"unordered-list\" data-indent=\"0\" data-border=\"0\">\n<li>In this tutorial, we are going to talk about online image crawling and some of the common problems that we come across while doing so.<\/li>\n<\/ul>\n<p>[\/vc_column_text]<\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1650294698986-a1b962b5-ef42\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655884798449{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;]<\/p>\n<p id=\"87a0\" class=\"pw-post-body-paragraph le lf jj bn b lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma jc hk\" data-selectable-paragraph=\"\">Online image crawling is basically a way of gathering large amounts of images through different websites. In short, we scrape images from different web pages, store them in a folder, drive or any desired location, and can use those images for different purposes, like for building an image dataset for a machine learning model. So in this tutorial, we will be discussing two ways of online image crawling, one through a Google Chrome extension named\u00a0<a class=\"au mq\" href=\"https:\/\/chrome.google.com\/webstore\/detail\/fatkun-batch-download-ima\/nnjjahlikiabnchcpehcpkdeckfgnohf?hl=en\" target=\"_blank\" rel=\"noopener ugc nofollow\">Fatkun Batch Download Image<\/a>\u00a0and another by writing Python script to scrape images from a web page. The main topic that we will be going through is the common problems incurred while scraping images from the web. So let\u2019s get started!<\/p>\n<p>[\/vc_column_text][\/vc_column][\/vc_row][vc_section full_width=&#8221;stretch_row&#8221; pix_over_visibility=&#8221;&#8221; css=&#8221;.vc_custom_1650444445523{padding-top: 80px !important;padding-bottom: 80px !important;background-color: #f8f9fa !important;}&#8221; el_id=&#8221;pix_section_program&#8221;][vc_row full_width=&#8221;stretch_row&#8221; pix_particles_check=&#8221;&#8221;][vc_column content_align=&#8221;text-center&#8221; offset=&#8221;vc_col-lg-offset-0 vc_col-lg-12 vc_col-md-offset-1 vc_col-md-10&#8243;]<div id=\"el1650442503491-f5da6b2f-fa35\" class=\"mb-3 text-left \"><h2 class=\"mb-32 pix-sliding-headline font-weight-bold secondary-font\" data-class=\"secondary-font text-heading-default\" data-style=\"\">Scraping Images from the Web<\/h2><\/div>[vc_column_text css=&#8221;.vc_custom_1655885058210{padding-top: 40px !important;padding-bottom: 40px !important;}&#8221;]<\/p>\n<h4 id=\"51c1\" class=\"ms kh jj bn ki mt mu mv km mw mx my kq lo mz na ku ls nb nc ky lw nd ne lc nf hk\" style=\"text-align: left;\"><strong>Google Chrome\u2019s Fatkun Batch Download Image Extension<\/strong><\/h4>\n<figure class=\"mc md me mf hb mg gp gq paragraph-image\">\n<div class=\"mh mi dq mj cf mk\" tabindex=\"0\" role=\"button\"><\/div>\n<\/figure>\n<p id=\"0340\" class=\"pw-post-body-paragraph zs zt yn bn b zu zv hk zw zx zy ho zz aba abb abc abd abe abf abg abh abi abj abk abl abm jn iz\" style=\"text-align: left;\" data-selectable-paragraph=\"\"><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-full wp-image-16371\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_XAFWD-F7_7qb7y9b.png\" alt=\"\" width=\"700\" height=\"404\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_XAFWD-F7_7qb7y9b.png 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_XAFWD-F7_7qb7y9b-300x173.png 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p style=\"text-align: left;\" data-selectable-paragraph=\"\">Fatkun Batch Image download is a simple and useful image download extension. As its name suggests, it allows you to download images in a batch from a website and use them for different purposes. You can apply search filters, select and deselect images, opt for a particular tab or all tabs for image download, rename images in batch, and can also vary image sizes before downloading them. For more about this extension i.e. for installing it, setting it up, and using it, you can read the \u2018Downloading images through the Fatkun Batch Download Image extension\u2019 section in our <a class=\"au mq\" href=\"https:\/\/mc.ai\/creating-the-best-quality-image-dataset\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Creating the Best Quality Image Dataset<\/a>\u00a0article.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-16372\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_99rGiCnuXnIoXrX4.jpg\" alt=\"\" width=\"700\" height=\"761\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_99rGiCnuXnIoXrX4.jpg 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_99rGiCnuXnIoXrX4-276x300.jpg 276w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p id=\"4a52\" class=\"pw-post-body-paragraph le lf jj bn b lg ng li lj lk nh lm ln lo ni lq lr ls nj lu lv lw nk ly lz ma jc hk\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Drawbacks of this extension:<\/p>\n<ul class=\"\">\n<li id=\"3e24\" class=\"nn no jj bn b lg ng lk nh lo np ls nq lw nr ma ns nt nu nv hk\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Tedious when it comes to collecting images for large datasets.<\/li>\n<li id=\"b868\" class=\"nn no jj bn b lg nw lk nx lo ny ls nz lw oa ma ns nt nu nv hk\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Images collected are thumbnails and not original size images.<\/li>\n<\/ul>\n<p>[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1655885193515{border-top-width: 1px !important;padding-top: 60px !important;padding-bottom: 30px !important;border-top-color: rgba(0,0,0,0.2) !important;border-top-style: solid !important;}&#8221;]<\/p>\n<h4 id=\"35c2\" class=\"ms kh jj bn ki mt mu mv km mw mx my kq lo mz na ku ls nb nc ky lw nd ne lc nf hk\" style=\"text-align: left;\"><strong>Image Crawling in Python<\/strong><\/h4>\n<figure class=\"mc md me mf hb mg gp gq paragraph-image\">\n<div class=\"mh mi dq mj cf mk\" tabindex=\"0\" role=\"button\"><\/div>\n<\/figure>\n<p style=\"text-align: left;\">There are multiple Python packages and libraries that can help you to scrape images from a website. These include Beautiful Soup, Selenium, Scrapy, etc. We will be talking about\u00a0<a class=\"au mq\" href=\"https:\/\/scrapy.org\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Scrapy<\/a>\u00a0which is basically a framework written in Python and designed for web scraping. It can also be used to extract data using APIs or just as a general-purpose web crawler. For more details on this framework and for full-fledged implementation of it for large-scale image scraping for datasets, you can have a look at our\u00a0<a class=\"au mq\" href=\"https:\/\/mc.ai\/fuel-up-the-deep-learning-custom-dataset-creation-with-web-scraping\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Custom Dataset Creation with Web Scraping<\/a>\u00a0article.<\/p>\n<p>[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1655885526249{border-top-width: 1px !important;padding-top: 60px !important;padding-bottom: 30px !important;border-top-color: rgba(0,0,0,0.2) !important;border-top-style: solid !important;}&#8221;]<\/p>\n<h4 id=\"9e00\" class=\"ms kh jj bn ki mt mu mv km mw mx my kq lo mz na ku ls nb nc ky lw nd ne lc nf hk\" style=\"text-align: left;\"><strong>Problems of Online Image Crawling<\/strong><\/h4>\n<figure class=\"mc md me mf hb mg gp gq paragraph-image\">\n<div class=\"mh mi dq mj cf mk\" tabindex=\"0\" role=\"button\"><\/div>\n<\/figure>\n<p style=\"text-align: left;\">Online image crawling or scraping is an important aspect for gathering relevant images from different websites especially when it comes to building high-quality image datasets for different machine learning models to be trained on. However, scraping images from the web is not without its challenges. The context of the internet changes with each passing day, making it complicated and at times almost impossible to successfully gather images from different websites. Even if you succeed in doing so, the performance of the web scraper may be seriously compromised. So it\u2019s extremely important to consider certain factors before diving into the scraping process itself. Let\u2019s have a look at some of the common problems that one could come across while scraping images online:<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>1. Access to scraping:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">Before you plan on scraping images or other content from a particular website, it is important for you to ensure that the target website permits scraping. This is because many applications restrict access through their robots.txt file. A robots.txt file basically tells search engine crawlers which pages or files the crawler can or cannot request from your site. In case, if access for image crawling has been denied to you, you can through a formal or informal route, contact the owner of the website, explain your situation and request him\/her for access. If that does not work out, you can look for other websites with similar content and hope for the best.<\/p>\n<p>&nbsp;<\/p>\n<h5 id=\"5e03\" class=\"pw-post-body-paragraph le lf jj bn b lg ng li lj lk nh lm ln lo ni lq lr ls nj lu lv lw nk ly lz ma jc hk\" style=\"text-align: left;\"><strong>2. Anti-scraping policies:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-16375\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_5zQ6jpbn8aNkgMRV.png\" alt=\"\" width=\"584\" height=\"328\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_5zQ6jpbn8aNkgMRV.png 584w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_5zQ6jpbn8aNkgMRV-300x168.png 300w\" sizes=\"(max-width: 584px) 100vw, 584px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">Another common problem which you can come across while trying to scrape images from a website is its anti-scraping policy. For instance, to prevent the scraping of their content, many websites make use of IP blocking. This is a way of preventing someone from scraping the content of your website by banning the scraper\u2019s IP address or restricting it to slow down the scraping process. This occurs when the target website detects a high number of requests coming from the same IP address, which is usually the case in online image crawling since mostly a large number of images are being scraped at one time. The website considers it as a malicious activity and retorts to IP blocking. One fine solution to this problem is\u00a0<a class=\"au mq\" href=\"https:\/\/www.octoparse.com\/tutorial\/octoparse-cloud-service\" target=\"_blank\" rel=\"noopener ugc nofollow\">Octoparse Cloud Service<\/a>\u00a0which uses multiple IP addresses to scrape one website at the same time and thereby prevents IP blocking.<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>3. Diversity in website structures and layouts:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16376\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_r6tpDGscXmqCPYRm.png\" alt=\"\" width=\"700\" height=\"553\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_r6tpDGscXmqCPYRm.png 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_r6tpDGscXmqCPYRm-300x237.png 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">Web scrapers have some generic limitations that cannot be overcome. Each web scraper is tailored to perfectly suit one particular website and cannot be used for every other website to scrape images. This is because websites have different structures, characteristics. formats and layouts. So there is no general web scraper which fits all websites, rather particular ones for each website. It is also common for not only websites but also webpages to have different structures and layouts from each other. This is because web page designers design different web pages according to their own tastes and standards. This also makes online image crawling an extremely tedious task considering the fact that you have to make changes to your web crawler in accordance with different websites or web pages in order to scrape images from them.<\/p>\n<p>&nbsp;<\/p>\n<h5 style=\"text-align: left;\"><strong>4. Continuously changing website content:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16377\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_DeVvAesqa-qd6abl.png\" alt=\"\" width=\"700\" height=\"417\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_DeVvAesqa-qd6abl.png 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_DeVvAesqa-qd6abl-300x179.png 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">Many websites update their content continuously by adding new features, removing unnecessary features, making certain layout and design changes, etc. This is done to improve user experience but can greatly affect the performance of the web scraper. Since each web scraper is specific to a website, any change in that website in turn is a call for changes in the implementation of the web scraper as well. Even if the website change is extremely minor, it might require you to adjust the scraper accordingly and that can be challenging at times. Octoparse Cloud Service again helps in visualizing these changing structures so that the image crawler can be altered accordingly.<\/p>\n<h5 style=\"text-align: left;\"><strong>5. Bad Image Quality:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16378\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_n2fDAl8DrkNSo2-n.jpeg\" alt=\"\" width=\"700\" height=\"395\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_n2fDAl8DrkNSo2-n.jpeg 700w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_n2fDAl8DrkNSo2-n-300x169.jpeg 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p id=\"ea8a\" class=\"pw-post-body-paragraph le lf jj bn b lg ng li lj lk nh lm ln lo ni lq lr ls nj lu lv lw nk ly lz ma jc hk\" style=\"text-align: left;\" data-selectable-paragraph=\"\">Image quality is an extremely important factor especially when we are talking about building image datasets. The quality of your scraped images can be seriously compromised due to any technical weaknesses or shortcomings of your web scraper. Therefore, it is pivotal for you to choose a high-quality web scraper with a lot of good features to do the job.<\/p>\n<h5><\/h5>\n<h5 style=\"text-align: left;\"><strong>6. Slow loading of Website:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16379\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_8nKepMvYK7toFtBJ.jpeg\" alt=\"\" width=\"322\" height=\"157\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_8nKepMvYK7toFtBJ.jpeg 322w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_8nKepMvYK7toFtBJ-300x146.jpeg 300w\" sizes=\"(max-width: 322px) 100vw, 322px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">Websites may respond very slowly or even fail to load because they may be receiving too many access requests which can be a serious problem in online image crawling.<\/p>\n<p>&nbsp;<\/p>\n<h5 id=\"e0f2\" class=\"pw-post-body-paragraph le lf jj bn b lg ng li lj lk nh lm ln lo ni lq lr ls nj lu lv lw nk ly lz ma jc hk\" style=\"text-align: left;\"><strong>7. Geographical restrictions:<\/strong><\/h5>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16380\" src=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_82IH33bKfTzRbQQb.jpeg\" alt=\"\" width=\"315\" height=\"316\" srcset=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_82IH33bKfTzRbQQb.jpeg 315w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_82IH33bKfTzRbQQb-300x300.jpeg 300w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_82IH33bKfTzRbQQb-150x150.jpeg 150w, https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/0_82IH33bKfTzRbQQb-75x75.jpeg 75w\" sizes=\"(max-width: 315px) 100vw, 315px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: left;\">At times, the biggest hindrance to online image crawling can be your location, since some websites may not be accessible or may not permit scrapping of their content in certain regions or countries.<\/p>\n<p>[\/vc_column_text][\/vc_column][\/vc_row][\/vc_section][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1653971463480-ce74a014-4ae9\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655885581631{padding-top: 40px !important;padding-bottom: 0px !important;}&#8221;]<\/p>\n<p id=\"87e4\" class=\"pw-post-body-paragraph le lf jj bn b lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma jc hk\" data-selectable-paragraph=\"\">Online image crawling is often not enough for companies to train their industry-level algorithms. Moreover, it is difficult to control the quality within a company, especially your company is a small- or medium-sized company. Therefore, it is often more efficient to find another service that does laborious works for you. We could be your perfect solution!<\/p>\n<p id=\"71e3\" class=\"pw-post-body-paragraph le lf jj bn b lg ng li lj lk nh lm ln lo ni lq lr ls nj lu lv lw nk ly lz ma jc hk\" data-selectable-paragraph=\"\">Here at <strong><a class=\"au mn\" href=\"https:\/\/www.datumo.com\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"pn\">D<\/em><\/a><\/strong><a href=\"https:\/\/www.datumo.com\"><strong>ATUMO<\/strong><\/a>, we crowdsource our tasks to diverse users located globally to ensure the quality and quantity on time. Moreover, our in-house managers double-check the quality of the collected or processed data. If you need data? If you need preprocessed data? Let us know!<\/p>\n<p>[\/vc_column_text]<div id=\"el1653972293756-76a5ecd1-3d25\" class=\"w-100 d-block \"><\/div>[vc_column_text css=&#8221;.vc_custom_1655885558341{border-top-width: 1px !important;padding-top: 80px !important;padding-bottom: 0px !important;border-top-color: rgba(0,0,0,0.2) !important;border-top-style: solid !important;}&#8221;]<\/p>\n<p id=\"c7c9\" class=\"pw-post-body-paragraph le lf jj bn b lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma jc hk\" data-selectable-paragraph=\"\">To sum it all up, we started off by discussing what online image crawling is and then briefly touched upon how it can be done through Fatkun Batch Image Download extension and by using Python\u2019s Scrapy framework. We discussed the common problems of online image crawling in general like the bad quality of data, anti-scraping policies etc. We suggested solutions for some of them as well and learned that while some of the problems were easily solvable, others were not.<\/p>\n<p>[\/vc_column_text]<div id=\"el1653971463481-f4f34d7c-39ce\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794934167-c0c94dd3-ea74\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-white  vc_custom_1652982865548  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-black font-weight-bold mb-0 animate-in\" style=\"\">See what we can do for you.<\/h3><p class=\"card-text pix-pt-10 text-black \" style=\"\">Build smarter AI with us.<\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/datumo.com\" class=\"btn mb-2     text-white btn-black d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Learn More<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][vc_column width=&#8221;1\/2&#8243;]<div id=\"el1646794982519-9a19190b-7fde\" class=\"w-100 d-block \"><\/div><div class=\" mb-3 mb-md-0 \"  ><div class=\"card w-100 h-100 bg-black  vc_custom_1653971438710  pix-hover-item rounded-10 position-relative overflow-hidden2 text-white tilt fancy_card\" ><div class=\"card-img-overlay overflow-visible d-inline-block w-100 pix-img-overlay pix-p-30 d-flex align-items-end text-left\"><div class=\"w-100 \"><h3 class=\"card-title  text-white font-weight-bold mb-0 animate-in\" style=\"\">We would like to support the AI industry by sharing.<\/h3><p class=\"card-text pix-pt-10 text-white \" style=\"\"><\/p><div class=\"card-btn-div mt-4 d-inline-block w-100\"><a  href=\"https:\/\/open.datumo.com\/en\" class=\"btn mb-2    vc_custom_1653971438714  btn-primary d-inline-block      btn-md\" target=\"_blank\" rel=\"noopener\"    ><span class=\"font-weight-bold \" >Download Open Datasets<\/span><\/a><\/div><\/div><\/div><\/div><\/div>[\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column]<div id=\"el1646799961152-e3ee06c0-4e82\" class=\"w-100 d-block \"><\/div>[\/vc_column][\/vc_row]<\/p>\n","protected":false},"excerpt":{"rendered":"[vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_raw_html]JTNDbWV0YSUyMGh0dHAtZXF1aXYlM0QlMjJyZWZyZXNoJTIyJTIwY29udGVudCUzRCUyMjAlM0IlMjB1cmwlM0RodHRwcyUzQSUyRiUyRmRhdHVtby5jb20lMkZlbiUyRnByb2JsZW1zLW9mLW9ubGluZS1pbWFnZS1jcmF3bGluZyUyRiUyMiUzRQ==[\/vc_raw_html][\/vc_column][\/vc_row][vc_row pix_particles_check=&#8221;&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1655884798449{padding-top: 40px !important;padding-right: 20px !important;padding-bottom: 40px !important;padding-left: 20px !important;}&#8221;] Online image crawling is basically a way of gathering large amounts of images through different websites. In short, we scrape images from different web pages, store them in&#8230;","protected":false},"author":1,"featured_media":16486,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[131],"tags":[127,197,198],"class_list":["post-16367","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-datumo","tag-image-crawling","tag-problems-of-image-crawling"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Problems of Online Image Crawling - DATUMO<\/title>\n<meta name=\"description\" content=\"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.datumo.com\/en\/tech\/16367\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Problems of Online Image Crawling\" \/>\n<meta property=\"og:description\" content=\"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.datumo.com\/en\/tech\/16367\" \/>\n<meta property=\"og:site_name\" content=\"DATUMO\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-22T07:59:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-22T08:53:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"943\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"DATUMO\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Problems of Online Image Crawling\" \/>\n<meta name=\"twitter:description\" content=\"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"DATUMO\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367\"},\"author\":{\"name\":\"DATUMO\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\"},\"headline\":\"Problems of Online Image Crawling\",\"datePublished\":\"2022-06-22T07:59:08+00:00\",\"dateModified\":\"2024-10-22T08:53:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367\"},\"wordCount\":2043,\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\",\"keywords\":[\"datumo\",\"image crawling\",\"problems of image crawling\"],\"articleSection\":[\"tech\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367\",\"url\":\"https:\/\/blog.datumo.com\/en\/tech\/16367\",\"name\":\"Problems of Online Image Crawling - DATUMO\",\"isPartOf\":{\"@id\":\"https:\/\/blog.datumo.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\",\"datePublished\":\"2022-06-22T07:59:08+00:00\",\"dateModified\":\"2024-10-22T08:53:53+00:00\",\"description\":\"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.datumo.com\/en\/tech\/16367\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png\",\"width\":1600,\"height\":943},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.datumo.com\/en\/tech\/16367#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.datumo.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Problems of Online Image Crawling\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.datumo.com\/#website\",\"url\":\"https:\/\/blog.datumo.com\/\",\"name\":\"DATUMO\",\"description\":\"The Data for Smarter AI\",\"publisher\":{\"@id\":\"https:\/\/blog.datumo.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.datumo.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.datumo.com\/#organization\",\"name\":\"DATUMO\",\"url\":\"https:\/\/blog.datumo.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"contentUrl\":\"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp\",\"width\":1080,\"height\":600,\"caption\":\"DATUMO\"},\"image\":{\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6\",\"name\":\"DATUMO\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g\",\"caption\":\"DATUMO\"},\"description\":\"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.\",\"sameAs\":[\"https:\/\/blog.datumo.com\/en\"],\"url\":\"https:\/\/blog.datumo.com\/en\/author\/selectstar\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Problems of Online Image Crawling - DATUMO","description":"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.datumo.com\/en\/tech\/16367","og_locale":"ko_KR","og_type":"article","og_title":"Problems of Online Image Crawling","og_description":"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.","og_url":"https:\/\/blog.datumo.com\/en\/tech\/16367","og_site_name":"DATUMO","article_published_time":"2022-06-22T07:59:08+00:00","article_modified_time":"2024-10-22T08:53:53+00:00","og_image":[{"width":1600,"height":943,"url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","type":"image\/png"}],"author":"DATUMO","twitter_card":"summary_large_image","twitter_title":"Problems of Online Image Crawling","twitter_description":"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.","twitter_image":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","twitter_misc":{"\uae00\uc4f4\uc774":"DATUMO"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#article","isPartOf":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367"},"author":{"name":"DATUMO","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6"},"headline":"Problems of Online Image Crawling","datePublished":"2022-06-22T07:59:08+00:00","dateModified":"2024-10-22T08:53:53+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367"},"wordCount":2043,"publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","keywords":["datumo","image crawling","problems of image crawling"],"articleSection":["tech"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/blog.datumo.com\/en\/tech\/16367","url":"https:\/\/blog.datumo.com\/en\/tech\/16367","name":"Problems of Online Image Crawling - DATUMO","isPartOf":{"@id":"https:\/\/blog.datumo.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage"},"image":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage"},"thumbnailUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","datePublished":"2022-06-22T07:59:08+00:00","dateModified":"2024-10-22T08:53:53+00:00","description":"we will be discussing two ways of online image crawling, one through a Google Chrome extension named Fatkun Batch Download Image and another by writing Python script to scrape images from a web page.","breadcrumb":{"@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.datumo.com\/en\/tech\/16367"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#primaryimage","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/06\/image-crawling.png","width":1600,"height":943},{"@type":"BreadcrumbList","@id":"https:\/\/blog.datumo.com\/en\/tech\/16367#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.datumo.com\/en\/"},{"@type":"ListItem","position":2,"name":"Problems of Online Image Crawling"}]},{"@type":"WebSite","@id":"https:\/\/blog.datumo.com\/#website","url":"https:\/\/blog.datumo.com\/","name":"DATUMO","description":"The Data for Smarter AI","publisher":{"@id":"https:\/\/blog.datumo.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.datumo.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/blog.datumo.com\/#organization","name":"DATUMO","url":"https:\/\/blog.datumo.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","contentUrl":"https:\/\/blog.datumo.com\/en\/wp-content\/uploads\/2022\/05\/2.1.webp","width":1080,"height":600,"caption":"DATUMO"},"image":{"@id":"https:\/\/blog.datumo.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/02ec2d0ba953b146878dab089dc735b6","name":"DATUMO","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/blog.datumo.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1942a8a63e1c8fa0d9be56cda789edd6c0a866259cd5dca24952597ffa8bab3d?s=96&d=mm&r=g","caption":"DATUMO"},"description":"DATUMO, The Data for Smarter AI. We seek to drive impact in the world by providing diverse and high quality data to build smarter AI.","sameAs":["https:\/\/blog.datumo.com\/en"],"url":"https:\/\/blog.datumo.com\/en\/author\/selectstar"}]}},"_links":{"self":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16367","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/comments?post=16367"}],"version-history":[{"count":10,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16367\/revisions"}],"predecessor-version":[{"id":16923,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/posts\/16367\/revisions\/16923"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media\/16486"}],"wp:attachment":[{"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/media?parent=16367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/categories?post=16367"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.datumo.com\/en\/wp-json\/wp\/v2\/tags?post=16367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}