preface

Didn’t you write high quality pictures in your last article? Python crawler bypasses restrictions on one-click search to download creative graphics! In, we have realized the batch download of graphic Worm creative high-definition small images without watermarking without logging in. Although the small picture can be displayed in some mobile terminal may be ok, but on the PC side to show the picture is really too small effect is very general! It is recommended to read this article to view the last article, in the specific implementation do not do too much introduction, only about an analysis of ideas.

Of course, this article may not be particularly technical, but it can be used as a tool for the diagram below.

Environment: python3 + pycharm + requests + re + BeatifulSoup + json

Download a larger version

Of course, for a graphics platform, high quality image downloads may be the core business, and AS I read below, those high quality large images are expensive to download! Therefore, the author did not try to pay to download and check the address of the large image, because it can be guessed that the success rate is very low and the cost is relatively high.

After preliminary analysis of The Tubuqi platform, the following viewpoints are obtained:

It is too expensive to download the original high-quality unwatermarked images. Because we did not pay for the download, we did not find the real address of the original hd unwatermarked images of the high-quality images. There is no way (ability) to download the original HD without watermark. And the author can also guess that this is the core business of a website will certainly layer upon layer set. Not easy to get, so there is no hard pursuit of paid hd high quality watermarked images.
However, high quality display images can be viewed in high definition with watermarks (with bug creative watermarks) during the preview.
There are some free high definition images available on the site. Although theNot a selection chart, butThe quality is ok, too!

Download free hd image

There is a block of pictures in the Graphic Worm creative that is free and open. In the Shared images column. Images can be searched and downloaded.

Stock.tuchong.com/topic?topic… Figure bug creative URL address

Image server domain name
Photo id
Image URL

The image ID is hidden in js
Get the ID and piece together the URL to complete all the image addresses
https://stock.tuchong.com/free/search/?term=
Search content

Download selected images with watermarks

The good pictures are in the preferred pictures column. However, we can get watermarked pictures for free. Click preview after logging in to your account. When you click preview, you can see the image. Each image corresponds to a unique ID, which is available but cumbersome. Can we try to get a simple universal URL?

The above
Url Address Sharing
Batch download images

  ##js parse rule:
  # -
 js=soup.select('script') js=js[4]
 pattern = re.compile(r'window.hits = (\[)(.*)(\])')
 va = pattern.search(str(js)).group(2)# Parse the JS content
 # -- -- -- -- -- -- -
Copy the code

Of course, in terms of illustration or high quality map quality is much higher, if acceptable can be used. The only drawback is the image watermark.

Code and Summary

import  requests
from  urllib import  parse
from bs4 import BeautifulSoup
import re
import json
header = {
    'user-agent': 'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'.'Cookie': 'wluuid=66; '.'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3'.'Accept-encoding': 'gzip, deflate, br'.'Accept-language': 'zh-CN,zh; Q = 0.9 '.'Cache-Control': 'max-age=0'.'connection': 'keep-alive'
    , 'Host': 'stock.tuchong.com'.'Upgrade-Insecure-Requests': '1'
    }
def mkdir(path):
    import os# import module
    path = path.strip()Remove the first space
    path = path.rstrip("\ \") # remove the tail \ sign
    isExists = os.path.exists(path)  # check whether the path exists # True # does not exist False
    if not isExists:  # Judge the result
        os.makedirs(path)Create a directory if it doesn't exist
        return True#print (path + 'created successfully ')
    else:
        If the directory exists, do not create it
        #print(path + 'directory already exists ')
         return False
def downloadimage(imageid,imgname):## Download large images and high quality large images with watermarks
    url = 'https://weiliicimg9.pstatp.com/weili/l/'+str(imageid)+'.webp'
    url2 = 'https://icweiliimg9.pstatp.com/weili/l/'+str(imageid)+'.webp'
    b=False
    r = requests.get(url)
    print(r.status_code)
    if(r.status_code! =200):
        r=requests.get(url2)
    with open(imgname+'.jpg'.'wb') as f:
        f.write(r.content)
        print(imgname+"Download successful")
def getText(text,free):
    texturl = parse.quote(text)
    url="https://stock.tuchong.com/"+free+"search? term="+texturl+"&use=0"
    print(url)
    req=requests.get(url,headers=header)
    soup=BeautifulSoup(req.text,'lxml')
    js=soup.select('script')
    path=' '
    if not free.__eq__(' '):
        js=js[1]
        path='No watermark /'
    else:
        js=js[4]
        path='Bug Idea /'
    print(js)
    pattern = re.compile(r'window.hits = (\[)(.*)(\])')
    va = pattern.search(str(js)).group(2)# Parse the JS content
    print(va)
    va = va.replace('{'.'{').replace('} '.'},)
    print(va)
    va = va.split(', ')
    print(va)
    index = 1
    for data in va:
        try:
             dict = json.loads(data)
             print(dict)
             imgname='img2/'+path+text+'/'+dict['title']+str(index)
             index+=1
             mkdir('img2/'+path+text)
             imgid=dict['imageId']
             downloadimage(imgid,imgname)
        except Exception as e:
            print(e)
if __name__ == '__main__':
    num=input("High quality large image with watermark input 1, ordinary without watermark input 2:")
    num=int(num)
    free=' '
    if num==2:
        free='free/'
    text = input('Enter keywords :')
    getText(text,free)
Copy the code

In this way, the whole process is completed, for the directory, I also have a watermark on the map worm and no watermark to distinguish, for everyone to use. In terms of usage, first input 1 or 2(1 represents watermarked high-quality image, 2 represents shared image), and then you can download it in batches by entering keywords.

bigsai
Learn together and make progress together

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python crawlers download large high-resolution images in bulk

preface

Download free hd image

Download selected images with watermarks

Code and Summary

Python crawlers download large high-resolution images in bulk

preface

Download free hd image

Download selected images with watermarks

Code and Summary

Related Posts

Real time communication service development practice based on WebRTC technology

Seven principles of Object-oriented Design

Those holes in Go