Crawl into the immediate collection

Because at ordinary times can collect some small yellow map, but the instant login mechanism is more laborious to me, very not easy to find an article on the net, but it also looks more laborious, probably know how to return a responsibility later, oneself masturbate a set

The instant mechanism is that the user scans the code to enter the site and gets an access_token, then a few minutes later gets a refresh_token, and then when they log in later, an interface is called and refresh_token is sent to the new Access_token that has been refreshed in the background. The first refresh_token needs to be copied in the browser. The code is as follows:

# send the refresh_token passed to the background to fetch
def refresh_token(refresh_token):
    user_agent = getUserAgent()Create a user_agent
    url = "https://app.jike.ruguoapp.com/app_auth_tokens.refresh"
    headers = {"Origin":"https://web.okjike.com"."Referer":"https://web.okjike.com/collection"."User-Agent":user_agent}
    headers["x-jike-refresh-token"] = str(refresh_token)
    r = requests.get(url,headers= headers)
    # print(r.text)
    content = r.text
    return content
Copy the code

Constructing user_Agent ST is my own utility class

from tools import Tools as tl
from tools import Settings as st

def getUserAgent():
    agentList = st.user_agent_list
    random_num = random.randint(1,len(agentList))
    user_agent = agentList[random_num-1]
    return user_agent
Copy the code

Once you get the latest token, you can do whatever you want

if __name__ == '__main__':      
    access_token =  refresh_token('eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJkYXRhIjoibjV0dVlqcVMrV0VVSDJKYTMwY0JYOTNcL1p4RTlqRExGTW1PZGRXcU9iaWZqOEZ3M3RrNj NNXC81enJsTUQ5ajNVMFVJRHZSNjlzYmhOWTBDejlQTXdXalwvSzBUcHRpRXJFMFZnXC9NSFVOYjFHaDVGajFzSEVKWm42TzR5aUk3XC9IaklrUENNeHNsSX RmNm1nVWdTUGZBbG1jZkNkdUdsblwvTGRvVGQ1UFJjQ3FNPSIsInYiOjMsIml2IjoiTWFQdTlpRUJqbUVcLzlIZURGdVVhZUE9PSIsImlhdCI6MTU1MzQwNz gwMS43NDl9.jWG-7-dUjZqSrgMJVnj1pIf52tqoSMHav_mop0_aABI')
    dic =  json.loads(access_token)
    startSpider('https://app.jike.ruguoapp.com/1.0/users/collections/list',dic['x-jike-access-token'])
Copy the code

Reptiles run!

loadMoreKey = NoneThis global variable is used to run paging, and is passed for immediate paging

def startSpider(url,access_token):
    user_agent = getUserAgent()
    headers = {"Accept":"application/json"."App-Version":"5.3.0"."Content-Type":"application/json"."Origin":"https://web.okjike.com"."platform":"web"."Referer":"https://web.okjike.com/collection"."User-Agent":user_agent}
    headers["x-jike-access-token"] = access_token
    tl.UsingHeaders = headers# this is used to save the request header. It is used to keep the request header consistent when downloading
    data = {'limit': 20.'loadMoreKey':loadMoreKey}

    response = requests.post(url,headers= headers, data= json.dumps(data))
    response.enconding = "ascii"
    print(response.status_code)
    data = json.loads(response.content.decode("utf-8"))
    global loadMoreKey
    loadMoreKey = data['loadMoreKey']
    data_list = data['data']
    for dic in data_list:
        pictures = dic['pictures']
        for picDic in pictures:
            picurl = picDic['picUrl']
            tl.downLoadFile(picurl)
    print('------ End 20 records ------')
    startSpider(url,access_token)
Copy the code

Related Posts

InnoDB, 5 best practices, know why?

Ten Frequently asked Questions about Mac Big SUR and How to Solve them!

How to manage a large Kubernetes cluster efficiently and reliably?