Crawler scripts are often used, so you can’t avoid going to ****. \

Python typically requires the Request library, supplemented by the POST element in the header, and some with hidden parameters that can be found through browser F12 or element review, which is a pit for beginners.

There is also the need to solve the problem of captcha, one way is to download captcha image recognition captcha again post, or use a cloud coding platform. Of course, some captcha and its perversion is not so easy to solve, such as the selection of word order, slider, 12306 kind of people will choose the wrong.

Awesome python-login-model is a GitHub project that uses python to login to mainstream platforms. It has 24 major platforms and is currently listed at 11.8k on GitHub.

Github link: github.com/Kr1s77/awes…

The completion of the mainstream website

Here are some of the major sites that the author has done, including selenium login, direct simulation login through package capture, and scrapy framework.

This is easy to understand because some web sites have complex designs that make simulated login difficult to implement through packet capture, so selenium+ WebDriver is relatively easy to use.

Although Selenium is used at login time, for efficiency, we can maintain cookies after login and call requests or scrapy for data collection, thus ensuring the speed of data collection. * * * *

Simulation login GitHub

Here is a code that simulates logging into GitHub. \

"""Making the second mode of the login info: author: CriseLYJ github:https://github.com/CriseLYJ/ update_time: 2019-3-7"""

import re
import requests
from lxml importEtree class Login(object): class GithubLogin(object): def __init__(self, email, password): self. Headers = {'User-Agent''the Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'.'Referer''https://github.com/'.'Host''github.com'
        }

        self.session = requests.Session()
        self.login_url = 'https://github.com/login'
        self.post_url = 'https://github.com/session'
        self.session = requests.Session()

        self.email = email
        self.password = password
Copy the code
Def login_GitHub(self): post_data = {'commit''Sign in'.'utf8''✓'.'authenticity_token': self.get_token(),
            'login': self.email,
            'password': self.password
        }
        resp = self.session.post(
            self.post_url, data=post_data, headers=self.headers)

        print('StatusCode:', resp.status_code)
        ifresp.status_code ! =200:
            print('Login Fail')
        match = re.search(r'"user-login" content="(.*?) "', resp.text)
        user_name = match.group(1)
        print('UserName:', user_name)

        response = self.session.post(self.post_url, data=post_data, headers=self.headers)

        print(response.status_code)
        print(post_data)

        if response.status_code == 200:
            print("Login successful!")
        else:
            print("Login failed!")
Copy the code
Def get_token(self): response = self.session.get(self.login_url, headers=self.headers) html = etree.HTML(response.content.decode()) token = html.xpath('//input[@name="authenticity_token"]/@value') [0]

        return token
        ifresponse.status_code ! =200:
            print('Get token fail')
            return None
        match = re.search(
            r'name="authenticity_token" value="(.*?) "', response.text)
        if not match:
            print('Get Token Fail')
            return None
        return match.group(1)
Copy the code
if __name__ == '__main__':
    email = input('Please enter your account number:')
    password = input('Please enter your password:')
    email = input('Account:')
    password = input('Password:')

    login = Login(email, password)
    login = GithubLogin(email, password)
    login.login_GitHub()
Copy the code

I believe that this is a good tutorial for beginners to learn reptiles.

But as a reminder, the simulated login code may be invalid at any time, because the front-end webpage HTML, CSS, JS and other structures may change according to the business adjustment of the company.

So, the key is to master all kinds of skills, learn these can debug complete login, then you can become contributor!

Github link: github.com/Kr1s77/awes…

Recommended reading:

Use Python for system clustering analysis

Use Python for correlation analysis of data

How to add comments and inline diagrams to Matplotlib

How to make gEvent crawler 100% faster with one line of code

▼ clickBecome a community member and click on itIn the see