Author: xiaoyu

Python Data Science

Python data analyst


Interpreting HTTP secrets in crawlers (Basics)

Interpreting HTTP secrets in crawlers (Advanced)

Python crawler simulation login jingdong mall

The first two articles share the basic concepts and advanced usage of HTTP, as well as the practical content of JD simulated login. This blogger will continue to share with you another interesting aspect of HTTP: OAUTH, which may also be used in the simulated login of crawlers.

The definition of a request


From the definition of BCO:

OAUTH protocol provides a safe, open and simple standard for user resource authorization. Different from previous authorization methods, OAUTH authorization does not allow the third party to touch the user’s account information (such as user name and password), that is, the third party does not need to use the user’s user name and password to apply for the authorization of the user resources, so OAUTH is safe. OAuth is short for Open Authorization.

At present, the latest request to use version 2.0, the specific content is recorded in RFC6749 standard, reference links: tools.ietf.org/html/rfc674… .

The application of the request


A simple but not unfamiliar example.

We usually visit a website or forum, if some personal operations, the web page will pop up to let us log in first prompt. What if I don’t have an account and don’t want to register? We usually click on a small icon from a third party (such as wechat) to log in. Some websites do not even have the function of user registration and rely entirely on third-party websites to log in and obtain user information.

For example, we use our Twitter account to log into segmentFault.

The page is first redirected to the Twitter login screen for login. After we enter our account and password, SegementFault will create a user based on the information obtained from the Twitter account (such as your Twitter avatar, nickname, friends list, etc.). Of course, SegmentFault won’t know your twitter password because we have to keep the user’s login information secure and can’t give it out in plain text. This series of security authorization operations are derived from the use of OAUTH protocol.

In fact, in this process, OAUTH protocol solves some disadvantages of traditional third-party login methods, such as:

  • It avoids direct use in traditional methodsThe user nameandpasswordThe act of performing third-party logins instead throughtokenTo make the login process more secure and reliable.
  • Avoid the traditional method of changing the password will lose all third-party program authorization embarrassment.
  • Avoid any third party program is cracked and leaked user information shortcomings.

It is these opportunities, OAUTH came into being. So, what is the realization of the third party authorization login process? How does this process work? Let’s move on with these questions.

OAUTH implementation of the idea


Through the introduction of the above application, it is not difficult to find that this can be roughly divided into three objects, respectively:

  • Client (SegmentFault above)
  • Third Party (above Weibo)
  • Users (ourselves).

With this in mind, let’s take a look at the general idea of OAUTH authorization.

  • useOAUTHProtocol, the client does not contact the third party login site directly, but first establishes contact through an authorization middle layer (some sites use authorization server and resource server separately, some use together). In thisAuthorized layerUnder, the user password and other security information will not be disclosed to the client, but by feedback a temporary tokentokenTo complete authorization in place of user information.tokenEquivalent to a handfulThe keyAnd different from the user password,tokenTokens are generated by encryption algorithms, which are generally difficult to crack.
  • In addition, the user can specifytokenThe scope of authority and validity of the token to moderate open resources.
  • Once authorized, the client will carry ittokenAnd obtain resource information according to the scope of permission and validity period stipulated by the user.

This is just the general idea, which is basically isolated by an authorization layerThe clientwithThe user information“, and used one on the basis of the authorization layerKey to safetyTo complete authorization on behalf of the user.

OAUTH operation process


Based on this idea, RFC6749 provides four different authorization processes:

  • Authorization Code
  • Simplified patterns (Implicit)
  • Password mode (Resource owner Password Credentials)
  • Client credentials

The microblogs mentioned above are authorized using OAUTH2.0 authorization code mode. Other clients need to register an application on weibo’s open platform and fill in their own information before third-party login. After registration, the open platform issues a client_ID and an APP Secret to clients (such as the SegmentFault mentioned above) for authorization requests.

Below details of the authorization process, weibo that authorization code model, other models can refer to the official document: https://tools.ietf.org/html/rfc6749.

OAUTH detailed implementation process


Here is a detailed flow chart of the OAUTH2.0 protocol:

The blogger takes the above segmentFault third-party login through Weibo as an example to explain the OAUTH authorization process in detail.

< 1 > the first step

First, the user clicks the weibo icon to log in to the third party. Then the page jumps to the Weibo entry interface and waits for the user to enter the account password for authorization.

The login URL is as follows:

https://api.weibo.com/oauth2/authorize?client_id=1742025894&redirect_uri=https%3A%2F%2Fsegmentfault.com%2Fuser%2Foauth%2 Fweibo&scope=follow_app_official_microblogCopy the code

The customer service establishes URI requests by applying/X-www-form-urlencoded format and UTF8 encoding by adding the following parameters to query String.

  • Client_id: application ID number applied by SegmentFault on Weibo open platform (required)
  • Redirect_uri: (optional) THE URL to which to redirect the user after authorization.
  • Scope: User authorization permission scope and validity period (optional)

< 2 > the second step

The page jumps to the redirect_URI address of the previous step and adds an authorization code value at the end of the page. The code value will be exchanged for the token in the following steps.

The forward address is as follows:

https://segmentfault.com/user/oauth/weibo? code=e7ec7daeb7bbf8cb9d622152cd449ae0
Copy the code

Parameter Description:

  • Code: indicates the authorization code, which can be used only once by the client. Otherwise, the authorization server will reject it. This code is mapped to the application ID and redirect URI above.

This also verifies that reponse_type is of type code.

< 3 > the third step

The SegmentFault client uses the authorized code to obtain the key token.

Access token can through the microblogging OAuth2 access_token interface for a POST request is completed, the request link: https://api.weibo.com/oauth2/access_token

Of course, the request also needs to carry the following parameters.

  • client_id: Assigned when applying for an applicationAppKey
  • client_sceret: Assigned when applying for an applicationAppSecret
  • grant_type: Request typeauthorization_code
  • code: callauthorizeTo obtain thecodevalue
  • Redirect_uri: callback address, which must be the same as the callback address in the registered application

< 4 > step 4

Returns the token information obtained in the previous request. An example results in the following:

{
       "access_token": "ACCESS_TOKEN"."expires_in": 1234,
       "remind_in":"798114"."uid":"12341234"
 }
Copy the code

Parameter Description:

  • access_token: Authorized by the userThe onlyTickets, an open interface for calling twitter, are alsoThe third partyThe application verifies the unique ticket of weibo user login. The third-party application should use the ticket to establish a unique projection relationship with the user in its own application to identify the login status. The one in this return value cannot be usedUID fieldTo do login identification.
  • expires_in:access_tokenIn seconds.
  • remind_in:access_tokenLife cycle of.
  • uid: UID of authorized user, this field is only for the convenience of developers, reduced onceuser/showThird party applications cannot use this field as a user login status identification, onlyaccess_tokenIs the only ticket authorized by the user.

< 5 > step 5

Use the token obtained in the previous step to obtain the user name, profile picture, and other information. You can request the following link:

Api.weibo.com/2/users/sho…

The request must carry the token and UID parameters obtained above.

< 6 > step 6

Returns the obtained authorized information, such as the user name and profile picture.

The above is the detailed introduction of the entire Microblog OAUTH authorization process.

Request to summarize


This paper introduces the basic concept of OAUTH, and details the authorization code workflow of OAUTH by taking the third-party login authorization of Weibo as an example.

Reference links:

Open.weibo.com/wiki/OAuth2… Tools.ietf.org/html/rfc674… www.ruanyifeng.com/blog/2014/0…


Follow the official wechat account Python Data Science to obtain 120G artificial intelligence learning materials.