Why write this article? (Mainly because some people in the QQ group are simulating login of Zhihu, but it has not been successful.) Then I caught the bag and found that the login page of Zhihu has been revised, and the difficulty has been greatly increased.

Start caught

First of all, open the home page of Zhihu again, and then enter your account password to log in (remember to enter the wrong password).

So we can see the request header (below)

We noticed that there were several request headers that were different from the normal ones (red box)

  1. authorizationFeel should be js generated, see later
  2. Content-TypeBoundary = XXX boundary= XXX
  3. cookieNote that the cookie is not empty before login, indicating that there must be a set-cookie operation before login
  4. x-udid,x-sxrftokenBoth of these are validation parameters, which can be found in the web source code

Look again at the request parameters

You can see that the parameter is payload

It’s the first time I’ve seen someone like this

This should be combined with the request

Content-Type:

multipart/form-data; boundary=----WebKitFormBoundary2KNsyxgtG28t93VF
Copy the code

To watch

Multipart /form-data is a form submission method and boundary= XXX is a form division method. Look at a simple example to see why

— — — — — – WebKitFormBoundary2KNsyxgtG28t93VF is the division of different parameters, so you can directly from him (this is decided by the boundary behind the content-type of the above, feel free to modify)

After the remove the line, the above is equivalent to client_id = c3cef7c66a1843f8b3a9e6a1e3160e20,

grant_type=password.

So this payload is pretty easy to understand.

So let’s see what the parameters are

There are a lot of parameters. You can see that many parameters are fixed, such as account number, password, timestamp, etc

There are two changes in client_id,signature

Start looking for parameters

Authorization:

In Chrome, we can directly press CTRL + Shift + F (global search, js search, CSS search, etc.), we can see that the search has been found, and it is directly written in JS. Then we can change the account randomly and capture the package again, and find that the authorization value is still fixed. So it shows that the authorization is directly written in JS, not dynamically modified (then we have found the value of authorization).

Cookies:

Before login, we found that the value of cookie is not empty, indicating that there must be set-cookie operation after opening the web page. If we want to verify, we should first open a non-trace browser (mainly to empty all the previous cookies to avoid interference), and then open zhihu.com. We found him doing a couple of set-cookies

So if we want to emulate, the easy way is to use requests. Session directly

x-udid,x-sxrftoken:

Generally this validation parameters will be in the web source code, so directly look at the web source code

You can see that it’s already there, and the next thing is how do you find it, you can use the regex, you can use xpath to locate it, right

client_id:

You will notice that client_ID is exactly the same as authorization above

signature:

Again, CTRL + Shift + F global search

Found found, but the parameter is js generated dynamically…

Basically figure out how to encrypt it, and then use Python to simulate it, okay

Step 1: Download the JS and format it (to make the code look nice)

Step 2: Replace the js with a string and use the js you just formatted

Step 3: Debug slowly… Until we figure out how to generate…

This is the general procedure

But if your JS is as bad as mine, you can just find the encrypted js and Python will execute it…

Up here, we’ve found all the parameters we need to find, and then we just simulate sending

Please pay attention to the code of wechat public account [Python crawler share], send “Zhihu login code” to see ~~~