In this post we continue to learn about coding, message digest algorithms, and encryption algorithms that are often used in development. As a developer, mastering this knowledge can give us a richer collocation when designing anti-crawler. As crawler engineers, this knowledge allows us to find a breakthrough faster when faced with strange strings. After learning and mastering JS encryption and reverse, we can deal with the following crawler problems:

(1) simulated login password encryption and other request parameters encryption processing (2) dynamic loading and encrypted data capture and cracking

PS: If you need Python learning materials, please click on the link below to obtain them

Free Python learning materials and group communication solutions click to join

1. Background

Wechat public platform, short for official account. It was named official platform, media platform, wechat public number, and finally positioned as the public platform, which undoubtedly let us see a wechat greater expectations for the follow-up.

Use public accounts platform from media activities, is simply a one-to-many media sex activities, such as businesses through the application number WeChat public service through secondary development show business micro website, member of the micro, micro push, micropayments, weibo event, micro registration, micro sharing, micro card, etc., have formed a kind of mainstream online WeChat interactive marketing. In this paper, crawlers are used to simulate the login of wechat public platform.

2. The analysis

Click here to enter the homepage of wechat public platform, and then click the mouse to log in using the account, as shown in the picture below:

Then enter the account and password in the account and password input box, as shown below:

Use a keyboard shortcut or right mouse click and select Inspect to open the Browser developer tools (in this case Google Chrome), then select the Network option in the top navigation bar, and since logins are usually made using Ajax requests, then click XHR below. Once all this is done, we can click the login button on the page and listen for the requested URL, as shown in the figure below.

We searched for PWD globally and found that PWD was contained in one CSS file and one JS file respectively. After analysis and judgment, the rules generated by PWD were most likely to be in JS file, as shown in the figure below:

After entering the js source file, press Pretty-print or {} in the lower left corner to format the code, then CTRL + F to search PWD keyword again, as shown below:

If the function cannot be located after left mouse click or CTRL + left mouse click, we can enter by breakpoint, as shown in the picture below:

Here we can locate the password after dealing with the format of the return value, observed that there are several different functions in the return value, we can use the mouse click on each function orientation to the function definition of position, in general, use the function, will be in the same scope, which contains the function return value in the distance on a layer of {}. After finding all the code, copy it to the JS debugging tool. The blogger uses the clockwork JS debugging tool V1.9 here, and readers can search and download it on the Internet by themselves, as shown in the picture:

After the loading is successful, we call the function and input the password on the web page to see if the corresponding encrypted string can be generated, as shown in the picture below:

After comparison with the web page, the two are exactly the same. At this point we have fully found the js encryption related functions.

3. Code implementation

In the above process, we found all the functions about JS encryption, then we have two ways to deal with:

(1) Use The syntax rules of Python to rewrite all JS encryption functions. This method requires the coder to have a good enough foundation of Python, so it is difficult and not recommended. (2) Use a third party module in Python to execute the JS code we found. Install first:

PIP install PyExecJS # readers can add their own image sourceCopy the code

It is recommended that you install the Node.js environment before using this module. The learning url for this module is as follows:

https://pypi.org/project/PyExecJS/
Copy the code

Here, the blogger firstly puts the JS code related to password generation found in a SEPARATE JS file, then reads the JS code in the file through the method of Reading the file in Python, and then calls the method in the installed third-party module for execution. As shown below:

Example code is as follows:

# -*- coding: Utf-8 -*- "" @author:AmoXiang @file:wechat. Py @time:2020/12/01 "" import execjs # node = execjs.get() # instantiate an object CTX = execjs.compile(open("./getPwd.js", "r", FunName =" getPwd({})".format("135790") PWD = ctx.eval(funName) print(PWD) # d3786ec2413a8cd9413bfcb24be95a73Copy the code

The running results of the program are as follows:

4. To summarize

  1. Js debugging tool – Clockwork JS debugging tool

  2. PyExecJs — Implementation of javascript code using Python installation:

  3. Nodejs development environment

  4. pip install PyExecJS

  5. Js algorithm rewriting exploration – break point (code debugging, if found missing variables, generally defined as an empty dictionary)

  6. Case combat (a) : wechat public platform JS reverse rewrite

So far, this is the end of today’s case. I hereby declare that I only wrote this article for the purpose of learning and communicating, and to save time for more readers who are learning the basics of Python. It is not used for other purposes. Thank you for reading this blog post, and I hope it will be a guide on your programming path. Enjoy your reading!