This article is for academic exchange only. Do not use it for commercial purposes or improper conduct

If it violates your privacy or rights, please contact me to delete it immediately

AST is an abstract syntax tree, the name feels very advanced, in fact, do not fear, can be simply interpreted as the JS code after the classification of JSON, and provides many methods for you to add, delete, change and check this JSON.


When you learn something new, you first have to figure out what it’s for. You don’t even know what it’s for. What do you learn it for


As a crawler siege lion, JS reverse is also common, JS reverse often encounter a variety of confused codes, extremely difficult to read, at this time, we can use AST to restore these confused codes to a certain degree, get a relatively easy to read JS code, convenient for us to conduct JS analysis.


Reading this article requires a little AST foundation, do not be afraid, just need to learn a little AST knowledge, do not need to master all the knowledge of AST to understand, but at least to have a little understanding of AST positioning nodes, about AST learning we must look for Boss CAI ah


If you read through this article and understand the concepts, you’ll get a feel for how the AST restores obfuscated code, and realize that the AST isn’t that hard after all.


The anti-obfuscation is the following file

https://static.geetest.com/static/js/fullpage.8.9.5.jsCopy the code

Save it locally and shrink the code to see the overall structure

Take a quick look at these functions

Well, I can’t read anything…


Don’t panic, calm down, bit by bit analysis, bit by bit to solve!


Looking at the code, ajgjj.dai was called in many places and 515 matches were found, as shown below

There are two types of calls in the match result, as shown in the red box above

First: ajgjj.dai (79) with parentheses; I don’t know what I’m doing;

The second, ajgjj. DAi without parentheses, can be understood as

mZtVWz = ['qhicV'].concat(AJgjJ.DAi)Copy the code

Concat () is used to merge arrays

So ajgjj. DAi should be an array

Console print and see

Sure enough, like an array, the value can be retrieved by index, the return value is a string.


The question is: how do I restore ajgjj.dai () to the corresponding string in the code?

It’s easy to see how you can do this by matching the string “ajgjj.dai ()” one by one in your javascript code, taking the index numbers in parentheses, passing the numbers into the function ajgjj.dai () to compute the result, and then replacing the result with the string” ajgjj.dai ()” in your code.


Let’s copy the code to the online AST parsing site and see what it looks like in the AST

https://astexplorer.net/Copy the code



It can be seen that all ajGjj. DAi are in the CallExpression node, so we first iterate through the CallExpression node, and then locate the name attribute to get the value and judge whether it is DAi. If it is, then locate the value attribute to get the value. The value obtained is then passed into the function ajgjj.dai () for calculation, and the result is replaced by the AST operation.


To pass a value into ajgjj.dai (), we first need to pull the ajgjj.dai code from JS and put it into Node for calculation.

Code implementation:

Effect after replacement:

As you can see, the parenthesized ajgjj.dai () has eight original values, which have all been restored after processing

There are 519 more ajgjj.dai without parentheses

Next, let’s deal with these ajgjj.dai without parentheses

Analyze it before dealing with it

Look at the image above:

nxWF = AJgjJ.DAi;Copy the code

Concat is a merge array, so

mZtVWz = ['qhicV'].concat(nxWF)Copy the code

In fact is equal to

mZtVWz = ['qhicV',AJgjJ.DAi]Copy the code

while

oeXg = mZtVWz[1];Copy the code

Is equal to

oeXg = AJgjJ.DAi;Copy the code

All the red box nxWF and green box oeXg in the figure are actually equal to ajgjj.dai


Here our processing idea is: all the names of these two positions are iterated, put into an array, and then iterated to determine whether all names like oeXg(12) are in the array, if so, replace



Code implementation:

Look at the effect

We can see that the functions in the array nameArray have been restored, but there is still some redundant code. How can we remove them


Continue to analyze AST structure

We can see that the three groups on the left correspond to the three nodes on the right. We just need to locate the three nodes on the right and delete them

The first node is positioned by checking whether the name under VariableDeclaraction’s property is DAi

Code implementation:

Reduction effect:

It looks a little bit clearer


Further analysis reveals that there are many Unicode codes in the code

In the AST structure, you can see that the Unicode code is in axtra, so we just need to delete axtra and restore unicode



Code implementation:

Effect:

After processing, you will find that there are still a large number of Unicode encoded strings

These are Chinese Unicode, and the solution is to add the second parameter to the generator

opts = {jsescOption:{"minimal":true}}Copy the code

After adding it, undo it again

You can see that The Chinese Unicode encoding is easily restored

The remaining 37 matches, all of which are in the re, don’t matter and don’t need to be processed


Also found in the code is some code for eval

After being restored, eval is JS code in string form

So before you de-obfuscate, you can go to your browser and execute eval, get the values and format them and replace them, so that when you restore, you’re going to be able to restore all of the string JS code as well, so I won’t go into that.


Continue to analyze, careful you must be able to see the extreme JS code has a lot of this structure of the code, very affect the code reading

This structure of code is commonly known as control flow flattening, which simply means that the relationship between code blocks is broken, and a distributor controls the jump of code blocks, as shown in the figure below


I’ll start with a simple example just to make it clear

The switch is the flow controller, and it iterates through the ARR array because each case has a continue in it. That is to say, each time the Switch fetches a value from the ARR, it executes all the case statements, disrupting the flow of the program and making it more difficult to analyze the code. Wait until all cases are executed before fetching the next value from the ARR array. The loop does not exit until the ARR is iterated.


So the ARR array is the main execution order of the program, we just need to traverse it, and then execute the corresponding case block of code, can achieve the purpose of simplifying the process.



The extreme test is a bit different from the above example. In each case, it reassigns a value that changes the judgment condition of the control flow. The dispenser uses this judgment condition to perform process control, so it is important to be able to know which case statement will be executed next.


The idea of control flow restoration is to first get the initial value of process control and the conditional judgment value of the for loop, then extract all cases, then traverse each case, calculate the conditional judgment value of the case, compare the conditional judgment value with the initial value, if the same, delete useless statements. Store the current case block in an array that is used to store the case block, update and modify the conditional values, and finally replace the array of case blocks with ForStatement, depending on the code


Code implementation reference public number full text bar: JS reverse: AST reduction extreme confusion JS combat


Take a look at the corresponding JS and AST structure of the code implementation

VariableDeclaration var PrevSibling = path.getPrevsibling (); VariableDeclaration var PrevSibling = path.getPrevsibling ();Copy the code

The current node is the ForStatement, and its last node is the VariableDeclaration, which is the statement above the for statement, and is the initial value of the control.

Var argNode = PrevSibling. Container [0].declarations[0].init; var init_arg_f = argNode.object.property.value; var init_arg_s = argNode.property.value; var init_arg = AJgjJ.EMf()[init_arg_f][init_arg_s];Copy the code

/ / extractionforNodes in theifJudgment parameters value as the judgment of the var break_arg_f = node. Test. Right. The object. The property. The value; var break_arg_s = node.test.right.property.value; var break_arg = AJgjJ.EMf()[break_arg_f][break_arg_s];Copy the code


// Extract and calculatecaseAfter the condition of judging the value of the var case_arg_f = case_list [I]. The test object. The property. The value; var case_arg_s = case_list[i].test.property.value; var case_init = AJgjJ.EMf()[case_arg_f][case_arg_s];Copy the code

/ / the currentcaseKeys var targetBody = case_list[I]. Keys;Copy the code


/ / extractionbreakNodes in a node AJgjJ. EMf () at the back of the two indexes value var change_arg_f = targetBody [targetBody. Length - 2] expression. Right. The object. The property. The value;  var change_arg_s = targetBody[targetBody.length - 2].expression.right.property.value; // Change the initial value of the control flow init_arg = ajgjj.emf ()[change_arg_f][change_arg_s];Copy the code

targetBody.pop(); / / delete breaktargetBody. Pop (); / / deletebreakThe last node of a nodeCopy the code

resultBody = resultBody.concat(targetBody);Copy the code


Compare the above code with the AST structure to explore, annotations are more detailed


After the above processing, you have a relatively easy to read JS code, so that the analysis of the reverse will be a little easier


The effect after reduction


Today’s AST reduction polar confusion code combat is all completed here, if you have a better idea, welcome to share with me.


The above content is only a summary of personal learning records, ideas mainly refer to the public number: do some reverse 778, share out the hope to help you, if there are mistakes or omissions, please forgive me


Public number: life to the wind