A small way to optimize JS procedures

F(X) Team of Ali Tao Department – Zhen Zi

Just like you need to read before you learn, I want to introduce my understanding of programming languages before I introduce you to optimizing JavaScript code. The story begins with a mechanical rat called Theseus and its inventor, Claude Shannon.

In the biography A Mind at Play: In How Claude Shannon Invented the Information Age, authors Jimmy Soni and Rob Goodman have a strong desire to present Shannon’s work Theseus to a wide audience. Faced with the complex maze, Theseus only uses a bunch of relays, ROM storage and other simple and ancient electronic components to complete the exploration of the complex maze and the memory of the successful route, the second time along the correct path out of the maze, Theseus did not make any mistakes. Most people dismiss them as gimmicks and gimmicks.

In the eyes of a few smart people, the amazing wisdom contained in Theseus is comparable to that of Newton and Einstein. Shannon alone introduced Boolean algebra into electronic circuit design and inspired the invention of later digital circuits and even computers.

The relationship between digital circuits and programs

Figure 4-44 Intelligent dimming and toning circuit I made in 2011

As shown in Figure 4-44, a digital circuit inspired by Shannon, the ancient electronic components of Theseus have become integrated circuits with red boxes on the right. By combining two capacitors and a crystal oscillator to form an oscillating circuit in the red box on the left, the integrated circuit is provided with a clock. There is a digital-analog conversion circuit on the integrated circuit, which converts the results of Boolean algebra operation circuit represented by high and low levels into analog signal output. The LED module with adjustable light and color can produce different power output: color, frequency output: brightness. Inspired and helped by Shannon, the circuit in Figure 4-44 implements a complete program function: constantly changing the color and brightness of the LED at any time.

If the complete procedure shown in Figure 4-44 is directly referred to the IC manual, write the corresponding instructions and data to the address of the register operation according to the pin definition and manual in Figure 4-45.

Figure 4-45 IC manual information used in my circuit (excerpt from ATMEL website)

As you can see from this example, inspired and helped by Shannon, the greatest benefit of digital circuits is the abstraction of the circuit, so that the program logic and digital circuit at this level of abstraction. At this new level of unified abstraction, the control of program logic can be analogous to control in logic circuits, and the input and output of programs can be analogous to storage (ROM, RAM) in digital circuits. The clock circuit provides a scale for signal transmission in the digital circuit, and the CPU controls signal transmission and flow in the digital circuit according to these scales. This transmission and flow turns point-like control into control flow and point-like storage into data flow. Therefore, the most important things in a program are control flow and data flow.

In order to inject control instructions and data into the digital circuit without having to refer to the manual every time and burn ROM with pins, predecessors invented a set of burn system, which defined and described the control flow and data flow in assembly or C language, and then translated by the compiler into register addresses, instructions and data corresponding to Figure 4-45. Then through the burner into a digital signal through the integrated circuit pin transmission to the integrated circuit to complete the program write. This system allows us to get rid of manuals (not entirely, of course; we still look them up sometimes, but much less often) and describe programs directly in language, and debug and simulate them through simulators (similar to front-end MOCK data), making it surprisingly easy to program digital circuits.

Figure 4-46 Programming a digital circuit (from ATMEL website)

Figure 4-46 This way to control the digital circuit is much easier, you can buy an Arduino development board on Taobao, and then follow the following code to try to understand what the program is essentially? What is a digital circuit? What is the nature of computing?

unsigned long colorT[] = {  0xff3300.0xff3800.0xff4500.0xff4700.0xff5200.0xff5300.0xff5d00.0xff5d00.0xff6600.0xff6500.0xff6f00.0xff6d00.0xff7600.0xff7300.0xff7c00.0xff7900.0xff8200.0xff7e00.0xff8700.0xff8300, you can continue to add}int R_Pin = 11;
int G_Pin = 10;
int B_Pin = 9;
// Here are the pins of integrated circuit output signal corresponding to the connection mode of LED module in the manual
int red,green,blue = 0;
int i = 0;
int l = sizeof(colorT);
void setup(a){
  pinMode(12, OUTPUT);
  pinMode(R_Pin, OUTPUT);
  pinMode(G_Pin, OUTPUT);
  pinMode(B_Pin, OUTPUT);
  digitalWrite(12, LOW);
}
void setColor(int redValue, int greenValue, int blueValue){
  analogWrite(R_Pin, redValue);
  analogWrite(G_Pin, greenValue);
  analogWrite(B_Pin, blueValue);
}
void  loop(a){
  red = (colorT[i] >> 16) & 0xff;
  green = (colorT[i] >> 8) & 0xff;
  blue = (colorT[i] >> 0) & 0xff;
  setColor(red, green, blue);
  i++;
  if(i >= l){
    i = 0;
  }
  delay(200); // Control clock signal
}
Copy the code

With that in mind, let’s take a look at JavaScript control flow and data flow. As mentioned earlier, raw code text (strings) is processed by Parser. The most common purpose is to generate “abstract syntax trees” (AST). Of course, the goal with D2C Schema and DesignToken was to export complete HTML documents with correct, inline CSS.

Abstract syntax tree AST

We convert JavaScript code text to AST because the compiler can’t directly manipulate program text made up of strings, The compiler can only understand the program text by changing it from “1+2” to the form new BinaryExpression(ADD, new Number(1), new Number(2)). How’s that？ Is it very similar to ATMEL integrated circuit programming? Operation ADD and data new Number(1). Thus, the process from the program text to the AST was analogous to the parsing process, which was called parsing. With this in mind, we can play around with program text, which is often referred to as “code,” and turn it into an AST. Recommend esprima, a tool that iterates through AST nodes and does some tinkering. , and finally convert the modified AST into code text using EscodeGen.

// Generate the AST abstract syntax tree
const esprima = require('esprima');
const AST = esprima.parseScript(jsCode);
// Iterate over and modify the AST
const estraverse = require('estraverse');
const escodegen = require('escodegen');
function toEqual(node){
  if(node.operator === '= ='){
    node.operator = '= = ='; }}function walkIn(ast){
  estraverse.traverse(ast, {
    enter: (node) = >{ toEqual(node); }}); }// Code generation from the AST
const escodegen = require('escodegen');
const code = escodegen.generate(ast);
Copy the code

With that in mind, let’s practice with a piece of real code.

acc = 0;
i = 0;
len = loadArrayLength(arr);
loop {
  if (i >= tmp)
    break;

  acc += load(arr, i);
  i += 1;
}
Copy the code

After converting this code into AST using the Parser provided by Esprima, I used GraphViz to convert the AST from JSON format to digraph format. Gv file, and then generated Figure 4-47.

Figure 4-47 Visualizing the AST

Data flow diagram DFG

Figure 4-47 is a tree, so it can be easily traversed, generating the corresponding machine code when we visit the AST node. The problem with this approach is that information about variables is sparse and scattered across different tree nodes. To optimize and safely move length lookups out of the loop, we need to know that the array length does not change between iterations of the loop. Humans can easily do this by looking at the source code, but the compiler needs to do a lot of work to confidently extract these facts directly from the AST. As with many other compiler problems, this is usually solved by elevating the data to a more appropriate abstraction level, called intermediate representation (IR). In this particular case, the choice of IR is called a data Flow graph (DFG). Instead of talking about syntactic entities (for loop, expressions,…) , we should talk about the data itself (read, variable values) and how it changes in the program.

In our particular example, the data we are interested in is the value of the variable ARr. We want to be able to easily observe all uses of it to verify that there are no out-of-bounds access or any other changes that modify the length of the array, which is a prerequisite for our optimization. This is done by introducing “use” (definition and use) relationships between different data values. Specifically, this means that the value has been declared once (_ node _ in Figure 4-47), and it has been used to create a new value (_ edge _ in Figure 4-47). Obviously, concatenating different values produces the data flow diagram shown in Figure 4-48.

Figure 4-48 Data flow diagram

Note the red array box in Data Flow Figure 4-48, and the solid arrow away from it indicates the use of this value. By iterating over these edges, the compiler can export array values for:

loadArrayLength
checkIndex
load

Such graphs are constructed by explicitly “cloning” array nodes if the values of the array nodes are accessed in a destructive manner (that is, storage, length and size). Whenever we see an array node and observe its purpose, we always make sure that its value does not change. This may sound complicated but easy to implement, the data flow diagram follows the single static allocation (SSA) rule. In short, to convert any program to SSA, the compiler needs to rename all assignments and subsequent uses of variables to ensure that each variable is assigned only once.

For example, before SSA:

var a = 1;
console.log(a);
a = 2;
console.log(a);
Copy the code

After the SSA:

var a0 = 1;
console.log(a0);
var a1 = 2;
console.log(a1);
Copy the code

We know from SSA that when we talk about A0 we’re actually talking about its individual tasks.

Control flow diagram CFG

Using data flow analysis to extract information from the program allows us to make security assumptions about how to optimize it. This data-flow representation is useful in many cases, the only problem is that by converting code into data-flow diagrams, this intermediate representation is less suitable for generating machine code in the chain of representations (from source to machine code) than the AST. Since the program logic is a sequential list of instructions that the CPU executes one after another, the data flow diagram does not seem to convey this. Typically, this problem is solved by grouping graph nodes into blocks, a representation called a control flow diagram (CFG).

b0 {
  i0 = literal 0
  i1 = literal 0

  i3 = array
  i4 = jump ^b0
}
b0 -> b1

b1 {
  i5 = ssa:phi ^b1 i0, i12
  i6 = ssa:phi ^i5, i1, i14

  i7 = loadArrayLength i3
  i8 = cmp "<", i6, i7
  i9 = if ^i6, i8
}
b1 -> b2, b3
b2 {
  i10 = checkIndex ^b2, i3, i6
  i11 = load ^i10, i3, i6
  i12 = add i5, i11
  i13 = literal 1
  i14 = add i6, i13
  i15 = jump ^b2
}
b2 -> b1

b3 {
  i16 = exit ^b3
}
Copy the code

As shown in Figure 4-49, we can program it as a control flow diagram in the same way as before.

Figure 4-49 Control flow diagram

As you can see: the code precedes the loop in block B0, the loop header in b1, the loop test in B2, the loop body in B3, and the exit node in B4. It is easy to translate machine code from this example, replacing the iXX with the CPU register name (like the register address found earlier in the ATMEL manual) and generating machine code line by line for each instruction.

CFG has data flow relationships and sequencing, which allows us to use it for data flow analysis and machine code generation. However, trying to optimize a CFG by manipulating the blocks it contains and their contents becomes complex and error-prone. Instead, Clifford Click and Keith D Cooper propose an approach called “node sea” to eliminate the problems of CFGS and complex data flow diagrams.

Node Sea

Remember fancy data flow diagrams with dotted lines? These dotted lines are actually what makes the map a nodal chart. Rather than group and sort the nodes, we choose to declare the control dependencies as dotted edges in the graph. If we delete everything that doesn’t break the line and group things slightly, we get the nodal chart shown in Figure 4-50.

Figure 4-50 Node Sea

Figure 4-50 Node Sea is a very powerful way to view code, with all the information of a general data flow diagram that can be easily changed for optimization without constantly removing/replacing nodes in blocks. The node chart is usually modified by graph reduction, we simply queue all the nodes in the chart, call our function for each node in the queue, and everything involved in this function (changes, substitutions) is put into another queue, which is later passed to the optimization function. If you have many optimization points, for example: merge/reduce network requests, merge/reduce JSBridge calls, merge/reduce local storage API calls, etc., you can stack them on top of each other and apply them on each node in the queue, or one by one if they depend on each other’s final state.

Tao department front – F-X-team opened a weibo! (Visible after microblog recording)

In addition to the article there is more team content to unlock 🔓

A small way to optimize JS procedures

The relationship between digital circuits and programs

Abstract syntax tree AST

Data flow diagram DFG

Control flow diagram CFG

Node Sea

Related Posts

More than 3000 people pay attention to Zhihu! 27W + reading software testing advanced problems: what level of functional testing can be transferred to automation?

Read this NSDate

Wechat applet into the page does not pop up user authorization? What did the authorities say