How will V8’s TurboFan affect the way we optimize our code

2017-07-27

GET READY: A NEW V8 IS COMING, Node.js PERFORMANCE IS CHANGING

This article first appeared in Zhihu column: Big front-end engineer, you can open [read the original article] to comment.

The article was reviewed by Franziska Hinkelmann and Benedikt Meurer from the V8 team.

Since day one, Node.js has relied on the V8 JavaScript engine to execute our JavaScript code. The V8 engine is a JavaScript virtual machine built by Google and used in its Chrome browser. From the beginning, V8’s main goal was to make JavaScript perform faster, or at least better than the competition. For JavaScript, a highly dynamic, weakly typed language, this is no easy task. Let’s start by reviewing the evolution of V8 and other JS engines in terms of performance.

The core element that allows the V8 engine to execute JavaScript quickly is the JIT (Just In Time) compiler. It is a dynamic compiler that optimizes code at run time. The original V8 JIT compiler was named FullCodegen, and the V8 team subsequently implemented crankshafts that had many performance optimizations not available in FullCodegen.

Thanks to Yang Guo for telling us FullCodegen is the first optimized compiler in V8.

I’ve been watching and using JavaScript since the 1990s, and it seems to me that code execution speed is often counterintuitive in any JavaScript engine, and it’s hard to know what’s behind some obviously slow JavaScript code.

For years, Matteo Collina and I have been working on how to write high performance Node.js code. This means figuring out what code executes fast or slow on a V8 engine.

But now it’s time to challenge what we already know, because the V8 team has introduced a new JIT compiler: Turbofan.

There were some well-known ways of writing code that resulted in inefficient execution (although this may not have been the case with Turbofan), and Matteo and I made some other, lesser-known discoveries in our Crankshaft performance research. In this article, we’ll take a look at all of these points and see how they change across versions of V8 through a series of performance benchmarks.

Of course, before optimizing our code for V8, we should first focus on API design, algorithms, and data structures. The performance tests in this article are simply to compare the efficiency of JavaScript running in Node. We can certainly change the way we write code to make it run more efficiently, but before we do that, we should first use some more general optimization techniques.

Next we’ll test V8 5.1, 5.8, 5.9, 6.0, and 6.1.

V8 5.1 was the engine used by Node 6 and included the Crankshaft JIT compiler; V8 5.8 was used for Node 8.0 through 8.2, using a mix of crankshafts and Turbofan.

V8 was 6.0 8.3 8.4) (can also be included in a Node, and the time of this writing the latest V8 version is 6.1, it is integrated in the Node – the V8 test (https://github.com/nodejs/node-v8) in the warehouse. In other words, V8 6.1 will eventually appear in a future version of Node, possibly Node.js 9.

Along with the test results, we’ll discuss what these changes mean for the future. These benchmarks are run using benchmark.js, and the resulting value reflects the number of executions per second, so in each chart, a higher result value means better performance.

Try / Catch

A well-known anti-optimization pattern is to use try/catch.

In this test, we sum from 1 to a certain natural number and compare the performance of four different encoding methods:

  • Include the sum with a try /catch

  • Try /catch (sum without try catch)

  • Write the sum wrapped as a function and execute it in a try block.

  • Write the sum as a function and simply execute it directly (sum function).

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/try-catch.js

The results show that our original view of try/catch causing performance problems is still true for Node 6 (V8 5.1), but the impact of try/catch on performance has been significantly reduced for Node 8.0-8.2 (V8 5.8).

Also note that in Node 6 (V8 5.1) and Node 8.0-8.2 (V8 5.8), executing a function inside the try block is much slower than executing it outside the try block.

For Node 8.3+, however, the performance penalty of executing functions inside the try block is negligible.

Don’t count your blessings, though. Matteo and I were preparing materials for some workshops on performance when we discovered a V8 performance bug: A particular combination of code logic can cause Turbofan to fall into an endless cycle of de-optimization/re-optimization (a total performance killer — a code pattern that kills performance).

Removes an attribute from an object

For years, anyone who wanted to write high-performance JavaScript code didn’t use the DELETE operator (or at least we avoided delete when we were trying to optimize some of our high-frequency code).

The root of delete’s problems lies in V8’s handling of the dynamic nature of JavaScript objects and prototype chains (which are also potentially dynamic). These dynamic characteristics make it very complicated for the engine to implement attribute look-up.

In order to speed up the processing of properties and objects, the V8 engine creates C++ classes for objects based on their “structure” at the C++ level. The so-called “structure” of the object refers to the keys and values of the attributes contained in the object (including the keys and values on the prototype chain); These C++ classes are called “hidden classes.” However, this optimization happens at run time, and if the “structure” of an object is uncertain, V8 cannot create “hidden classes” for it, and has to use a much slower method of obtaining properties: hash table lookup. So all the time, when we delete an attribute from an object, the subsequent attribute lookup mode becomes a hash table lookup. This is why we want to avoid using DELETE. Assigning the attribute to be removed to undefined achieves a similar effect to delete, which is fine in most cases, except that it might be a bit of a problem to check if the attribute exists. Also, json.stringify does not include undefined in its output (undefined is not a valid value according to the JSON specification), so doing so is no problem for object serialization.

Arguments is an implicit object that exists in normal JavaScript functions (arrow functions don’t have arguments’ objects) and is often problematic because it looks like an array but isn’t.

In order to be able to use the array methods on the Arguments object and use it like an array, we need to copy each of its index attributes into a real array. In the past, JavaScript developers tended to believe that less code meant faster execution. For code that needs to run in a browser, this crude rule does bring benefits in terms of transfer and load volume, but on the server side, where the speed of code execution is far more important than the size of code, applying this rule can cause problems. A lot of people are accustomed to using a look very smart and convenient way to convert the arguments object into an Array, Array. Prototype. Slice. The call (the arguments). The slice method treats the pseudo-array as an array by calling the slice method of the array and passing in the Arguments object as the execution-time context (this). This gives us an array containing the entire parameter object’s members.

Directly to a function, however, the arguments of the hidden object from a function within the context of exposed out (such as put it by the return value output to a function, or as in the Array. The prototype. Slice. The call (the arguments) did, Arguments object is passed to another function) causes performance degradation. Now let’s see if that’s the case.

The following test, based on our four V8 versions, compares the performance cost of leaking arguments objects to the public versus copying arguments as an array (and then passing arguments out of a function in the same way) :

  • Expose the Arguments object to another function (Leaky Arguments)

  • Array.prototype. slice (Array prototype.slice arguments)

  • Copy arguments for each attribute using a for-loop (for-loop copy arguments)

  • Convert Arguments to an array using the spread operator in ES2015 (Spread operator)

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/arguments.js

We can also use a line graph to show the same data as above to see the change in performance more clearly:

In summary, if we want to write high-performance code that handles function inputs as arrays (a common requirement in my experience), we should use the spread operator in Node 8.3 or later. On Node 8.2 and earlier, we should use a for loop to copy the arguments values into a new (pre-allocated) array (you can see how to do this in our test code).

Meanwhile, on Version 8.3+ of Node, exposing arguments objects to other functions no longer causes performance degradation, so if we don’t need a full array, we can just pass arguments objects around if we can just manipulate the arraylike structure. The performance is actually better.

Currization is bound to functions

Corrification is a way to store state in nested closures.

Such as:


      
  1. function add (a. b) {

  2.  return a + b

  3. }

  4. const add10 = function (n) {

  5.  return add(10. n)

  6. }

  7. console.log(add10(20))

Copy the code

In this example, the parameter a of function add is fixed to 10 in function add10.

Using the bind method provided in EcmaScript 5, the above example can be simplified to:

                                                                                    
      
  1. function add ( a, b) {

  2.   return a + b

  3. }

  4. const add10 = add. bind( null, 10 )

  5. console .log (add10 (20 ))

Copy the code

But because bind’s performance is significantly slower than that of closures, we generally don’t use bind.

The following test compares the performance of bind and closures on the V8 versions we tested.

We compared the following four situations:

  • A function that calls another function with the first parameter given, forming a curry function.

  • Write a function as an arrow that calls another function with the first argument given (Fat arrow curry).

  • A Currie function generated by bind

  • Call a function directly instead of using a direct call

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/currying.js

This line chart clearly shows how these different code forms converge in performance as V8 versions evolve. Interestingly, the Currification function formed using the arrow function is significantly faster than the normal function (at least in our test case), even approaching the performance of the direct call method. On V8 5.1 (Node 6) and 5.8 (Node 8.0-8.2), bind is by comparison the worst, and the arrow function is the fastest way. But starting with V8 5.9 (Node 8.3+), bind increased by an order of magnitude and became one of the fastest on V8 6.1 (future Node) (although the lead was small and negligible).

Considering the various V8 versions we tested, the arrow function was the fastest option. In later versions, the performance is similar to that of BIND, and it is currently faster than the normal function form. However, it is important to note that there may be additional types of currization that need to be studied, combining data structures of different sizes, to get a fuller picture of the problem.

The length of the function

The length of a function, including its signature, whitespace, and even internal comments, can affect whether the function is optimized inline with V8. Believe it or not, adding a comment to your function can cause a performance drop of up to 10%. Will this change on Turbofan? Let’s take a look.

In this test, we look at three scenarios:

  • Calling a short, uncommented function (sum Small Function)

  • Perform the operations of the above short function directly inline in the test, with a long all together comment in front of it.

  • Add a long comment to that short function and call it (sum long function)

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/function-size.js

In V8 5.1 (Node 6), “Sum Small Function” and “Long All Together” perform the same. This actually reflects the fact that short functions are inlined. When we call a small function, it behaves as if V8 writes the contents of the function directly to the place where it was called. So the second test case is essentially an artificial inlining of the function, so the performance is exactly the same. We can also see that in V8 5.1 (Node 6), when a function contains a large section of comments, its performance deteriorates significantly.

In Node 8.0-8.2 (V8 5.8), the situation is similar, except for the increased overhead of small function calls. The problem may have been caused by the simultaneous presence of crankshafts and Turbofans. One function might be in the Crankshaft and the other might be in the Turbofan, causing the function to not be continuously inlined (that is, having to jump between two inlined function clusters).

In 5.9 and later versions (Node 8.3+), function length changes caused by irrelevant content such as whitespace and comments no longer affect performance. This was because Turbofan didn’t use the number of characters to calculate the function length as they did with their crankshafts, but instead used the number of AST nodes in the function based on the actual number of operation instructions it contained. Thus, starting with V8 5.9 (Node 8.3), whitespace, variable name characters, function signatures, and comments no longer affect whether a function is optimized inline.

Also of note, you can see that the overall performance of the function execution is reduced.

At this point, we came to the conclusion that we should continue to keep functions short. At this stage, it is important to avoid excessive comments (and even Spaces) inside functions. If you’re obsessed with speed, manually inlining these functions (that is, removing function calls) is the fastest way. Of course, there is a balance to be struck here. Copying and pasting other functions into your function will also cause your function body to become too large and may no longer be optimized inline. So manual inlining can be self-defeating, and in most cases is best left to the compiler.

32-bit integers VS double – precision floating – point numbers

As we all know, JavaScript has only one numeric type: Number.

V8 is implemented in C++, so V8 needs to choose an underlying type to express values in JavaScript.

For integers (that is, we declared a decimal number without a decimal in JS), V8 first assumes that all numbers can be described as 32-bit integers. This seems a reasonable choice, as in most cases the numbers will be in the range of -2147483648 to 2147483647. If a JavaScript number exceeds 2147483647, the JIT compiler dynamically changes the underlying type of the number to a double-precision floating-point number. This can have a negative impact on other optimization mechanisms.

The next test looks at three scenarios:

  • A function that handles only integers in the 32-bit range (sum small)

  • A function that handles both integers in the 32-bit range and integers beyond that range that need to be expressed in double-precision floating-point numbers (from small to big)

  • A function that only deals with integers that need to be expressed as double-precision floating-point numbers (all big)

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/numbers.js

From this graph we can see that both Node 6 (V8 5.1) and Node 8 (V8 5.8) and even future versions of Node are true. Operating on integers greater than 2,147,483,647 causes the function to run a third to a half slower. So if you’re using long integers to store ids, use strings instead.

It is also worth noting that the speed of integer operations in the 32-bit range drops slightly from Node 6 (V8 5.1) to Node 8.1/8.2 (V8 5.8), but the drop is very noticeable in Node 8.3+ (V8 5.9+). The speed of large integer operations is significantly improved from Node 8.3+ (V8 5.9+). This just goes to show that integer operations in the 32-bit range may indeed be slower, and is not due to performance changes in function calls and for loops (used in the test code).

Thanks to Jakob Kummerow and Yang Guo and the V8 team for their help in making this test result accurate.

Traverse object

Extracting all the attribute values from an object and processing them is a common task in development. There are many ways to do this. Let’s take a look at which was the fastest on the V8/Node versions tested.

We compared the following four methods:

  • Use for-in loops and hasOwnProperty to get all the values of an object (for in)

  • Get an array of all the keys of an Object using object. keys, and then iterate over the array using the array’s reduce method to get the value of the Object in the traversal function (Object.keys functional)

  • Keys is used to obtain an array containing all the keys of an Object, and then the array’s reduce method is used to iterate over the array, retrieving the Object’s value in the traversal function. Also, the traversal function is written as an arrow function (Object.keys functional with arrow).

  • Get an array of all the keys of the Object using Object.keys, and then use a for loop to iterate through the array and get the value of the Object.

For V8 5.8, 5.9, 6.0, and 6.1, we also tested three additional scenarios:

  • Get an array of the Object’s values using Object.values and iterate over it using the array’s reduce method (Object.values functional)

  • Get an array of the Object’s values using Object. values and iterate over it using the array’s reduce method. Also, the traversal function is written as an arrow function (Object.values functional with arrow).

  • Iterate over the array returned by Object.values with for loop

Because V8 5.1 (Node 6) does not support EcmaScript 2017’s native object. values method, we did not run these three additional tests.

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/object-iteration.js

In Node 6 (V8 5.1) and Node 8.0-8.2 (V8 5.8), the fastest way to get an object’s key is to do a for-in traversal to get the corresponding value, which can be done approximately 40 million times per second. The fastest of the object-keys-based methods is only 8 million times a second, a fifth of that.

On V8 6.0 (Node 8.3), for-in plummeted to a quarter of its speed, but even then it was still faster than the alternatives.

In V8 6.1 (future version of Node), object-keys speeds improved and surpassed for-In, but were still nowhere near the for-In speeds of V8 5.1 and 5.8 (Node 6 and Node 8.0-8.2).

The design principle behind Turbofan seems to be that it is tuned for intuitive coding. That said, those forms of coding that feel most natural to developers should be optimized by the engine.

However, using object. values directly is slower than using object. keys first and then iterating. This shows that the procedural loop pattern is still faster than the functional programming pattern. So when traversing an Object, I’m afraid we can’t just use object. values.

Many people used to love for-In for its speed advantage, but now it’s a pain in the neck: the speed of the for-In will drop dramatically in the new V8, and there will be no substitute for that speed for now.

Object creation

The creation of new objects is everywhere in our code, so this is also a very interesting test point.

We’ll look at the following three scenarios:

  • Create objects using object literals

  • Create an object with an EcmaScript 2015 class

  • Use constructors to create objects.

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/object-creation.js

On all V8 versions tested, object creation took about the same amount of time, except for Node 8.2 (V8 5.8), where using classes was significantly slower. This was due to a mix of crankshafts and Turbofans on V8 5.8, which was fixed in Node 8.3 (V8 6.0).

Eliminate object creation

In preparing for this article, we found that Turbofan has made a pretty good optimization for a particular object creation approach. We had mistakenly assumed that this optimization would work for all object creation methods, but thanks to the V8 team’s help in understanding the conditions under which this optimization would be triggered.

In the previous test in the “Object Creation” section, we assigned the newly created object to a variable, set it to NULL, and then reassigned it. This avoids triggering the particular optimization pattern we’ll see below.

In the next test, we will look at the same three situations as in the previous test:

  • Create objects using object literals

  • Create an object with an EcmaScript 2015 class

  • Use constructors to create objects.

The difference is that the variable holding the object reference is not overwritten by the next newly created object, which is passed to another function to perform some other operation.

Let’s take a look at the results!

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/object-creation-inlining.js

You’ll see V8 6.0 (Node 8.3) and 6.1 (Node 9) take a leap forward for this, with over 500 million operations per second. This is mainly because once your code triggers this optimization of Tubofan, the engine doesn’t actually create objects in the whole process, it doesn’t actually do anything. In this scenario, Turbofan can determine that the subsequent execution of the code logic doesn’t actually require the creation ofa real object, so it starts optimization and skips object creation.

The test code above does not fully represent the possible conditions that trigger this optimization, which can be quite complex.

But there’s one thing we can say for sure that doesn’t trigger this Turbofan optimization:

The life of an object must not be longer than that of the function that created it. That is, after the function that created the object has finished executing, there are no more references to that object. This object can be passed to other functions, but this optimization must not happen if we add it to this, or assign it to an external variable outside the scope of the function, or add it to other objects whose lifetime is longer than that of the function that created it.

While the above test results are beautiful, it is difficult to predict all the conditions that trigger optimization. But as long as your code does what it needs, it will give you a huge speed boost.

Thanks to Jakob Kummerow and the rest of the V8 team for helping us understand the reason behind this code behavior. In the course of our research, we also discovered a performance degradation issue in V8’s new GC engine, Orinoco. If you’re interested, You can view and https://v8project.blogspot.it/2016/04/jank-busters-part-two-orinoco.html https://bugs.chromium.org/p/v8/issues/detail?id=6663.

Polymorphism VS singlet

When we always pass arguments of the same type to a function (such as always passing a string), we are using the function in a singleton manner.

Some functions are written to support polymorphism. We can imagine a function that can accept different types of input at the same parameter position. For example, a function can take a string or an object as its first argument. But when we say “type,” we don’t mean strings, numbers, or objects. We mean the structure of an object (in fact, in JavaScript, objects of different data types can also be considered objects of different structures).

An object defines its structure through its properties and values. For example, obj1 and obj2 have the same structure, but obj3 and obj4 have different structures from other objects:

                                                                                                                                            
      
  1. const obj1 = { a: 1 }

  2. const obj2 = { a: 5 }

  3. const obj3 = { a: 1 , b : 2 }

  4. const obj4 = { b: 2 }

Copy the code

For good interface design purposes, we sometimes use the same code to handle objects with different structures, but this can have a negative impact on performance.

Let’s take a look at how singleton and polymorphic code performed in our tests.

We looked at two scenarios:

  • In one function we set it to handle objects of different structure.

  • In a function, we only let it deal with monomorphic objects of the same structure.

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/polymorphic.js

This graph unquestionably shows that the singleton function performed better than the polymorphic function on every VERSION of V8 tested. However, the performance of polymorphic functions began to improve after V8 5.9 (that is, starting with Node 8.3, V8 6.0).

Polymorphic functions are very common in Node.js code, and they bring great flexibility to the API. Thanks to this performance boost on polymorphic functions, real Node.js applications that are more complex than our test code will benefit.

If we are writing code that needs special optimization, such as a function that will be called many times, we should make sure that the structure of the argument objects passed to the function is consistent. On the other hand, if a function can only be called once or twice, such as some initialization functions, it is acceptable to design it in polymorphic form.

Thanks to Jakob Kummerow for providing a more reliable version of the code for this test.

debuggerThe keyword

Finally, let’s talk about the debugger keyword.

Be sure to completely remove debugger statements from your code. Fragmentary debuggers left in the code can degrade performance.

Let’s look at two examples:

  • A function that contains the debugger keyword

  • A function that contains no debugger keyword (without debugger)

Code: https://github.com/davidmarkclements/v8-perf/blob/master/bench/debugger.js

Obviously, the mere presence of a debugger resulted in a significant performance degradation on all tested V8 versions.

In addition, the without Debugger test case suffered some performance degradation for subsequent V8 versions, which will be discussed in the final summary section.

A real-world test: performance comparison of logging tools

In addition to these small performance tests, we can also use a real-world example to see the overall performance impact of the V8 version change. Matteo and I collected and tested the performance of several of the most commonly used Node.js logging tools while developing Pino.

The following bar chart shows how long it took these most common logging tools to log 10,000 lines (as little as possible) in Node 6.11 (Crankshaft) :

Here’s the result of the same test on V8 6.1 (Turbofan) :

While all logging tools on the new JIT compiler Turbofan have increased their speed (by about twice as much as before), Winston has the biggest improvement. This seems to reflect the so-called performance convergence we observed in many of the tests above: Slower forms of code in Crankshaft were significantly improved in Turbofan, while faster code in Crankshaft execution was somewhat slower in Trubofan. Winston was originally the slowest in the above test and probably used some code writing that was slower in the Crankshaft but much faster in the Tubofan; The Pino crankshafts had been optimized specifically for their crankshafts, so the speed was also improved on the Tubofan, but to a much lesser extent.

conclusion

Some of the tests above show that with the full use of Turbofan in V8 6.0 and 6.1, some of the slower practices in V8 5.1, 5.8, and 5.9 will become faster, but some of the previously faster code will also slow down, often by the same magnitude.

Much of this stems from the overhead of making function calls in Turbofan (V8 6.0 and above). Turbofan’s approach to performance improvement is to optimize the most common usage and eliminate the obvious performance pain points. This brings an overall performance improvement for applications running in the browser (Chrome) and on the server (Node) side, but it also comes with a tradeoff, which obviously leads to a speed decrease (and may be improved again in the future) for the highest performance code that has been specifically optimized. Our testing of the logging tool also showed that Turbofan gave us overall performance improvements, even with radically different code bases like Winston and Pino.

If you’ve been concerned about JavaScript performance for a long time and have been rewriting your code to accommodate the “quirks” of the underlying engine for high performance, now is the time to forget some of the tricks you’ve used in the past. If you’ve been working on writing generally “good” code in accordance with best practices, you’re in for a performance improvement package thanks to the V8 team’s hard work.

This article was co-written by David Mark Clements and Matteo Collina and reviewed by Franziska Hinkelmann and Benedikt Meurer from the V8 team.

All source code and copy: https://github.com/davidmarkclements/v8-perf.

Tests of the raw data: https://docs.google.com/spreadsheets/d/1mDt4jDpN_Am7uckBbnxltjROI9hSu6crf9tOa2YnSog/edit?usp=sharing

Most of the tests were conducted on a Macbook Pro 2016 with a 3.3ghz Intel Core I7 CPU and 16GB 2133 MHz LPDDR3 memory. Other tests (including values, attribute deletion, polymorphism, object creation) were performed on a MacBook Pro 2014. All versions of Node.js under test were deployed on each test machine. We also did our best to ensure that no other program interfered with the test results.

Review of my articles:

Any website can become a PWA – but we need to do better

Flow vs. Typescript

The best JavaScript frameworks, libraries and tools of 2017

A new feature in JavaScript — class private properties

Welcome to pay attention to our public account