Important: the way strings are expressed has a lot to do with how they are constructed.

Before we get into the wonders of strings, we need to understand V8’s representation of JavaScript strings.

There are five ways to express a string in V8:

  • SeqString
    • Use contiguous (array-like) space to store strings in V8 heap.
    • Actual data storage is divided into OneByte and TwoByte (Unicode) types.
  • ConsString(first, second)
    • In string splicing, tree structure is used to express the string after splicing (first + second).
  • SliceString(parent, offset)
    • In string cutting, offset and [length] are used to represent parts of the parent string.
  • ThinString(actual)
    • Directly reference another string object (actual).
    • In most cases, it can be considered equivalent to ConsString(actual, empty_string).
  • ExternalString
    • Represents string resources generated outside of the V8 heap.
    • The actual data expression can be divided into OneByte and TwoByte (Unicode).

Flag: –thin_strings

All expression patterns have the [Length] attribute (defined by the base String class) that records the number of characters in the String, but only SeqString actually stores character data, ConsString, SliceString, and ThinString are all references to other expressive patterns (which do not store character data).

In practical applications, ConsString is often used to express a string because string concatenation takes place.

To help you understand, let’s look at an example:

Create 2 English strings and concatenate them:

var hello = "hello";
var world = "world";
var hello_world = hello + world;
Copy the code

Since all characters in Hello and World are single-byte characters, V8 uses SeqOneByteString to express them.

To improve access performance after short string concatenation, V8 does not generate a ConsString here (hello_world). Instead, V8 copies the characters from Hello and world into a new SeqOneByteString to store them.

V8 defines the minimum length of a ConsString as kMinLength = 13 characters.

Create a Chinese string and concatenate it with the string hello_world generated in step 1:

var chinese = Hello world.;
var mixed = hello_world + chinese;
Copy the code

Since Chinese has double-byte characters, V8 uses SeqTwoByteString to express it.

Mixed strings exceed the minimum ConsString length limit,

Generate a tree structure like mixed = ConsString(hello_world, Chinese).

3. Generate the string mixed in Step 2 for cutting:

var sliced = mixed.slice(1);
Copy the code

Because sliced Strings exceed the minimum SliceString length limit,

Create sliced = SliceString(mixed, 1) such a spatial structure.

V8 defines the minimum SliceString length as kMinLength = 13 characters.

Here we can see that only the SeqOneByteString and SeqTwoByteString instances store the actual characters, while the SliceString and ConsString instances are actually internal references to other expression patterns (no character data is stored).

Note that characters cut out in the SliceString scenario are not recycled by the V8 GC.

That is, the “H” character stored in the SeqOneByteString instance in the example is not collected by the V8 GC.

To sum up:

  • The way a string is expressed in V8 depends on how the string is constructed (Seq or Cons, etc.).
  • When concatenating strings, a ConsString is more efficient than reassigning a space for storage.

Now that you know how strings are expressed in V8, it’s time to talk about their quirks:

  • V8 vii, Flattening strings (to be released tomorrow)