Project warehouse: github.com/bytedance/s…

Sonic is a Golang JSON library based on just-in-time Compilation and Single Instruction Multiple Data. Sonic is a Golang JSON library based on just-in-time Compilation and Single Instruction Multiple Data. Greatly improve the Go program JSON codec performance. Combined with lazy-load design, it also creates a comprehensive and efficient API for different business scenarios.

Since its release in July 2021, Sonic has been used by businesses such as Tiktok and Toutiao, saving ByteDance hundreds of thousands of CPU cores.

Why develop a JSON library

JSON (JavaScript Object Notation) is widely used in Internet services with its concise syntax and flexible self-description capabilities. However, JSON is a text protocol in nature, and there is no schema constraint similar to that of Protobuf, so the codec efficiency is often very low. Coupled with improper selection and use of JSON libraries by some business developers, the service performance deteriorates dramatically.

At Bytedance, we ran into the same problem. According to the performance analysis data of the company’s TOP 50 SERVICES with CPU ratio, the overall cost of JSON codec is close to 10%, and the cost of a single service even exceeds 40%. Therefore, it is of great importance to improve the performance of JSON library. So we evaluated the existing Go JSON libraries in the industry.

First, according to the main JSON library apis, we have broken them down into three ways of using them:

  • Generic codecs: JSON does not have a corresponding schema. Values read can only be interpreted as runtime objects of the corresponding language based on self-describing semantics. For example, JSON object is converted to Go Map [string]interface{}.
  • Binding codec: JSON has corresponding schema, which can bind the value read to the corresponding model field by combining the model definition (Go struct) and JSON syntax at the same time, and complete data parsing and verification at the same time.
  • Find (get) & Modify (set) : specify the search path of a certain rule (usually the set of key and index), get the required part of JSON value and process it.

Second, we divide the sample JSON into three orders of magnitude based on the number and depth of keys:

  • Small: 400B, 11 keys, 3 layers deep;
  • Medium: 110KB, 300+ key, 4 layers deep (actual business data, including a large number of nested JSON strings);
  • Large (large) : 550KB, 10000+ key, 6 layers deep.

The test results are as follows:

Performance of JSON library at different data levels

The results show that none of these JSON libraries can maintain the optimal performance in all scenarios. Even the most widely used third-party library jSON-Iterator can not meet our needs in the scenarios of generic codec and big data.

The benchmark codec performance of the JSON library is important, but the optimal matching for different scenarios is more critical — so we started to develop our own JSON library.

Open source library Sonic technology principle

Due to the complexity of JSON business scenarios, it is unrealistic to expect optimization through a single algorithm. So when designing Sonic, we borrowed optimization ideas from other domains/languages (not just JSON) and incorporated them into each processing step. Among them, there are three core technologies: JIT, lazy-load and SIMD.

JIT

For stereotype codec scenarios with a schema, many operations do not need to be performed “at run time”. By “runtime,” I mean the time period in which the program actually starts parsing JSON data.

For example, if the business model determines that the value of a JSON key must be Boolean, we can print the corresponding JSON value (‘ true ‘or’ false ‘) during the serialization phase without checking the specific type of the object.

The core idea of Sonic -JIT is to separate model interpretation from data-processing logic, leaving the former fixed at “compile time”.

This idea also exists in the standard library and some third-party JSON libraries, such as the jSON-iterator function assembly pattern: The Go struct is broken down into codecs of field types, and then assembled and cached into codec (codec) corresponding to the whole object, which is loaded at run time to process JSON. However, this implementation inevitably translates into a large number of interface and function call stacks, and the function-call overhead multiplies as the size of JSON data increases. Only when the model interpretation logic is actually compiled to implement stack-less implementations can the performance gains from schema be maximized.

There are two main implementations: code generation code-gen (or Template) and just-in-time compilation JIT. The advantage of the former is that it is relatively easy for library developers to implement, but the disadvantage is that it increases the maintenance cost and limitations of business code, and cannot achieve hot updates in seconds — which is one of the reasons why JSON libraries with code generation methods are not widely available. JIT moves the compilation process to the loading (or first parsing) stage of the program, and only needs to provide the structure type information corresponding to the JSON Schema, so that the corresponding CODEC can be compiled once and executed efficiently.

Sonic -JIT process is as follows:

Sonic – JIT system

  1. For the first time, the schema information to be compiled (AST) is obtained based on Go reflection.
  2. Generate a set of custom intermediate code OP Codes (SSA) with JSON codec algorithm;
  3. Translate OP codes into Plan9 assembly (LL);
  4. Convert Plan 9 to machine code (ASM) using the third-party library Golang-ASM;
  5. Inject the generated binary code into the memory cache and encapsulate it as go function (DL);
  6. Hash (rtype.hash) to load the corresponding COdec from the cache.

As a result of the final implementation, the Sonic-Jit generated Codec not only performed better than jSON-Iterator, but even outperformed code-generated EasyJSON (see “Performance Testing” below). This has to do with the optimization of the underlying text processing operator (see “SIMD & ASM2ASM” below) and the ability of the sonic-JIT to control the underlying CPU instructions. A set of independent and efficient ABI (Application Binary Interface) is established at runtime:

  • Place frequently used variables in fixed registers (such as JSON buffers, structure Pointers) to avoid memory load & store.
  • Maintain variable stack (memory pool), avoid Go function stack expansion;
  • Automatically generates jump table to speed up the branch jump of generic Decoding;
  • Parameters are passed using registers (currently not supported by the Go Assembly, see section “SIMD & ASM2ASM”).

Lazy-load

Generic codec is one of the worst-performing scenarios for most Go JSON libraries, but it is often the most frequently used scenario due to business needs or improper selection by business developers.

Is generic codec performance poor simply because there is no schema? It’s not. We can compare C++ JSON libraries, such as rappidjson and simdjson, which are generic in their parsing methods, but still have good performance (simdjson can be more than 2GB/s). The root cause of the poor generic parsing performance of the standard library is its use of the Go native generic interface (Map [string]interface{}) as a JSON codec object.

This is a bad choice: first, the map insertion is expensive during data deserialization; Map traversal is also far less efficient than array traversal during data serialization.

Looking back, JSON itself is fully self-describing, and if we describe it with a data structure that is closer to JSON AST, we can not only simplify the conversion process, but also allow lazy-load, which is the core logic of Sonic – AST: It is a KIND of JSON codec object in Go. Node {type, Length, pointer} is used to represent any JSON data node, and the hierarchical relationship between nodes is described by combining tree and array structure.

Sonic – AST structure schematic diagram

Sonic – AST implements a stateful, scalable JSON parsing process: When the user gets a key, Sonic uses skip calculations to lightweight the JSON text before the key. For JSON nodes after the key, no parsing is performed. Only the keys that the consumer really needs are fully parsed (converted to some Go primitive type). Because node transformation is much cheaper than parsing JSON, the benefits can be considerable in business scenarios where complete data is not required.

While Skip is a lightweight text parsing (dealing with JSON control characters “[“,” {“, etc.), there is often repeated overhead of the same path lookup when using a pure JSON lookup library like GJSON (see Benchmark).

To address this problem, Sonic added a step to skip processing for child nodes, recording the key, start, and end bits that skip JSON and assigning a raw-JSON type node to save, so that the second skip can be directly based on the node’s offset. Meanwhile, Sonic – AST supports node update, insertion and serialization, and even any Go types can be converted to nodes and saved.

In other words, Sonic – AST can be used as a generic data container to replace Go Interface, which has great potential in protocol transformation, dynamic proxy and other service scenarios.

SIMD & asm2asm

The core of both stereotyped and generic codec scenarios is the processing and calculation of JSON text. Some of these problems already have mature and efficient solutions in the industry, such as floating point to string algorithm Ryu, integer to string look-up method, etc., which are implemented in sonic’s underlying text operator.

There are also some problems with relatively simple logic that can be used for larger orders of magnitude text, such as unquote\quote handling of JSON strings, whitespace skipping, and so on. That’s when we need some kind of technology to increase processing power. SIMD is such a technology for parallel processing of large-scale data. At present, most cpus already have SIMD instruction set (such as Intel AVX), and simdjson has a relatively successful practice.

Skip whitespace in Sonic

While (likely(nb >= 32)) {// vmovd convert a single character to YMM __m256i x = _mm256_load_si256 ((const void) *)sp); __m256i a = _mm256_cmpeq_EPI8 (x, _MM256_set1_EPI8 (' ')); __m256i b = _mm256_cmpeq_epi8 (x, _mm256_set1_epi8('\t')); __m256i c = _mm256_cmpeq_epi8 (x, _mm256_set1_epi8('\n')); __m256i d = _mm256_cmpeq_epi8 (x, _mm256_set1_epi8('\r')); __m256i u = _mm256_or_si256 (a, b); __m256i v = _mm256_or_si256 (c, d); __m256i w = _mm256_or_si256 (u, v); If ((ms = _mm256_movemask_EPI8 (w))! = -1) { _mm256_zeroupper(); N return sp-ss + __builtin_ctzll(~(uint64_t)ms); } /* move to next block */ sp += 32; nb -= 32; } /* clear upper half to avoid AVX-SSE transition penalty */ _mm256_zeroupper(); #endifCopy the code

STRNCHR () implementation in Sonic (SIMD part)

Developers will notice that this code is actually written in C – most of the text processing functions in Sonic are implemented in C: on the one hand, SIMD instruction sets are better encapsulated in C and easier to implement; On the other hand, this C code can enjoy the full benefit of compilation optimization through clang compilation. To do this, we developed asM2ASM, an x86 assembler turned Plan9 assembler, to statically embed clang output assembler into Sonic via the Go Assembly mechanism. Meanwhile, in the JIT generated CODEC, we used THE ASM2ASM tool to calculate the PC value of the C function, and directly called the CALL instruction to jump, so as to bypass the Go Assembly can not register parameters, squeezing the last bit of CPU performance.

other

In addition to the techniques mentioned above, there are a lot of detailed optimizations in Sonic, such as using RCU instead of sync.map to increase the loading speed of codec cache, using memory pools to reduce the memory allocation of encode buffers, and so on. I don’t have space to go into details here, but if you are interested, you can search and read the sonic source code.

The performance test

We tested different test scenarios in the previous article (see benchmark for the test code) and got the following results:

Small data (400B, 11 keys, 3 layers deep)

Medium data (110KB, 300+ key, depth 4 layers)

Big Data (550KB, 10000+ key, depth 6 layers)

It can be seen that Sonic is ahead in almost all scenarios (Sonic – AST suffers performance degradation under small data sets due to direct use of C functions imported by Go Assembly)

  • The average encoding performance is 240% higher than that of JSON-Iterator, and the average decoding performance is 110% higher than that of Json-Iterator.
  • The single key modification ability is improved by 75% compared with SJSON.

And in the production environment, sonic also demonstrated good returns, with peak service core usage reduced by nearly a third:

Byte CPU usage (cores) of a service before and after Sonic went live

conclusion

Sonic currently supports only Darwin/Linux on the AMD64 architecture, but will expand to other operating systems and architectures as the underlying development is based on assembly. In addition, we are also considering porting Sonic’s success with Go to different languages and serialization protocols. The C++ version of sonic is currently under development, which aims to implement a set of general performance JSON codec interface based on sonic core ideas and underlying operators.

With the release of the first major version, V1.0.0, Sonic is responding to community needs and embracing the open source ecosystem in addition to being flexible for enterprise production. We look forward to more scenarios and performance breakthroughs for Sonic in the future, and we welcome developers to contribute PR and build the best JSON library in the industry!

A link to the

Project address: github.com/bytedance/s…

BenchMark:github.com/bytedance/s…