Go “players” may be puzzled by this topic — the Go native library, Encoding/JSON, provides a comfortable enough JSON processing tool for JSON and is well received by Go developers. What else could go wrong with it? However, in fact, in the process of business development, we met a lot of native JSON can not do well or even can not do the problem, but really can not fully meet our requirements.

So, what’s wrong with it? When are third-party libraries used? How to select? How about performance?

However, before getting into specifics, let’s take a brief look at some of the libraries that Go currently uses to process JSON and test data analysis of those libraries. If the text below is too long, you can skip to the conclusion.

Part of the commonly used Go JSON parsing library

Go native encoding/json

This library should be familiar to Go programmers. Using JSON.unmarshal and JSON.marshal, it is easy to deserialize JSON formatted binary data into the specified Go structure and to deserialize the Go structure into a binary stream. For data with unknown structure or uncertain structure, binary deserialization is supported into Map [String] Interface {} type, and data access is carried out in KV mode.

Here are two additional features that you might not notice:

  • The JSON package parses a JSON data, which can be objects, arrays, strings, numbers, booleans, or null. The above two functions, in fact, also support the resolution of these types of values. For example, the following code can also be used
var s string err := json.Unmarshal([]byte(`"Hello, world!" '), &s) // Note that the double quotes in the string cannot be missing. If it is just 'Hello, world', then this is not a valid JSON sequence and will return an error.
  • When parsing JSON, if you run into case issues, you convert it to case as much as possible. Even if a key is different from the one defined in the structure, it is still possible to assign a value to a field if the key is the same regardless of case. Here’s an example:
cert := struct { Username string `json:"username"` Password string `json:"password"` }{} err := json.Unmarshal([]byte(`{"UserName":"root","passWord":"123456"}`), &cert) if err ! = nil { fmt.Println("err =", err) } else { fmt.Println("username =", cert.Username) fmt.Println("password =", Cert.password)} // Output: // username = root // Password = 123456

jsoniter

Open up Jsoniter’s GitHub home page and it starts with two key words: high-performance and compatible. These are the bag’s two biggest selling points.

First, compatibility: The biggest advantage of Jsoniter is that it is 100% standard library compliant, so the code can be easily migrated. It’s really inconvenient. You can also use Go Monkey to forcibly replace JSON’s related function entries.

Then there’s performance: Like other open source libraries that boast about their performance, their own testing results can’t be taken with a salt of salt. Here are a few simple conclusions based on my own testing:

  • In the single scenario of the deserialization structure, Jsoniter does have an improvement over the standard library, with my own measurements showing a 1.4-fold improvement
  • But in the same single scenario of deserializing a structure, Jsoniter is far worse than EasyJSON
  • Other scenarios are not, as I’ll explain later

In terms of performance, Jsoniter is able to perform faster than the official libraries jointly developed by many major players. One is to minimize unnecessary memory copying and the other is to reduce the use of Reflect — objects of the same type are cached after Jsoniter calls Reflect only once. But as the Go version iterates, the performance of native JSON libraries gets better and better, and the performance advantage of Jsonter gets narrower and narrower.

In addition, Jsoniter also supports the GET function, which allows the field of the response to be read directly from a []byte binary data, which will be explained later

easyjson

This is another JSON parsing package on GitHub. Compared to Jsoniter’s 9K Star, EasyJSON seems a bit smaller, with 3K, but it is also a very popular open source project.

The bag’s main selling point, again, is its speed. Why is EasyJSON faster than Jsoniter? Because EasyJSON’s development mode is similar to ProtoBuf’s, you need to use its code tools to generate serialization/deserialization code specifically for each structure before the program runs. Each program has a custom parsing function.

But also because of this development mode, EasyJSON is more invasive to the business. On the one hand, you need to write code before you go build; On the other hand, the associated JSON handler functions are not compatible with native JSON libraries.

jsonparser

This is my personal favorite of a JSON parsing library, 3.9K star number can also be seen that it is not low popularity. Its GitHub home page title boasts up to 10X performance over the official library.

Again, open source projects can’t take their own testing results with a grain of salt. I have personally measured the 10x performance, but it is not representative of all scenarios.

Why does JSONParser have such high performance? As for JSONParser itself, it is only responsible for deconstructing some key boundary characters in a binary byte string, such as:

  • find", then find the end"And in the middle of that is a string
  • find[, then find a pair]And in the middle of that is an array
  • find{, then find a pair}In the middle of this is an object

It then passes the []byte data in the middle of the found data to the caller for further processing. At this point, it is the responsibility of the caller to parse and validate the binary data.

Why do I like open source libraries that seem so cumbersome? Developers can build special logic based on JSONParser, or even build their own JSON parsing library. My own open source project, JSONValue, was implemented on JSONParser in the early days, and although JSONParser was later deprecated for further performance optimization, that didn’t stop me from praising it.

jsonvalue

This project is my personal JSON parsing library. At the beginning, it was designed to replace the native JSON library to use map[string]interface{} to process the unstructured JSON data. For this reason, I have another article that addresses this issue: “[still using map[string]interface{} to process JSON? Here’s a more efficient way — JSONValue][2]”.

I’ve roughly finished tuning the library (see the master branch), and the performance is already much better than the native JSON library, and slightly better than Jsoniter. Of course, this is also a case in point, and the performance of various libraries varies for a wide variety of scenarios. That’s one of the reasons I wrote this article.

JSON processing under normal operations

Other than struct and map, what else? Here’s a list of the scenarios I’ve encountered in real business development. All test code are open source, readers can consult, you can also put forward opinions to me, issue, comment, private chat can be.

General operation: structure analysis

Structure parsing, which is by far the most routine operation with JSON in Go. Here I have defined a structure like this:

type object struct {
    Int    int       `json:"int"`
    Float  float64   `json:"float"`
    String string    `json:"string"`
    Object *object   `json:"object,omitempty"`
    Array  []*object `json:"array,omitempty"`
}

Slightly bad — this structure can be madly self-nested.

Then, I define a binary stream, using json.cn, which is a JSON object with 5 layers.

{" int ", 123456, "float" : 123.456789, "string" : "Hello, world!" , "object" : {" int ": 123456," float ": 123.456789," string ":" Hello, world!" , "object" : {" int ": 123456," float ": 123.456789," string ":" Hello, world!" , "object" : {" int ": 123456," float ": 123.456789," string ":" Hello, world!" , "object" : {" int ": 123456," float ": 123.456789," string ":" Hello, world!" }, "array" : [{" int ": 123456," float ": 123.456789," string ":" Hello, world! "}, {" int ": 123456," float ": 123.456789," string ":" Hello, World! "}}}}], "array" : [{" int ": 123456," float ": 123.456789," string ":" Hello, World! "}, {" int ": 123456," float ": 123.456789," string ":" Hello, world! "}]}

Use both structures for official encoding/json and jsoniter, respectively. EasyJSON three packages for Marshal and Unmarshal tests. First let’s look at the results of the Unmarshal test:

The package name function Time Per Iteration Memory footprint Alloc number Performance evaluation
encoding/json Unmarshal 8775 ns/op 1144 B/op 25 allocs/op End to end
jsoniter Unmarshal 6890 ns/op 1720 B/op 56 allocs/op U u do
easyjson UnmarshalJSON 4017 ns/op 784 B/op 19 allocs/op U u u u u

Here are the test results for serialization:

The package name function Time Per Iteration Memory footprint Alloc number Performance evaluation
encoding/json Marshal 6859 ns/op 1882 B/op 6 allocs/op End to end
jsoniter Marshal 6843 ns/op 1882 B/op 6 allocs/op End to end
easyjson MarshalJSON 2463 ns/op 1240 B/op 5 allocs/op U u u u u

From a purely performance point of view, EasyJSON has customizable serialization and deserialization functions for each struct. It achieves the highest performance, 2.5 to 3 times more efficient than the other two libraries. Jsoniter is slightly higher than official JSON, but not by much.

Map [string]interface{} map[string]interface{}

The reason for “unconventional” is that in this case, the program needs to deal with unstructured JSON data, or many different types of data structures in a single function, and therefore cannot be handled using the structure schema. The solution to the official JSON library is to store it (for object types) with a map[String]interface{}. In this scenario, only official JSON and Jsoniter are supported.

The test data is as follows. First, deserialize:

The package name function Time Per Iteration Memory footprint Alloc number Performance evaluation
encoding/json Unmarshal 13040 ns/op 4512 B/op 128 allocs/op End to end
jsoniter Unmarshal 9442 ns/op 4521 B/op 136 allocs/op End to end

Serialization test data is as follows:

The package name function Time Per Iteration Memory footprint Alloc number Performance evaluation
encoding/json Marshal 17140 ns/op 5865 B/op 121 allocs/op End to end
jsoniter Marshal 17132 ns/op 5865 B/op 121 allocs/op End to end

In this case, we are all six of one and six of the other, and Jsoniter does not have a clear advantage. Even the big data volume analysis that Jsoniter uses as a selling point has little advantage.

For the same amount of data, the deserialization time of the two libraries is nearly twice as long as that of the structure case, and the serialization time is about 2.5 times longer than that of the structure case.

Emmm… Brothers don’t need to use it if they can, not to mention the fact that the program needs all kinds of assertions when processing interface{}. You can read my article to get a feel for the pain.

Unusual operations – deserialize a section

When it comes to not being able to use a struct, various open source projects have their own way. Each library has very detailed and powerful additional features that this article alone can’t cover. Here I’ll list a few libraries and the ideas they represent, along with test data for each case.

jsoniter

In dealing with unstructured JSON, if you want to parse a piece of []byte data and get one of its values, Jsoniter has the following similar scheme.

The first option is to parse the original text directly and return the required data:

Jsoniter.Get(data, "response", "userList", 0, 0, 0) "name") fmt.Println("username:", username.ToString())

You can also return an object directly and proceed with the operation based on that object:

obj := jsoniter.Get(data) if obj.ValueType() == jsoniter.InvalidType { // err handling } username := obj.Get("response",  "userList", 0, "name") fmt.Println("username:", username.ToString())

One big feature of this function is that it parses on demand. Obj := Jsoniter.Get(data). Jsoniter does a minimum of data checking, at least to the extent that it is currently an Object.

Even in the second call obj.get (“response”, “userList”, 0, “name”), Jsoniter does its best to reduce unnecessary parsing, parsing only the part that needs parsing.

For example, if the request parameter asks to resolve the value of Response. userList, then Jsoniter will try to avoid the irrelevant fields such as Response. gameList and so on. This minimizes irrelevant CPU time.

Note, however, that the returned obj object is understood to be read-only in terms of interface functionality and cannot be serialized into a binary sequence.

jsonparser

JSONParser’s support for parsing a []byte of data and getting a value from it is more limited than JSONiter’s.

For example, if we could implement knowing the type of a value, such as the Username field above, we could get it like this:

username, err := jsonparser.GetString(data, "response", "userList", "[0]", "name") if err ! = nil { // err handling } fmt.Println("username:", username)

However, JSONParser’s GET family of functions can only obtain basic types other than NULL, namely number, Boolean, and string.

If you want to work with objects and arrays, you need to be familiar with the following two functions, which I think are at the heart of JSONParser:

func ArrayEach(data []byte, cb func(value []byte, dataType ValueType, offset int, err error), keys ... string) (offset int, err error) func ObjectEach(data []byte, callback func(key []byte, value []byte, dataType ValueType, offset int) error, keys ... string) (err error)

These two functions parse the binary data in sequence and return the extracted data segments to the caller via callback functions, which manipulate the data. Callers can group map, group slice, and even do things they normally can’t do (as I’ll explain later)

jsonvalue

This is an open source Go JSON operation library developed by me. It is similar to the second style of Jsoniter in terms of the design style of the Get class operation API.

If we also want to get the username field mentioned above, we can get it like this:

v, err := jsonvalue.Unmarshal(data) if err ! = nil { // err handling } username := v.GetString("response", "userList", 0, "name") fmt.Println("username:", username)

Performance test comparison

In the “unconventional operations” scenario described in this section, JSONiter and JSONParser are parsed “on demand” in each of the three libraries, while JSONValue is parsed fully. So there’s still a difference in the test plan.

Here I start by throwing out the test data, and there are two parts to the test evaluation:

  • Performance evaluation: Represents the performance score in this scenario, not considering whether it is easy to use, but only considering whether the CPU execution efficiency is high or not
  • Function evaluation: Indicates whether the subsequent processing of the program is convenient after data is obtained in this scenario. It doesn’t matter how high the deserialization performance is
The package name Function description/main function calls Time Per Iteration Memory footprint Alloc number Performance evaluation Function evaluation
Shallow parsing
jsoniter any := jsoniter.Get(raw); keys := any.Keys() 9118 ns/op 3024 B/op 139 allocs/op Do things U u u
jsonvalue jsonvalue.Unmarshal() 7684 ns/op 9072 B/op 61 allocs/op u U u u u u
jsonparser jsonparser.ObjectEach(raw, objEach) 853 ns/op 0 B/op 0 allocs/op U u u u u End to end
Read one of the deeper levels of data
jsoniter any.Get("object", "object", "object", "array", 1) 9118 ns/op 3024 B/op 139 allocs/op Do things U u u u u
jsonvalue jsonvalue.Unmarshal(); v.Get("object", "object", "object", "array", 1) 7928 ns/op 9072 B/op 61 allocs/op u U u u u u
jsonparser jsonparser.Get(raw, "object", "object", "object", "array", "[1]") 917 ns/op 0 B/op 0 allocs/op U u u u u U u do
Only one of the deeper levels is read from a large (100x) amount of data
jsoniter jsoniter.Get(raw, "10", "object", "object", "object", "array", 1) 29967 ns/op 4913 B/op 469 allocs/op u U u u u u
jsonvalue jsonvalue.Unmarshal(); v.Get("10", "object", "object", "object", "array", 1) 799450 ns/op 917030 B/op 6011 allocs/op U u u u u
jsonparser jsonparser.Get(raw, "10", "object", "object", "object", "array", "[1]") 8826 ns/op 0 B/op 0 allocs/op U u u u u U u do
Complete traverse
jsoniter jsoniter.Get(raw)Resolve recursively for each child 45237 ns/op 12659 B/op 671 allocs/op Do things End to end
jsonvalue jsonvalue.Unmarshal() 7928 ns/op 9072 B/op 61 allocs/op U u u U u u u u
jsonparser jsonparser.ObjectEach(raw, objEach)Resolve recursively for each child 3705 ns/op 0 B/op 0 allocs/op U u u u u Do things
encoding/json Unmarshal 13040 ns/op 4512 B/op 128 allocs/op u u
jsoniter Unmarshal 9442 ns/op 4521 B/op 136 allocs/op U do u

It can be seen that the above test data can be divided into four deserialization scenarios. Here I explain in detail the application scenarios of the four situations, as well as the corresponding technical selection suggestions

Shallow parsing

In the test code, shallow parsing refers to parsing only the shallowest key list of a deeper level structure. This scene is more of a reference. As you can see, JSONParser outperforms other open source libraries in terms of parsing out the key list at the first level as fast as possible.

However, in terms of ease of use, both JSONParser and JSONiter require developers to do further processing on the acquired data, so the ease of use of both JSONiter and JSONParser is slightly lower in this scenario.

Get a specific data in the body

Here’s the scenario: Only a small portion of the JSON data body is useful to the current business and needs to be retrieved. Here I’ve divided it into two situations:

  • Useful data accounts for a higher proportion of all data (corresponding to “reading data at a deeper level”) :

    • In terms of performance, JSONParser is as good as ever in this scenario
    • In terms of ease of use, JSONParser requires the caller to process the data again, so JSONiter and JSONValue are superior
  • The proportion of useful data to total data is low (corresponding to “read only one of the deeper level values from a large number of (100x) data”) :

    • In terms of performance, JsonPrser is still off the hook
    • In terms of ease of use, JSONParser is still weak
    • Combined with ease of use and performance, the lower the percentage of useful data in this scenario, the higher the value of JSONParser
  • The business needs to fully parse the data — this scenario is the most complete consideration of the overall performance of each solution

    • From a performance point of view, JSONParser is still good, but in this case, the ease of use is a problem — you need to repackage logic to store data in the midst of complex traversal operations
    • The second performance is JSONValue, and this is where I am very confident

      • JSONValue does all and complete parsing in less time than the supposedly fast JSONiter
      • Compared with JSONParser, JSONValue ostensibly takes 2.5 times as long to process, but the latter only completes the semi-processing of data, while the former takes the finished product out for the caller to use
    • As for Jsoniter, don’t use it in this scenario — its data is simply unreadable when you need to parse it completely
    • Finally, the official JSON library and Jsoniter parsed map data are added for reference only — in this case, it is also recommended not to use

Unusual Operations — Serializing the article

This means serializing a piece of data without a structure. This scenario usually occurs in the following situations:

  • The format of the data to be serialized is uncertain and may be generated based on other parameters.
  • The data that needs to be serialized is too much and too trivial, and if you define the structure one by one and marshal it, the code is not readable enough.

The first solution to this scenario is the “normal unconventional operation” mentioned earlier, which is to use a Map.

As for unconventional operations, we’ll exclude Jsoniter and JsonParser for the first time, because they don’t have a direct way to build custom JSON structures. Then exclude easyJSON because it cannot operate on a map. All that’s left is JSONValue.

For example, if we return the nickname of the user, suppose the return format is :{” code”:0,”message”:”success”,”data”:{” NICKNAME “:” revitalize China “}}. The code to use map is as follows:

Code := 0 NICKNAME := MAP [String]interface{}{"code": code, "message": "success", "data": map[string]string{ "nickname": nickname }, } b, _ := json.Marshal(&res)

The JSONValue method is:

res := jsonvalue.NewObject()
res.SetInt(0).At("code")
res.SetString("success").At("message")
res.SetString(nickname).At("data", "nickname")
b := res.MustMarshal()

I should say in terms of ease of use, it’s very convenient. We serialized the official JSON, Jsoniter and JsonValue respectively, and the measured data are as follows:

The package name function Time Per Iteration Memory footprint Alloc number Performance evaluation
encoding/json Marshal 16273 ns/op 5865 B/op 121 allocs/op U do
jsoniter Marshal 16616 ns/op 5865 B/op 121 allocs/op U do
jsonvalue Marshal 4521 ns/op 2224 B/op 5 allocs/op U u u u u

The results are already clear. You can see why, because the reflect mechanism needs to be used when processing map data types, which greatly reduces the performance of the program.

Conclusion and the suggestion of type selection

Structure serialization and deserialization

In this scenario, my personal priority is the official JSON library. The reader may be surprised. Here’s what I think:

  • While EasyJSON outperforms all other open source projects, its biggest drawback is that it requires an extra tool to generate the code, and versioning that extra tool adds an extra cost to the operation and maintenance. Of course, if the reader’s team already handles ProtoBuf well, you can manage EasyJSON in the same way
  • Before Go 1.8, the performance of the official JSON library had received a lot of criticism. But today (1.16.3) the performance of the official JSON library is not comparable. In addition, as one of the most widely used JSON libraries, the official library is the least bug-prone and the most compatible
  • Jsoniter’s performance is still better than the official one, but not out of the sky. If you’re looking for extreme performance, you should choose EasyJSON over Jsoniter
  • Jsoniter has been inactive in recent years. The author raised an issue some time ago, but no one replied. Later, I checked the issue list and found that there were still some issues of 2018

Serialization and deserialization of unstructured data

So in this scenario, we’re going to look at both high data utilization and low data utilization. Data utilization refers to the body of the JSON data, and if more than a quarter of the data is the one that the business needs to focus on and process, it is considered high data utilization.

  1. High data utilization – In this case, I recommend using JSONValue
  2. Low data utilization – there are two situations: whether the JSON data still needs to be serialized back

    • No need to reserialize: At this point, just choose JSONParser, and its performance is spectacular
    • Need to reserialize: In this case, there are two options. If the performance requirements are relatively low, you can use JSONValue; If the performance requirements are high and you only need to insert one data (important) into the binary sequence, you can use Jsoniter’sSetMethods. You can check out godoc

In practice, there are very few cases where large amounts of JSON data need to be serialized at the same time. This scenario is often used by proxy servers, gateways, overlay relay services, etc., while injecting additional information into the original data. In other words, Jsoniter has a limited range of applications.

The following is a comparison of operation efficiency of different libraries with data coverage from 1% to 60% (ordinate unit: μs/op)

As you can see, when Jsoniter’s data utilization reaches 25%, it has no advantage over JsonValue. JSONParser is around 40%. In the case of JSONValue, because the data is fully parsed at once, the post-parsed data access time is minimal, so the time is stable across different data coverage.

Other heretical operations

I’ve also encountered some weird JSON processing scenarios in the real world, and I’ve taken the opportunity to share my solutions here.

Case insensitive JSON

“When parsing JSON, if you have a case problem, try to convert it to case if possible. Even if a key is different from the one defined in the structure, it is still possible to assign a value to the field if the key is the same regardless of case.”

However, if you’re using Map, JSONiter, or JSONParser, this is a big problem. We have two services that operate on the same field in the MySQL database, but the two Go services define a structure in which one of the letters does not match case. This problem has been around for a long time, but has not been exposed because of the above characteristics of official JSON parsing structures. Until one day, we wrote a script to wash the data, using map to read the field, the Bug was exposed

I solved this problem by adding a case support feature to JSONValue:

Raw: ` = {" user ": {" nickName" : "pony"}} ` / / N v of note, _ : = jsonvalue. UnmarshalString (raw) FMT. Println (" nickName: ", v.GetString("user", "nickname")) fmt.Println("nickname:", v.Caseless().GetString("user", "NICKNAME ")) // output // NICKNAME: // NICKNAME: pony

There are sequential JSON objects

In the interface of the partner module, the data stream is pushed to our business module in the form of a JSON object. Later, according to the demand, the data requirements are pushed in order. If the interface format is changed to an array, then a major change in the data structure of both interfaces is required. In addition, we are bound to encounter the old and new modules at the same time when rolling upgrade, so the interface needs to be compatible with two interface formats at the same time.

Finally, we adopt a very evil way — the data producer can derive KV in order, while we, as the consumer, can obtain KV byte sequence in order by using the ObjectEach function of JSONParser, so as to complete the sequential acquisition of data.

Cross-language UTF-8 string docking

Go is a very young language, and by the time it was born, the dominant character encoding on the Internet was already Unicode and UTF-8. Other older languages, for a variety of reasons, may have a different encoding format.

As a result, when doing cross-language JSON docking, different teams and different companies may use different encoding formats for Unicode wide characters. If this is the case, then the solution is to adopt ASCII coding uniformly. If it’s official JSON, you can refer to this Q&A to escape wide characters.

If you use JSONValue, the default is an ASCII escape, such as:

V := jsonValue.newObject () v.setString (" China ").at (" Nation ") fmt.println (v.marshalString ()) // Output // {"nation":"\u4E2D\u56FD"}

The resources

  • Open source libraries covered in this article:

    • jsoniter
    • RapidJSON: This is another library you learned about during the course of your study, implemented in CGO. Personally, this is a bit too much, if the performance is really so high, it is better to use C++ directly. And the library is no longer iterating, so just read about it
    • jsonparser
    • easyjson
    • jsonvalue
    • Go monkeypatching
  • The test data and test methods covered in this article are shown in:

    • jsonvalue-test
  • Escape and Unicode encoding in JSON serialization
  • It claims to be the fastest JSON parser in the world, 10x faster than others
  • JSON-iterator /go uses notes
  • How do you evaluate Jsoniter’s claim to be the fastest JSON parser?
  • The JSON-Iterator uses a large pit of note
  • Go learning _28_ efficiently parse JSON data using EasyJSON

This article is licensed under a Creative Commons Attribution – Non-Commercial – Share Like 4.0 international license.

This paper links: https://segmentfault.com/a/1190000039957766

Author: AMC, originally published in Cloud + community, is also my blog. Welcome to reprint, but please indicate the source.

What’s wrong with native JSON packages in Go and how to better handle JSON data?

Post Date: 2021-05-06

The original link: https://cloud.tencent.com/developer/article/1820473.