On August 31, 2019, the OpenRESTY community jointly held the OpenRESTY × Open Talk national tour salon · Chengdu station. Wang Yuansheng, the main author of Apisix, shared “Apisix High Performance Practice” at the event.

OpenRESTY × Open Talk is a national touring salon initiated by the OpenRESTY community, which invites senior OpenRESTY technical experts in the industry to share their OpenRESTY experience and improve communication and learning among OpenRESTY users. Drive the OpenResty open source project.

Yuansheng Wang is the founder and lead author of the Apisix project, the OpenRESTY community, the OpenRESTY Software Foundation, and the lead author of OpenRESTY Best Practices.

Here’s the full text:

First of all, I’d like to introduce myself. After graduating from university, I worked in the traditional financial industry for nine years. I joined Qihoo 360 in 2014, during which I wrote OpenRESTY Best Practices. Personally, I prefer to study technology and open source, perhaps because of the influence of Luo, like to try idealistic things. In March this year, together with like-minded partners, we founded Shenzhen Branch Technology Co., Ltd., which is one of the few open source technology companies in China. Apisix is our main project at present.

Apisix is a micro service API gateway product. In July this year, I made a sharing about “Apisix High Performance Practice” in Shanghai. This time, the content is based on the last sharing, and I will share with you the recent accumulation.

What is an API gateway

API gateway plays an increasingly important role as the gateway of all traffic. As can be seen from the figure, requesters may come from browsers, LOT devices and mobile devices, etc. API gateway, as an intermediate control layer, needs to do security control, traffic and logging, etc. More and more enterprises are adopting the micro-services approach to achieve internal decoupling, flexible deployment, flexible scaling and other technical features to meet business requirements. The number and complexity of micro services are also rising, through the API gateway to complete the unified traffic management scheduling is very necessary, and the API gateway put forward higher requirements.

APISIX overview

The diagram above shows the basic architecture of Apisix. To support clustering and high availability, either the AdminAPI or the Apisix kernel needs to be included on any node, and can be used with only part or both enabled. The Admin API is mainly used to receive the submitted information from the administrator and complete parameter verification through JSON Schema to prevent illegal parameters from falling into the configuration center of the storage. The internal part of Apisix processes the external request, matches it to specific routing rules based on the request characteristics, executes the plug-in, and then forwards traffic to the specified upstream service.

Apisix releases a monthly version with routing plug-in support in version 0.7 and is proud to say that this is currently the only API gateway implementation that allows custom routing. In addition to the existing R3 routing, Apisix has added a dedicated high-performance prefix-matching RadixTree, which was open sourced by Redix’s authors. The matching efficiency of RadixTree code is 10 times or more than that of R3, and some production users do see a significant decrease in CPU usage after upgrading RadixTree.

The image above shows what Apisix had two months ago.

In the last two months, Apisix has added these new features, and each month there are about five or six big new features. If I only shared some of the new features in Apisix with you, the benefits would probably be small, so today I’m going to share with you some general OpenResty programming tips.

Apisix is all about high performance, and we compare the performance with OpenResty to highlight the best of Apisix’s performance. The first test was performed with the full Apisix service, and compared to an empty OpenResty service with no functionality at all, it was found that Apisix only experienced a 15% performance degradation with all the functionality loaded. In other words, if you can live with a 15% performance drop, you can enjoy all the features shown above.

OpenRESTY optimization tips

Routing: RadixTree vs R3

Now that we have R3, why do we continue to implement new routes using RESTy-RadixTree?

First, let’s introduce the problems of R3: R3 has a high learning complexity (regular is inherently difficult to learn), and it does not support iterator iteration of matching results, which is much less efficient than prefix tree implementation. On the contrary, these problems in RESTY-RadixTree have a perfect solution, performance, stability naturally improved a lot. The current RESTy-RadixTree implementation is based on Antirez/Rax, and is also written by the Redis authors, standing on the shoulders of giants can save us a lot of detour.

From the point of view of data structure, the prefix tree is theoretically faster than the hash algorithm, because the real complexity of the hash algorithm is O(K), K refers to the length of the query Key, the longer the Key is, the more complex the hash algorithm is to turn the string into an integer, while the prefix tree is progressive, the worst complexity is O(K), So the worst efficiency of the prefix tree is the same as the hash algorithm.

Of course, this is only in principle. After special testing, it is found that the hash search speed of Lua table kills the prefix tree in seconds. This is because when compiling Luajit, it uses the CPU instruction set to calculate the hash value, which can achieve O(1) perfectly. So the Luajit table hash is the most efficient, followed by the prefix tree.

Matching is the most efficient in the Luajit world, where hashes from the Lua table are always used first. We ended up using a RadixTree instead of a Trietree, which is memory consuming, for a much smaller footprint with similar performance.

OpenResty vs. Golang, HTTP vs. GRPC

In 2015, I chose OpenResty instead of Golang. The reason is that I think OpenResty can think more deeply, while Golang can only solve problems from the application level. For this reason, I chose OpenResty.

Apisix supports a scenario where HTTP(s) -> Apisix -> GRPC Server turns REST APIs into GRPC requests. After this function is completed, you need to do some stress tests to verify the effect. For ease of comparison, a protocol conversion gateway was also written in the Golang way. Tests found that the Apisix version performed slightly better than the Golang version, and my computers all had a single core of around 10,000 QPS. We thought Golang would have the best performance in the GRPC space, but Apisix had a chance to do a little better.

We preliminarily assumed that HTTP must not perform as well as GRPC, but now it is a bit arbitrary. GRPC has many advantages that HTTP does not, such as its smaller size and built-in schema checking. However, if your request body is small and you use JSON plus JSON Schema over HTTP, the performance of the two is almost the same, especially if the difference is very small on the Intranet. If the request body is large and the encoding is complex, then GRPC has a significant advantage.

The acceleration of NGX. Var

The easiest way to speed up Nginx variables is to use the iresty/lua-var-nginx-module repository and compile it into an OpenResty project as a Lua module. When we extract the corresponding ngx.var, we use the method provided in the library to get it, which can increase the overall performance of Apisix by 5%, which is at least 10 times the difference between the performance of a single variable. Of course, you can compile this module into a dynamic library and load it dynamically, so that you don’t have to recompile OpenResty.

The Apisix Gateway will retrieve a large number of variables from ngx.var, and variables such as host address may be retrieved repeatedly. It is inefficient to interact with Nginx each time. So we added a layer of CTX cache in APISIX/core, which is the first time we interacted with Nginx to fetch variables, and then we used the cache directly.

Exception: Again, I recommend that you look at the code in Apisix/Core. This code is generic and should be useful for most projects.

fail to json encode

When we encode a table using JSON, it may fail. There are several reasons for failure: for example, the table contains CDATA or userdata cannot encode, or contains function, etc. But in fact, encode is not intended to be a result that can support serialization/deserialization perfectly, sometimes it is just for debugging.

So I added a Boolean parameter to Apisix’s core/json_encode to indicate whether to force it to a string when it can’t. In addition, table within table is A common situation, that is, there is A table A, and A itself is referenced inside the table of A, forming A circular nesting. The solution to this problem is simple, in the case of nesting, after reaching a certain position point, no more nesting. Allowing table encode to be enforced in these two scenarios is very useful for our development debugging.

When debugging, if you need to type the table results, you should not trigger meaningless JSONENCODE behavior when the log level is not enough. It is recommended to use DELAY_ENCODE to debug the log. The JSON encode will be triggered only when the log actually needs to be written to disk. Avoid those that do not require encode. This problem worked very well in Apisix, where it was finally possible to test different levels of logging without having to comment the code. It was a bit of a C macro definition, a nice balance between performance and ease of use.

Static code detection tools

At present, Apisix runs code checking tools for CI regression: Lua-Check and LJ-Releng. Statically check the contents of the current code directory, such as whether there are global variables added, whether the length of the code line is too long, etc.

The lifecycle of Rapid JSON

A particularly interesting bug was discovered during debugging regarding the JSON declaration cycle of Rapid. The reason for this cycle can be seen in the last line of the figure above. We are actually using the Validator and only calling one validation of the Validator, which is derived from the create-Validator above. One thing to note here is that why use an array to cache another object called SD?

Since the Validator is a CDATA with an internal pointer reference dependency on the SD object, both of them must have the same declaration period and cannot be prematurely released. If we need two objects to have the same lifetime, then putting them in the same table is the easiest way.

Third-party libraries use PCRE

If you choose the most efficient C library and the C library cites pcre this library, it needs to consider a problem, the object across a request there will be a very big risk, therefore must be separate to the library to create independent memory pool, must not use the current request memory pool, soon to be released because the current request.

How can we solve this problem? If OpenResty had the API right now, it would have been better to simply apply to the memory pool, but unfortunately OpenResty doesn’t. If you look at the Ningx source code, you can see that the memory pool function for creating Nginx is defined as ngx_create_pool(size_tsize, void *log), as long as the global log handle can be obtained.

We choose to get the log object from the global ngx_cycle, here I define a fake fake_ngx_cycle structure, this structure is the same as the first three parts of NGINX’s ngx_cycle structure, but we cut off the last part, and then we make a memory copy. So we get the position of the log object pointer.

Start the Prometheus plugin

I took a cursory look at the code while researching Kong’s Prometheus plug-in and found that his implementation logic was flawed and could seriously impact performance. So instead of using Kong directly, turning on the plug-in in Apisix only decreases performance by about 5%. This plug-in is nearly 10 times more powerful than Kong. I have also talked with Kong’s technical person in charge here, and I will contribute some Apisix practices to Kong in the future, so that we can learn from each other and grow together.

That’s all I have to share today, thank you!

Speech PPT download and video viewing:

Apache Apisix Microservice Gateway Extreme Performance Architecture Analysis

This article by the blog multiple platform
OpenWriteRelease!