Jingdong Wang dong: The practice of architecture behind the load of GIGABit on 618 Da Promote Gateway

This article is an online record shared by Professor Wang Dong on the architecture practice behind the 618 Big Push Gateway carrying Gigabit.

In 618, our gateway carries billions of traffic and calls. In this case, the gateway system must ensure the stability and high availability of the entire system, and ensure high performance and reliability to support the business. We are faced with a very complex problem, based on this complex problem, how to improve its performance and stability, how to integrate complex technologies to ensure the high availability of the overall gateway, is the focus of this paper.

First, gateway coverage technology

1.1 Gateway System

There are two main types of gateway systems:

The first is called the client gateway, which is used to receive requests from some clients, that is, the server of the APP.
The second is called an open gateway, where companies (such as JD.com) provide an interface to third-party partners.

The technology used by the two different gateways is very similar.

The gateways with heavy traffic face the following difficulties:

First, the gateway system needs to carry billions of traffic calls, and the smooth operation of the interfaces, and the performance degradation of each interface after the back-end service, is very important. For example, we use a Redis cluster, and then build two rooms, each building a Redis cluster, so that we can ensure high availability. In the face of an instant traffic, we use some caching technology, or more advanced Nginx+ Lua +Redis technology, to allow this heavy traffic application to be independent of the JVM. In addition, we need to sort out various interfaces and downgrade some of the weakly dependent interfaces through the degradation strategy, so as to ensure the availability of core applications.

Second, a gateway system is simply a process of extending Http requests to back-end services. Our gateway accepts more than a thousand back-end service interfaces. How do services not affect each other? How can architecture prevent butterflies and avalanches? This means that the failure of one interface does not affect the health of other interfaces. It sounds simple, but it’s not.

More than one thousand interface, each interface performance are inconsistent, and each interface rely on external resources, such as database cache is different, almost every day all kinds of problems, how do we through some isolation technology, control technology, etc., ensure that when these interfaces appear problem, will not affect the global?

Third, we expose a thousand service interfaces to the outside world, and the back of all interfaces means that dozens or even hundreds of teams are constantly developing every day, and new requirements may come online every day. Faced with such a complex situation, it is impossible for us to change the gateway every time there is any change in the back-end server, so the gateway will become very fragile and unstable.

We adopt a dynamic access technology, so that the back-end gateway can be seamlessly accessed through an access protocol. Then, through some dynamic proxy methods, the back-end interface can be directly transmitted and released externally through the back-end management platform from the gateway, regardless of any modification or online. It solves the problem that the gateway relies on the backend interface service to go online.

1.2 Gateway coverage technology

Four technical directions for gateways:

First, unified access. Traffic from the front end (including APP or other sources) can be accessed at the unified network layer. The problems at this layer are: high performance passthrough, high concurrent access, high availability, and how to forward a load to the back end once the current end traffic comes.

Second, flow control, mainly refers to the flow management part. In the face of massive traffic, how do we protect the gateway from being washed down by large traffic through some anti-brush technologies? And how to protect the gateway comprehensively through some technologies such as current limiting, downgrading and fusing.

Third, protocol adaptation. As mentioned above, the gateway will pass through thousands of back-end services, and not every one of these services needs to be developed and configured by the gateway. We use a protocol-adapted transformation to make the back-end services open from the gateway over HTTP, not only HTTP, but also TCP, using the protocol we specify. The internal protocols of JINGdong are relatively unified, including Http restful protocol and JSF interface. JSF is an internal framework developed by Jingdong, which is an RPC call framework similar to Double and an RPC framework discovered based on registration.

Fourth, safety protection. This part is very important for the network, because the gateway is one of the whole company’s foreign exports, in this layer we need to do some brush, such as prevent malicious flow cleaning, do some blacklist, when there are some malicious traffic, by limiting the IP restrictions such as refused to it in the gateway, to prevent these malicious traffic gateway from flooding.

2. Self-developed gateway architecture

2.1 Self-developed gateway architecture

Our self-developed gateway architecture is mainly divided into three layers.

Layer 1: access layer. Mainly responsible for the access of some long and short links, traffic limiting, blacklist and whitelist, routing, load balancing, and DISASTER recovery switchover. The technology used in this layer is the Nginx+ Lua approach.

Layer 2: the distribution layer (or business layer of the gateway). It is more NIO+Serviet3 asynchronous technology. This layer is divided into several sections.

The top part is data verification, in this layer will do some signature verification, time verification, and version, method and so on.
The next layer is called the generalized call layer, which mainly converts restful requests exposed by the gateway to the internal protocol of JINGdong for a dynamic adaptation call process. In this part, we use some caching technology, thread isolation, fuse and other technologies are also implemented in this layer. Because there are a lot of data and protocol conversion, so this layer uses the caching technology, all the data of our gateway layer will not directly penetrate DB, but adopt a way called heterogeneous data to do directly with the cache.

There are two layers in the generalization layer: one is called active notification and the other is sandbox testing. Active notification is easy to understand, that is, we will timely notify the client through this TCP downlink channel, send some coupons or reminders like JINGdong account; Sandbox testing basically means that we do an external test of some interface before it goes live.

As shown in the figure, the far right part is service degradation, logging, and monitoring alarms, which are the support system of our entire gateway. Service downgrading means that when some services have problems, the first time to downgrade it; Logs are for us to troubleshoot problems; Monitoring alarms will be emphasized in the following sections, because the availability of a gateway is largely improved through monitoring systems. Without monitoring systems and alarms, it is impossible to know anything without eyes.

Layer 3: Various business apis (business interfaces) on the back end. These interfaces are exposed externally through the gateway.

The whole gateway is generally divided into the above three layers: the access layer at the top, the distribution layer of the gateway in the middle, and the service verification and business logic layer, and then pass through the request to the back-end service through the gateway.

In addition to these three layers, we look at the systems on both sides, which are the core and important support of our entire gateway.

Gateway registry. The back-end of a variety of interfaces can be published through the gateway registry, the system has a similar management interface, as long as the back-end API service in accordance with the inherent protocol for a preparation, if the format is OK then uploaded to the management background, a key can be published online. Of course there will be a test before the interface is released.
OA Authentication Center. This section is mainly used for authentication. Security checks such as the check of many signatures at the data check layer are performed uniformly at this layer.

2.2 technology stack

Some technology stacks involved in our gateway system: the first is access layer Nginx+ Lua technology; The second is NIO+Serviet3 asynchronous technology; The third is separation technology; The fourth is downgrading current limiting; The fifth is fusing technology; The sixth is cache, where should add cache, where can read library directly; The seventh is heterogeneous data; The eighth is to fail quickly; Finally, there are monitoring statistics, which are a very important part of the overall HA gateway system.

What follows is an in-depth discussion and analysis of the scenarios where these technologies are applicable, including what problems we are using them to solve.

Iii. Basic ideas and process improvement points

Practice 1 Unified Access at the Nginx layer

First look at the deployment architecture of the gateway on the whole line, first through a soft load LVS into the gateway of the whole jingdong, the first layer is the core Nginx, after the core Nginx is the back of the business Nginx, and then through the business Nginx to pass our request to the back-end server.

Core Nginx is mainly the front-end traffic distribution, such as traffic limiting, brush prevention are done in this layer. The lower layer is business Nginx, where the main Nginx+ Lua logic is implemented. This layer also reduces core Nginx stress, CPU stress, and some lua application logic, such as limiting, brush prevention, authentication, degradation are done in this layer.

Why Nginx+ Lua? Compared to Tomcat and others, Nginx is actually a server that can carry exceptionally large amounts of concurrent traffic. Based on this situation we had problems before, when the concurrent flow particularly big, once appear behind a single machine has a problem, even if what did you do on this interface relegation, but the real traffic or into the Tomcat JVM layer, when the flow is very big, it is difficult to through the JVM to digest the things, such as a result of: When you have a problem with Tomcat, it’s very difficult to restart it because the traffic will always be there, this Tomcat has a problem, and when you restart it, it releases all the actions, but they’re like a virus, they go back and forth, you restart one batch, and that batch is immediately infected again. Nginx is naturally NIO asynchronous and supports the business requirements of large concurrency very well. So we put some core elements, such as downgrading and flow control, on this layer, and let it prevent the flow at the front for us.

Practice 2 introduces NIO and asynchronizes with Servlet3

The second practice was to introduce NIO in the Tomcat layer, using a JDK7+TOMCAT7+Servlet3 configuration to make synchronous requests asynchronous, and then take advantage of NIO’s multiplexing processing technology to allow us to handle a higher number of concurrent requests simultaneously.

With Servlet3 asynchronization, throughput is improved, but the response time for individual requests is slightly longer, but the loss is tolerable because of the increased throughput and flexibility of the overall application. It’s still worth using.

Specific adoption strategy: business method to enable asynchronous context AsynContext; Release tomcat’s current processing thread; Tomcat The thread is released for processing the next request, increasing its throughput. The business method is processed in the AsynContext environment, and its complete method is called to write the response back to the response stream. This improves the likelihood that tomcat business logic can handle more requests with very few threads in this layer, rather than being overwhelmed by heavy traffic.

Practice 3 The art of separation

In this section, two of the most important separation techniques will be selected and shared.

Request resolution is separated from business processing

The first is to separate the request parsing thread from the subsequent business thread via NIO.

Requests are handled by a single Tomcat thread, and a very small number of threads can handle a large number of links in NIO mode. The business logic processing and response generation are handled by a separate Tomcat thread pool, isolated from the requesting thread. The business thread pool here can be further isolated, with different threads set up for different businesses.

Business thread pool separation

The second is business thread pool separation, which is to isolate different interfaces or different types of interfaces through a thread isolation technique. For example, order related interface, take 20 separate threads to process; Commodity related interface, take 10 separate threads to deal with, so that different interfaces do not affect each other, if the order piece has a problem, it consumes itself at most, will not affect the call of other interface threads.

Specific thread isolation can be according to the business to specify a set of the number of threads, is this a few threads for fixed interface, this interface when there is a problem, it is to spend your own threads, not to occupy the other thread interface, this had the effect of thread isolation, let a single API will not affect the other problems.

Practice 4 Downgrade

Demotion basically means that when there is a problem with an interface, we can demote the interface directly and make it return directly without using other applications. In addition, if there is a problem with a weak piece of business logic, we directly turn it down, so as not to affect the other golden logic.

How to downgrade?

First of all, the degrade switch should be centrally managed, for example, pushed to each application service through ZooKeeper. In this way, we can find the corresponding switch to do downgrade in the first time there is a problem.

A unified configuration based on development degradation itself if the system is highly available, support multi-dimensional caching, for example, if we use ZooKeeper implementation, first ZooKeeper will have database storage, and then there will be a local cache. If ZooKeeper cannot read from the cache, it will load some data from the snapshot to ensure that the development can respond to the trigger in the first place. And our switch is not going to be a problem for other systems, it’s a very weak, very thin layer.

Fine flow control

After switching, traffic control and degradation, let’s look at multi-dimensional traffic control and degradation strategies, such as control by a single API or API+ region, operator and other dimensions. When things go wrong, we degrade the various combinations, and we can also fine-tune the flow management by different dimensions such as the second/minute level.

Graceful degradation

When it comes to downgrades, what we’ve said before is more technical, but at the business level, we should also pay attention to graceful downgrades. We can’t say that once the logic is established it just goes back to front 502, that’s definitely not friendly. We will definitely communicate with the front end, such as feedback a corresponding error code to the front end after demotion, or feedback a hint to the user and other operation instructions, so as to make the user experience better.

Practice 5 Current limiting

Malicious request and malicious attack. Malicious request traffic can only access cache, and malicious IP addresses can use the DENY lining at the Nginx layer.

Prevent processes from exceeding the system’s capacity, although it can be expected but there are always surprises, if there is no finite flow, when the system’s capacity peak is exceeded, the whole system will collapse.

Practice 6 Fusing

When we have a problem with the back end mechanism, a certain threshold is reached, the system can automatically shut down degrade, this is the general idea of the circuit breaker. We will have more flexible configurations: for example, when an interface times out or returns an error three times in a row, it will automatically fuse. For example, if the performance of this method call exceeds 50 ms for three consecutive times, the method will be fusing automatically. After the fusing, it is equivalent to degraded. If the method is called again, it will return failure, which is directly refused to return.

After the circuit breaker, there can also be a setting: for example, after 5 seconds or a minute, it will come out of the semi-open state, and wake up again, it will test whether the service is OK that day, if there is no problem, it will go to the API business you shut down before again, can provide services normally. There are some open source practices that you can use to make a circuit breaker. Of course, according to the ideas here, you can also implement it yourself, which is not particularly complicated.

Practice 7 Fast Failure – Timeout in links

Failing fast is a very important practice, not only for gateway systems, but also for other systems, especially for high-volume systems, such as paying attention to timeout Settings throughout the chain. This is something that we need to focus on reviewing every year when we are preparing for Double 11 and 618. We should focus on monitoring this item when we are doing development and before every new module is launched. We will sort out all the external dependencies of the system, such as the gateway’s dependence on some of our own business caches, databases, and more importantly, on thousands of different services at the back end.

This involves to the network, we have to be set over time, because the dosage larger systems, such as the gateway if no super time, may be it the default time is a few minutes, for such a long time, once you have an agency problem, possible moment of avalanche away all the gateway system, no one interface can external use, because the data quantity is large, Chances are you’ll be swept away before you even get demoted.

Practice 8 Monitoring Statistics – Application Layer

Monitoring statistics is a very core part of the gateway system. Only with monitoring and alarm can we understand all the operations and every API call in real time.

Monitoring the target

First: guarantee 7*24 hours guard system;

Second: the ability to monitor the health of the system in real time, such as which APIS are being called too long? Which API has been blown out? And so on;

Third: statistical data, analysis indicators. For example, when a day passes, does each API call time out? There is no access performance degradation etc.;

Fourth: real-time alarm. Because monitoring is part of the problem, we can be notified of the problem as soon as we find it, so that we can deal with it immediately is also an aspect of making the system healthier.

Monitoring scope

Dimensions of monitoring

Layer 1: Hardware monitoring. For example, the CPU, memory, and network adapter of the system.
Layer 2: Custom monitoring. Like calling the police.
Layer 3: Performance monitoring. For example, the TP index of each interface, TP999, TP99, TP90, TP50 are used as SLA reference standards, as well as availability, which are critical to gateways.
Layer 4: Heartbeat monitoring. There are many machines on the gateway system line. What is the status of each machine? Is there any stock etc.
Layer 5: Service layer monitoring. For example we will have some JVM monitoring, monitoring the number of Nginx connections, etc.

Jingdong has a perfect internal monitoring system, called UMP system, which can help us conduct monitoring at all levels. It basically provides us with configuration files that we can use to monitor the system, and we’ll use some AOP proxy to monitor all the methods as we go. Because we are the gateway, we need a lot of back-end pass-through. The gateway, because it is dynamically generating these interfaces, does not know what interfaces there are, so when dynamically generating interfaces, AOP automatically infuses it with a monitor, so that each interface can have a monitor.

Said to monitor have to mention is that we do is do passthrough gateway system, there are all kinds of different interface, business logic behind, each need to monitor the performance of the business logic and interface, and then inform the other party to the other party to rectification, so we add the monitoring, there’s a problem to be able to inform to the corresponding head, including our own. Therefore, every day and every week, we will send out emails in the form of statements, so that all system leaders know the situation of the corresponding organization, such as whether there is a problem in performance, whether to rectify and so on.

This article is protected by the original, without authorisation, prohibit reprinting. Linkedkeeper.com (By Wang Dong)

Jingdong Wang dong: The practice of architecture behind the load of GIGABit on 618 Da Promote Gateway

Related Posts

How to Program with Go (Environment preparation)

Tcpdump checks for online interface requests

Python takes you inside the wizarding world of Harry Potter