background

To talk about our technical architecture, we currently use SpringCloud Alibaba version, register with configuration center NACOS, call OpenFeign between interfaces, gateway for gateway.

The cause of

The thing is, we before this project is a large monomer project, carries on the micro service has recently been split, split out the modules in a single project with feign calls, that business people in the group complained to a page is slow, also has appeared before this kind of situation, is the company’s network problems, because in the gateway to see return time is very short, Within 100ms, but from the server to the company is very slow, open the log of the gateway, ready to throw the responsibility to the operation and maintenance, but this time the situation seems different, look carefully, the gateway interface return time is 12 seconds!! That’s right, 12 seconds! 12000ms, this is the fryer. Let’s find out why. Open the project, see the source code, this is a backend management page of a request list of data, is indeed a lot of logic processing, each return 10 data, each time to call 3 microservices, before or single time, because it is a local method call, return quickly, the overall interface returned within 300ms, Consider the time it takes to make a feign call to the network, but since it’s an Intranet call, even an HTTP request should be fast.

To find the reason

At this time, print the time consuming before and after the three calls to the Feign interface, and look at the log of the microservice called, we will have a Fitter for each microservice to make the request time statistics. It takes about 20-30ms to call each FEIGN interface in a single project, and 90ms for each of the 10 data returns. If you add other business processing, it takes more than 900ms. After several calls, I find that some microservice interfaces suddenly return 150ms+, which is slow if there are 10 data. But fortunately, there are not many slow feIGN calls. At this time, I went to the micro server to check, and found that most of the time it returned within 1ms, and occasionally it was around 100ms. After thinking about it, the micro service was split out, but it was not so slow. Add the network consumption after the fastest also 20ms, the slowest more than 100 ms, there must be a problem.

  • [The first pot is virtual, mainly because we didn’t use it right]

Let’s look at the micro server first, because the execution method is the same, why is some 1ms and others time-consuming? Mysql > select * from ‘redis’ where’ redis’ = ‘mysql’ where ‘redis’ =’ mysql ‘where’ redis’ = ‘mysql’ where ‘redis’ =’ mysql ‘ After connecting to the server, jstat-GC PID is printed once per second for 1s, which is a great print. After only 1 week of online operation, FGC reaches 48 times, and Ygc is needless to say. First, I think it is a code problem, will there be memory leak? This method is hideously simple, shouldn’t there be a memory leak? Look at the use of the heap jmap-heap PID, ok, the truth is out, the heap size is only 300M+, right? To know our server but 4 nuclear 8 g of ah, the original is that we don’t have and launch parameters and initial heap size, the JVM default initialization heap size is 64/1 of the physical memory, the largest is 4/1, which means we JVM has launched the heap size is 127, the biggest can increase to 2 g, each full of will trigger the FGC as well as the capacity, After restarting the project in the test environment, I found that FGC occurred 3 times as soon as it was started.

-xms4096m, -XMx4096m, -xx :MetaspaceSize=256m, -xx :MaxMetaspaceSize=256mCopy the code

Start the project again, and call it to see, basically keep about 900ms, each FEIGN interface 20-30ms, basically no more 100+ms feign interface, GC situation is no longer.

  • Is there not enough HTTP connection pool links?

But even 20ms is not right, obviously my micro service shows that the execution is completed and returned after 1ms, the network consumption can not be so much, right? Our feign is configured with httpClient client, based on connection pool, is the number of connections too small? HttpClient defaults to 200 connections. This is a lot of connections. Something’s wrong. No improvement, 20-30ms.

  • Is there any extra configuration?

Did I feign configure the encoding and decoding stuff? Normally you don’t need it, look at the code configuration and find the following code:

As expected is the configuration of codec, looks like no use, delete try, release version!

  • [Perfect!]

Now the sky is clear, individual projects are basically 1-2ms when calling Feign, microservices are also 1-2ms, and the overall interface is reduced to 100ms.

conclusion

  1. The JVM heap size needs to be fixed, otherwise the initial default physical memory is 64/1, which is likely to trigger GC and slow expansion will affect system performance. If the system is in GC when feign is called, the corresponding feign interface will also wait for 100ms.
  2. Feign uses the JDK’s HTTP client by default and creates a new link on each request. You need to change this to httpClient, or okHttp3, a pool-based HTTP client.
  3. Do not configure the FEign codec if there is no other business need, it will affect performance.

One interface was optimized from 12s to within 100ms, and the feign interface was optimized from 100ms occasionally and 20-30ms normally to 1-2ms, which is no different from the local method execution time.

I am Lu Awkwardness, welcome everyone to pay attention to my public number, weekly do not share technical articles, and I slowly grow up together!