Use in production environment

A, had been

1. Timeliness of service registration

EurekaServiceRegistry immediately registers the service to the ApplicationResource when it starts. As soon as the service is started, it will initiate registration immediately, with milliseconds of timeliness

2. Timeliness of service discovery

When the service is started, it will fetch the full registry information from eureka Server by sending an HTTP request, so this can be said to be millisecond level (full fetch at startup). When a new machine is added to the service, it will update the information and invalidate the read/write cache directly. However, due to the multi-level caching mechanism of Eureka Server, only the read/write cache is invalidated, and the data in the read-only cache is not updated. Inside Eureka Server, there is a scheduled task. The eureka client also has a scheduled task to fetch the registry information from the Eureka server every 30 seconds. Therefore, in this way, Data can be updated only after eureka server synchronizes read-only and read/write cache data, and the next eureka client incremental registry synchronization. The timing is about one minute. This way to grab the registry scheduling default is 30 s task execution time, can through eureka. Client. RegistryFetchIntervalSeconds parameters to adjust, in EurekaClientConfigBean, Defines the default values for all the arguments, and it will read the arguments prefixed with Eureka.client, plus the names of the variables, if we want to set them, as we did earlier. The eureka server automatically synchronizes the read/write caches with the read/write caches for 30 seconds. The EurekaServiceConfigBean defines the parameters. Eureka. Server. ResponseCacheUpdateIntervalMs, modify the synchronization time

3. Service heartbeat interval

Any service after the start, there will be a scheduling tasks, scheduling once every 30 seconds, by default sends a heartbeat, can pass parameters eureka. Client. LeaseRenewalInSeconds parameters to control, In EurekaInstanceConfigBean class

4. Timeliness of automatic sensing of service failures

In Eureka Server, a scheduling task is performed every 60 seconds to determine whether all current service instances have failed (no heartbeat). Controlled by parameters eureka. Server. EvictionIntervalTimerInMs, major judgment logic is to see the time, the last time a heartbeat and now time to make a comparison, and compensation of time (for example, because without gc to perform). – Currently 180s is a bug in Eureka. If it is out of date, 15% of the data is erased by default, and if it is less than 15%, all of the data is erased. After it is cleared, it will expire the ReadWriteCacheMap, which in extreme cases can take up to four minutes to determine the service failure, at least two to three minutes, the timeliness is also at the minute level, plus two cache maps, incremental synchronization registry of the service, Other services may take up to five minutes to feel that the service is down (extreme), but in general, it takes about three to four minutes. Estimates are generally made in extreme cases. So, here can be extended, the service timeout, retry link (because it is slow to perceive that the service has failed, the service will be called again during the broken, so there should be timeout and retry mechanism)

5. Timeliness of service referral perception

When the service goes offline, call shutdown, delete the data in the registry, add itself to the recentChangeQueue, and expire the read/write cache. Eureka Server synchronizes the read/write cache and read-only cache every 30s. The Eureka client performs incremental synchronization every 30 seconds, and this perceptual time is also about a minute.

6. Stability of eureka Server’s self-protection

Formula: Number of Eureka clients * (60s/heartbeat interval) * 0.85. Calculated per minute an expectation of the heartbeat, then there is a scheduling tasks, once every minute, by comparing the number of received a minute heartbeat and expectations of the heartbeat is compared, if less than the desired number of heartbeat, so at this time, would be the ego to protect mode, does not remove any service instance. And the ego to protect mode is very unstable, not suitable for the production environment to use completely, through parameter eureka. Server. EnableSelfPreservation to shut off, the default is on (true).

7. Load balancing of Eureka Server

When the service is started, it registers and sends heartbeat to the cluster. In general, if the eureka cluster address is configured in application. Yml, which machine is configured first, the first machine will go first. And if that doesn’t work, then you go to the machine in the back, 8762, and you go 8762, and 8761, if it comes back, doesn’t go, unless 8762 dies. Conclusion: In each service, a list of Eureka servers will be configured. Whoever is configured first will have priority access to that service, and will continue to access this service, unless the service is down, after retrying, it will go to the service address behind it, and will continue to go to this address.

8. Timeliness of eureka Server cluster synchronization

When we register with the cluster (heartbeat), we request the address of one service, which synchronizes with the other services in the cluster. The machine that receives the request forwards the request to all the other machines, which is a for loop. The cluster tasks will be put into the queue, which will be processed by the corresponding Runner background thread every 10ms. There will be a thread to batch the requests every 500 ms, and each batch has a maximum of 250 tasks, and then one time. Batch is sent to other services to reduce network overhead. The effectiveness of eureka Server cluster synchronization is about a few hundred milliseconds or less than a second.

Service invocation

1. Ribbon + Eureka service discovery and fault awareness timeliness

Every 30 seconds, the Ribbon pulls the registry information from the Eureka client. The Eureka Client gets a new service. This takes about a minute. In case of a service failure, the eureka client can only get the latest data in four minutes. With the ribbon, it takes about four and a half minutes.

2. Load balancing algorithm

The default load balancing algorithm used by the ribbon is polling. The ZoneAwareLoadBalancer is responsible for the load balancing algorithm. This class is used to classify computer rooms.

for (;;) {
            int current = nextIndex.get();
            int next = (current + 1) % modulo;
            if (nextIndex.compareAndSet(current, next) && current < modulo)
                return current;
        }
Copy the code

Timeout and retry of feign + Ribbon service calls

Configure the timeout and retry mechanisms

ribbon:
	ConnectTimeout: 1000				# link timeout
  ReadTimeout: 1000						# Time to initiate a request like a machine
  OkToRetryOnAllOperations: true All requests should be retried
  MaxAutoRetries: 1						Request once per machine, retry once
  MaxAutoRetriesNextServer: 3 Retry three times on a machine (retry three times on a faulty machine, retry another machine)
Copy the code

But if we set it like this, we now have two machines, 8080, 8088, we stop 8088, this request to request 8088, we can have the first time: 8088, second time: 8088, third time: 8088, fourth time: 8080

In other words, MaxAutoRetriesNextServer means to retry the failed machine three times, so when we set it, we should set it to 1 and let it retry once. **

Third, the gateway

1. Solve the first timeout problem through Ribbon preloading

Zuul needs to load the ribbon and pull the registry from eureka client for the first time. This process is slow and may cause timeout. We can configure relevant parameters to preload the ribbon to prevent ribbon timeout

zuul:
	ribbon: 
  	eager-load:
    	enabled: true
Copy the code

However, even if the ribbon is configured, there will still be a timeout because each service initializes the ribbon at the first request, causing the request to time out

2. Zuul + Ribbon + Eureka service discovery and fault awareness timeliness

Zuul and Feign are of the same type. They both come online through the Ribbon + Eureka. The Eureka client perception service comes online in about a minute, and the ribbon comes online in about a minute, a minute and a half. If the shopping cart service has two machines and one of them is down, the Eureka client will notice it in about five minutes, the Ribbon in about 5.5 minutes, and Zuul similarly. For load balancing, the default is polling. If a service goes down and Zuul keeps asking for it, zuul will default to hystrix integration, which will degrade the logic, but default to nothing, and will throw the exception, which will be printed out by errorFilter and returned to the caller.

3, a timeout

The timeout here is divided into ribbon timeout and hystrix timeout. Zuul enables Hystrix by default, and Hystrix wraps around the ribbon. The hystrix timeout period must be longer than the ribbon timeout period. Otherwise, the hystrix timeout period is logically unreasonable. (ribbon. ConnectTimeOut + ribbon. ReadTimeOut) * (ribbon. MaxAutoRetires + 1) * (ribbon. MaxAutoRetriesNextServer + 1) If the ribbon timeout period is not set, the default Hystrix timeout period is 4000 ms

zuul:
	ribbon: 
  	eager-load:
    	enabled: true
  retryable: trues
ribbon:   Retry with timeout
	ReadTimeOut: 100
  ConnectTimeOut: 500
  MaxAutoRetries: 1
  MaxAutoRetriesNextServer: 1
Copy the code