Nacos source code analysis

Code directory structure

  • Address module: mainly queries the list of nodes and IP addresses in the NACOS cluster.
  • API module: An abstraction of the API interface that is mainly called to the client.
  • The Common module is a generic toolkit and definition of string constants
  • Client module: it mainly relies on THE API module and the common module, and implements the interface of the API for nacOS clients to use.
  • CMDB module: the main operation data storage in memory, this module provides an interface to query data labels.
  • Config module: mainly manages service configuration, provides API to pull configuration information to the client, and provides updated configuration. The client updates configuration information through long polling. The data store is mysql.
  • Naming module: mainly serving as the implementation module of the service registry, it has the functions of service registration and service discovery.
  • Console module: Implements the functions of the console. Provides functions such as permission verification, service status, and health check.
  • Core module: A post-processor that implements Spring’s PropertySource and loads configuration information for NACOS’s default.
  • Distribution module: The main operation of packaging nacos-server, using maven-assembly-plugin for custom packaging

frame

The client

Heart keep alive

Heartbeat is periodically sent to the NACOS registry. Class: BeatReactor BeatTask. If the packet is sent every 5s by default, the system also obtains the latest delay time from the last heartbeat response packet.

Heartbeat packets are as follows:

{"cluster":"DEFAULT"."ip":"xxxx"."metadata": {"preserved.register.source":"SPRING_CLOUD"},
 "period":5000."port":8091."scheduled":false."serviceName":"DEFAULT_GROUP@@service-preposedata-testD"."stopped":false."weight":1.0}
Copy the code

The heartbeat was cancelled after the instance went offline. The online and offline interface implementation classes NacosServiceRegistry are typically hosted in the spring container. However, SpringBoot does not take the NACOS service offline until the last minute, which causes the service consumer to have a short request to continue to send to the provider and causes a timeout.

The solution is to listen for the Spring container ContextClosedEvent event and actively take the NACOS service offline during that time. The code is as follows:

@Override
public void onApplicationEvent(ContextClosedEvent event) {
    NacosAutoServiceRegistration nacosAutoServiceRegistration = ApplicationContextUtil
        .getApplicationContext()
        .getBean("nacosAutoServiceRegistration",NacosAutoServiceRegistration.class);
	nacosAutoServiceRegistration.destroy(); 
}
Copy the code

But that doesn’t solve the problem. Because the consumer caches the server list locally.

Service subscription

Initialization fetch

Service consumer when they start by NacosNamingService getAllInstances/selectInstances interface to get the corresponding instance of the collection service.

The spring-cloud-started alibaba-nacos-Discovery package contains the following code:

private List<NacosServer> getServers(a) {
		try {
			String group = discoveryProperties.getGroup();
			List<Instance> instances = discoveryProperties.namingServiceInstance()
					.selectInstances(serviceId, group, true);
			return instancesToServerList(instances);
		}
		catch (Exception e) {
			throw new IllegalStateException(
					"Can not get service instances from nacos, serviceId="+ serviceId, e); }}Copy the code

The call chain is as follows:

This is how you get the server list at initialization. The NacosNamingService class also creates the HostReactor class when instantiated.

Update server information periodically

One of the purposes of the HostReactor class is to instantiate a Scheduled task called UpdateTask. The purpose of this class is to update service information. The update frequency is 10 seconds.

Update server information in real time

Another function of the HostReactor class is to instantiate a Scheduled task called PushReceiver. This class is used to receive changing service information in real time in UDP mode.

Problem solving

For the client, there are periodic updates and real-time notifications. However, for the system with high concurrency, the problems mentioned above may still occur. Continue to work in both directions.

  • Tomcat, passive wait

    There is a second delay when a message is received from the service provider offline to the consumer. Tomcat will still receive requests from the upstream system during these several seconds. Therefore, when the local service is stopped, it is necessary to monitor the ContextClosedEvent of the Spring container and take the initiative to offline nacOS in the first time. Do not rely on the implementation of Spring Cloud. The second listens on the Tomcat thread pool and the self-built thread pool until all the tasks in the thread pool are consumed, at which point the service is gracefully shut down. Note: The spring container shuts down the database connection pool at the end of the process. If scheduled scheduling has not stopped before then, database connection exceptions will occur

  • Ribbon, go offline

    If the ribbon+ Feign framework is used in the project, the server list retrieved by the Ribbon is cached again. The ribbon holds a list of servers that are updated from nacOS to loadbalancer by the PollingServerListUpdater scheduled task (delay=1s). To sum up, the Nacos Client gets the server list in real time through UDP, but the Ribbon updates the table on a regular basis through scheduling tasks. The delay gets longer and longer. Here’s the code. If the ribbon wants to receive nacOS updates in real time, it needs to modify the source code to listen to the NACOS registry in real time.

Class: PollingServerListUpdater@Override
    public synchronized void start(final UpdateAction updateAction) {
        if (isActive.compareAndSet(false.true)) {
            final Runnable wrapperRunnable = new Runnable() {
                @Override
                public void run(a) {
                    if(! isActive.get()) {if(scheduledFuture ! =null) {
                            scheduledFuture.cancel(true);
                        }
                        return;
                    }
                    try {
                        updateAction.doUpdate();
                        lastUpdated = System.currentTimeMillis();
                    } catch (Exception e) {
                        logger.warn("Failed one update cycle", e); }}}; scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay( wrapperRunnable, initialDelayMs, refreshIntervalMs, TimeUnit.MILLISECONDS ); }else {
            logger.info("Already active, no-op"); }} class: DynamicServerListLoadBalancer@VisibleForTesting
    public void updateListOfServers(a) {
        List<T> servers = new ArrayList<T>();
        if(serverListImpl ! =null) {
            servers = serverListImpl.getUpdatedListOfServers();
            LOGGER.debug("List of Servers for {} obtained from Discovery client: {}",
                    getIdentifier(), servers);

            if(filter ! =null) {
                servers = filter.getFilteredListOfServers(servers);
                LOGGER.debug("Filtered List of Servers for {} obtained from Discovery client: {}",
                        getIdentifier(), servers);
            }
        }
        updateAllServerList(servers);
    }
Copy the code

The service registry

The SpringCloud registration nacOS service is accessed from here:

Registration entry provided by NACOS:

The heartbeat keepalive thread is started during registration.