Found the problem

Dubbo recently released version 2.7.1, which was an attempt to upgrade from version 2.7.0 to version 2.7.1. It was thought that the revision would be easy. After the version adjustment is completed and released to the test environment, the client attempts to call but an error is reported.

Check the exception stack and find that the url called by the client is a local address (127.0.0.1). Check the configuration of the registry and find no exceptions. Tracking down the provider found clues in the startup logs of the server.

The url published by the server is 127.0.0.1. Check the registered information in ZooKeeper to confirm that the URL published by the server is incorrect.

After the dubbo version is rolled back to 2.7.0 and the service is restarted, the url published by the service is changed back to the Intranet address (192.168.158.3).

Meanwhile, the client returns to normal

Is this version of the upgrade is about to end, the heart is still a little bit unconvinced, decided to track down the source code to see where the problem is.

The problem tracking

Start with version 2.7.0 and start tracing from the main entrance to find the key place where Dubbo gets IP when releasing the service. Dubbo uses the ServiceBean (which implements the ApplicationListener interface) to listen for the ContextRefreshedEvent event. The ContextRefreshedEvent event will be published after the Spring container is started. Dubbo listens for the event to publish the service.

Listen for ContextRefreshedEvent events and call the export method if the conditions are met.

The parent class method is called inside the export method, and the doExport method is called directly if there is no need to delay publication.

FindConfigedHosts, the method to obtain host, can be found in doExportUrlsFor1Protocol

Try to obtain the binding IP address from the protocol configuration. If there is any binding IP address, it will be used directly. If the protocol is not configured with an IP address, obtain the parameter from the Provider configuration. If the provider is not configured with a bound IP address, run inetaddress.getLocalhost ().gethostAddress () to obtain the local IP address.

Inetaddress.getlocalhost ().gethostAddress () gets 127.0.0.1. IsInvalidLocalHost considers the IP address invalid.

Try to connect to the registry through Socket, and obtain the local IP address after successful establishment. If the local IP address still cannot be obtained, use netutils.localhost () to obtain the local IP address. The IP address is 192.168.158.3. At this point, take a look at how the netutils.getl ocalHost() method gets the IP if you can’t connect to registryURL.

If the address is not a loopback address (127.xxx.xxx.xxx) and is not 0.0.0.0 or 127.0.0.1, it is regarded as a valid IP address. The above is the process of obtaining IP for dubo2.7.0 release service.

Tracking is

Let’s take a look at some of the differences in dubo2.7.1. We also trace the findConfigedHosts method in doExportUrlsFor1Protocol.

In order to reduce unnecessary overhead, netutils.get LocalHost() is used directly to obtain the IP address. Let’s continue to see if the implementation of this method has been adjusted. Until getLocalAddress0, we can see that the implementation of the new version extracts some duplicate code, but the main process is not much changed. We still try inetAddress.getLocalhost () first to verify whether the IP is valid, otherwise we iterate through the network card to obtain the IP.

If we look at the toValidAddress method, we can see that we have added an isValidPublicAddress check.

At that time we can see isValidPublicAddress method in more different things, and 2.7.0, compared to verify the implementation of increased address. IsSiteLocalAddress () check.

Address. IsSiteLocalAddress () is mainly to judge whether the private network address. (What was the reason for adding this judgment?)

The implementation of Inet4Address checks whether the IP address prefixes 10/8, 172.16/12, and 192.168/16. The previous LAN IP address (192.168.158.3) is in this range and therefore is considered invalid. In the end, because no valid IP address can be found in inetaddress.getlocalhost () or the nic address, 127.0.0.1 obtained first through inetaddress.getlocalhost () is selected. As a result, the publishing address is a local address.

contrast

Re-compare the netutils.get LocalHost() implementation of Dubbo2.7.0 and 2.7.1. With the following demo:

The final confirmation was that the netutils.getl ocalHost() implementation added a private network address judgment, which caused the LAN IP to be considered invalid, resulting in the new version of Dubbo being released with a local address (127.0.0.1).

Try to solve

Why did inetaddress.getLocalhost () get 127.0.0.1 because the new implementation finally chose to use the IP obtained by inetaddress.getlocalhost () as the distribution address? Instead of 192.168.158.3?

Inetaddress.getlocalhost () reads the hostname of the server and uses the hostname to look up the corresponding IP in the hosts file.

Hostname is localhost. Localdomain. Check the /etc/sysconfig/network file to confirm that it corresponds to 127.0.0.1.

Try thinking

Adjust the hostname and hosts file name associations to allow Dubbo to get the LAN IP

Change hostname to lizp-test, create 192.168.158.3 lizp-test in the hosts file, and restart the host to make the configuration take effect.

Netutils.getlocaladdress () can obtain ip192.168.158.3, restart demo can also publish services.

thinking

In this investigation and tracking experience, the following points are summarized and reflected:

  1. When upgrading common components, it is important to fully understand the version differences and have adequate testing, otherwise unexpected problems will occur.

  2. Reading the source code is the most effective way to understand the framework and troubleshoot problems.

  3. Another thing to consider is what makes the implementation of the new version of netutils.get L ocalHost() add a lissitelocaladdress. (Not yet understood)

  4. Is there any other solution besides using hostname and hosts files? (To be studied)