A criticism

I was talking to a reader the other day and Dubbo came up.

He said he came across a Dubbo pit earlier.

I asked what happened.

Then he gave me the context, summed up in two words: after the timeout, automatically retry.

I have two points to make about this.

  • Readers who are not familiar with using the framework are not aware of Dubbo and automatic retry.
  • Dubbo is an auto-retry feature that I think is well-intentioned but poorly designed.

The first one is not what to say, the art is not fine, continue to study.

Let’s focus on the second one.

To be fair, as a Dubbo user for many years, I think the idea of retry is a good one based on my experience, but what’s wrong with it is that it shouldn’t automatically retry.

In most cases, I manually set it to retries=0.

As a framework, of course, users can be required to fully understand the relevant features before using, including the need to understand its automatic retry function.

However, it is up to the user to decide whether or not to retry, and frameworks or utility classes should not actively do this for the user.

Wait, that’s a little too definitive. Let me change that.

It is up to the user to decide whether to retry after scenario analysis. The framework or tool class should not intervene at the business level to help the user do this.

This article takes two familiar examples to make a simple comparison.

The first example is Dubbo’s default Cluster fault-tolerant policy Failover Cluster, which automatically switches over when a failure occurs.

The second example is Apache’s HttpClient.

The framework and the utility classes support retry, and both have retry enabled by default.

But from my experience, Dubbo’s automatic retry is involved in the business and is user aware. HttpClient’s automatic retry is at the network level and insensitive to the user.

However, one thing must be emphasized again:

Dubbo explicitly states on its website that it automatically retries by default, usually for read operations.

If you use the wrong data, you can’t blame the official, can only say that the design has pros and cons.

Dubbo retry a few times

Dubbo will retry automatically, so how many times?

So let’s just go ahead and do an example.

Let’s start with the interface definition:

You can see that in the interface implementation, I slept for 5 seconds to simulate the interface timeout.

The timeout is set to 1000ms:

The client XML file is configured like this and the timeout is set to 1000ms:

Then we simulate a remote call in the unit test:

This is an original Dubbo Demo project. Since our timeout is 1000ms, or 1s, but interface processing takes 5s, the call is bound to time out.

Dubbo’s default Failover Cluster policy will retry several times to run the test case, which can be seen at a moment:

If you look at the time of this test case, 3 s, 226 ms, you remember that time, and I’ll talk about it later.

Let’s focus on the number of retries.

It’s a little hard to see, so I’ll take out the key logs separately for you to see:

From the log, the client tried again three times. The start time of the last retry is: 2020-12-11 22:41:05.094.

Let’s look at the output on the server side:

I’m only going to call it once, so I’m going to insert the database three times. Cool.

And you pay attention to the request time, one request every 1s.

I keep emphasizing time here why?

Because here’s the thing: the 1000ms timeout is the time for one invocation, not the time for the entire retry request (three times).

I was asked this question about time in an interview. So I’ll just write it out separately.

Then we change the client XML file to retries=0:

Call again:

As you can see, only one call was made.

So far, we’re still using Dubbo as a black box. It was tested with three automatic retries, which can be specified using the retries parameter.

Next, let’s dig the source code.

FailoverCluster source

Source is located in the org. Apache. Dubbo. RPC. Cluster. Support. FailoverClusterInvoker:

The default number of retries is 2:

And so on, not ah, just said is 3 times, in front of the frame is 2 times?

Don’t worry.

Do you see the “+1” at the end of line 61?

Think about it. We want to retry the interface call n times after it fails. This n is DEFAULT_RETRIES, which defaults to 2. So our total number of calls is going to be n+1.

So this plus one comes from, it’s a very small point of view.

In addition, lines 62 to 64 are marked with a red five-pointed star. That’s the key thing. The retries parameter is described like this on the official website:

No retry is required. Set this parameter to 0. As we analyzed earlier, when set to 0, it will only be called once.

But I’ve also seen retries configured to -1. 1 + 1 = 0. Calling 0 times is clearly a false meaning. But the program also works and is called only once.

That’s where the red five-pointed star is.

Defensive programming. Even if you set it to -10000 it will only be called once.

The following image is a comprehensive interpretation of the doInvoke method, with almost every major line of code annotated. Click on the larger image to view it:

As shown above, the main workflow of FailoverClusterInvoker doInvoke method is as follows:

  • The first step is to get the number of retries, then loop through the call based on the number of retries, and within the loop, retry if it fails.
  • In the body of the loop, the select method of the parent class AbstractClusterInvoker is called, an Invoker is selected by the load-balancing component, and then the invoke method of this Invoker is invoked remotely.
  • If that fails, log the exception and retry.

Note one detail: Before retrying, retrieve the most recent invoker collection. The advantage of this is that if a service dies during the retry, you can guarantee that copyInvokers are the most recent list of invokers available by calling the list method.

The whole process looks like this. It’s not hard to understand.

The HttpClient uses an example

Next, let’s look at how retries work in Apache’s HttpClients.

Is this class: org. Apache. HTTP. Impl. Client. HttpClients.

First of all, cut the crap and run a Demo.

First look at the Controller logic:

@RestController public class TestController { @PostMapping(value = "/testRetry") public void testRetry() { try { System.out.println(" time: "+ new Date() + ", database inserted successfully "); TimeUnit.SECONDS.sleep(5); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

Again, sleep 5 seconds, and simulate overtime.

The HttpUtils package is as follows:

public class HttpPostUtils { public static String retryPostJson(String uri) throws Exception { HttpPost post = new HttpPost(uri); RequestConfig config = RequestConfig.custom() .setConnectTimeout(1000) .setConnectionRequestTimeout(1000) .setSocketTimeout(1000).build(); post.setConfig(config); String responseContent = null; CloseableHttpResponse response = null; CloseableHttpClient client = null; try { client = HttpClients.custom().build(); response = client.execute(post, HttpClientContext.create()); if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) { responseContent = EntityUtils.toString(response.getEntity(), Consts.UTF_8.name()); } } finally { if (response ! = null) { response.close(); } if (client ! = null){ client.close(); } } return responseContent; }}Copy the code

Let’s first explain the three parameters set to 1000ms:

ConnectTimeout: timeout for establishing a connection between the client and the server

ConnectionRequestTimeout: A timeout to get a connection from the connection pool

SocketTimeout: timeout for the client to read data from the server

You all know that an HTTP request, abstractly, has three phases

  • One: Establish a connection
  • Two: data transmission
  • Three: Disconnect

When a connection is not completed within a specified time, the connection fails and a ConnectTimeoutException is thrown.

Subsequent SocketTimeoutExceptions are guaranteed not to occur.

When the connection is established, data transfer starts. If the data transfer is not complete within a specified period of time (SocketTimeOut), a SocketTimeOutException is thrown. If the transfer is complete, disconnect.

Test the Main method as follows:

public class MainTest { public static void main(String[] args) { try { String returnStr = HttpPostUtils. RetryPostJson (" http://127.0.0.1:8080/testRetry/ "); System.out.println("returnStr = " + returnStr); } catch (Exception e) { e.printStackTrace(); }}}Copy the code

ConnectTimeoutException will be thrown if the connection between the client and the server times out.

Directly executing the main method results in the following:

In line with our expectations.

Now let’s start up the Controller interface.

Since our socketTimeout is set to 1000ms, the interface sleeps for 5s.

The SocketTimeOutException is thrown because the client must have timed out to read data from the server.

After the Controller interface is started, we run the main method and the output is as follows:

In this case, the interface is called successfully, but the client does not get the return.

In this case, as in the Dubbo case, the timeout is for the client.

Even if the client times out, the server logic continues to process the request.

The result does throw a SocketTimeOutException, as expected.

But what about retry?

HttpClient retry

In HttpClients, there is a retry function, and like Dubbo, it is enabled by default.

But why are we not retrying either exception here?

If it can retry, how many times by default?

We take a question, or to find the answer in the source code.

The answer is hidden in the source code, org. Apache. HTTP. Impl. Client. DefaultHttpRequestRetryHandler.

DefaultHttpRequestRetryHandler is Apache HttpClients default retry strategy.

As you can see from its constructor, it defaults to three retries:

The constructor’s this calls this method:

As you can see from the constructor comments and code, there are no retries for these four types of exceptions:

  • A: InterruptedIOException
  • 2: UnknownHostException
  • Three: a ConnectException
  • Four: SSLException

ConnectTimeoutException and SocketTimeOutException inherit InterruptedIOException:

Let’s close the Controller interface and set a breakpoint to see:

As you can see, if will return false and retry will not be initiated.

To simulate the retry situation, we need to modify HttpPostUtils to create a custom HttpRequestRetryHandler:

public class HttpPostUtils { public static String retryPostJson(String uri) throws Exception { HttpRequestRetryHandler httpRequestRetryHandler = new HttpRequestRetryHandler() { @Override public boolean retryRequest(IOException exception, Int executionCount, HttpContext context) {system.out.println (" start "+ executionCount +" retry!" ); If (executionCount > 3) {system.out.println (" Retry more than 3 times, no more retry "); return false; } if (exception instanceof ConnectTimeoutException) {system.out.println (" Connection timed out, ready to re-request...." ); return true; } HttpClientContext clientContext = HttpClientContext.adapt(context); HttpRequest request = clientContext.getRequest(); boolean idempotent = ! (request instanceof HttpEntityEnclosingRequest); if (idempotent) { return true; } return false; }}; HttpPost post = new HttpPost(uri); RequestConfig config = RequestConfig.custom() .setConnectTimeout(1000) .setConnectionRequestTimeout(1000) .setSocketTimeout(1000).build(); post.setConfig(config); String responseContent = null; CloseableHttpResponse response = null; CloseableHttpClient client = null; try { client = HttpClients.custom().setRetryHandler(httpRequestRetryHandler).build(); response = client.execute(post, HttpClientContext.create()); if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) { responseContent = EntityUtils.toString(response.getEntity(), Consts.UTF_8.name()); } } finally { if (response ! = null) { response.close(); } if (client ! = null) { client.close(); } } return responseContent; }}Copy the code

In our custom HttpRequestRetryHandler, I let the ConnectTimeoutException pass so the request can be retried.

When we do not start the Controller interface, the program automatically retries 3 times:

The default retry policy for Apache HttpClients is demonstrated above. The above code you can run directly out.

If you want to see the call flow, you can view the call link in Debug mode:

HttpClients auto-retry, also enabled by default, is used without awareness.

Because the retry conditions are strict, retries at the network level do not interfere with services.

Be cautious and cautious.

For features that require retries, we need to be careful and cautious in the development process.

For example, Dubbo’s default retry, which I think is intended to make the service highly available.

Normally our microservice has at least two nodes. When one node is not serviced, the cluster fault tolerance policy automatically retries the other one.

However, Dubbo also considers it necessary to retry service calls that time out, which is equivalent to intruding into the business.

Earlier we said that service invocation timeouts are client-specific. Even if the client call times out, the server is still executing the request normally.

That’s why the official documentation says “usually used for reading operations” :

http://dubbo.apache.org/zh/docs/v2.7/user/examples/fault-tolerent-strategy/

Read operation, meaning default idempotent. So remember to set retries=0 when your interface methods are not idempotent.

This thing, let me give you an actual scenario.

Suppose you call the wechat pay interface, but the call times out.

What do you do at this time?

Retry directly? Please go home and wait for notice.

The query interface must be called to determine whether the current request has been received, so that further operations can be performed.

For HttpClients, its automatic retry does not intrude into the business, but at the network level.

So most of the time, our system is insensitive to its automatic retry.

We even need to implement automatic retry function in the program.

Since your modification is at the bottom of the HttpClients method, one important point to note at this time is that you need to distinguish whether the request supports retries after exceptions.

You cannot retry mindlessly.

Take a look at Guava-Retry and spring-retry frameworks.

anecdotes

I know this is your favorite part of the show.

Take a look at the FailoverClusterInvoker commit record:

It was submitted twice in 2020. That’s a pretty short window of time.

The submission on February 9 was a repair for issue no. 5686.

In this issue, I have fixed the numbers 5684 and 5654:

https://github.com/apache/dubbo/issues/5654

They all point to a problem:

Load balancing for multiple registries does not take effect.

As soon as officials fixed the problem, there was another big one:

The failfast load balancing policy fails in 2.7.6.

You see, I know I can’t retry an interface, so I deliberately changed the failfast policy.

But the actual framework is still using failover, retry twice?

The actual situation is worse. In version 2.7.6, the load balancing policy only supports failover.

This thing is a bit of a mess.

This bug was not fixed until version 2.7.8.

So, if you’re using Dubbo version 2.7.5 or 2.7.6. It is important to note that other cluster fault tolerance strategies are used. If it does, it doesn’t actually work.

This is a big bug.

But open source projects, co-maintenance.

We know that Dubbo isn’t a perfect framework, but we also know that there are engineers behind it who know it’s not perfect, but still don’t give up.

They’re trying to make it perfect.

We, as users, are less “teasing” and more encouraging and substantive.

Only in this way can I proudly say that we have contributed a little bit to the open source world, and we believe it will be better tomorrow.

Hats off to open source, hats off to open source engineers.

Anyway, awesome.

Well, that’s all for this article.

If you find something wrong, you can mention it and I will correct it.

Thank you for reading, I insist on original, very welcome and thank you for your attention.

I am why, a literary creator delayed by code, a nice man in Sichuan who is warm and interesting.

And welcome to follow me.