Writing in the front

Distributed architecture is the basic architecture of Internet applications. Many newcomers have been responsible for writing and calling various remote interfaces of Alibaba since they joined the company. But like marriage, with a right interface is like marrying a right person, often difficult to achieve so smoothly, more or less everyone will suffer in this.

Every year when the double 11 system call resets, I will hear the following voice

  • You guys tuned my interface wrong and won’t try again yourself?
  • My return value should come from here
  • I return isSuccess() == true, does not mean the business is successful, you also need to judge ERROR_CODE
  • This ERROR_CODE does not say that all of them should be retried.
  • This ERROR_CODE must be retried!

There are many more. The goal of this article is to help you think about how to design your remote interface so that it is robust and easy to use and saves you time in this quagmire.

A LogService LogService

PS: See the code for this exampleExcavatore-DEMO

. sora
A class! Hello everyone, I am your old teacher. Today I’m going to tell you how to write a robust remote interface. The teacher will design a centralized log system for you here.



While the system doesn’t make sense, this is the simplest example you can find, so don’t start a discussion in class about why it makes sense or the teacher will get angry

System architecture

A centralized log server requires the application of the log service provided by the log system to output all logs to fixed files in a centralized manner.

System Architecture Diagram

Xiao Ming .
This is very simple, according to the requirements of the system and architecture characteristics, I can write the interface definition very quickly, you see. “If the method returns without exception, the log has been successfully written to the log file.”

Interface v0.1 release

/** * log service * @author: [email protected] * @version: 0.1 */ public interface LogService {/** * record INFO level logs ** @param format Log template (same as string.format ()) * @param args log parameter */ void info(String format, Serializable... args); }Copy the code
. sora
Great, but this interface only works in stand-alone applications, not in remote calls. To understand this fact, you need to understand the general implementation of remote calls.

RPC calls

What is an RPC call

Remote Procedure Call (RPC), a technical implementation that requests services from Remote computer programs over the network without understanding the underlying network technology.

RPC adopts C/S mode. The requester is a client and the service provider is a server. First, the client calling process sends an invocation message with process parameters to the server process and then waits for the reply message. On the server side, the process stays asleep until the call information arrives. When a call message arrives, the server gets the process parameters, calculates the result, sends the reply message, and waits for the next call message. Finally, the client calling process receives the reply message, gets the process result, and the call execution continues.

The above information is extracted fromBaidu encyclopedia

A complete RPC call

  • Request process
  1. Client-side functionsPass the parameter toClient handle.
  2. Client handleEncapsulate the request serial number, remote method, parameters and other information into the request object, and complete the serialization of the request object to form the request messageNetwork clientSend a request packet.
  3. Request packet passesNetwork clientwithNetwork serverThe agreed protocol (HTTP, RMI, or custom) for communication.
  4. Network serverAfter receiving the request message, through deserialization, the remote method, parameters and other information are resolved from the request object, and found according to these informationServer handle.
  5. Local calls to server functions are made through server handles

    From there, the entire request process is complete.

  • Response process
  1. Server functionThe execution of the procedure returns the resultsServer handleThe return result may be normal, or it may be thrown as an exception.
  2. Server handleAccording to the returned value and request serial number encapsulated into the reply object, and complete the serialization of the reply object, the formation of the reply message, throughNetwork serverSend a reply packet.
  3. Reply packet passNetwork server clientwithNetwork customer terminalThe agreed protocol (HTTP, RMI, or custom) for communication.
  4. Network clientAfter receiving the reply message, the request serial number hook is resolved from the reply object through deserializationClient handle
  5. The client handle returns data back to the client function, returning information in the form of a return value or an exception thrown

    Since then, the whole response process is complete.

. sora
A complete RPC call consists of 10 steps, and errors may occur in each step. Therefore, when designing a remote interface, you must fully consider all possible errors and agree with the client to deal with errors. No matter which link goes wrong, your business logic is still guaranteed not to go wrong!


Xiao Ming .
Worthy of being a cang teacher, as expected extensive and profound. I see, because of the remote access factor, the small error probability in a single machine is magnified and the application is forced to sense and handle these communication errors.



Then how should we sum up and deal with these mistakes?

The possibility of a remote call going wrong

Communication frame error

Communication framework errors can be subdivided into

  1. Marshell & UnMarshell

    C/S and C/S use inconsistent serialization/deserialization algorithms, resulting in the failure to obtain communication objects before or after communication. Thus, errors occur in the process of encoding and decoding.

    If your communication framework uses Hessian, chances are you’ve come across it. Serialization and deserialization can be a topic. I won’t bother you here.

  2. Network communication error

    System errors can result in unpredictable exceptions, depending on how RPC is implemented. The only way to handle this error is to try again at another time/opportunity.

Service system error

Business system errors fall into two categories

  1. Business errors

    The Client passes parameters that violate the service rules, causing a failure in service logic processing. No matter how many times you repeat this mistake, you get the same result.

  2. System error

    An uncontrollable error occurs when the Server processes internal logic. Common errors include:

  • Database access failure
  • File write failure
  • Network communication failure

    This error can be resolved by retry.

Various wrong scenarios & solution sorting

Error situations The solution Whether to retry
Communication frame error Throw a frame exception retry
System error Throw a system exception retry
Business errors Returns an explicit error code Prohibit to retry


Xiao Ming .
Well, I’ve learned that a good remote method definition must take into account the exception scenarios listed aboveClear error handling conventions. That excuse me, cang teacher how to write this interface?


. sora
Before you write a robust interface, you still have a few concepts to understand. First let’s look at the declaration of this interface. I have two more important pieces of information than youResultDO<Void>withLogExceptionI’m going to show you what this definition does for these two classes

Code organization

If you have the opportunity to rebuild an application, it is recommended that you think about your module organization using a strategy of subcontracting.

  • Common: Defines the content shared by core and client

    • Business Interface Declaration
    • LogService
    • Domain objects (all DO, TO, and DTO are named DO for simplicity)
    • ResultDO<T>
    • Business exceptions
    • LogException
  • Client: the rich client, in this layer can organize cache, business independent general validation, this layer is not necessary.

    • Service client implementation
    • LogServiceClient
    • AsyncLogServiceClient
  • Core: Implementation of business services, this layer of code running on the server side.

    • The service business logic is implemented, while the internals can be customarily layered again into (Service,Manager,Dao)
    • LogServiceImpl

Handle the return value correctly

The idea behind this RPC interface declaration is how to distinguish system exceptions from business exceptions by convention. The key to distinguish between them is ResultDO
and LogException

  • ResultDO<T>

    The info method does not require a return value, but the server needs to return the error code to the client for a friendly error message. So there are two methods in the Result object:

    • public boolean isSuccess();

    When isSuccess is true, the service is successfully processed: When the client obtains this value, the server has correctly received the request and successfully processed the request, and the service is complete. That’s the best case scenario.

    If isSuccess is false, service processing fails: when the client obtains the value, the server receives the request correctly, but service processing fails. The failure cause is displayed in errorCode errorCode.

    • public String getErrorCode();

    When the server receives the request correctly but fails to process the service, the failure reason is returned in the form of an error code.

  • LogException

    This exception is used to shrink and mask specific error information of the service layer. When the server encounters an error that cannot be handled, it needs to continue to throw the exception to the client for the client to retry. The client can use LogException to quickly determine the cause of the current service interruption.

Summary of the client’s handling of the returned value

  • The client processes the logical table

    Call the situation isSuccess errorCode throw LogException throw Exception Client processing
    Frame error / / / true retry
    System error / / true / retry
    Business errors false true / / Don’t try again
    Successfully returns true true / / Don’t try again

    All things are not the same. For example, business errors return error codes, but sometimes for performance reasons (throwing exceptions is costly to JVM performance), you can specify in the interface declaration that part of the error code must also be retried. However, the fewer such scenarios, the better. Once a convention is made, for the sake of interface backward compatibility, the error codes that need to be retried can only be reduced since the declaration, but cannot be increased. Otherwise, compatibility problems will be caused.

. sora
The teacher has also seen a system declared in ResultDOpublic boolean isReTry();Method, so that when a service error occurs in the system, the decision whether to retry is handed overisReTry()It’s also a good choice.
  • Added isReTry client processing logic tables

    Call the situation isSuccess isReTry errorCode throw LogException throw Exception Client processing
    Frame error / / / / true retry
    System error / / / true / retry
    Business errors false true true / / retry
    Business errors false false true / / Don’t try again
    Successfully returns true / true / / Don’t try again

Why should there beClientlayer

To be honest, this layer is not necessary and in many cases it is sufficient for the client to use the Service interface declared by the server. However, the advantages of ServiceClient come into play in the scenario of client Dr And enhancement.

Interface v0.2 version

/** * log service * @author: [email protected] * @version: 0.2 */ public interface LogService {/** ** Record INFO level logs ** @param format Log template (same as string.format ()) * @param args log parameter * ResultDO<Void> info(String format, Serializable... args) throws LogException; }Copy the code
Xiao Ming .
A good system convention can reduce many unnecessary errors, but after all, not all systems are new systems, in the face of a variety ofThe wisdom of our ancestorsHow to include remote interfaces that do not conform to the convention?


. sora
In the face ofThe wisdom of our ancestorsIn this case, it is not possible to change the existing heavily invoked interface declarationBeing is reasonableYou cannot change an interface, even if you know there are problems with its declaration or implementation. For details about interface maintenance principles, see Remote Interface Maintenance Experience Sharing in the next lecture.



This parameter is used when an out-of-convention interface is encounteredDecorative patternWrap noncanonical interfaces into canonical ones.

The Wrapper of the interface

You’re almost certainly not the first person in the company to declare an interface. So once you have a remote interface design specification, how to deal with the old interface becomes a headache.

The wisdom of our ancestors is infinite. Our predecessors have already faced and solved all the problems we are discussing (if you are unlucky, you may also encounter the interface of novice hand writing), but there are various ways to solve them without forming an agreement. Why?

Consider using decorator mode to repackage non-standard interfaces into interfaces that conform to the design specification. This has two benefits:

  • Solve the problem that the old interface is not standard
  • Reduce the probability of old interfaces being exposed to business code

    There needs to be some explanation here. The definition of external interfaces is not controlled. If a Service needs to be upgraded, changes, regression, and code REVIEW are limited to the Wrapper class. If all business code references the external Service/ServiceClient class directly, the regression surface of the upgrade will be magnified.

So regardless of whether the declared interface is compliant or not, I would recommend that clients use the Wrapper layer instead of Service/ServiceClient directly.

Xiao Ming .
Great, thanks to the teacher’s advice, I finally wrote a robust remote interface and knew how to contract the retry relationship with the client.



However, I still want to ask whether the existence of such a remote log system is not too reasonable. Is it inappropriate for you to cite this example?


PS: See the code for this exampleExcavatore-DEMO

This article is from alibaba Technology Association (ATA) selected articles.