Why is Dubbo not good for transferring files?

Make it a habit to like it first

background

Before, the company had a Dubbo service, which internally encapsulated the OBJECT storage service SDK of Tencent Cloud. The purpose was to manage the SDK of such three-party service in a unified way, and other systems directly called the Dubbo service of the object storage. In this way, all system changes in the company can be upgraded due to incompatible major version updates of the platform SDK.

It’s a good idea, but it’s not a good idea because Dubbo doesn’t transfer files. Fortunately, the system was abandoned soon after it went live…

Although the system is deprecated, the topic of Dubbo uploading files can be analyzed in detail to talk about why it is not suitable for uploading files.

How does Dubbo send files?

Do you want to upload File like this?

void sendPhoto(File photo);
Copy the code

Of course not! Dubbo simply serializes the object and then transfers it, while File objects can’t handle the File’s data even if serialized, so they can only send the File’s contents directly:

void sendPhoto(byte[] photo);
Copy the code

However, this would require the consumer to read the entire file contents into memory at once, which no amount of memory can support. In addition, when receiving data parsing packets, the Provider needs to read byte[] into the memory at a time, which also causes high memory usage.

Single connection model problem

In addition to the memory footprint, the single-connection model of Dubbo (here referred to as the Dubbo protocol) is not suitable for file transfers.

The Default Dubbo protocol is a single-connection model, that is, all requests to a provider are made over a SINGLE TCP connection. By default, Netty is used for transmission. In Netty, write events are queued to ensure Channel thread safety. Then, in a single connection, multiple requests will use the same connection, that is, the same Channel to write data; When multiple requests are written at the same time, if a certain message is too large, the Channel will keep sending this message, and the writing events of other requests will queue up and cannot be sent. The data is not sent, and other consumers will naturally be blocked waiting for the response and cannot return.

Therefore, if the number of packets in a single connection is too large, Netty’s write event processing will be blocked and data cannot be sent to the server in time. As a result, the request will be blocked in vain.

So why would Dubbo adopt a single-link model if the model is so flawed?

Because of the resource savings, TCP connections are valuable resources. If a single connection can satisfy most scenarios, there is no need to prepare a connection for each request.

The Dubbo documentation also mentions the reason for the single-connection design:

Because the current situation of services is that there are few service providers, usually only a few machines, but there are many consumers of the service, so the whole website may access the service. For example, Morgan’s provider has only 6 providers, but there are hundreds of consumers, and there are 150 million calls every day. If the regular Hessian service is adopted, Service providers can easily be overwhelmed by single connections, ensuring that a single consumer does not overwhelm the provider, long connections, reducing connection handshake validation, etc., and using asynchronous IO, multiplexing thread pools, preventing C10K problems.

Although the Dubbo protocol defaults to the single-connection model, it is possible to set multiple connections:

<dubbo:service connections="1"/>
<dubbo:reference connections="1"/>
Copy the code

However, with multiple connections, connections and requests are not one-to-one, but rather a polling mechanism. As shown in the following figure, when N connections are configured, multiple connections are maintained for each Provider instance and different connections are allocated to each request through polling mechanism

Why is HTTP “good” for transferring files?

It’s not that HTTP is suitable for transferring files. Dubbo also supports HTTP (albeit half-baked), which is also not suitable for transferring files.

RPC frameworks such as Dubbo must serialize data to objects in the language in order to “call local methods as if they were remote”, but this makes it impossible to handle objects in the form of File.

If you look beyond the limitations of RPC framework features such as Dubbo to the HTTP protocol, it is very suitable for transferring files. As for the Client, it only needs to send the packet to the Server. For example, if the file to be transmitted is local, I can read only one Buffer size of the file each time and send the data of the Buffer through the Socket. In this way, the data stored in memory at the same time is only one Buffer size, and there is no Dubbo problem of reading all the data into memory.

As shown in the following figure, the Client only reads 4K Buffer data from the 1GB file each time and sends it through the Socket until the file is read and sent successfully. In this way, for a single transmission, the memory is always occupied by only 4K buffer size, and it will not be read as byte[] at one time and then sent like Dubbo.

The same is true for the Server side. Instead of reading all the packets into memory at once, the Server side directly wraps an InputStream after parsing the content-Length in the Header. InputStream can read Socket Buffer data inside the InputStream, so there is no memory problem. (Please refer to my article “How to handle file upload in Tomcat” for more details.) ).

So if HTTP is “good” for transferring files, what problems does Spring Cloud’s standard RPC client, Feign, have with transferring files?

Is Feign suitable for transferring files

Feign isn’t really an RPC framework, it’s just an Http Client. When using Feign, the Server can be any Http Server, such as Tomcat/Jetty/Undertow that implements servlets, Apache Server in another language, and so on.

When Feign is used, it is usually in the Spring Cloud bucket environment, and Tomcat is the default server. When Tomcat reads form-data, it saves the packet temporarily to the disk, and then reads the packet content in the disk using FileItem. Therefore, the Server does not read the complete packet data into the memory at a time, and therefore does not occupy too much memory.

There are several ways to upload a file in Feign:

interface SomeApi {

  // File parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") File photo);

  // byte[] parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") byte[] photo);

  // FormData parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") FormData photo);
    
  // MultipartFile parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto(@RequestPart(value = "photo") MultipartFile photo);
    
  // Group all parameters within a POJO
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (MyPojo pojo);

  class MyPojo {

    @FormProperty("is_public")Boolean isPublic; File photo; }}Copy the code

Feign abstracts the encoding/serialization of the parameters into an Encoder. For HTTP file uploads, Feign also provides a Feign -form module, which provides some FormEncoder. However, no matter which FormEncoder is used for Output, the Output object is not a Socket InputStream, but a data carrier. Use a ByteArrayOutputStream to store encoded data.

So no matter how you define the FormEncoder, the data will be written into the ByteArrayOutputStream of this Output, and all the data will still be read into memory, which will also have a high memory footprint.

@RequiredArgsConstructor
@FieldDefaults(level = PRIVATE, makeFinal = true)
public class Output implements Closeable {

  ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

  // All data is still written to ByteArrayOutputStream after encoding
  public Output write (byte[] bytes) {
    outputStream.write(bytes);
    return this;
  }

 
  public Output write (byte[] bytes, int offset, int length) {
    outputStream.write(bytes, offset, length);
    return this;
  }

  public byte[] toByteArray () {
    returnoutputStream.toByteArray(); }}Copy the code

Fortunately, Feign is only an HTTP Client and the Server reads “incrementally”, so there is no memory problem on the Server side.

conclusion

Dubbo is not suitable for transferring files or large packets. The design of Dubbo is more suitable for transferring small service packets (the default packet size is only 8MB).

So if there is a file upload scenario, use client direct transmission as much as possible, friendly and save resources!

Original is not easy, prohibit unauthorized reprint. Like/like/follow my post if it helps you ❤❤❤❤❤❤

Why is Dubbo not good for transferring files?

background

How does Dubbo send files?

Single connection model problem

Why is HTTP “good” for transferring files?

Is Feign suitable for transferring files

conclusion

Related Posts

Getting started with Spring (3) : Assemble beans through JavaConfig

Elasticsearch – Basic concepts for elasticSearch

Follow me dubbo- Introduction (1)