preface

Learning long Uncle’s article the interviewer asked me many times TCP sticky package, and why I repeatedly frustrated? After seeing someone asked for details, I felt it was necessary to connect Uncle Long’s theory with specific codes to form a complete solution. So before reading this article, please read Uncle Long’s article as the theoretical basis.

The basic concept

  1. TCP is essentially a flow of data. In principle, there is no concept of a packet. TCP packets can be transparent to application programmers.

  2. Sticky packets actually mix the implementation of bottom packets with the concept of upper layer flow.

  3. Sticky packet problem is essentially how to determine the boundary of data flow.

Several typical methods for determining boundaries

1. Fixed-length method: generally implemented in a simple private protocol, which can simplify the process and facilitate implementation.

Before communication, specify the packet length to be sent through the third party

  • Block send and receive:

    Send: fd, wr_datA_buf, wr_datA_len, 0); /* wr_data_buf data cache, wr_datA_len preset fixed length */ receive: recv(fd, WR_data_buf, WR_data_len, 0);Copy the code

    As shown in the code above, send and receive simply call the socketAPI interface. This is easy to write, but has the following problems:

    • Easy to block the main process, resulting in redundant process scheduling and uncontrollable system timeout;
    • This can be optimized with separate processes or threads, but can cause complex synchronization logic;
    • Unable to adapt to large-scale scenarios in which both the sender and the receiver work simultaneously.
  • Non-blocking send and receive: this way is a bit more complicated coding, but to solve the problem caused by blocking mode, is the mainstream solution at present.

    • The flow chart of the sending side looks like this:

      Flow chart of sending end of Alice fixed length method

      • The instructions are as follows:

        • The flowchart assumes that the default fixed length to send is 1024 bytes.
        • If EPOLL Level is used, alice_send_data should be called in the EPOLLOUT callback, implicitly implementing the 3->4->3 loop.
        • If you use EPOLL’s Edge mode, you should explicitly implement a 3-4-3 loop by calling alice_send_epoll in the EPOLLOUT callback.
      • The pseudocode looks like this:

         static int alice_send_data(int fd, char *wr_data_buf)
         {
             int n;
             n = send(fd, wr_data_buf + offset, 1024 - off, MSG_DONTWAIT); /* Non-blocking sends n bytes */
             if (n < 0) {
                 if (errno == EAGAIN || errno == EINTR)
                     return 0;
                 else {
                         return - 1; /* error */}}else if (n == 0) {
                     return 1; /* socket close */
             }
             offset += n; /* Remember the total number of off bytes */
             if (off < 1024)  /* If the length is less than the specified length, return 0, continue calling this function to send */
                 return 0;
             return 1; /* return 1 */
         }
      
         static int alice_send_epoll(int fd, char *wr_data_buf) /* Edge mode */
         {
             int offset = 0;
             int finish;
             
             do {
                 finish = alice_send_data(fd, wr_data_buf)
             } while(! finish); } ` ` `Copy the code
    • The flow chart at the receiving end looks like this:

      Bob fixed length method receiving end flow chart

      As you can see, the receiving part is very similar to the sending part, so this article does not repeat the code.

2. Variable length method: implemented in private and public protocols, slightly more complex than fixed length method, but more flexible:

Before communication, the third party shall specify the length of the position, so that the receiver can obtain it

  • The sender knows the actual length of the data to be sent, then adds four bytes of the record length to calculate the total length of the data, and sends the data in a fixed-length manner.

  • The receiver needs to dynamically obtain the actual length of the data, and its flow chart looks like this:

    Receiver flow chart of Bob variable-length method

We see that the variable-length method is actually a two-step fixed-length method at the receiving end, so it is more complicated than the fixed-length method. However, because the sender can flexibly specify the length of data, that is, the data sent each time can be different, the application is more extensive.

3. Special string method: implemented in private and public protocols, more complex than variable length method, but save header length field, more flexible processing.

Prior to communication, a special string, such as ‘\r\n\r\n’, is specified by a third party so that the receiver can determine the boundary of the data stream.

  • The sender can send in a fixed length way.
  • The receiver needs to constantly look for all the data in the receive buffer to see if there is a special string. Its flow chart looks like this:
    Bob special string receiving flow chart

4. Hybrid method: to achieve more balance between flexibility and simplicity in public protocols.

  • The special string method code requires all the data in the buffer to be compared to the string, which becomes inefficient when sending large data. A hybrid approach has been developed for this purpose. For example, the famous HTTP header uses \r\n\r\ to determine the header boundary, and then specifies the length of the data by specifying content-lenght in the header.
  • It is generally implemented in two steps:
    • Send a short header data with \r\n\r\n to determine the boundary.
    • It then sends a long piece of actual data, using the length obtained from the header to determine the boundary.

Flowchart and code readers can follow the variational and special character methods to implement, but not here.

conclusion

The conclusions of this paper are as follows:

When we design a system, simplicity and flexibility are at odds. Our trade-offs should be based on the actual needs of the system, not as flexible as possible, or as simple as possible.

TCP sticky packets are not a design flaw, but a feature of TCP data flow. The focus is on how the receiver determines the boundaries to receive a data stream. Because the sender can know the length of the data to be sent, the fixed-length method can be used to achieve.