TCP sticky half-packet problem

This article has participated in the third “topic writing” track of the Denver Creators Training Camp. For details, check out: Digg Project | Creators Training Camp third is ongoing, “write” to make a personal impact

This question may also ask: How to solve the problem of sticky packets, packet loss, or packet disorder during network communication?

First, we need to clarify the characteristics of TCP protocol:

connection-oriented
reliable
Byte stream based
Full duplex

Reliable is to ensure the serial number before the package, as far as possible to ensure not to lose the package. In general, packet loss and packet disorder do not exist. TCP communication is a reliable communication method. The TCP protocol stack ensures the order and correct delivery of packets through the sequence number and packet retransmission confirmation mechanism.

So the general problem is: sticky package problem.

Concept to explain

What is sticky bag?

Sticky packets refer to sending two or more packets to the peer end in a row. The peer end may receive more than one or more packets, including one packet plus part of a packet, or even several complete packets together.

So the concept of half pack also came out. If the data received is only part of a packet, a half packet appears.

Whether it is sticky packet or half packet, the essence is that TCP is a streaming protocol. So the solution is how to distinguish packets from streams, or more precisely: how to determine a fixed-size data boundary in a borderless data stream?

The solution

There are three main methods:

Fixed packet long packet

As the name suggests: Each packet is of a fixed length. For example, the size of each protocol packet is set to 64 bytes. Each time 64 bytes are collected, the packet is extracted and parsed. If not, it is temporarily stored in the MEM and the TCP flow continues to be parsed later.

This protocol implementation format is simple, but space utilization may be low. Because the packet length does not meet the fixed word length, the remaining space needs to be filled with special characters (special characters need to be separated from the real data).

If the packet content exceeds the specified number of bytes, the packet will be subdivided and additional processing logic will be added:

The sender needs to truncate the packet according to the fixed length
The receiver needs to assemble shard packets

Specifies the end flag packet

This solution is common in protocol processing, i.e. the end of a packet is considered when a particular symbol value is encountered in the byte stream.

For example, the familiar FTP protocol, SMTP protocol for sending mail, a command or a piece of data followed by “\r\n” (so-called CRLF) indicates the end of a packet. After receiving the data, the peer end parses the data stream, and when it reaches “\r\ N “, it assembs a packet from the previous parsing and sends it to the subsequent processing function.

However, there is a problem: if the end of packet characters appear in the content part of the protocol packet, these characters need to be transcoded or escaped, so as not to be misunderstood by the receiver as the end of packet.

Refer to Redis RESP protocol: if it is a multi-line string, it will add string packet + prefix length to help distinguish packet content

header+content

The header tail directly specifies the length of the content

Content Flicker Stores the actual packet content

type struct msg_header {
  int32 bodySize;
  int32 bodyContent;
};
Copy the code

First, header and Content are both 8 bytes long, so when the stream reads the 8-byte length → parses the header → parses the packet length contained in the header and then starts caching the packet content.

conclusion

Common network libraries do not provide these functions because of the need to support different protocols. Because of the uncertainty of protocols, it is impossible to provide specific unpacking codes in advance. Of course, this is not always true, and there are some network libraries that provide this functionality: Netty, for example

DelimiterBasedFrameDecoder ⇒ package according to the special characters as the terminator agreement
ByteToMessageDecoder understand header + body data packets
Developers inherit ByteToMessageDecoder to develop their own custom protocols

Protocol design

I’m going to take the header + Content design

First design the Header tail

/ / agreement
type struct msg {   
		// Total package size
    int32 bodysize;  
};
Copy the code

How to unpack:

while (true) {
	  if (pBuffer.ReadBytes() < cap(msg)) {
		    // It is not enough for a head
		    return;
	  }
	
	  // Get the packet header information
	  msg header;
	  memcpy(&header, pBuffer.Peek(), cap(msg));
	
	  // The connection is closed immediately
	  if (header.bodysize <= 0 || header.bodysize > MAX_PACKAGE_SIZE) {
	      // The server shuts down after the client sends invalid packets
	      conn.Close();
	      return;        
		}
	  // The data received is not enough for a complete package
	  if (pBuffer.ReadBytes() < header.bodysize + cap(msg))
	      return;
	
	  pBuffer.Retrieve(cap(msg));
	  // inbuf is used to store the current package to be processed
	  string inbuf;
	  inbuf.append(pBuffer.Peek(), header.bodysize);
	  pBuffer.Retrieve(header.bodysize);          
	  // Unpacking and business processing
	  if(! Process(conn, inbuf.c_str(),len(inbuf))) {
	      // The server shuts down after the client sends invalid packets
	      conn.Close();
	      return; }}Copy the code

Concept to explain

The solution

Fixed packet long packet

Specifies the end flag packet

header+content

conclusion

Protocol design

Related Posts

ThinkPHP global exception handling

Blogging must have! Hand in hand to teach everyone to build free chart bed, really sweet!

Java-queue vs. DeQUE