Recently, I have built my personal blog, the link is here: Tobe’s Delirium, the article will be updated in the blog and the official account ~ everyone collect more ah

Last time I talked about UDP protocol, from this time on, I will talk about TCP protocol, because TCP protocol involves a lot of things, an article can not finish, so I put TCP protocol content into several parts, one by one to break.

TCP packet segment structure

When you think of TCP, the words that come to mind are “connection-oriented” and “reliable.” Yes, TCP is designed to establish a reliable connection between a client and a server.

Before we talk about the connection process, let’s take a look at TCP’s packet segment structure. From this structure, we can know what information TCP can provide:

Here are a few things to note:

  • TCP requires a quad (source IP, source port, destination IP, and destination port) to determine the connection, which is distinguished from UDP. The IP address is in the IP packet segment, and the TCP packet segment does not contain IP address information.
  • The length of the basic TCP header is 20 bytes, but since the length of the “option” is uncertain, you need the “header Length” field to specify the header length. Note here that the header length field is in 32bits, or 4 bytes, so the minimum value for this field is 5.
  • The orange fields (acknowledgement number, receive window size, ECE, ACK) are used to “reply” to the recipient. For example, instead of sending a separate packet to the recipient, the server will wait a little while and attach the acknowledgement to the next frame sent to the client, aka pigging.
  • The window size is a 16-bit unsigned number, meaning that the window is limited to 65535 bytes, which limits TCP throughput performance, which is not friendly to some high-speed and high-latency networks (think about why). Fortunately, TCP provides an additional Window Scale option that allows this value to be scaled.

The following is the meaning of the eight flag bits. Some protocols are older and may not have the first two flag bits:

Although there are many flag bits, but if you put them into a specific scene, it is easy to understand their role.

TCP three-way handshake

The purpose of the three-way handshake is to establish a connection between the client and the server. This process is not complicated, but there are many details to pay attention to.

This is the handshake process, you can see that there are three messages between the client and the server, these three handshakes are actually two machines to confirm the status of each other, let’s look at it bit by bit.

First handshake

The client initiates the connection, and the SYN is set in the first packet (SYN = 1) to indicate that the packet is a SYN segment (also known as segment 1). The purpose of this send is to tell the server that its initial sequence number is client_ISN. There is also an implicit message that is not shown in the diagram, which is to tell the server the port number it wants to connect to. In addition to this, the client will send a few options, but it’s not about the three-way handshake.

The most important thing to note in paragraph 1 is the client_ISN, the initial sequence number. “RFC0793[^1]” states:

When new connections are created, an initial sequence number (ISN) generator is employed which selects a new 32 bit ISN. The generator is bound to a (possibly fictitious) 32 bit clock whose low order bit is incremented roughly every 4 microseconds. Thus, the ISN cycles approximately every 4.55 hours.

Translated, the initial sequence number is a 32-bit (virtual) counter that increments by one every 4 microseconds. That is, the ISN repeats every 4.55 hours. This is to prevent overlapping serial numbers.

But even this can be a security risk — because The initial ISN is still predictable, a malicious program may analyze The ISN, predict The ISN of subsequent TCP connections based on The previously used ISN, and then attack, a well-known example being “The Mitnick Attack [^2].” Here’s an excerpt:

Mitnick sent SYN request to X-Terminal and received SYN/ACK response. Then he sent RESET response to keep the X-Terminal from being filled up. He repeated this for twenty times. He found there is a pattern between two successive TCP sequence numbers. It turned out that the numbers were not random at all. The latter number was greater than the previous one by 128000.

So to make the initial sequence number harder to predict, modern systems often use a semi-random method to select the initial sequence number, so the detailed method is not developed here.

Second handshake

When the server receives the connection request from the client, it sends an ACK to the client indicating that it has received the connection request. Moreover, the server must inform the client of its initial serial number. This is actually a two-step process, but it can be done by sending a packet, using the piggy-in technique described above. ACK = client_ISN + 1 in the figure is the value of the confirmation number field, which must be distinguished from the ACK flag bit.

The ACK field also has a number of points that need to be noted, but this is more intuitive with sliding Windows, so I won’t mention it here.

It is important to note that when a SYN segment arrives, the server checks whether the number of connections in the SYNRCVD state exceeds the tcpmaxSynbacklog parameter and rejects the connection. Of course, this can also be exploited by hackers. SYN Flood is a good example. After receiving a SYN-ACK reply, the server waits for the ACK from the client. If the server does not receive a SYN-ACK reply within a certain period of time, it considers that the packet is lost, and then resends the SYN-ACK. After repeated several times, the connection is disconnected. The server’s SYN queue is quickly exhausted, and normal connections go unanswered for some time.

This state of the server is termed muted. To defend against SYN Flood attacks, the server can adopt SYN cookies. In this way, when SYN arrives, the server does not directly allocate memory for it, but encodes the connection information and stores it in the sequence number field of the SYN-ACK packet segment. If the client replies, The server then computes the important information of the SYN packet from the ACK field, and allocates memory for the connection only after the authentication succeeds. In this way, the server will not respond to the attacker’s request, and normal connections will not be affected.

However, SYN cookies themselves have some limitations and are not suitable as a default option. If you are interested, Google it yourself.

Third handshake

This is the last step to establish a TCP connection. After the first two handshakes, the client (server) already knows the sliding window size of the other party, the initial serial number, and so on. Why the third handshake?

This is because the server sends the packet, but it does not know whether the client received the packet, so the server needs to wait for the client to return an ACK indicating that the client received the data, then the connection is complete.

Once the connection is established and the data is transferred, there are many, many techniques involved, which I will write another article about.

Four times to wave

With a three-handshake basis, the four-wave is easier to understand:

The quadruple wave is a simple process. The server and client send FIN and ACK packets to each other to inform each other of disconnection.

One notable aspect of the four waves is the TIME_WAIT state, which means that the party that actively closes the connection has to wait 2MSL before closing the connection completely, even after receiving the FIN packet. (MSL here refers to the maximum segment generation period, which refers to the maximum time that a packet segment is allowed to exist in the network.) But why not just close the connection?

One reason is that the fourth ACK packet does not necessarily reach the server. To prevent the server from being in the LAST_ACK state (the server resends the FIN until it receives the ACK), the client has to wait for a while to see if it needs to resend. If a packet is lost, the server sends a FIN packet to the client. The FIN packet does not exceed 2MSL when it reaches the client. At this time, the client can resend the ACK before TCP is closed.

Another reason is that after 2MSL, any packets associated with that connection have disappeared from the network and will not interfere with the new connection. Let’s take an example: if the client establishes a new connection to the server, some delayed data from the old connection persists until the new connection is established, and the serial number is still in the sliding window, the server mistakenly receives it as the packet of the new connection, as shown in the following figure:

The 2MSL mechanism avoids this.

There are a lot of interesting things about TIME_WAIT that I think could be written in a separate article and won’t cover here.

I feel a little confused, because TCP knowledge is really a little bit, I hope you don’t mind.

[^1]: https://tools.ietf.org/html/rfc793[^2]: http://wiki.cas.mcmaster.ca/index.php/The_Mitnick_attack