Starting with this chapter, we introduce the most important transport layer. The transport layer is located at the fourth layer (bottom up) of the OSI seven-layer model. As the name implies, the primary role of the transport layer is to enable communication between applications. The network layer is mainly to ensure the reachability of data in different data links, and the transmission layer is responsible for data transmission.

Introduction to transport layer protocols

Common transport layer protocols include TCP and UDP. TCP is a connection-oriented protocol, that is, a connection must be established between the sender and receiver before TCP is used for data transmission. Generally, three steps are required to establish a connection and four steps are required to close the connection.

After a TCP connection is established, the TCP protocol can correctly handle packet loss due to functions such as data retransmission and traffic control, ensuring that the data can be received by the receiver and effectively utilizing the network bandwidth. However, TCP is not as efficient as UDP because it has many complex specifications and is not suitable for real-time video and audio transmission.

UDP is a connectionless protocol. It only sends data to the receiver, but does not care whether the receiver actually receives the data. But this feature lends itself to multicast, real-time video and audio transmissions. Because the loss of individual packets does not affect the overall performance of video and audio.

The two key elements of THE IP protocol are the source IP address and the destination IP address. As we said earlier, the transport layer’s main role is to enable communication between applications. Therefore, three new elements are added to the transport layer protocol: source port number, destination port number and protocol number. From these five pieces of information, a communication can be uniquely identified.

Different ports are used to distinguish different applications on the same host. Suppose you have two browsers open. The request from browser A will not be received by browser B. This is because A and B have different ports.

Protocol number Indicates whether TCP or UDP is used. Therefore, communication between the same two processes on the same host can be correctly distinguished when TCP and UDP are used respectively.

In a word, “source IP address, destination IP address, source port number, destination port number and protocol number”, as long as one of the five information is different, it is considered as different communication.

UDP header.

The UDP protocol is characterized by its simplicity. The header is shown below:

Packet length Indicates the sum of the UDP header length and UDP data length.

The checksum is used to determine whether data is corrupted during transmission. When calculating the checksum, not only the source port number and destination port number are considered, but also the source IP address, destination IP address, and protocol number in the IP header (also known as UDP pseudo header). This is because none of the above five elements is needed to identify communication. If the checksum only considers the port number, the application will not know if the other three elements are corrupted. It is possible that the application that should not have received the packet received the packet, and the application that should have received the packet did not receive the packet.

This concept also applies to the TCP headers that will be introduced soon.

The TCP header

Compared to UDP headers, TCP headers are much more complex. The time to parse this header increases correspondingly, which is one of the reasons why TCP connections are less efficient than UDP connections.

Some key fields are described as follows:

  • Serial number: indicates the location where the data is sent. Assuming that the current serial number is S and the length of the data to be sent is L, the serial number of the next data to be sent is S + L. When establishing a connection, the computer usually generates a random number as the initial value of the serial number.

  • Acknowledgement acknowledgement number: This is equal to the serial number of the data that should be received next time. If the serial number of the sender is S and the length of the data sent is L, the acknowledgement number returned by the receiver is also S + L. When the sender receives this acknowledgement, it can be assumed that all previous data at this location has been received normally.

  • Data offset: length of TCP header, in 4 bytes. If there are no optional fields, then the value here is 5. Indicates that the length of the TCP header is 20 bytes.

  • Control bit: Change field length to 8 bits, each with 8 control flags. CWR, ECE, URG, ACK, PSH, RST, SYN, and FIN in that order. You’ll touch on some of these control bits in subsequent articles.

  • Window size: Used to indicate how many 8-bit bytes can be accepted starting with the answer number. If the window size is 0, window probes can be sent.

  • Emergency pointer: valid only when URG control bit is 1. Represents the position of the end of the emergency data in the TCP data section. Usually used to temporarily break communication (e.g. Ctrl + C).

TCP handshake

TCP is a connection-oriented protocol. A connection is established before each communication and closed after the communication. Understanding the process of establishing and closing connections is often the focus of the inspection. The process of establishing and closing a connection can be represented by a graph:

Normally, we assume that the client initiates the connection first.

Three handshakes establish a connection

This process can be represented by the following three vivid dialogues:

  1. (Client) : I’m going to establish a connection.
  2. (server) : I know you want to establish a connection, I have no problem.
  3. (Client) : I know you know THAT I am going to establish a connection, and then we will officially start communicating.

Why three handshakes

According to common thinking, we might think that just two handshakes are enough, and the third step of confirmation might seem redundant. So why does TCP add the thankless handshake?

This is because in network requests, we should always keep in mind that “the network is unreliable and packets can be lost”. Suppose there is no third acknowledgement, and the client sends a SYN to the server to establish a connection. The server did not receive the packet in time due to delays. The client then resends a SYN packet. Recall from the introduction to TCP headers that the serial numbers of the two packets are obviously the same.

Assume that the server receives a second SYN packet, establishes communication, and after some time the communication ends and the connection is closed. As soon as the SYN packet arrives at the server, the server sends an ACK. Since the connection is established after two handshakes, the server will establish a new connection, but the client will not send data to the server because it feels that it has not requested the connection. As a result, the server establishes an empty connection and wastes resources.

In a three-way handshake, the server does not establish a connection until it receives a reply from the client. Therefore, in this case, the client receives the same ACK packet. In this case, it will discard the packet and do not shake the server for a third time, thus avoiding the server to establish an empty connection.

What if the ACK confirmation packet is lost

The three-way handshake actually solves the packet loss problem of step 2. How does TCP handle the loss of ACK?

In the TCP protocol, the server sends packets to the client again until an ACK is received. In practice, however, SYN flooding is possible. In a flood attack, the sender forges multiple IP addresses to simulate a three-way handshake. When the server returns an ACK, the attacker deliberately does not acknowledge it, causing the server to repeatedly send ACK. The server is half-connected for a long time and consumes too many CPU and memory resources, causing the server to crash.

The correct solution is that the server sends an RST packet and enters the CLOSE state. In the TCP header of this RST packet, the RST bit in the control bit is set to 1. This indicates that all connection information is initialized, and the original TCP communication cannot continue. If the client wants to re-establish the TCP connection, it must restart the first handshake.

The quad handshake closes the connection

This process can be represented by the following four vivid dialogues:

  1. (Client) : I’m going to close the connection.
  2. (server) : Your connection can be closed.
  3. (server) : I’m going to close the connection on my end.
  4. (client) : Your connection can be closed.

Since the connection is two-way, both parties actively close their own side of the connection.

What if the last ACK to close the connection is lost

In fact, in step 3, when the client receives the FIN package, it sets a timer and waits for a considerable amount of time. If the ACK returned by the client is missing, the server also resends the FIN and resets the timer. Assuming that the FIN packet resent by the server does not reach the client before the timer expires, the client enters the CLOSE state, causing the server to never receive an ACK acknowledgement and thus cannot CLOSE the connection.

The schematic diagram is as follows: