TCP is one of the most important network protocols, which ensures reliable transmission of data and fair use of network resources.

  • This article is a record and summary type of article, all the conclusions are from authoritative books and personal practice verification, if there is any mistake, please correct, thank you very much.
  • 【 computer Network 】 Transport layer – TCP – nuggets (juejin. Cn)
  • The article is still unfinished, please forgive me

Series of articles:

  • Transportation Layer – Overview – Digging gold (juejin. Cn)
  • 【 Computer Network 】 Transport layer – TCP – Digging gold (juejin. Cn)

An overview of the

TCP provides several additional services above the basic requirements of the transport layer, namely, full-duplex service, reliable data transmission, and congestion control.

  • Full-duplex services allow processes to send data to each other and simultaneously over the same connection.
  • Reliable data transmission is process-oriented, allowing TCP to transfer data from the sending process to the receiving process in a correct and orderly manner.
  • Congestion control is Internet oriented so that each connection through a congested network link can share the network link bandwidth equally.

TCP packet

The TCP header consists of the source port, destination port, serial number, confirmation number, data offset, reserved bit, flag bit, receive window, checksum, emergency pointer, option, and data padding.

  • 16-bit source and destination ports are a transport layer requirement.
  • The 32 bit serial number and 32 bit confirmation number are used for reliable data transmission.
  • The 4-bit data offset is used to record the length of the header, which is not fixed because of the option area.
  • Six reserved bits are used for future extensions.
  • The 6-bit flag bit indicates the type and status of TCP packets.
  • The 16-bit receive window is used to tell the other party how many bytes it is willing to receive.
  • 16-bit checksums are used for error checking.
  • The 16-bit emergency pointer is used to point to data that is set to “emergency” by the application, and since the window length is 16 bits at most, it is also 16 bits.

The latter option is used to do some extra work, padding because the TCP header is in 32 bits, padding the header length to a multiple of 32 bits. The options are usually empty, so the TCP header is typically 20 bytes long.

The six flag bits are ACK, RST, SYN, FIN, CWR, and ECE. ACK indicates that the value of the confirmation field is valid. RST, SYN, and FIN are used to establish and terminate connections. CWR and ECE are used for congestion control.

Serial number and confirmation number

The sequence number is based on the byte stream being transmitted, not on the sequence of message segments being transmitted. Therefore, the serial number of a packet segment is the byte stream number of the first byte of the data in the packet segment. If the current packet transmits bits 4000 to 5000, the current sequence number is the start sequence number + 4000.

The confirmation number is the sequence number of the next byte that the receiver expects to receive from the sender. If the received serial number is 4000 and the length of data content is 1000, the confirmation number in the returned confirmation data packet is 5001.

If the receiver receives packets in the range 0 to 533 and 900 to 1024, the acknowledgment number in the receiver’s acknowledgment datagram continues to be 534, acknowledging only the first lost byte in the stream. So TCP is called the Cumulative Acknowledgement.

Establishing a TCP Connection

In TCP connection and TCP disconnection, the packet segment is called a packet.

The establishment of a TCP connection requires at least three TCP packets, which are sent alternately. Therefore, this process is also called “three-way handshake”, but in fact, it is far from elegant and appropriate. But because there were so many people using it and it was easy to express, it stuck.

process

The client is ready to establish a connection with the server. The CLIENT TCP is in the CLOSED state and the server TCP is in the LISTEN state.

  1. The client sends a SYN packet to the server. The SYN flag in the header is 1. Therefore, the packet is called a SYN packet. The initial sequence number client_ISN in a SYN packet is randomly generated and stored by the client. SYN packets are sent after the client enters the SYN_SENT state. Procedure

  2. The server obtains a SYN packet, allocates the TCP cache and variables for the TCP connection, and sends a packet allowing the connection to the client. In this packet, the SYN flag bit is set to 1, the ACK flag bit is set to 1, the confirmation number field is client_ISN + 1, and the initial serial number server_ISN is randomly generated by the server. Therefore, the packet is called a SYNACK packet. The SYNACK packet is sent successfully. The server enters the SYN_RCVD state.

  3. The client gets the SYNACK packet, the client allocates the TCP cache and variables, and sends a packet to the server for confirmation. Because the connection has been established, the SYN bit of the packet is 0, the ACK bit is 1, the confirmation number is server_ISN + 1, and the serial number is client_ISN + 1. The client TCP enters the ESTABLISHED state.

  4. After receiving an ACK packet, the server does not respond and enters the ESTABLISHED state.

Design idea

The reason why TCP connections are designed to be set up as triple TCP packets is how to set up a full-duplex channel with minimal cost.

How to build a simplex channel at the lowest cost? The answer is very simple. Host A sends A setup request to host B, and host B agrees to setup. Two packets are required for implementation.

If we want to set up A duplex channel, we need to set up A simplex channel between host A and host B, and then set up A simplex channel between host B and host A. The simplest implementation is four messages. A to B, B to A; B to A, A to B. Since there are two consecutive packets from B to A, they can be optimized and merged together. This is the simplest full-duplex channel establishment scheme.

Therefore, only two handshakes cannot establish a full-duplex connection channel, only one direction channel can be established.

From another point of view, when the server sends SYNACK packets, the client knows that the server is working properly. At this time, the client is only in the connection establishment state. The server does not know whether the client is still online or whether it has received SYNACK packets, so it cannot establish a connection with the client. However, in essence, two packets cannot establish a full-duplex channel.

It is wrong to talk in terms of the handshake, because the handshake is a false metaphor.

SYN flood attack

The three-way handshake design is vulnerable to DOS attacks, most notably SYN flooding. SYN flooding attacks allow a large number of clients to send SYN packets to the server. As a result, the server resources are exhausted due to the need to initiate a large number of TCP connections. When receiving a SYN packet, the server initializes the TCP connection cache and variables and then sends a SYNACK packet. When the client does not respond to the SYNACK packet, the server releases these TCP connections only after the timeout. In this way, the server resources are exhausted in a short time.

SYN Cookie

SYN flood prevention attacks can be effectively defended against by SYN cookies. The defense method is to initiate the TCP connection immediately after receiving the SYN packet and establish the TCP connection directly after the client shakes hands for the third time. The SYN Cookie is used to identify the client for the third handshake.

When the server receives a SYN packet, it does not create a TCP connection for the packet and does not save any information about the client. Instead, it stores the information in the sent SYNACK packet. A cookie is generated based on the source IP address, source port number, and local key of the SYN packet. The cookie is sent back to the client as the sequence number of the SYNACK packet seQ. In this case, the client responds by adding one to the sequence number as an ACK in the packet for the third handshake. If the server receives an ACK packet and finds that the TCP connection is not established locally, it decrypts the ACK packet using the key, extracts the source IP address and source port number, and compares them with the IP address and port number of the current packet. If the IP address and port number of the current packet are the same, the packet for the third handshake is valid, and the server establishes a TCP connection.

The third message is used to store data

When the server returns a SYNACK message, the client can establish a connection to the server. At this time, the channel is simplex, and only the client can send data to the server. Therefore, from now on, all packets sent by the client are data transmission packets instead of connection request packets with the SYN flag. Therefore, the packet for the third handshake is a data packet to which the client can store data.

In fact, almost all TCP implementations start transmitting data in the ACK packet of the third handshake.

Disconnecting TCP Connections

The disconnection of a TCP connection usually requires four TCP packets. Since this process is similar to a farewell wave, it is called the “four waves”, which is an apt metaphor.

However, there is a misunderstanding that many people exist, four waves are not necessary, at least three TCP packets are needed to disconnect, and this situation is also very common in actual use.

The most important thing for a computer learner is the ability to think, not the ability to simply receive knowledge and recite. In fact, no matter what you learn, the most important thing is the ability to think.

process

The client process is ready to terminate the connection with the server process, both of which are in CONNECTED state before termination.

  1. The CLIENT TCP sends a FIN packet with the FIN flag bit 1 to the server to indicate that the client is ready to terminate the connection. At this point, the client enters the FIN_WAIT_1 state. The client in the FIN_WAIT_1 state can still receive data from the server, but does not actively send data.

  2. After receiving a FIN packet, the server sends an ACK packet to indicate that the server has received the termination signal. The server enters the CLOSE_WAIT state. The server may still have resources to send to the client, and the server in CLOSE_WAIT state can still send resources to the client.

  3. After remaining resources are sent, the server sends a FIN packet to the client, indicating that the server is ready to terminate the connection. The server enters the LAST_ACK state.

  4. After receiving a FIN packet, the client sends an ACK packet to indicate that the client disconnects. The client enters the TIME_WAIT state. After 2MSL, the client disconnects and enters the CLOSED state. If an ACK packet is lost, the client can retransmit the ACK packet during TIME_WAIT.

  5. After receiving the ACK packet, the server directly closes the connection and enters the CLOSE state without any response.

Design idea

Four TCP packets are required to disconnect a TCP connection. To discuss the design of these four packets, we need to think about how to disconnect. To disconnect the full-duplex channel, you need to disconnect the connections between the two sides, that is, the connections from DIRECTIONS A to B and from directions B to A.

To disconnect A connection from A to B, A only needs to send A disconnect notification to B, and B needs to send A confirmation notification. B has no right to reject the data, because the data is from A to B, and A does not want to send data.

Therefore, disconnecting the double-shot connection requires two of the above operations,

  1. A informs B.
  2. B confirms that A is disconnected.
  3. B informs A.
  4. A confirms the disconnection of B.

There is A point in THE TCP protocol, that is, the device can reply to the TCP packet even if the TCP connection is not established, that is, ACK. Therefore, A can confirm B after disconnecting B.

Since B sent both steps 2 and 3, why not merge them together?

The answer is yes! TCP does not prohibit this. The default design is four steps, because after A disconnects B, it only means that A has no data to send to B, and B may still need to send data to A. So between steps 2 and 3, B can keep sending data to A, but A can’t send data to B. In this case, A can still answer, because the answer does not need to establish A connection. Wait until B has no data to send, and then disconnect it from A.

Three wave

If user A disconnects from USER B and has no more data to send, user B can combine the FIN packet and ACK packet and send the packet to user A. In this case, user A waves three times.

Most of the system kernel’s TCP implementation supports the triple wave, but in reality most of THE TCP connection disconnection uses the triple wave. Because usually the server does not disconnect even if there is no data to send to the client. When the client disconnects, the server basically has no data to send.

TIME_WAIT state

The significance of the TIME_WAIT state is that the ACK packet from the client may not be transmitted to the server. Therefore, the client needs to ensure that the server receives the ACK packet. If the responder does not receive an ACK packet, it retransmits the FIN packet. In this case, if the client receives the FIN packet, it retransmits the ACK packet.

The reason for setting this parameter to 2MSL is that the longest time for the retransmitted ACK packet to arrive is 2MSL. MSL is the maximum lifetime of a message on the network. The longest ACK packet from the sender reaches the service segment within 1MSL. Therefore, the server retransmits a FIN packet within 1MSL. In this case, the longest time a FIN packet reaches the client is 1MSL.

MSL is the Maximum Segment Lifetime. In the header of an IP packet, there is a TTL (Time to Live), which is the maximum number of routes that the IP packet can pass through. If the TTL is 0, the packet is discarded and an ICMP packet is sent to notify the source host. MSL is greater than TTL to ensure that packets that exceed THE MSL time will disappear.

Reliable data transmission

TCP creates a reliable data transfer service on top of IP’s best efforts. Ensures that the data read from one process’s send cache to another process’s receive cache is a corruption-free, gap-free, non-redundant, and sequential data stream.

TCP implements reliable data transmission service through sequential delivery, redundant acknowledgement and timeout retransmission.

  • Sequential delivery TCP implements sequential delivery through serial numbers and acknowledgement numbers. Through the cumulative acknowledgement mechanism, the receiver always indicates that it has received the next byte in the sequential data.

  • Redundancy acknowledgement When receiving a packet segment with a larger data sequence number than required, the receiving end repeatedly sends the acknowledgement of the previous packet. The redundancy acknowledgement mechanism enables the sender to know the missing packet segment. In this case, the sending end retransmits the missing packet segment.

    TCP uses the accumulation acknowledgement mechanism. If a previous packet is lost after a sender sends multiple consecutive packets, the receiver repeatedly sends an acknowledgement of the previous packet. The repeated acknowledgement packet is called redundant ACK.

  • Timeout retransmission The sender starts a timer every time it sends a packet segment. If no acknowledgement of the packet is received after the timer expires, the sender retransmits the packet segment.

A few details are worth noting, assuming that A, B, and C are segments of the message that the sender wants to send consecutively:

  • Acknowledgement packet Loss If the acknowledgement packet sent by the receiver to USER A is lost and the sender has not sent user B, user A retransmits the acknowledgement packet to user A after user A times out. If the sender continues to send B and C, the receiver performs redundant acknowledgement for A.

  • Doubling the timeout interval Each time TCP retransmits timeout, the next timeout interval is twice as long as the previous one. This in effect provides congestion control of a limited form, since timeouts are most likely caused by network congestion, which can be exacerbated by frequent retransmission of packets.

  • Fast retransmission When the sender receives three redundant ACKS for A, the sender retransmits B packets before the B segment expires.

  • Select confirmation If the sender sends A, B, and C packets consecutively and the B packet segment is lost, the receiver saves A and caches C, and performs redundant confirmation for A. The sender will then retransmit B. After receiving B, the receiving end will return an acknowledgement of C and skip B.

Flow control

TCP provides a flow-control Service to eliminate the possibility of a sender overrunning the receiver’s cache. Flow control is a speed matching service, that is, the sending rate of the sender matches the receiving rate of the receiver.

Receiving window

TCP provides traffic control services by having the sender maintain a Receive Window.

The receiver’s receive cache size minus the used cache size is the remaining receive window size. Since the used receive cache size is known only to the receiver, the size of the receive window is calculated by the receiver. The receiver notifies the sender of the size of the remaining receive window by putting the receive window value into the 16-bit window field in the TCP header. The amount of data that the sender sends each time must be smaller than the size of the receive window.

When the size of the receive window is 0, if no operation is done, the receiver cannot remind the sender that the window size has been updated after all the receive cache has been read. As a result, the sender is blocked and cannot send data all the time. Therefore, when the TCP receiving window is set to 0, the sender continues to send one-byte data segments. These segments are confirmed when the receiving window expands. In this case, the confirmation packets carry the new size of the receiving window.

Congestion control

The sending rate of the sender may also be restricted due to congestion at the network layer. This control behavior is called congestion control.

Congestion control is different from traffic control. The purpose of congestion control is to solve the problem that the transmission rate is higher than the transmission link rate. The purpose of traffic control is to balance the rates of the sender and the receiver.

why

The throughput of each link in the network is limited. Therefore, there are four negative effects of network congestion:

  1. Queuing delay: When the packet rate approaches the link capacity, the grouping will experience huge queuing delay.
  2. Packet loss: When a packet is discarded due to congestion, all transmission capacity used by each upstream router to forward the packet is wasted.
  3. Frequent retransmission: When a packet is discarded due to congestion, the sender must perform retransmission.
  4. Routing pressure: When a packet encounters a huge time delay, the sender will make unnecessary retransmission, causing the router to use its link bandwidth to forward unnecessary packet copies.

Congestion control methods

In the face of network congestion, the fundamental solution is to use some methods to make the sender aware of network congestion, so that the sender can reduce the transmission rate. There are usually two ways:

  1. End-to-end congestion control. The end system checks whether the link is blocked through the network behavior.
  2. Network – assisted congestion control. The network layer provides congestion status information to the sender.

TCP congestion control

TCP uses end-to-end congestion control because the IP network layer does not provide explicit network congestion feedback to the end system, which violates the layered protocol.

The TCP sender maintains a congestion window variable to limit the rate at which the sender can send traffic to the network. To implement congestion control mechanisms.

Congestion perception

When the TCP sender detects packet loss events, the sender determines that the transmission link is congested. Packet loss events include acknowledgement timeout and repeated acknowledgement, in which three consecutive identical ACKS are received.

Therefore, IT can be said that TCP uses acknowledgement to trigger congestion control, which makes TCP said to be self-clocking.

Congestion control algorithm

TCP congestion control algorithm is divided into three parts: 1. Slow start; 2. Congestion avoidance; 3. Quick recovery. Slow start and congestion avoidance are mandatory parts of TCP, while fast recovery is recommended and not required.

  1. Slow start During slow start, the TCP sender wants to quickly find the amount of available bandwidth. The start value of the congestion window is 1. The Maximum Segment Size (MSS) is increased by 1MSS each time an acknowledgement packet is received. So with each Round Trip Time (RTT), the congestion window doubles in size. Thus the size of the congestion window increases exponentially during a slow start.

    When a packet loss event occurs, the TCP sender resets the congestion window to 1 and restarts the slow start, with half of the maximum window length from the last slow start set to the current slow start threshold. When the length of the slow start window reaches or exceeds this threshold, congestion avoidance mode is entered.

  2. Congestion avoidance When you enter the congestion avoidance mode, the length of the congestion window is half or more than that of the last congestion window, indicating that congestion is highly likely to occur. Therefore, the window length is no longer doubled, but the maximum message segment length is increased for each round trip delay to expand the window.

    This is usually done by extending the window by MSS/ CWND (CWND refers to the length of the congested window) bytes each time a new acknowledgement is reached. When an RTT ends, the window only increases by one maximum message segment length because CWND * (MSS/CWND) = MSS.

    In the process of linear growth, the window will also encounter packet loss events, and different decisions will be made for different packet loss events.

    • For timeout events, consistent with slow start blocking, set the current window size to half of the current slow start threshold, reset the window length to 1, and restart the slow start process.
    • For redundant ACK events, halve the current window size and add 3 (plus 3 redundant ACKS), and enter a quick recovery phase.
  3. Quick Recovery In the quick recovery phase, for the missing packet segment that causes TCP to enter the quick recovery state, the length of the congestion window is increased by 1 MSS for each redundant ACK received on the packet segment. Finally, the sender enters the congestion avoidance state after reducing the size of the congestion window until the ACK of the lost packet segment arrives. If a timeout event occurs, set the window size to half the slow start threshold and enter the slow start state.

On the whole, TCP congestion control is mainly as follows: after each round trip delay, the maximum length of the packet segment is increased by one, until the timeout event occurs, the window is halved. Therefore, TCP congestion control is often referred to as additive and multiplicative congestion control. The advantage of the congestion control algorithm is that it can avoid congestion while maintaining good throughput.

Other features

TCP Retry mechanism