TCP

1. TCP concepts are related

[!NOTE] Transmission Control Protocol (TCP). TCP is a connection-oriented, reliable, byte stream – based transport protocol. In TCP-based communication, a TCP connection needs to be established, and a three-way handshake is required for establishing the connection, and a four-way wave is required for disconnecting the connection.

1.1 the TCP header

The following fields are important for TCP headers:

  • Sequence number: This Sequence number ensures that all TCP packets are in order. The peer end can splice the packets in Sequence

  • Acknowledgement Number, which indicates the Number of the next byte that the data receiver expects to receive, as well as the receipt of the data from the previous Number

  • Window Size: indicates how many bytes of data can be received for flow control

  • identifier

    • ACK=1: If this field is 1, the confirmation number field is valid. In addition, TCP specifies that ACK must be set to one for all segments sent after a connection is established.
    • SYN=1: When SYN=1 and ACK=0, the current packet is a connection request packet. When SYN=1 and ACK=1, the current packet is a reply packet agreeing to establish a connection.
    • FIN=1: If the field is 1, the packet is a request packet to release the connection.
    • URG=1: The value of 1 indicates that the data part of the datagram contains emergency information and is a data packet with high priority. In this case, the emergency pointer is valid. The emergency data must precede the data section of the current packet, and the emergency pointer indicates the end of the emergency data.
    • PSH=1: A field of one indicates that the receiver should push the data to the application layer immediately, rather than wait until the buffer is full.
    • RST=1: This field indicates that the current TCP connection has a serious problem and may need to be re-established. It can also be used to reject illegal packet segments and connection requests.

1.2 Three handshakes

  • First handshake

    • SYN = 1, SEq (client) = x
    • The client sends a connection request packet segment to the server. The paragraph contains its own initial data communication sequence number. After the request is SENT, the client enters the SYN-sent state.
  • Second handshake

    • SYN = 1, ACK = 1, confirmation sequence number = X +1, SEQ (server) = Y
    • After the server receives the connection request packet, if the server agrees to the connection, it sends a response containing its initial data communication number. After the response is complete, the server enters the SYN-received state
  • Third handshake

    • ACK = 1, confirm sequence number = Y +1, SEQ (Client) = x +1
    • After receiving the connection approval reply, the client sends an acknowledgement message to the server. After sending the packet, the client enters the ESTABLISHED state. After receiving the response, the server also enters the ESTABLISHED state.

1.3 Why not shake hands twice?

[!NOTE] Is mainly used to prevent invalid connection request packets from being suddenly sent to the server, resulting in resource waste.

If the two-handshake is adopted, if the packet A1 sent from the Client to the Server encounters a fault on the transmission link, the transmission time to the Server is quite delayed. During this period, the Client will retransmit a packet A2 because it does not receive the confirmation of packet A1 from the Server. Assume that the server receives the A2 packet normally, and then returns to acknowledge the B2 packet. Since there is no third handshake, the Client and Server are already connected. Suppose that A1 packet is then transmitted to the Server in the link. At this time, the Server will return B1 packet for confirmation. However, since the Client has cleared A1 packet, the Client will discard the confirmation packet, but the Server will keep the connection equivalent to “zombie”, resulting in the waste of network resources of the Server.

Image explanation:

  • 1. The customer sends an ambiguous message to the waiter
  • 2. The waiter received the message, read it, was very happy and wrote back immediately (at this time, the customer did not know that the service had been received)
  • 3. The customer is very happy to receive the confirmation of the relationship with the waiter (but the waiter does not know that the customer has received it, and if he does not receive it, he will have to resend it, theoretically speaking, until – Seas run dry and rocks crumble =-=)
  • 4, the waiter finally received the message of customer relationship confirmation, hanging heart finally put down
  • 5. So the customer and the waiter really establish a reliable channel, after all, they both know it can work…

So it takes at least three confirmations. If the server does not receive a response, there are two possible reasons:

  • The client didn’t get the message at all,
  • Or the client responds, but the server does not receive the result

If you’ve ever used a walkie-talkie, you know:

  • C ->S: Can you hear me?
  • S – > C: hear. Can you hear me?
  • C – > S: hear.

If the three-way handshake is used, even if the invalid packet is sent, the server receives the invalid packet and replies with an acknowledgement message, but the client does not send an acknowledgement again. Since the server does not receive an acknowledgement, it knows that the client did not request a connection and does not waste resources by making new connections.

1.3 Why three handshakes is used to establish a connection, but not four

First handshake:

The Client can't confirm anything. The Server can confirm that the other party is sending properlyCopy the code

Second handshake:

Client confirms that the sending/receiving function is normal and the sending/receiving function is normal. Server confirms that the sending/receiving function is normal and the sending/receiving function is normalCopy the code

Third handshake:

The Client confirms that the sending/receiving function is normal and the receiving function is normal. The Server confirms that the sending/receiving function is normal and the sending/receiving function is normalCopy the code

Therefore, three handshakes are used to confirm that the sending and receiving functions of both parties are normal. Four handshakes are also acceptable but unnecessary

1.4 Wave four times

  • First wave

    • If client A considers the data transfer complete, it needs to send A connection release request to server B.
  • Second wave

    • After receiving the connection release request, USER B tells the application layer to release the TCP connection. The ACK packet is then sent and the state CLOSE_WAIT is entered, indicating that the connection from A to B has been released and the data sent by A will not be received. But because the TCP connection is bidirectional, B can still send data to A.
  • Third wave

    • If there is still uncompleted data sent by USER B, user B will send A connection release request to user A. Then, user B enters the last-ACK state.
    • PS: The ACK packet can be delayed by combining the second and third waves using the technique of delayed acknowledgement (usually with a time limit, otherwise the recipient will mistakenly think that a retransmission is needed).
  • Fourth wave

    • After receiving the release request from USER B, user A sends A confirmation reply to user B and enters the time-wait state. The state lasts for 2MSL (maximum segment lifetime, which refers to the duration of the packet segment in the network. The timeout will be discarded). If there is no resending request from B within this period, the state is CLOSED. When B receives the confirmation reply, it also enters the CLOSED state.

1.5 Why does A enter the time-wait state and WAIT 2MSL before entering the CLOSED state?

In order to ensure that B can receive A’s confirmation. If A enters the CLOSED state directly after sending the confirmation reply, if the confirmation reply does not arrive due to network problems, B cannot be CLOSED normally.

If after A sends an ACK response directly into the CLOSED state, if this response is lost or because the network delay problems haven’t reached within 2 MSL B, then B waiting after the timeout will resend A FIN bag, but at this point A has shut down, always can’t get A response, resulting in B can’t normally CLOSED forever

1.6 Why is the TIME_WAIT state required

1.6.1 To achieve reliable release of TCP full-duplex connections

This allows TCP to send the last ACK again in case the ACK is lost (the other end times out and reissues the last FIN). Another consequence of this 2MSL wait is that the TCP connection’s socket (the client’s IP address and port number, the server’s IP address and port number) that defines the connection can no longer be used during the 2MSL wait. This connection can only be used after 2MSL ends.

1.6.2 To make old data packets disappear on the network due to expiration

Each specific TCP implementation must select a message segment maximum lifetime MSL. It is the maximum amount of time that any packet segment is in the network before being discarded.

1.7 Why is establishing a connection a three-way handshake, but closing a connection a four-way wave?

  1. During connection establishment, the server receives a SYN packet in LISTEN state and sends the ACK and SYN packets to the client.

  2. When the connection is closed, the server receives a FIN packet from the peer party, indicating that the peer party no longer sends data but can still receive data, and that the server may not send all data to the peer party

2. ARQ (Timeout retransmission) protocol

[!NOTE] Ensures the correct delivery of data through confirmation and timeout mechanism. ARQ protocol includes stopping waiting FOR ARQ and continuous ARQ

2.1 Stop waiting for ARQ

Normal transmission process

Whenever A sends A packet to B, it stops sending and starts A timer to wait for the reply from the peer end. After receiving the reply from the peer end within the timer time, it cancels the timer and sends the next packet.

When a packet is lost or an error occurs:

Packet loss during packet transmission: When the timer expires, the packet loss data will be sent again until the peer end responds. Therefore, you need to back up the sent data every time.

A packet error occurs during transmission: The peer end discards the packet and waits for retransmission.

PS: Generally, the timer setting time is greater than the average time of an RTT.

ACK timed out or lost:

The reply transmitted by the peer end may also be lost or timed out. If the timer expires, end A retransmits packets. When receiving A packet with the same serial number, end B discards the packet and replies until end A sends the next packet with the same serial number.

The disadvantage of this protocol is that the transmission efficiency is low. In a good network environment, each packet must wait for the ACK of the peer end.

2.2 continuous ARQ

In continuous ARQ, the sender has a sending window and can continuously send the data in the window without receiving any reply. In this way, the waiting time is reduced and the efficiency is improved compared with stopping the waiting ARQ protocol.

2.2.1 Cumulative confirmation

In continuous ARQ, the receiver continuously receives packets. It would be a waste of resources if it stopped waiting for a packet to be received in ARQ and then sent a reply. After multiple packets are received, a reply packet can be sent. The ACK in the packet can be used to tell the sender that all the data before the sequence number has been received. Please send the data with the sequence number + 1 next time.

But there is a downside to cumulative validation. When receiving packets with serial number 5, you may not receive packets with serial number 6, but receive packets later than serial number 7. In this case, ACK can only reply 6, which causes the sender to send data repeatedly. In this case, Sack can be used to solve the problem.

2.2.2 Sliding Windows

I talked about the send window. In TCP, both ends maintain Windows: the sender window and the receiver window respectively.

The sender window contains data that has been sent but not received, and data that can be sent but not sent.

After receiving the reply packet, the sender slides the window

Sliding window realizes flow control. The receiver notifies the sender of how much data can be sent through the packet to ensure that the receiver can receive the data in time.

Zero window

During packet sending, a zero window may appear on the peer end. In this case, the sender stops sending data and starts a Persistent timer. The timer periodically sends requests to the peer end to inform the peer end of the window size. If the number of retries exceeds a certain number, the TCP connection may be interrupted.

3. Handle congestion

Congestion processing is different from flow control, which acts on the receiver to ensure that the receiver receives data in time. The former works on the network to prevent excessive data congestion and excessive network load.

Congestion processing includes four algorithms: slow start, congestion avoidance, fast retransmission, and fast recovery.

3.1 Slow start algorithm

[!NOTE] The slow start algorithm, as its name implies, expands the sending window exponentially from 1 at the beginning of transmission to avoid network congestion caused by large amounts of data being transmitted at the beginning.

The slow start algorithm steps are as follows

  1. Connection initial set Congestion Window to 1 MSS (maximum amount of data in a segment)
  2. The window is multiplied by two for every RTT
  3. Exponential growth certainly cannot be unlimited, so there is a threshold limit, and congestion avoidance algorithms are activated when the window size is greater than that threshold

3.2 Congestion avoidance algorithm

[!NOTE] Congestion avoidance algorithm is simpler. The size of each RTT window increases by one, which can avoid network congestion caused by exponential growth and slowly adjust the size to the optimal value.

If the timer times out during transmission, TCP considers that the network is congested and performs the following operations immediately:

  • Set the threshold to half of the current congestion window
  • Set the congestion window to 1 MSS
  • Enable congestion avoidance algorithm

3.3 Fast Retransmission

Fast retransmission usually comes with fast recovery. Once the received packets are out of order, the receiver responds only to the sequence number of the last packet (in the case of no Sack). If three duplicate ACKS are received, fast retransmission is started instead of waiting for the timer to expire. There are two specific algorithms:

4. TCP summary

4.1 Why is TCP so complex?

[!NOTE] Because we want to ensure reliability and at the same time maximize performance

4.1.1 Mechanism for ensuring reliability

  • The checksum
  • Serial numbers (arriving in sequence)
  • Confirmation reply
  • Timeout retransmission
  • Connection management
  • Flow control
  • Congestion control

4.1.2 Mechanisms for improving performance

  • The sliding window
  • The fast retransmission
  • Delayed response
  • From the reply

4.2 timer

  • Timeout Retransmission timer
  • Keepalive timer
  • TIME_WAIT timer

4.3 Application Layer Protocols based on TCP

  • HTTP
  • HTTPS
  • SSH
  • Telnet
  • FTP
  • SMTP