TCP Data Format

Source port/destination portProcesses on a computer need to communicate with other processes through a computer port, and a computer port can only be occupied by one process at a time. Therefore, by specifying the source port and the destination port, you can know which processes need to communicate. The number of the source port and the destination port is 2 bytes each. The number of ports can be calculated as 65535. The source port of the client is a random port that is enabled temporarily.

The serial number is 4 bytes, indicating the number of the first byte of the data sent by this article. Each byte of the byte stream passed in a TCP connection is sequentially numbered. Since the sequence number is represented by 32 bits, every 2^32-1 bytes, the sequence number is wound, again starting at 0

Confirmation no.

The number of the first data byte in the next packet to be received is 4 bytes.

Data migration

It is the length of the header of a TCP packet (the calculated offset of the start address of the data segment). Maximum :(2^4-1) *4=60

keep

6, reserved for future use, but at present should be 0

Emergency pointer URG

When URG=1, it indicates a high-priority packet and the emergency pointer field is valid.

Confirm ACK

This field is valid only if ACK=1. TCP specifies that all packets transmitted after a connection is established must have ACK set to 1

Push PSH

PSH=1 indicates data with a PUSH flag, indicating that the receiver should deliver the segment to the application layer as soon as possible without waiting for the buffer to fill up.

Reset RST

It takes one place. When RST is 1, it’s a serious error. You may need to recreate the TCP connection. It can also be used to reject invalid segments and to reject connection requests.

Synchronous SYN

When SYN=1 and ACK=0, it indicates that this is a connection request or a connection acceptance request, which is used to create a connection and synchronize the sequence number. If the connection is agreed, then the response packet should set SYN=1 and ACK=1

Termination of the FIN

When FIN=1, the sender has no data to transmit and requests to release the connection.

window

2 bytes: indicates the number of bytes that the sender can receive from the confirmation number, that is, the size of the receiving window. Used for flow control.

Inspection and

The value is 2 bytes. The entire TCP packet segment, including THE TCP header and TCP data, is calculated in 16-bit words. This is a mandatory field.

Pointer to an emergency

2 bytes. The sequence number of the last byte of the emergency data in this article

options

Option field – up to 40 bytes. Each option begins with a 1-byte KIND field that describes the type of the option, and a 1-byte Length field that describes the number of bytes that the type of the option is in.

Kind=0: End of option table (1 byte)

Kind=1: No operation (1 byte) used for word boundary alignment between option fields.

Kind=2: Maximum Segment Size (MSS) This option is usually specified in packets with the SYN flag set when a connection is created. It indicates the Maximum Segment length that the local end can receive. Generally, if the MSS is set to (MTU-40) bytes, the length of the IP datagram carrying the TCP packet segment does not exceed the MTU (the maximum length of THE MTU is 1518 bytes and the minimum length is 64 bytes) to avoid IP fragments on the local device. This parameter can be used only in the synchronization segment. Otherwise, it is ignored.

Kind=3: Window expansion factor (3 bytes, wscale), ranging from 0 to 14. The number of bits used to shift the value of a TCP window to the left, multiplying the window value. This parameter can be used only in the synchronization segment. Otherwise, it is ignored. This is because the current TCP receive data buffer (the receive window) is usually longer than 65535 bytes.

Kind=4: sackOK – The sender supports and agrees to use the SACK option.

Kind=5: SACK optional confirmation option.

Left Edge: 4 bytes, Left border Right Edge: 4 bytes, a pair of Right boundary information need 8 bytes, because of the TCP header options section up to 40 bytes, so, The SACK option carries a maximum of four groups of boundary information. Maximum number of bytes used by the ACK option = 4 * 8 + 2 = 34Copy the code

Kind=8: timestamp (10 bytes, TCP Timestamps Option, TSopt)

Timestamp Echo Reply field (Timestamp Value field, TSval, 4 bytes)Copy the code

Reliable transport

  1. First, a three-way handshake is used to establish the TCP connection, and a four-way wave is used to release the TCP connection, so as to ensure that the established transmission channel is reliable.

  2. Secondly, TCP uses continuous ARQ protocol to ensure the correctness of data transmission, and uses sliding window protocol to ensure that the receiver can process the received data in time and control the flow.

  3. Finally, TCP uses slow start, congestion avoidance, fast retransmission, and fast recovery for congestion control to avoid network congestion.

The three-way handshake establishes a TCP connection

Assume that client A actively opens the connection, while server B passively opens the connection.

  1. TCP server PROCESS B creates A transmission control block, TCB, and is ready to accept client process A’s connection request. Then server B enters the LISTEN state.
  2. Client A creates the transmission control block TCB and sends A connection request (SYN) packet to server B. The same part of the packet header is 1 and carries the random initial sequence number x (i.e. Seq =x), and client A enters the SYN-sent state. According to TCP, the SYN packet segment (SYN=1) cannot carry data, but requires a sequence number.
  3. After receiving the request packet, server B sends a SYN/ACK packet if it agrees to the connection. In the acknowledgement packet, SYN=1, ACK=1, ACK= x+1, and a random serial number Y (that is, SEq =y) should be carried. At this time, server B enters the SYN-RCVD (synchronously received) state. This packet also does not carry data, but it also consumes a serial number.
  4. After receiving an ACKNOWLEDGEMENT (SYN/ACK packet), client A also sends an ACKNOWLEDGEMENT (ACK packet) to server B. ACK=1, ACK= y+1, and seq=x+1. Then, the TCP connection is ESTABLISHED and client A enters the ESTABLISHED state.
  5. Server B enters the ESTABLISHED state after receiving confirmation from client A.

The purpose of a “three-way handshake” is to create a trusted communication connection over an unreliable channel, requiring each party to notify the other of the start value of their packet’s serial number.

Release the TCP connection by waving four times

After the data transmission is complete, both parties can release the connection. We start with client A and server B in the ESTABLISHED state, then client A shuts down and server B shuts down

  1. Client A sends A connection release packet and stops sending data. Release the header of the data packet, FIN=1, ACK=1, and its sequence number is SEq = U (equal to the sequence number of the last byte of the data that has been transmitted plus 1). At this time, client A enters the FIN-wait-1 state. According to TCP, a FIN packet segment consumes a serial number even if it does not carry data.
  2. Server B receives the connection release packet and sends a acknowledgement packet with ACK=1, ACK= U +1 and its serial number seq= V. Server B enters the close-wait state. The TCP server B notifies the application process at the upper level, and the client A is released to the server B. At this time, the client A is in the semi-closed state. That is, client A has no data to send, but client A still accepts the data sent by server B. This state will continue for a period of time, that is, the entire duration of the closewait state.
  3. When client A receives the acknowledgement request from server B, client A enters the FIN-wait-2 state and waits for server B to send the connection release packet (before receiving the last data sent by server B).
  4. After server B finishes sending the last data, it sends A connection release packet to client A. FIN=1, ACK=1, ACK= U +1. Since server B is in the semi-closed state, it is likely to send some more data, assuming that the serial number at this time is seq=w, at this time, Server B enters the last-ACK state, waiting for client A to confirm.
  5. After receiving the connection release packet from server B, client A must send ACK=1, ACK= w+1, and its serial number is SEq = U +1. In this case, client A enters the time-wait state. Notice In this case, the TCP connection has not been released. Client A enters the CLOSED state only after 2MSL (maximum packet segment lifetime) expires and client A withdraws the corresponding TCB.
  6. Server B enters the CLOSED state as soon as it receives the acknowledgement from client A. Again, canceling the TCB terminates the TCP connection. As you can see, server B closes the TCP connection earlier than client A.

Stop waiting protocol

Assume that A is the sender and B is the receiver

Stop waiting means that after sending a packet, the sender stops sending the packet and waits for the acknowledgement from the recipient. The next group is sent after the acknowledgement is received.

Error free situation

Each time a group is sent, an acknowledgement of a group is received and the next group is sent

Any errors

There are two ways things can go wrong:

  1. B received packet M, and an error occurred during detection
  2. Packet M was lost during transmission

In the first case, after detecting an ERROR in M, B dismisses M and does nothing else (and does not send data to notify A of receiving the errant packet). In the second case,B doesn’t receive a 1, and waits, and doesn’t send data.

Confirm the missing

After B receives M, the sent acknowledgement of M is lost. If user A does not receive confirmation within the set timeout retransmission period, user M will be retransmitted. At this time, B receives M again and takes the following two steps

  1. Discard the duplicate partition
  2. Send confirmation to A

Confirmation late

B is late for M’s confirmation. In this case, due to timeout, A retransmits M to B. B dismisses the same group M and retransmits it for confirmation.

Continuous ARQ protocol

Because the stop wait ARQ protocol channel utilization is too low, it needs to use continuous ARQ protocol to improve. The protocol sends a continuous set of packets and then waits for an ACK of those packets. Continuous ARQ specifies that each time the sender receives an acknowledgement, he slides the sending window forward by one group.

Send window:

Groups in the sending window can be sent consecutively without waiting for the other party’s confirmation. Any data that has been sent must be retained until confirmation is received for use in the case of a timeout retransmission.

Cumulative confirmation:

Instead of sending acknowledgments for each received packet, the receiver sends acknowledgments for the last one that arrives in sequence after receiving several packets.

Go-back-n (rollback N) :

Indicates that N sent groups need to be retransmitted. If the sender sends five packets and the third packet is lost, the receiver can only acknowledge the first two packets. If the sender does not receive the confirmation information of the next three groups within a certain period of time, the sender can only retransmit the next three groups (SACK only needs to send the missing groups).

SACK(selective acknowledgment) The INITIAL TCP protocol that takes cumulative acknowledgment is inefficient when it comes to packet loss. For example, suppose 10,000 bytes of data are sent through 10 groups. If the first packet is lost, under pure cumulative acknowledgment, the receiver cannot say that it successfully received 1,000 to 9,999 bytes but did not receive the first packet containing 0 to 999 bytes. As a result, the sender may have to retransmit all 10,000 bytes.

To do this, TCP adopts the Selective Acknowledgment (SACK) option. RFC 2018 defines this as a discontinuous block that allows a receiver to acknowledge that it has successfully received a packet. In the example above, the receiver can issue a SACK indicating the serial numbers 1000 through 9999, so the sender knows only to resend the first packet (bytes 0 through 999).

The SACK option is not mandatory. Used only when both ends are supported. The SACK details are negotiated in the TCP header when a TCP connection is created.

The SACK information is placed in the options section of the TCP header

Kind: occupies 1 byte. A value of 5 indicates that this is the SACK option

Length: The Length is 1 byte. Indicates how many bytes are consumed by the SACK option

Left Edge: 4 bytes, Left Edge

Right Edge: 4 bytes, Right Edge

A pair of boundary information takes 8 bytes, and since the options section of the TCP header is up to 40 bytes,

Therefore, the SACK option carries a maximum of four sets of boundary information

Maximum number of bytes used by the ACK option = 4 * 8 + 2 = 34

The sliding window

Assume that A is the sender and B is the receiver

The sliding window protocol maintains a sliding window between the sender and the receiver. The sender is the sending window and the receiver is the receiving window, and this window can slide forward over time. It allows senders to send multiple groups without waiting for confirmation. TCP’s sliding window is measured in bytes.

In the send window: data that has been sent and received acknowledgement: not in the send window and send buffer; Data sent but not acknowledged: within the sending window; Data allowed to be sent but not yet sent: within the send window; Data that is temporarily not allowed to be sent in a buffer outside the send window.

In the receive window: data that has been sent for acknowledgement and delivered to the host: not in the receive window and the receive buffer; Data not received in sequence: within the receiving window; Allowed data: within the receive window; Data not allowed to be received: within the send window.

  1. All data that has been sent must be retained temporarily until confirmation is received for retransmission in case of timeout.
  2. Only after sender A receives the acknowledgement packet segment from the receiver, the sender window can slide forward several sequence numbers.
  3. When the data sent by sender A has not received an acknowledgement for A period of time (controlled by the timeout timer), the n-step backstep protocol is used to return to the place where the acknowledgement number was last received and resend the data (the SACK only needs to send the missing packet).

Flow control

Flow control: used to avoid the host group sent too fast so that the receiver can not fully receive, generally by the receiver notice to the sender for control.

If the cache of the receiver is full and the sender is still frantically sending data, the receiver can only discard the received packets. A large number of packet loss will greatly waste network resources, so traffic control is required.

TCP uses the sliding window protocol for flow control. The sending rate of the sender is controlled by acknowledging the window field in the packet. The size of the sending window of the sender cannot exceed the size of the window given by the receiver.

When the size of the receiving window is 0, the sender stops sending data and starts the Persist Timer to avoid the deadlock on both sides of the connection due to the loss of packets that modify the receiving window. The sender cannot send data until it receives the instruction from the receiver to modify the window. When the hold timer expires, the TCP sender tries to resume sending a small ZWP packet (Zero Window Probe), expecting the receiver to reply with a confirmation packet with the new receiving Window size. Generally, ZWP packets are set to 3 times, and if it is 0 after 3 times, some TCP implementations will send RST to break the link.

Congestion control

Congestion control: The behavior of the sender and receiver to modify the flow of data by estimating network congestion based on packet acknowledgement or packet loss and timers.

Preventing excessive data injection into the network, and avoiding router or link overload on the network, is a global process that involves all hosts, routers, and all the factors associated with degrading network transmission performance. In contrast, flow control is the control of point-to-point communication.

The sender makes its sending window equal to CWND (Congestion Window). The principle for the sender to control the congestion window is as follows: As long as there is no congestion on the network, the congestion window can be larger and the number of packets sent can be controlled according to the carrying capacity of the network to obtain high performance and avoid congestion collapse. Once congestion occurs (confirming that the packet is not received on time) or is likely to occur, you must narrow the congestion window.

Modern implementations of TCP include four interplaying congestion control algorithms: slow start, congestion avoidance, fast retransmission, and fast recovery.

Slow start

The INITIAL value of CWND (Congestion Window) is small and then doubles as the packet is acknowledged by the recipient (receiving an ACK). After THE CWND reaches ssthresh(threshold), it increases in a linear manner.

Congestion avoidance

Congestion avoidance (increase in addition): Congestion avoidance phase in which the congestion window is slowly increased to prevent premature network congestion.

Fast retransmission

When a packet segment is received out of sequence, the sender immediately reacknowledges the received packet segment. As long as the sender receives three consecutive reacknowledgments, it knows that the recipient does not receive the packet segment and immediately retransmits the packet.

Fast recovery

When the sender receives three consecutive double acknowledgments indicating congestion on the network, the “multiply-reduce” algorithm is performed to reduce ssthresh to half of the peak congestion. The difference with the slow start algorithm is that the slow start algorithm is not performed now, that is, CWND does not revert to its initial value. Instead, the CWND value is set to the new SSthRESH (threshold)(the reduced value), and then the congestion avoidance algorithm (” additive increase “) is executed to slowly increase the congestion window linearly.

The problem

Why the three handshakes?

Have failed to prevent A has A connection request message segment after the connection is released and transferred to the B, in the absence of the third handshake, B receives A request message, and send A confirmation message directly lead to waste of resources, A connection is established after the third time so you need to A shake to confirm for the first time to shake hands is effective. If A does not send the packet, B’s second handshake (acknowledgement synchronization segment) is ignored and B’s acknowledgement is not acknowledged. Because B does not receive confirmation, it knows that A does not ask to establish A connection.

What if the server receives the SYN from the client and sends a SYN/ACK response?

If a server receives a SYN from a client and then sends a SYN/ACK back to the client and the client goes offline, but the server does not receive an ACK from the client, the connection is in an intermediate state and neither succeeds nor fails. Therefore, if the server does not receive an ACK from the client within a certain period of time, it will resend the SYN/ACK. In Linux, the default number of retries is 5, and the retry interval doubles from 1 second each time. The five retries are 1s, 2s, 4s, 8s, 16s, and a total of 31s. After the fifth retry, it takes 32s to know that the fifth retry has timed out. In total, it takes 1s + 2s + 4s+ 8s+ 16s + 32s = 63s for TCP to disconnect the connection.

Why does client A end up waiting for 2MSL?

  1. Ensure that the last ACK packet sent by client A can reach server B. If server B does not receive the packet, client A will retransmit its FIN+ACK packet segment. Client A will receive the retransmitted packet within this 2MSL period, then send A response packet, and restart the 2MSL timer.
  2. Prevents invalid Connection request segment in three-way Handshake. After A sends the last ACK packet segment, after 2MSL, all the packet segments generated during the duration of this connection disappear from the network, ensuring that the packet segment of the old connection does not appear in the next new connection.
  3. To prevent a new process from being assigned to this port, the new process receives the connection release packet sent by server B. The new process might want to connect to server B.

Why is it four waves to close a connection?

After receiving a SYN request packet from the Client, the Server can directly send a SYN+ACK packet. ACK packets are used for response and SYN packets are used for synchronization. However, when closing the connection, the Server may not close the SOCKET immediately after receiving a FIN packet. Therefore, it must first reply an ACK packet to the Client, saying, “I received the FIN packet you sent.” I can only send FIN packets when all packets on the Server are sent, so they cannot be sent together. So you need a four-step handshake.

What if the connection is established, but the client suddenly fails?

TCP also has a keepalive timer that ensures that in the event of a client failure, the connection is released within a certain period of time, saving resources. The keepalive timer is refreshed every time the server receives customer data, and the setting time is usually 2 hours. If no data is received from the client within two hours, the server sends a probe packet segment and then sends it every 75 seconds. If there is no response from the client after 10 consecutive probe segments, the server considers the client to be faulty and closes the connection.

Why divide data into segments at the transport layer rather than wait until the network layer and then slice it to the data link layer?

Because reliable transmission is controlled at the transport layer, if there is no segmentation at the transport layer, once the data is lost, the data of the whole transport layer has to be retransmitted. If there is segmentation at the transport layer, once the data is lost, only those lost segments need to be retransmitted, which can improve the performance of retransmission.