Network Protocol – Transport Layer (TCP&UDP)

January 31, 2024

by Brent Carroll

No Comments

The two most important protocols at the transport layer areTCP protocolandUDP protocol.“Is often asked together in an interview

The difference between TCP and UDP
UDP
- UDP data format
  
  First if a complete UDP packet to the target machine first verifies if the MAC address and its own has the data of MAC layer to remove the IP encapsulation, and then compare the IP address and its own are consistent, consistent after and then get a UDP packet (target machine by IP layer encapsulation protocol type which agreement comes ZhiYong), Now data is received, but the data to give who, many applications specific to the target machine to application, this time you need the port number, each application will listen to a port number, so the TCP and UDP header is arguably has a port number, so the UDP data structure is as follows (compared to TCP or simple a lot)The source port is the local port number, because the data must be sent back, so you need to put your own port number in it. The destination port number is the port number of the target machine.
  
  The size of the UDP header is 8 bytes (TCP takes up 20 bytes).
  - UDP test and
    
    The main function is error detection, but if the check error will directly lose the packet without retransmission function. If fragment upload fails, the packet will be discarded. Therefore, UDP datagrams are generally used for transmission. It is recommended that the size of UDP datagrams be less than 1500 (that is, fragment upload should be avoided as much as possible).
    
    The calculation content of check sum is: pseudo head + head + data
    
    Pseudo header: a fixed virtual header is used only for calculation verification and will not be placed in UDP datagrams. It is 12 bytes in size and contains the source IP address, destination IP address, protocol type, UDP length, and a reserved bit fixed at 0. (Calculation verification and the need to add false header is TCP/IP protocol provisions of the purpose is to enhance the verification function)
    
    The diagram below:
  - port
    
    2 bytes, that is, the scope of the port number is 0 ~ 65535 above have probably said to a target port, is used to identify the received information is passed to the who, like our own server, and can open a number of ports, and then our project more than one want to use a server, this time you can open multiple ports, Then run a server software (Tomcat) on each port and deploy our back-end code to the corresponding service software so that the different port numbers can return the correct data.
    
    But at the same time there is a source port number, that is, our client port number, the client port is a temporary random open port, not the server port number it needs to always open always listen. The temporary port number will be closed when the client data communication is complete.
    
    The following are the default protocol ports for common protocols (which can also be changed by modifying the Tomcat configuration file)
- UDP characteristics
  1. Need less resources, UDP header only takes up 8 bytes, because UDP thinking is very simple, you ask me to send data I send, I will not care about whether the traffic is crowded, I directly set out, at the same time UDP will not care about the problem of packet loss, lost lost, will not be retransmitted. With so many fewer constraints, nature needs fewer resources
  2. There is no need to establish a link. Before sending a message, he does not need to know the situation of the target server, that is, the target server is shut down in time and cannot communicate. He also sends the message directly as long as he receives it
  3. Fast processing speed, low latency, can tolerate a few packet loss. Even when the Internet is congested, it is unflinching
TCP
- TCP data structure
  
  As you can see from this figure, TCP headers are much more complex than UDP headers. UDP headers are only 8 bytes, while TCP headers are at least 20 bytes
  - Source port number and destination port number
    
    As with UDP, I won’t go into details here
  - The serial number
    
    TCP is a byte stream transmission oriented, so every their corresponding to a label, data transmission and transmission of the serial number is the serial number of the first byte of data to the other party, it is important to note the serial number is not necessarily the serial number of the first byte of data, the data is according to the other side window (window will learn later) size transmission, The data may be very large and the window is very small, so the data needs to be transmitted in segments, so the need is the number of the first byte of each segment
  - Confirmation no.
    
    After establishing a connection, the confirmation number represents: Expect to pass each other the next TCP data part of the serial number of the first byte, such as in the example above, make the window size is smaller than the size of the data when the data is segmented transmission, if the first section of the receiving end, the receiver to release the cache then said goodbye to the sender received, can accept a piece of data, the next leg of the serial number of the data tell the sender, As shown below: Note: the normal response should not be every time but after several times, figure 2 is mainly for learning
  - Data migration
    
    This value multiplied by 4 is the length of the header, and if you have a question here you will find that the TCP header does not store the total length of the TCP packet, but the UDP header does, and in fact this total length can be calculated from the network layer data and the TCP header, The total length of the UDP header is a bit redundant. The total length of the UDP header is a bit redundant. The length of the UDP header is a bit redundant.
  - Flags
    - URG The emergency pointer field is valid only when URG=1. Indicates that urgent data exists in the current packet segment and should be transmitted as soon as possible
    - ACK The confirmation number field is valid only when ACK is 1
    - PSH, as opposed to URG, acts on the receiver, causing the receiver to accept the data first
    - RST If the RST value is 1, the TCP connection needs to be released
    - SYN=1 and ACK=0 are the first two packets of TCP three-way handshake. If the other party agrees to establish a connection, it replies with SYN=1 and ACK=1. This is easier to understand because only the ACK=1 confirmation number is useful and then the data will be sent only when the confirmation number is known as the serial number to be sent. In other cases, the confirmation number is invalid, so the data will not be sent
    - FIN When the value of FIN is 1, data is sent and the connection needs to be released
  - window
    
    This field has the function of flow control, which is used to tell the other party how much data is allowed to be sent next time (in bytes). The machine reads the corresponding cache of data, then releases it and continues reading the data
  - Inspection and
    
    This is the same as UDP and the inclusion calculation is the same
  - Pointer to an emergency
    
    Marking the current data requires priority processing, and of course the flag bit determines whether the emergency pointer is valid
- Reliable transport – Stop waiting for THE ARQ protocol
  
  There is a timeout period. If the sent message does not wait for the message returned by the receiver within the timeout period, the message is regarded as lost. Then, the message just passed needs to be retransmitted. There are three situations as shown in the following figure: In normal cases, it is easy to understand that the next piece of data can be transmitted only after receiving it. Abnormal data is shown in the figure above. However, there is a problem here that the channel can only transmit one piece of data at a timeContinuous ARQ + sliding window protocol
- Reliable transmission – continuous ARQ + sliding window protocol
  
  The stop-wait protocol can only send one message at a time, and the protocol can send several messages at a time, depending on the window size. The sender and receiver maintain the send and receive Windows, respectively, and each time the sender receives an acknowledgement, it slides the send window forward one group position. The recipient generally adopts the accumulative acknowledgement mode. That is, the recipient does not need to acknowledge the packets one by one. After receiving several packets, the recipient can acknowledge the last packet in sequence, which indicates that all packets to this position have been correctly received. As shown below:
  
  This solves the problem of channel utilization as shown in the figure below:However, in the process of TCP communication, if a certain packet is lost in the transmitting sequence (for example, 1, 2, 3, 4 and 5 are lost), the last confirmed packet is 2, so 3, 4 and 5 are retransmitted, which also reduces the performance of TCP, so SACK is developed
- Reliable transmission -SACK(Selective confirmation)
  
  SACK information is stored in the TCP header options section,
  - Kind: 1 byte. A value of 5 indicates that this is the SACK option
  - Length: contains 1 byte. Indicates how many bytes the SACK option occupies
  - Left Edge: 4 bytes, Left Edge
  - Right Edge: 4 bytes, Right Edge
  As shown in the figure belowSo we can identify which paragraph is the problem
  
  A pair of boundary information takes up to 8 bytes, and the SACK option carries up to 4 sets of boundary information because the TCP header has a maximum of 40 bytes of options
- Flow control
  
  If the other side of the buffer is full, does not stop at this point to send data, the receiver will throw in the data directly, using flow control, the practice is in TCP phase please tell the sender buffer has how old, how much is my window, constraints, the sender can send no more than the window size of data, Every time the receiver to receive at the end of the data in the buffer will be the size of the current window back to the sender, when the window size is zero as the receiver could not handle the data when the sender is not temporarily as the sender to send data (when the receiver can handle data here will give the sender sends a message window is non-zero, The sender can then continue to send data.)
  
  The above solution seems perfect but there are a few minor problems. For example, if a message with a non-zero window is lost, how does the sender know to continue sending the message? The solution is as follows: When the sender receives a message with a window of 0, it starts a timer. When the time is up, it sends a message inquiring about the window size. If it still receives a message with a window of 0, it refreshes the timer once and so on
- Congestion control
  
  As shown in the above each link has his throughput or bandwidth, once more than change the bandwidth of the link can cause congestion at this time will be lost package, so a lot of people think that to achieve the maximum throughput will be in accordance with the maximum transmission, this is the ideal state, as shown above when you haven’t reached the maximum network has begun to congestion, once the congestion packet loss is much, And then some packets kept retransmitting and finally deadlocked. So you need congestion control. But in contrast to flow control (point-to-point, end-to-end constraints), congestion control is the control of the entire network, like the law of the road, which every driver has to obey. Congestion control is mainly achieved through the following methods:
  - Noun explanation
    - Maximum Segment Size (MSS) : The Maximum Segment Size is determined during connection establishment. For example, TCP packets are 1500-20 (network header) -20 (TCP header) =1460
    - CWND (congestion window) : Congestion window
    - RWND (receive window) : indicates the receive window
    - SWND (send window) : SWND = min(CWND, RWND)
  - Slow start
    
    At first, the control can only send one packet, and one packet can be sent to find that the network is not blocked. Then the size of the positive CWND (doubled) is shown in the following figure:
  - Congestion avoidance
    
    Start slow to a certain pointThreshold SSthresh (slow start threshold) then the CWND begins to grow slowly to prevent premature congestion, after which the fence halves the threshold ssthRESH and then restarts slowly as shown below
  - Fast Retransmit
    
    Fast retransmission algorithm requirementsFirst, the receiver sends a repeat acknowledgement immediately after receiving an out-of-order segmentInstead of waiting for your own data to be sent for piggy-in confirmation. The receiver has successfully received M1 and M2 sent by the sender and sent ACK respectively. Now the receiver has received M4 instead of M3. Obviously, the receiver cannot confirm M4, because M4 is an out-of-order message segment. If according to the principle of reliability transmission receiver and do nothing, but according to the fast retransmission algorithm, after receiving the M4, M5, while waiting for the message segment, repeated to sender send ACK of M2, if the receiver for receiving three repetitive ACK, so the sender don’t have to wait for the retransmission timer expires, retransmission unconfirmed message as soon as possible.
  - Fast Recovery
    
    When three repeated confirmation messages are received, network congestion occurs, and then the threshold value is halved. Different from the above, the threshold value is not set to slow start, that is, CWND is directly set to the threshold value and then the addition increases
  - Fast retransmission + fast reply
- TCP Connection Management
  - Establish a link – three handshakes
    
    If TCP wants to carry out data transmission, it needs to establish a link first. If the link is not established successfully, data transmission will not be carried out. The general steps are as follows:
    1. A sends SYN=1, seq=x, len=0, that is, sends A packet with 0 packets
    2. B replies with SYN=1, ACK=1 seq=y, ACK= x+1 len=0
    3. ACK=1 seq=x+1 ACK= y+1
    The three rounds are also easy to understand. In the first step, A initiates A request to establish A connection and tells B the start digit of my serial number. When B receives the request to establish A connection, it replies that A can establish A connection and also tells A the start digit of my serial number. ACK=1, the last time it sent x, but the data it sent was 0, so it wants A to send the initial data of x+1. ACK= x+1, of course, is A response to the connection, so there is no packet transmission. And then finally step three, when A gets an answer from B and then gives B an answer, But it’s no longer A link request so there’s only ACK, because the last message from B needs to be y and then len=0 so A wants the next message to start at y+1 so ACK =y+1 and then B wants me to send the data starting at x+1 so my seq=x+1, and then three handshakes are done the
    
    There are A couple of questions here, why is it A three-way handshake, if it’s A two-way handshake where A receives A response from B and then starts creating A link and then starts sending data, But is if the establishment of A link in the case of A request for A long time did not meet B, now come and establish A connection timeout time and then start the retransmission request this time is very well after and then pass the data to release link, but at this time for the first time of A request link to reach the B, B, and don’t know if this request link is valid so they sent to the agreed to establish A link So at this moment, and established A link, and then have not hair message so won’t reply B doesn’t send data to B, also do not want to establish A connection is caused by A and B thought that wants to build A link unrequited love, so at this time is A waste of resources, and of course as B program designer for such A long time not contract the client can take the initiative to close to release resources, But that can take a while, and of course some people ask why not 4 handshakes, but you can do 4 handshakes or even 100 handshakes, but there is no guarantee of reliable transmission, so generally as long as the two sides of the message back and forth is ok.
    
    Three-way handshake in addition to build links will also tell the own window size, MSS, and serial number start bit this is more important, if there is no need to start bit, for instance, after A connected B sent 1, 2, 3, three packages, but send 3, lost in the middle or detours, will send again later at this time A drop, Then connect it to B again. At this time, it only wants to send 1 and 2. But last time, 3 came back by A detour. Therefore, unconnected links need a different serial number, which needs to change over time.
    
    After the trust link is established, both parties will maintain a state machine, and the sequence diagram of state changes is as follows:
    
    Initially, both the client and the server are CLOSED. The server listens to a port and is in LISTEN state. Then the client initiates a SYN connection and is in the SYN-sent state. The server receives a SYN from the initiated connection and ACK the client with a SYN. Then it is in syn-RCVD state. After receiving the SYN and ACK from the server, the client sends the ACK of the ACK and then enters the ESTABLISHED state because the ACK is successfully received. After receiving the ACK, the server is in the ESTABLISHED state because it also receives the ACK.
  - Release link – Four waves
    
    Both sides of the communication can initiate the release of the link first, but the link can be disconnected only after four waves of the hand. The four waves of the hand and the three shakes of the hand almost maintain the same state as shown in the figure below:You must be in the link state until you send a request to disconnect the link.
    - First wave: When the sender sends the first disconnection request (FIN=1 packet), the sender enters fin-WaIT-1 state and waits for the reply from the receiver
    - Second wave: When the recipient receives the request for disconnection, it replies an ACK message to the recipient and enters close-wait state. In this state, the need to consider whether or not they have data to send to each other, if not, send FIN message to each other, it is important to note if the recipient has information needs can continue to send a message to the sender don’t send FIN a message to the sender) after the sender receives an ACK packet will enter into the FIN – WAIT – 2 state, (If FIN and ACK are received at this time, the third wave is not needed to directly enter the fourth time)
    - Third wave: If the receiver receives a FIN packet and replies to the sender, and has no other data to send to the sender, the receiver sends a FIN=1,ACK=1 packet, and then enters the last-ACK state, waiting for the sender to send an ACK packet
    - Fourth wave: After receiving the FIN=1,ACK=1 packet, the sender immediately replies to the receiver, and then enters time-wait state for a certain period of TIME (2MSL) before closing the state (to prevent the packet generated in this connection from being mistransmitted to the next connection (because all the packets in this connection will disappear within 2MSL).
      
      MSL indicates the Maximum Segment Lifetime. MSL indicates the Maximum Segment Lifetime for which a packet is discarded. TCP packets are BASED on THE IP protocol, and there is a TTL field in the IP header, which is the maximum number of routes that an IP packet can pass through. This value decreases by 1 for each router that processes the PACKET. When this value is 0, the packet is discarded and an ICMP packet is sent to the source host. The protocol stipulates that MSL is 2 minutes, but in practical application, 30 seconds, 1 minute and 2 minutes are commonly used.
      
      The main problem
      - Why is four wave First link after both of us can send a message, after both sides have no data needs to be sent to disconnect the link to each other, so why is four times, simply put the receiver after receiving the initiator of the FIN request cannot be terminated the connection immediately, it need to make sure if they still have the content of the reply, waiting for to reply to end, In order to prevent the initiator from sending repeated requests, an ACK needs to be replied immediately.
      - If the sender sends an ACK and immediately releases it, and the receiver does not receive an ACK from the sender due to network reasons, the receiver will resend the FIN. The sender does not respond, and the server will wait or even resend the FIN several times, wasting resources. In another case, a new application on the sender happens to be assigned the same port number. Upon receiving the FIN, the new application immediately disconnects from the sender, which might have wanted to establish a connection with the receiver
      - More than 2MSL of time, still did not receive the recipient of the FIN ACK how to do according to the principle of TCP, B of course will resend the FIN, this time after A received this package, A said, I have been waiting here for such A long time, has been completely humane, after I do not recognize, So they just send RST, and THEN B knows that A is long gone.
  - TCP state machine
- Issue review
  - In the continuous ARQ protocol + sliding window is mentioned can transfer data according to the size of the window of the receiver, such as the window size can now accept 4 bags, you will wait to receive a after 4 bags will give the sender a confirmation, but the problem is coming, if the sender at this time only send two packages, so when the receiver send confirmation? The answer is that the receiver will have a waiting time (TCP by default), and when the waiting time is up and no acknowledgement is received, the sender will send an acknowledgement
  - When talking about reliable transmission and flow control above, it was mentioned that if the data is too large, it will be divided into many segments at the transmission layer, instead of waiting for the network layer to split. Why is this? Is the answer should be in the network layer without reliable transmission mechanism, such as a 6000 – byte packets are subdivided into 4 bags shard to upload, so once there is a lost package problems, then the whole data needs to be shard upload, if you split the data operations on the transport layer, the same was subdivided into four period of 1, 2, 3, 4, If 3 is missing, the receiver will tell the sender that 3 is missing and send it again, thus improving the retransmission performance
- Comparison between TCP and UDP