The body of the

We all know that TCP is a reliable transport protocol, so how does it ensure reliability?

In order to achieve reliable transmission, there are many things to consider, such as data corruption, packet loss, duplication, and disordered sharding. If these problems cannot be solved, there is no reliable transmission.

Then, TCP is through serial number, acknowledgement reply, resend control, connection management, window control and other mechanisms to achieve reliable transmission.

Today, we will focus on TCP’s retransmission mechanism, sliding window, flow control, and congestion control.

The retransmission mechanism

One of the ways TCP implements reliable transmission is through serial numbers and acknowledgments.

In TCP, when the data from the sender reaches the receiving host, the receiving host returns an acknowledgement acknowledgement message indicating that the message has been received.

However, in the complex network, it is not necessarily as smooth as the above data transmission can be normal, what if the data is lost in the transmission process?

Therefore, TCP uses the retransmission mechanism to solve packet loss.

Here are some common retransmissions:

  • Timeout retransmission

  • The fast retransmission

  • SACK

  • D-SACK

Timeout retransmission

One of the retransmission mechanisms is to set a timer when sending data. If no ACK packet is received within a specified period of time, the data will be retransmitted, which is often called timeout retransmission.

TCP timeout retransmission occurs in one of two situations:

  • Packet loss

  • Acknowledgement reply lost

How much should the timeout be set?

RTT (round-trip Time) : RTT (round-trip Time) : RTT (round-trip Time) : RTT (round-trip Time) : RTT (round-trip Time)

RTT is the time it takes for data to travel from one end of the network to the other, or the round trip time of a packet.

Retransmission Timeout (RTO) indicates the Retransmission Timeout (RTO).

Suppose in the case of retransmission, what happens when the timeout RTO is “long or short”?

In the figure above, there are two cases where the timeout duration is different:

  • When the timeout RTO is large, the retransmission is slow. It takes a long time to retransmit, which is inefficient and has poor performance.

  • When the timeout RTO is small, the RTO may be resended without being lost, which increases network congestion and leads to more timeouts. More timeouts lead to more resends.

It is important to accurately measure the timeout RTO to make our retransmission mechanism more efficient.

According to the above two cases, we can know that the value of the timeout retransmission time RTO should be slightly greater than the value of the packet round trip RTT.

At this point, you may think that the value of the timeout retransmission time RTO calculation, is not very complicated.

RTT = T1 — T0; RTT = t0; RTT = T0; It’s not that simple. It’s just a sample. It doesn’t represent the general situation.

In fact, the “RTT value for round-trip packets” is always changing, because our network is always changing. Because the RTT value of the round-trip message is constantly fluctuating, the RTO value of the timeout retransmission time should be a dynamic value.

Let’s see how Linux calculates RTO.

To estimate the round trip time, you usually need to sample the following two:

  • TCP takes the time to sample the RTTS, and then performs a weighted average to calculate a smooth RTT value, which is constantly changing because the network conditions are constantly changing.

  • In addition to sampling the RTT, sample the fluctuation range of the RTT, so as to avoid the situation that if the RTT has a large fluctuation, it will be difficult to detect.

RFC6289 recommends using the following formula to calculate RTO:

SRTT is a smooth RTT, and DevRTR is the difference between smooth RTT and the latest RTT.

In Linux, α = 0.125, β = 0.25, μ = 1, ∂ = 4. Don’t ask how. Ask what comes from a lot of experiments.

If the data needs to be retransmitted after a timeout, the TCP policy is to double the timeout interval.

That is, each time a timeout retransmission is encountered, the next timeout interval is set to double the previous value. If two times out, it indicates that the network environment is poor. Therefore, it is not advisable to send packets repeatedly and frequently.

The problem with timeout triggering retransmission is that the timeout period may be relatively long. Is there a faster way?

The “fast retransmission” mechanism can be used to solve the time waiting for the timeout retransmission.

The fast retransmission

TCP has another Fast Retransmit mechanism, which is not time-driven, but data-driven.

How does the fast retransmission mechanism work? It’s quite simple. A picture is worth a thousand words.

In the figure above, the sender sent 1,2,3,4,5 pieces of data:

  • The first Seq1 was sent first, so Ack back to 2;

  • Seq2 did not arrive for some reason, Seq3 arrived, so Ack returned 2;

  • Seq4 and Seq5 have arrived, but Ack returned 2, because Seq2 has not been received.

  • The sender receives three acknowledgments of Ack = 2 and knows that Seq2 has not been received, it will retransmit the lost Seq2 before the timer expires.

  • Finally, Seq2 is received, and since Seq3, Seq4, and Seq5 are received, Ack returns 6.

Therefore, the fast retransmission mode is that when three identical ACK packets are received, the lost packet segments are retransmitted before the timer expires.

The fast retransmission mechanism solves only one problem, timeout, but it still faces another. When you retransmit, do you retransmit the previous one, or do you retransmit all the questions?

For example, in the example above, how about retransmitting Seq2? Or retransmit Seq2, Seq3, Seq4, Seq5? This is because the sender does not know who sent the three consecutive Ack 2s.

Depending on the implementation of TCP, either of these scenarios is possible. So this is a double-edged sword.

To solve the problem of not knowing which TCP packets to retransmit, there is the SACK method.

Method the SACK

Another way to implement the retransmission mechanism is called Selective Acknowledgment (SACK).

This approach requires adding a SACK to the “options” field of the TCP header to send the cached map to the sender, so that the sender can know which data was received and which was not. Knowing this information, only the lost data can be retransmitted.

As shown in the following figure, after receiving the same ACK packet for three times, the sender triggers the fast resend mechanism. According to the SACK information, only the data segment 200 to 299 is lost, and only this TCP segment is selected for resend.

If SACK is to be supported, both must be supported. Under Linux, this function can be turned on with the net.ipv4.tcp_sack parameter (enabled by default after Linux 2.4).

Duplicate SACK

Duplicate SACK, also known as D-SACK, uses SACK to tell the sender what data is being duplicated.

Here are two examples of d-sack: chestnuts.

Chestnut no. 1: ACK packet is lost

  • Both ACK acknowledgments sent by the “receiver” to the “sender” are lost, so the first packet (3000 ~ 3499) is retransmitted after the sender times out.

  • Therefore, the “receiver” finds that the data has been received repeatedly, so it sends back a SACK = 3000~3500, telling the “sender” that the data between 3000 and 3500 has been received, because the ACK has reached 4000, which means that all the data before 4000 has been received. SACK, therefore, stands for D-sack.

  • In this way, the “sender” knows that the data is not lost, but that the “receiver” has acknowledged the loss of the ACK packet.

Chestnut 2: Network delay

  • The packets (1000 to 1499) are delayed by the network. As a result, the sender does not receive the Ack 1500 acknowledgement packet.

  • The following three identical ACK confirmation packets are triggered, but after the retransmission, the delayed packets (1000~1499) are sent to the “receiver”.

  • So the “receiver” returns a SACK=1000~1500, because the ACK has reached 3000, so the SACK is d-sack, indicating that duplicate packets have been received.

  • This way the sender knows that the rapid retransmission triggers not because the outgoing packet is lost, nor because the responding ACK packet is lost, but because the network is delayed.

D-sack has several benefits:

  1. You can let the “sender” know whether the sent packet is lost or the ACK packet that the receiver responded to is lost.

  2. You can tell if the “sender” packet is being delayed by the network;

  3. You can know whether the network has copied the “sender” packet;

On Linux this function can be turned on/off with the net.ipv4.tcp_dsack parameter (enabled by default after Linux 2.4).


The sliding window

The reason for introducing the concept of Windows

We all know that TCP makes an acknowledgement every time a data is sent. When the previous packet receives a reply, the next packet is sent.

It’s a little bit like me talking to you face to face, you talking to me. But the disadvantage of this method is that it is less efficient.

If you finish a sentence and I’m dealing with something else and I don’t get back to you in time, then you don’t have to wait for me to finish something else and I get back to you before you say the next sentence, which is obviously not realistic.

So, there is a downside to this transmission: the longer the round-trip time of the packet, the less efficient the communication.

To solve this problem, TCP introduced the concept of Windows. It does not reduce the efficiency of network communication even in the case of long round trips.

Now that you have a window, you can specify the window size, which is the maximum amount of data you can continue sending without waiting for an acknowledgement.

The implementation of the window is actually a buffer space created by the operating system, and the sender host must keep the sent data in the buffer until the acknowledgement reply is returned. If an acknowledgement is received on schedule, the data can then be cleared from the cache.

Assuming that the window size is three TCP segments, the sender can “continuously send” three TCP segments. In addition, if an ACK is lost in the process, the sender can “acknowledge with the next acknowledgement reply”. The diagram below:

The ACK 600 acknowledgement reply packet in the figure is lost, which is ok because the next acknowledgement reply can be called. As long as the sender receives an ACK 700 acknowledgement reply, it means that all data “recipients” up to 700 have received it. This pattern is called cumulative acknowledgment or cumulative response.

Which side decides the window size?

There is a field in the TCP header called Window, which is the Window size.

This field is where the receiver tells the sender how many buffers it has left to receive data. The sender can then send data according to the processing power of the receiver without causing the receiver to overprocess the data.

So, usually the size of the window is determined by the recipient.

The size of the data sent by the sender cannot exceed the size of the window of the receiver; otherwise, the receiver cannot receive the data normally.

The sliding window of the sender

Let’s take a look at the window of the sender first. The figure below is the data cached by the sender, which is divided into four parts according to the processing situation. The dark blue box is the sending window, and the purple box is the available window:

  • #1 is the data that has been sent and received as an ACK acknowledgement: 1 to 31 bytes

  • #2 is data that has been sent but has not received an ACK acknowledgement: 32 to 45 bytes

  • #3 is unsent but the total size is within the receiver’s processing range (the receiver still has space) : 46 to 51 bytes

  • #4 is unsent but the total size exceeds the receiver’s processing range (the receiver has no space) : 52 bytes later

In the figure below, when the sender sends “all” of the data at once, the size of the available window is 0, indicating that the available window is exhausted and the data cannot be sent until the ACK confirmation is received.

In the figure below, after receiving the ACK confirmation response of 32~36 bytes of previously sent data, if the size of the sending window does not change, the sliding window moves 5 bytes to the right, because 5 bytes of data are acknowledged by the response, and the next 52~56 bytes become available window again. Then you can send 52 to 56 bytes of data.

How does the program represent the four parts of the sender?

The TCP sliding window scheme uses three Pointers to track bytes in each of the four transport categories. Two of the Pointers are absolute (referring to a specific sequence number) and one is relative (offsets are required).

  • Snd. WND: indicates the size of the sending window (the size is specified by the receiver).

  • Snd.una: is an absolute pointer to the sequence number of the first byte sent but not acknowledged, i.e., the first byte of #2.

  • Snd.nxt: Also an absolute pointer to the sequence number of the first byte of the unsent but deliverable range, i.e., the first byte of #3.

  • The first byte to #4 is a relative pointer. It takes the snD. UNA pointer offset by the size of snd. WND to point to the first byte of #4.

Then the available window size can be calculated as:

Available window size = snD.wnd – (snD.nxt – snD.una)

The sliding window of the receiver

Next, let’s look at the receiver’s window. The receiver window is relatively simple and divided into three parts according to the situation it handles:

  • #1 + #2 is the data that has been successfully received and acknowledged (waiting for the application process to read);

  • #3 is not received data but can receive data;

  • #4 Data cannot be received without receiving data;

Three of the receive sections are divided using two Pointers:

  • RCV.WND: indicates the size of the receiving window, which is notified to the sender.

  • NXT: is a pointer to the sequence number of the next data byte expected to be sent from the sender, that is, the first byte of #3.

  • The first byte to #4 is a relative pointer, which takes the RCV.NXT pointer offset by the RCV.WND size to point to the first byte of #4.

Are the receiving and sending Windows the same size?

Not exactly, the size of the receiving window is approximately equal to the size of the sending window.

Because sliding Windows are not fixed. For example, if the receiving application process reads the data very quickly, then the receiving window can be empty very quickly. The new receive window size is told to the sender by the Windows field in the TCP packet. So there is a delay in this transmission process, so the relationship between the receiving window and the sending window is about equal to.


Flow control

The sender should not mindlessly send data to the receiver. The processing power of the receiver should be taken into account.

If you send data to the peer without thinking, but the peer cannot process the data, the retransmission mechanism is triggered, and network traffic is wasted.

To address this, TCP provides a mechanism for the “sender” to control the amount of data sent based on the “receiver”‘s actual ability to receive. This is known as flow control.

For the sake of simplicity, consider the following scenario:

  • The client is the receiver and the server is the sender

  • Let’s say the receiving window and the sending window are the same, and they’re both 200

  • It is assumed that both devices maintain the same window size throughout the transmission process and are not affected by the outside world

According to the flow control in the figure above, illustrate each of the following processes:

  1. The client sends a request packet to the server. To be clear, this example uses the server as the sender, so the receiving window for the server is not drawn.

  2. After receiving the request message, the server sends an acknowledgement message and 80 bytes of data, so the Usable window is reduced to 120 bytes, and the snD. NXT pointer is offset 80 bytes to the right to point to 321, which means that the next time data is sent, the sequence number is 321.

  3. After the client receives 80 bytes of data, the receiving window moves 80 bytes to the right, and rsV.nxt points to 321. This means that the client expects the sequence number of the next packet to be 321, and then sends an acknowledgement packet to the server.

  4. The server sends another 120 bytes of data, so the available window is exhausted to zero, and the server cannot send any more data.

  5. After the client receives 120 bytes of data, the receiving window moves 120 bytes to the right, RCV.NXT points to 441, and then sends the acknowledgement packet to the server.

  6. After the server receives the 80-byte acknowledgement message, the snD. UNA pointer is offset to the right and points to 321, so the Usable window increases to 80.

  7. After the server receives the 120 byte acknowledgement message, the snD. UNA pointer is offset to the right and points to 441, so the Usable window increases to 200.

  8. The server can continue sending, so after sending 160 bytes of data, snD. NXT points to 601, so the available window Usable is reduced to 40.

  9. After the client receives 160 bytes, the receiving window moves 160 bytes to the right, RCV.NXT points to 601, and then sends the acknowledgement packet to the server.

  10. After the server receives a 160-byte acknowledgement message, the send window moves 160 bytes to the right, so the snD. UNA pointer is shifted by 160 to point to 601, and the Usable window increases to 200.

The relationship between operating system buffers and sliding Windows

In the previous flow control example, we assumed that the send window and the receive window were invariant, but in fact, the number of bytes stored in the send window and the receive window are stored in the operating system memory buffer, which is adjusted by the operating system.

It also affects our buffer when the application process cannot read the contents of the buffer in time.

How does worrying about the buffer of the system affect the send window and the receive window?

Let’s look at the first example.

Changes in the send and receive Windows when the application does not read the cache in time.

Consider the following scenario:

  • The client is the sender and the server is the receiver. The initial size of the sending window and receiving window is 360.

  • The server is very busy. When receiving data from the client, the application layer cannot read the data in time.

According to the flow control in the figure above, illustrate each of the following processes:

  1. After the client sends 140 bytes of data, the available window changes to 220 (360-140).

  2. The server received 140 bytes of data, but the server was very busy, the application process only read 40 bytes, and 100 bytes occupied the buffer, so the receive window shrank to 260 (360-100), and finally sent the acknowledgement, the window size was passed to the client.

  3. After the client receives the acknowledgement and window notification packets, the sending window is reduced to 260.

  4. The client sends 180 bytes of data, at which point the available window is reduced to 80.

  5. The server receives 180 bytes of data, but the application does not read any data. The 180 bytes are left directly in the buffer, so the receive window shrinks to 80 (260-180) and is sent to the client through the window size when the acknowledgement is sent.

  6. After the client receives the acknowledgement and window notification packets, the sending window is reduced to 80.

  7. After the client sends 80 bytes of data, the available window is exhausted.

  8. The server receives 80 bytes of data, but the application still does not read any data. The 80 bytes remain in the buffer, so the receive window shrinks to zero and is given the window size to the client when the acknowledgement is sent.

  9. After the client receives the acknowledgement and window notification packets, the sending window is reduced to 0.

You can see that the window shrinks to zero at the end, which means the window closes. When the sender’s available window goes to 0, the sender actually periodically sends a window probe packet to see if the receiver’s window has changed, which I’ll talk about later, but I’ll mention briefly here.

Let’s look at the second example first.

When server system resources are very tight, the worry system may directly reduce the size of the receive buffer, and the application can’t read the cache in time, then something serious can happen, and packets can be lost.

Describe each process below:

  1. The client sends 140 bytes of data, so the available window is reduced to 220.

  2. The server is very busy now, so the operating system reduces the receive buffer by 100 bytes. After receiving the acknowledgement packet for 140 bytes, the application is not reading any data, so the 140 bytes remain in the buffer, so the receive window size shrinks from 360 to 100. When the acknowledgement message is sent, Announce window size to each other.

  3. At this time, the client has not received the notification window message from the server, so it does not know that the receiving window shrinks to 100. The client only sees that there are 220 available Windows, so the client sends 180 bytes of data, and the available window is reduced to 40.

  4. When the server receives 180 bytes of data, it finds that the data size exceeds the size of the receiving window, so it loses the data packet.

  5. When the client receives the second step, the server sends the acknowledgement packet and the notification window packet. It tries to reduce the sending window to 100 and shrinks the right end and left end of the window by 80. At this time, the size of the available window appears oddly negative.

So, if you reduce the cache and then shrink the window, you will lose packets.

To prevent this, TCP rules do not allow reducing the cache and shrinking the window at the same time. Instead, the window is narrowed first and the cache is reduced over time to avoid packet loss.

The window closed

As we have seen previously, TCP performs flow control by letting the receiver indicate the size of the data (window size) it wants to receive from the sender.

If the window size is zero, the sender is prevented from passing data to the receiver until the window becomes non-zero, which is the window closing.

Window closing is potentially dangerous

When the receiver notifies the sender of the window size, it notifies the sender through ACK packets.

Then, when the window closes, the receiver will notify the sender of a non-0 ACK packet of the window after processing the data. If the ACK packet of this notification window is lost in the network, it will be in great trouble.

As a result, the sender is waiting for the non-0 window notification from the receiver, and the receiver is waiting for the data from the sender. If no action is taken, the mutual waiting process will cause deadlock.

How does TCP address potential deadlocks when Windows close?

To solve this problem, TCP has a duration timer for each connection that starts whenever one side of the TCP connection receives a zero window notification from the other.

If the timer times out, the device sends a Window Probe packet. When confirming the packet, the device indicates the size of the Window that it receives.

  • If the receive window is still 0, the recipient will restart the duration timer.

  • If the receive window is not zero, then the deadlock situation can be broken.

The number of window probes is typically three, each for about 30-60 seconds (depending on the implementation). If the receive window is still zero after three attempts, some TCP implementations will send an RST packet to break the connection.

Confused window syndrome

If the receiver is too busy to pick up the data in the receiving window, then the sender’s sending window will get smaller and smaller.

At the end of the day, if the receiver frees up a few bytes and tells the sender that there are now a few bytes of window, the sender will send the few bytes without a second thought, this is confused window syndrome.

You know, our TCP + IP header is 40 bytes long, and it’s uneconomical to spend so much money transferring those few bytes of data.

It’s like a bus that can carry 50 people, and every time one or two people come, it just leaves. Unless you have a mining bus driver at home, you’re gonna go broke. It is not difficult to solve this problem, the bus driver and other passengers more than 25, is considered to be allowed to leave.

As an example of confused window syndrome, consider the following scenario:

The window size of the receiver is 360 bytes, but the receiver is in trouble for some reason. Assume that the application layer of the receiver has the following read capability:

  • For every three bytes received by the receiver, the application can read only one byte from the buffer;

  • The application also reads 40 additional bytes from the buffer before the next sender’s TCP segment arrives;

The changes in the window size of each process are clearly described in the figure, you can find that the window is decreasing, and the data sent is relatively small.

So, the phenomenon of confused window syndrome can occur on both the sender and the receiver:

  • The receiver can advertise a small window

  • The sender can send small data

Thus, to solve the confused window syndrome, the above two problems can be solved

  • Let the receiver not notify the small window to the sender

  • Allows senders to avoid sending small data

How do you get the receiver not to notify the small window?

The usual policies of the receiver are as follows:

When the “window size” is less than min(MSS, cache space /2), that is, less than the minimum value between MSS and 1/2 cache size, the sender will be notified that the window is 0, which prevents the sender from sending data again.

When the receiver has processed some data and the window size is >= MSS, or the receiver has half of the cache space available, open the window for the sender to send the data.

How do you get senders to avoid sending small data?

The sender’s usual policy:

Nagle algorithm is used. The idea of this algorithm is delay processing. Data can be sent only when one of the following two conditions is met:

  • Wait until window size >= MSS or data size >= MSS

  • Received the ACK packet that sent data before

As long as none of the above conditions is met, the sender keeps hoarding data until the above conditions are met.

In addition, the Nagle algorithm is turned on by default, and should be turned off for scenarios that require small packet interactions, such as Telnet or SSH, which are more interactive.

This algorithm can be turned off by setting the TCP_NODELAY option on the Socket (there are no global parameters to turn off Nagle, so it needs to be turned off for each application).

setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, (char *)&value, sizeof(int));


Copy the code

Congestion control

Why congestion control when there is flow control?

The previous flow control is to prevent the “sender” from filling the “receiver” cache with data, but not knowing what is going on in the network.

Generally speaking, computer networks are in a shared environment. Therefore, communication between other hosts may cause network congestion.

When the network is congested, if a large number of packets continue to be sent, packet delay and loss may occur. At this time, TCP will retransmit data. However, a retransmission will result in a heavier burden on the network, which will lead to greater delay and more packet loss. .

Therefore, TCP cannot ignore what is happening on the network. It is designed to be a selfless protocol. When the network is congested, TCP sacrifices itself to reduce the amount of data sent.

Then there is congestion control, which is designed to prevent the “sender” from filling the network with data.

To adjust the amount of data to be sent at the “sender”, a concept called a “congestion window” is defined.

What is a congestion window? What does it have to do with the send window?

The congestion window CWND is a state variable maintained by the sender that dynamically changes according to the congestion level of the network.

As mentioned earlier, the sending window SWND and the receiving window RWND are approximately equal to each other. Since the concept of congestion window is introduced, the value of the sending window is SWND = min(CWND, RWND), which is the minimum value in the congestion window and the receiving window.

Congestion window CWND change rule:

  • As long as there is no congestion in the network, CWND will increase.

  • But when there is congestion in the network, CWND decreases.

So how do you know if the network is congested?

In fact, as long as the “sender” does not receive THE ACK reply packet within the specified time, that is, timeout retransmission occurs, the network is considered to be congested.

What are the congestion control algorithms?

Congestion control mainly consists of four algorithms:

  • Slow start

  • Congestion avoidance

  • Congestion occurs

  • Fast recovery

Slow start

TCP has a slow start process after the establishment of a connection. The slow start process means to increase the number of packets sent bit by bit. If a large amount of data is sent at the beginning, it will jam the network.

The slow-start algorithm just remembers one rule: The size of the CWND congestion window increases by one for each ACK the sender receives.

It is assumed that the congestion window CWND is equal to the sending window SWND.

  • After the connection is established, the initial CWND is set to 1, indicating that an MSS size of data can be transmitted.

  • When an ACK acknowledgement is received, the CWND increases by 1, so that two can be sent at a time

  • When two ACK confirmations are received, the CWND increases by two, so that two more can be sent, so that four can be sent this time

  • When these four ACK’s are confirmed, each confirmed CWND will increase by 1, and the four confirmed CWND will increase by 4, so there will be 4 more than before, so 8 can be sent this time.

It can be seen that the slow start algorithm, the number of packets sent is exponential growth.

So where does slow start end?

There is a status variable called Slow Start Threshold (SSthRESH).

  • When CWND < SSthRESH, the slow start algorithm is used.

  • When CWND >= SSthRESH, the congestion avoidance algorithm is used.

Congestion avoidance algorithm

As stated earlier, the congestion window CWND enters the congestion avoidance algorithm when it “exceeds” the slow start threshold SSthresh.

Generally the size of SSthRESH is 65535 bytes.

After entering the congestion avoidance algorithm, its rule is: every time an ACK is received, the CWND increases by 1/ CWND.

Attach the previous slow start chestnut, now assume that ssthresh is 8:

  • When 8 ACK confirmations arrive, each acknowledgement is increased by 1/8. The 8 ACK confirmations CWND is increased by 1, so that 9 can be sent this timeMSSThe size of the data becomesLinear growth.

Therefore, we can find that the congestion avoidance algorithm changes the exponential growth of the original slow start algorithm into a linear growth, or the growth stage, but the growth rate is a little slower.

As it continues to grow, the network will gradually become congested, resulting in packet loss and the need to retransmit lost packets.

When the retransmission mechanism is triggered, the congestion generation algorithm is entered.

Congestion occurs

When network congestion occurs, packet retransmission will occur. There are two main retransmission mechanisms:

  • Timeout retransmission

  • The fast retransmission

The congestion sending algorithms used by the two are different, and they are discussed separately.

Congestion generation algorithm with timeout retransmission

When a “timeout retransmission” occurs, the congestion generation algorithm is used.

At this point, the values of sshresh and CWND will change:

  • Ssthresh set to CWND /2,

  • CWND is reset to 1

Then, the slow start is restarted, which suddenly reduces the flow of data. This is really once “timeout retransmission”, immediately back to the liberation. But this way is too radical, the reaction is also very strong, will cause network card.

It’s like we were drifting at high speed in Akimine Mountain, and suddenly we have to brake, how can the tires stand it…

Congestion generation algorithm with fast retransmission

There are better ways. We talked about fast retransmission algorithms earlier. When a receiver realizes that a tundish is missing, it sends an ACK of the previous packet three times, so that the sender can quickly retransmit without waiting for a timeout.

TCP considers that this situation is not serious because most of it is not lost and only a small part is lost, then ssthresh and CWND changes as follows:

  • CWND = CWND /2, which is set to half of the original;

  • ssthresh = cwnd;

  • Enter the fast recovery algorithm

Fast recovery

The fast retransmission and fast recovery algorithms are usually used together. The fast recovery algorithm says that the fact that you can also get three repeat ACKS means that the network is not that bad, so it doesn’t have to be as strong as the RTO timeout.

As stated earlier, CWND and SSthRESH had been updated prior to moving into rapid Recovery:

  • CWND = CWND /2, which is set to half of the original;

  • ssthresh = cwnd;

Then, enter the fast recovery algorithm as follows:

  • Congestion window CWND = SSthresh + 3 (3 means confirmation that 3 packets were received)

  • The lost packets are retransmitted

  • If a duplicate ACK is received, the CWND increases by 1

  • If you set CWND to SSthRESH after receiving an ACK of new data, then you enter the congestion avoidance algorithm

That is, there is no “timeout retransmission” that goes back to pre-liberation overnight, but remains at a relatively high value, followed by a linear increase.

Source: Xiao Lin Coding