As we saw in the last class, TCP retransmits packets when they are lost. What is the retransmission mechanism? TCP has two independent mechanisms to complete retransmission. One is based on time and the other is based on acknowledgement messages (SACK). The second method is usually more efficient than the first.

4.1 a brief introduction

When TCP sends data, it sets a timer. If no data confirmation message is received after the timer expires, a timeout or retransmission operation based on the timer is triggered. When the timer expires, it is called retransmission timeout (RTO). Another type of retransmission is called fast retransmission and usually occurs without delay. If TCP cumulative acknowledgement fails to return a new ACK, or if the ACK contains select acknowledgement information (SACK) indicating an out-of-order packet, fast retransmission will infer that packet loss occurs.

4.1.1 Simple Timeout and retransmission

“TCP in detail” cited an example, very good, with Telnet connection, connection after completion of the disconnect again, and then capture packets, captured packets can experience the mechanism of overpass retransmission, each retransmission is 2 times later than the previous one. After a few retransmissions, you will be prompted that the connection failed. TCP has two thresholds that determine how to retransmit the same packet segment. R1 represents the number of times (or wait time) TCP is willing to try to retransmit before delivering “negative advice” (such as reevaluating the current IP path) to the IP layer. R2(greater than R1) indicates when TCP should drop the current connection. For THE SYN packet segment, set R2 to at least 3 minutes.

On Linux systems, for general data segments, the values of R1 and R2 can be set either by an application or using the system configuration variables net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2. Tcp_retries1 The default value is 3. The default value of tcp_retries2 is 15 and the connection takes about 13 to 30 minutes, depending on the RTO of the specific connection.

TCP needs to adapt to different environments and may change over time. Therefore, you need to set the timeout value based on the current status. For example, if a network connection fails and needs to be re-established, the RTT will change (possibly dramatically). That is, TCP dynamically sets the RTO (retransmission timeout).

4.1.2 Simple calculation method of RTO

The original TCP specification [RFC0793] calculated a smooth ESTIMATE of RTT (called SRTT) using the following formula: SRTT = α (SRTT) + (1-α) RTTs

Here, SRTT is the result of TRRs based on existing values and new sample values. The constant α is the smoothing factor and the recommended value is 0.8 to 0.9. This method is called an exponentially weighted moving average or low barrel filter.

RTO = nIN (ubound, Max (lbound, (SRTT)β) RTO = nin(ubound, Max (lbound, (SRTT)β)

Where β is the delay discretization factor and the recommended value is 1.3 to 2.0, ubound is the upper bound of the RTO (the recommended value can be 1 minute), and lbound is the lower bound of the RTO (the recommended value can be 1 second). The value range of RTO is greater than 1 second and less than 2 times RTT.

Is that enough to calculate the RTO?

In the case of large-scale changes in RTT, the result calculated by the above formula is relatively small, because the last result accounts for a large number of retransmissions, which will lead to network overload.

In order to solve the above problems, the original method can be improved to adapt to the situation of large changes in RTT. A more accurate estimate can be obtained by recording changes in RTT measurements and their mean values. We can also measure mean and variance (or standard deviation) at the same time to better estimate future values. A good prediction of the range of possible RTT values can help TCP set an RTO value that will fit most situations.

The mean deviation is a good approximation of the standard deviation, but it’s easier and faster to calculate. So in this next version, we’re going to combine the mean with the mean deviation. The following formula can be used for each RTT measured value M (formerly called RTT) :

SRTT = (1 - g) (SRTT) + M (g) / / calculate the SRTT is the same with the previous rttvar = (rttvar) + (1 h) (h) (| | M - SRTT) / / calculate average deviation RTO = SRTT + 4 (rttval)Copy the code

As before, SRTT is average, rttval for absolute error | err | EWMA. Err is the deviation between the measured value M and the current RTT estimated value SRTT. Increment G is the weight of the new RTT sample M to the SRTT estimate, which is 1/8. Increment H is the weight of the new mean deviation sample to the ESTIMATED VALUE RTTVAL, which is 1/4. When the average deviation is larger (i.e., when large fluctuations occur), the RTO is larger.

4.1.3 Calculation of RTT

The formula above makes the calculation of RTO clear, but each RTO calculation requires measuring the value of RTT.

So how do you measure RTT?

The TCP clock is running during the RTT measurement. The actual TCP clock does not start from zero when the connection is first established, nor is it absolutely accurate. As long as the TCP time, will increase with the system clock, THE TCP clock a “tick” length of time called granularity. Typically, this value is relatively large (about 500ms), but Linux systems use a finer granularity of 1ms.

RTO = Max (SRTT + Max (G, 4(rTTVal)), 1000)

Here, G is the granularity of the timer, and 1000ms is the lower bound of the entire RTO. Thus, the RTO is at least 1s and provides an optional upper bound, say 60s.

4.1.4 the initial value

We’ve been talking about values that vary in time, but what about values at initialization?

TCP cannot set the value of the RTO before the first SYN exchange. It is also impossible to set the initial value of the estimator unless provided by the system.

According to [RFC6298], the initial value of RTO is 1s, and the timeout interval adopted by the initial SYN packet segment is 3s. When the first RTT measurement result M is received, the estimator initializes as follows: SRTT = M rTTVal = M/2

4.1.5 Methods used in Linux

Before I saw an introduction to Linux retransmission mechanism, did not save, today did not find, is really wireless inserted willow shade, next time encountered just carefully read a wave, today to their own words to write, write much more inclusive, also please point out.

It’s hard to draw this picture, but you can also draw it to deepen your impression.

TSV=0 initial phase: The SYN timeout period is set to 3s, which is not shown in the figure because it is before the connection. TSV=0 When the server receives the SYN, it replies with an ACK, and writes the TSV value to the TSER. After receiving the ACK, the client receives M for the first time. SRTT =m=16, rTTVal =m/2=16/2=8 is equal to a variable introduced in Linux,mdev uses the estimate of the instantaneous mean deviation using the standard method, Mdev =m/2=16/2=8, mdev_max is also introduced to record the maximum Mdev in the process of measuring RTT samples, rttVAL =mdev_max(mdev, TCP_RTO_MAX=50ms), So rttval is equal to 50, and rto is equal to SRTT plus 4 times rttval is equal to 216;

Normal calculation: After receiving an ACK from the server, the client sends an ACK to the server again. This is the three-way handshake process. After the three-way handshake is complete, the client enters the transmission phase. At this time, the client sends two 1400 packets. Because the interval between the two packets is relatively small, the TSV does not increase. (This is also a problem). After receiving two packets, the server replies with ack=2801. This value is told in the sliding window. At this point, the client receives the ACK and calculates the RTO again:

m=223-127=96
mdev=mdev*3/4+|m-srtt|*1/4=8*3/4+|96-16|*1/4=26ms
mdev_max=rttval=max(mdev_max, mdev)=max(50, 26)=50
srtt=srtt*7/8+m*1/8=16*7/8+96*1/8=26
RTO=26+4*50=226
Copy the code

Perfect. This is the result of the second calculation.

Failure to receive: If an ACK cannot be returned in time and the ACK of a subsequent packet is returned, calculate the RTO based on the TSER timestamp in the ACK rather than the latest timestamp sent by the client.

Linux has the following advantages using this mechanism:

  • The time granularity is 1ms
  • Introduced instantaneous mean deviation mdev, and the maximum value of mdev within a range mdev_max, to ensure that rTTVal does not become too small
  • Reduce the rTTVAL ratio

if(m < (srtt – mdev)) mdev = (31/32)mdev+(1/32)|srtt-m| else mdev = (3/4)mdev+(1/4)|srtt-m|

4.1.6 Timer based retransmission

From the above explanation, we have understood, RTO value setting. Before setting the timer, record the serial number of the timed packet segment (otherwise, you do not know whose it is after receiving an ACK). If the timer receives an ACK of the packet segment, the timer is cancelled. Later, when the sender sends a new packet, it needs to set a new timer and record the new serial number. So the sender of the TCP connection is constantly setting off a retransmission timer, and if the data is lost, the retransmission mechanism will start when the timer expires.

4.2 Fast Retransmission

Fast retransmission, so let’s get this over with. Ha ha.

It is much less efficient to wait for a timeout timer to retransmit if the receiving end already knows which packet we have lost, so TCP proposes a retransmission based on the feedback from the receiving end called fast retransmission.

When there is out of order grouping in the network – if the receiver receives the subsequent grouping of the current expected sequence number, the current expected packet may be lost or may only arrive late. Usually we don’t know which case this is because TCP waits for a certain number of duplicate ACKS (called duplicate ACK threshold or Dupthresh) to determine whether data is lost and trigger a fast retransmission. Normally dupthRESH is a constant (value 3), but Linux systems can adjust this value dynamically based on the current degree of disorder.

If SACK is not used, at most one packet segment can be retransmitted before a valid ACK is received. With SACK, the ACK can contain additional information, allowing the sender to fill multiple gaps in each RTT time.

4.2.1 Retransmission with Selection confirmation

When I captured the packet, I did not find the SACK packet, it was a pity, but in the SYN packet, I turned on the SACK enable, and I will take a screenshot when I encounter it later.

** Each SACK block contains the sequence number range of the most recently received packet segment. ** Due to limited SACK space, ensure that the latest messages are provided to TCP senders whenever possible. The rest of the SACK blocks also contain contents in order of receipt.

For example, if you don’t have a picture, you don’t have a soul, but if you can’t find the bag, you can just write.

The sender sent 1-26601 pieces of data, received 1-23801 and received 25201-26601. ACK =23801 SACK = [25201-26601]. Assuming that the sender received three repeated ACK packets, fast retransmission would be triggered. Then send the packet 23801-25201. (I hope I got that example right)