Interview hot | introduction to TCP/IP transport layer TCP BBR algorithm

This article first appeared on
Interview hot | introduction to TCP/IP transport layer TCP BBR algorithm

The technical compass of wechat public account

0 x00. Preface

This is the third article of the TCP/IP protocol stack series, in an interview before hot | understand TCP/IP transport layer congestion control algorithm about the basic principle of the traditional congestion control algorithm, learning together today under the latest Linux kernel increased congestion control algorithm: TCP BBR algorithm.

In view of the TCP congestion control algorithm behind a set of complex mathematical theory and control strategy, so this article can only talk about, through this article you will understand the following content (warm tips: long article needs some patience, also can read first) :

Review of traditional congestion control algorithms
Overview of TCP BBR algorithms
The principle of BBR algorithm is introduced

A Brief history of Congestion Control

TCP/IP had no congestion control until about 1988, but as the scale of network access grew, the only end-to-end window control was no longer sufficient, causing massive network outages in 1986. At this time, a heavyweight was mentioned: Van Jacobson.

The man who saved the day was inducted into the Internet Hall of Fame. Van Jacobson proposed and implemented TCP/IP congestion control, solving one of the biggest problems of his time. Here’s a brief wikipedia profile of Van Jacobson (which I edited and edited) :

Van Jacobson is the main draftsman of the TCP/IP stack that is currently the basis of Internet technology. He is known for his pioneering work in improving and optimizing network performance.

In August 2006, he joined PARC as a researcher and as chief scientist at Packet Design, a company located in the adjacent Xerox complex. Prior to that, he was chief scientist at Cisco Systems and head of the Network Research Group at Lawrence Berkeley National Laboratory.

Van Jacobsen is best known for his work on improving IP network performance and optimization. Between 1988 and 1989, he redesigned the FLOW control algorithm for TCP/IP (Jacobson’s Algorithm), He is best known for designing the VAN Jacobsen TCP/IP header compression protocol in RFC 1144. He also co-designed some of the most widely used network diagnostic tools, such as Traceroute, Pathchar, and tcpdump.

Van Jacobsen was inducted into the first class of the Computer Hall of Fame in April 2012.
www.internethalloffame.org/inductees/v…

Here is an introduction to the Van Jacobson Computer Hall of Fame:

I turned to Van Jacobson and Michael J. Karels’ November 1988 paper on congestion Avoidance and Control, which contains 25 pages and is available to interested readers:

Ee. The LBL. Gov/cca shut/cong…

We often use Tracetoute and Tcpdump are also the work of Van – Jacobson, as a beneficiary of the Internet age can not help but feel admiration and respect for the pioneers, innovators, and change makers who made great contributions to the early development of the Internet.

0x02. Review of traditional Congestion control algorithms

2.1 Algorithm Purpose

I read an article that said that TCP transport layer congestion control algorithm is not a simple computer network concept, but also belongs to the category of cybernetics, I feel that this view is reasonable.

The purpose of TCP congestion control algorithm can be summarized as: fair competition, full utilization of network bandwidth, reduce network latency, and optimize user experience. However, there are trade-offs and trade-offs to achieve these goals.

But now the level of network communication infrastructure has been rapidly improving, I believe that at some point in the future these goals can be achieved, children only choose, we adults all want!

2.2 Algorithm Classification

Before understanding the congestion control algorithm, we need to make clear A core idea: there are different specialties in different fields. In my opinion, this is A very important consensus problem. It is not A good practice to step A in the dirt and flatten B to the sky.

The actual network environment is very complex and changes quickly, and no congestion control algorithm can be all done, each algorithm has its own specific and applicable fields, each algorithm has been on several key points of weighing, under the condition of not have some algorithm to choose the bandwidth utilization, some algorithms to choose communication time delay, and so on.

After making clear this consensus problem, we should treat each congestion control algorithm’s attitude to be a little more calm, do not extreme think who is the best, the network situation a few decades ago and now is completely different, we will always stand on the shoulders of giants, this is also the impetus of scientific and civilized progress.

Traditional congestion control algorithm cannot be achieved overnight. The complex network environment and high requirements of users promote the optimization and iteration of congestion control algorithm. Let’s take a look at several iteration versions of traditional congestion control algorithm based on packet loss strategy, as shown in the figure:

At the same time, there is a class of algorithms based on RTT delay strategy to control, but this kind of algorithm may not be aggressive enough in the packet rate, the competitive performance is not as good as other algorithms, so it is unfair in the sharing of network bandwidth, but the rate curve of the algorithm is very smooth, let’s consider this kind of algorithm as gentleman for the moment!

Among them, the well-known Vegas algorithm was proposed by University of Arizona researchers Larry Peterson and Lawrence Burakoff around 1995. This new TCP congestion algorithm was named after las Vegas, the largest city in Nevada, and later became TCP Vegas algorithm.

For details on the RTT based TCP Vegas algorithm, please refer to the documentation:

www.cs.cmu.edu/~srini/15-7…

The document makes some comparison between Vegas algorithm and New Reno. We can see from the intuitive graph that Vegas algorithm is smoother, on the contrary, New Reno shows large fluctuations and jagged shape, as shown in the figure:

In fact, there are more fine-grained classification, which is not the focus of today, so it is no longer expanded in depth. The current congestion control algorithm is still Based on loss-based as the mainstream.

2.3 Algorithm Principles

We know that the number of connections in the network link is dynamic and huge, and every connection faces a black box network environment. It is not like when we travel, we can see where the traffic is blocked by looking at the map. In order to maintain a good network environment, every connection needs to follow some conventions.

If the connection end is not concerned about packets, then the network link will soon reach the bottleneck, data communication can not be guaranteed, so to reach a stable and efficient network environment or need to pay a lot of attention, which has two important concepts: fairness and convergence.

To my shame, THE author found a lot of data on the network to understand the fairness and convergence of TCP congestion control, but still did not get a good authoritative explanation, so can only combine some data and their own understanding to explain the so-called fairness and convergence.

The author thinks that fairness is relative to the case of all links in the network link, these Shared link connection to start and end time is different, in the actual process of the interaction of each link has a bandwidth of the opportunity is equal, and because the bandwidth limit access to both sides communication the amount of data is dynamic adjustment and approximate converges to a certain value, In other words, it presents a jagged or smoother wave curve, and AIMD plays a key role in the control of packet loss-based congestion control algorithms.

Let’s focus on the features of AIMD. First, let’s post a classic diagram to visualize the process of AIMD:

Check out wikipedia’s definition of AIMD:

The additive-increase/multiplicative-decrease(AIMD) algorithm is a feedback control algorithm best known for its use in TCP congestion control.

AIMD combines linear growth of the congestion window with an exponential reduction when congestion is detected.

Multiple flows using AIMD congestion control will eventually converge to use equal amounts of a shared link.

The related schemes of multiplicative-increase/multiplicative-decrease (MIMD) and additive-increase/additive-decrease (AIAD) do not reach stability.

Simple translation: The linear increase multiplicative reduction algorithm is a feedback control algorithm, which is well known for its use in TCP congestion control. AIMD combines the linear increase congestion window with the congestion time multiplier reduction window. Under the ideal state, multiple connections based on AIMD will reach the final convergence and share the same amount of network bandwidth. Neither the multiplicative multiplicative reduction MIMD strategy nor the multiplicative multiplicative reduction AIAD can guarantee the stability.

AIMD MIMD and compared AIAD entered the stage of congestion avoidance in the connection used to test linear plus strategy rather than a multiplicative and strategy is more secure, when detecting packet loss is multiplicative significantly reduced to 1/2 so can have better effect to alleviate congestion more quickly, if detected packet loss when using linear decrease AD may be longer the duration of the congestion, In general, AIMD is a relatively simple and practical feedback control of the engineering version, and it also has engineerable convergence, so it is widely used.

2.4 AIMD in weak network environment

Back more than 20 years ago, in the early days of the Internet, almost all devices were connected and communicated through the wired network. This is also an important factor that congestion control has played a good role since its design. The wired network is relatively stable, so it is normal to consider packet loss as a feature of network congestion.

Now, with the booming development of mobile Internet since 2010, the number of mobile terminals can be called massive. The introduction of wireless network makes the network environment more complex, so unstable packet loss becomes more frequent. However, at this time, the packet loss is not necessarily caused by network congestion. Because the whole packet through multiple routes, switches, base stations and other basic communication equipment each link may occur exceptions.

In the weak network environment, especially in the mobile Internet, the previous CONGESTION control strategy based on AIMD may greatly reduce the network throughput due to the occurrence of packet loss, and thus the utilization of network bandwidth will also be greatly reduced. In this case, we can adopt more radical control strategy to obtain better results and user experience.

In the case of malicious packet loss, congestion control based on THE AIMD is actually being speed-limited because the AIMD is a bit conservative, which makes sense.

As we all know, in the mobile network environment, the terminal interacts with the nearby base station in the wireless form, and then the data is transmitted to the core network and finally falls to the specific wired network where the server is located. The last kilometer area belongs to the high-delay scenario, and the wired network belongs to the low-delay high-bandwidth scenario.

Related experiments in foreign countries have proved the influence of RTT change on network throughput in weak network environment using traditional congestion control algorithm. The data and curve are shown in the figure:

Experimental meaning: The increase of RTT affects the slow start and other stages of congestion control algorithms such as CUBIC. We know that the congestion window CWND will double after every RTT cycle in the slow start stage, but a larger RTT means that the sender sends data at a very low rate and spends more time idle, so the acceleration of sending packets will be greatly reduced. So the overall throughput goes down quite dramatically.

Take a look at the experimenter’s original statement:

The delay before acknowledgment packets are received (= latency) will have an impact on how fast the TCP congestion window increases (hence the throughput).

When latency is high, it means that the sender spends more time idle (not sending any new packets), which reduces how fast throughput grows.

0x03 Introduction to the TCP BBR algorithm

BBR algorithm is an active closed-loop feedback system, generally speaking, according to the bandwidth and RTT delay constantly dynamic exploration to find the appropriate transmission rate and quantity.

Check out the wikipedia description and information on BBR algorithm:

Related literature:
Queue.acm.org/detail.cfm?…

TCP BBR(Bottleneck Bandwidth and Round-Trip Propagation Time) is a congestion algorithm designed by Google and released in 2016. Most congestion algorithms are based on packet loss as a signal to reduce transmission rate. BBR is based on model active detection.

The algorithm creates an explicit model of the network using the maximum bandwidth and round trip time of the network’s most recent outbound data packet. Each cumulative or selective acknowledgment of the packet transmission is used to generate a sampling rate that records the amount of data transmitted during the time between the packet transmission and the acknowledgment return.

The algorithm argues that packet loss should not be considered as the primary determinant of congestion identification as the network interface controller moves into gigabit speed, so the model-based congestion control algorithm can have higher throughput and lower latency, and BBR can be used to replace other popular congestion algorithms such as CUBIC. When Google applied the algorithm to YouTube, it increased the throughput of the YouTube network by 4% on average worldwide, and more than 14% in some countries. BBR was later ported to version 4.9 of the Linux kernel and is available for QUIC.

3.1 Possible problems based on the packet loss feedback strategy

As the packet loss feedback is a passive mechanism, the root cause is that these congestion control algorithms judge network congestion and make window reduction adjustment according to whether the packet loss event occurs. In this way, some problems may occur:

Packet loss is congestion. In reality, the network environment is very complex, and there will be error packet loss. Many algorithms cannot distinguish congestion packet loss from error packet loss well.
In network connections, routers, switches, core network devices, and so on have buffers to smooth out network fluctuations. These buffers are like the expansion of an infusion tube to smooth out data, but loss-based strategies initially act like data occurs on a network like flooding, Once the Buffer is full, there may be problems such as RTT increase and packet loss. This means that there is some capacity that should not be included in the Buffer, but the policy is predicated based on the included Buffer, especially in deep Buffer networks.
High network load but without packet loss event Assumption in the network load is already high, as long as there is no packet loss event, the algorithm will not take the initiative to reduce window to reduce the transmission speed, although this case made full use of network bandwidth, and because there has been no packet loss event sender still add window, show the strong network bandwidth aggressive, increased the network load pressure.
In the case of high load packet loss and high load without packet loss, the algorithm keeps adding Windows, so as to predict that the packet loss event may appear soon. Once the packet loss occurs, the window will show a multiplier reduction, and the rapid decrease of the transmission rate at high load will cause the instantaneous jitter of the whole network, presenting a large jagged fluctuation on the whole.
Low load high latency packet loss In some weak network environment RTT will increase even appeared the congestion causes packet loss, based on packet loss at this time the window of the feedback congestion algorithm will be small, the bandwidth utilization ratio is low, the throughput drop obviously, but in fact the network load is not high, so in weak network environment effect is not very ideal.

3.2 Basic principles of the TCP BBR algorithm

Front we mentioned some problems of Loss – -based algorithm, TCP BBR algorithm is a kind of active mechanism, no longer simply BBR algorithm Based on packet Loss judgment and is no longer use the AIMD linear increase multiplicative decrease strategy to maintain the congestion window, but sampling estimate maximum bandwidth and minimum delay respectively, and the two product as send window, And BBR has introduced Pacing Rate to limit the data transmission Rate, which is used in conjunction with CWND to reduce impact.

3.2.1 Terms

BDPBDP is the abbreviation of bandwidth-delay Product, which can be translated as Bandwidth Delay Product. We know that the unit of Bandwidth is BPS (bit per second) and the unit of Delay is S, so the dimensional unit of BDP is bit. Thus, we know that BDP is an indicator to measure the data volume of a link over a period of time. This image can be understood as the pipe irrigation problem, bandwidth is of flow velocity in pipe m3 / s, delay is s irrigation time unit, the product we can know the current store water pipes, it is a key indicator of BBR algorithm, to see a tao-fai great god the figure in the article, and some network scenario BDP calculation:

Long fertilizer network The network with long RTT round trip time and high bandwidth is called long fertilizer network or long fertilizer pipeline. Its bandwidth delay product BDP is very large, and this kind of network has a large throughput in theory, which is also the focus of research.
The TCP Pacing mechanism can be simply understood. The TCP Pacing mechanism is to smooth the transmission of packets in the congestion control, avoid data burst and reduce network jitter.

3.2.2 Measurement of bandwidth and delay

Some of the ideas of THE BBR algorithm have appeared in previous delay-based congestion control algorithms, the most famous of which is TCP WestWood algorithm.

TCP Westwood is a refinement of the New Reno. Unlike previous congestion control algorithms that use loss measurements, TCP Westwood adjusts the congestion window and slow start threshold by measuring the acknowledgment packet to determine an appropriate send speed. It improves the slow start stage algorithm for agile detection and designs a method of continuous detection of congestion window to control the entry into agile detection and make the link use as much bandwidth as possible.

TCP WestWood algorithm is also designed based on the product of bandwidth and delay, but bandwidth and delay can not be measured at the same time, because these two values are some contradictory extreme values. To measure the maximum bandwidth, the maximum amount of data should be sent, but the RTT at this time may be large. If you have to measure the minimum RTT for that long that means the amount of data is very small and the maximum bandwidth is not available.

TCP BBR algorithm uses alternate sampling to measure two indicators, and takes the maximum bandwidth value and minimum delay value within a period of time as the estimated value. The specific implementation will not be expanded in this paper.

3.2.3 Transmission rate and RTT curve

As mentioned above, the core of BBR algorithm is to find the optimal working point of BDP. A combined curve graph is given in the relevant paper. Let’s take a look:

This graph is a combination of two graphs. The graph shows the relationship between [RTTvs network data] and [RTTvs network data]. The horizontal axis is the amount of network data.

The two vertical axes from top to bottom are RTT and send rate respectively, and the whole process is divided into three stages: application limit stage, bandwidth limit stage, buffer limit stage.

2. Description of curve process:

App limit limit application At this stage is the application start sending data, the current network unobstructed RTT is basic to maintain constant and very small, the sending rate is inversely proportional to the RTT, sending rate is linear increase, therefore, can simply think this stage did not achieve effective bandwidth limit, RTT is almost fixed no significant growth.
Band limit Phase As the transmission rate increases, more and more packets on the network occupy the link Buffer. At this time, the RTT starts to increase. The transmission rate does not increase, and the effective bandwidth becomes a bottleneck.
This is the maximum detected bandwidth. This node is also the point at which the BDP+BufferSize is used for the lost-based control policy.

3. Some ideas

There are some information are mentioned in this picture, some of them explain also is not very clear, in combination with these data and their own understanding, the author thinks that area has not been used in the network link cache RTT for minimum delay MinRTT, in the network link MaxBW buffer had filled in the maximum bandwidth (link bandwidth + link cache), However, at this time, MaxBW and MinRTT are not optimal, but the water level is relatively high. According to the data, the gain of 2ln2 is calculated to be 3BDP. In the whole process, MinRTT and MaxBW are detected separately, because they cannot be measured at the same time.

3.2.4 Main process of BBR algorithm

BBR algorithm is similar to CUBIC algorithm. It also has several processes: StartUp, Drain, Probe_BW, Probe_RTT.

The difference is that the slow start of BBR uses the gain acceleration of 2ln2. Even if packet loss occurs in the process, the rate will not decrease. Instead, the bandwidth growth is judged according to the returned confirmation packet. When the bandwidth is no longer growing, the slow start is stopped and the next stage is moved on. Note that the search for the maximum bandwidth generates an extra 2BDP of data.

To handle Internet link bandwidths spanning 12 orders of magnitude, Startup implements a binary search for BtlBw by using a gain of 2/ln2 to double the sending rate while delivery rate is increasing. This discovers BtlBw in log2BDP RTTs but creates up to 2BDP excess queue in the process.

Drain emptying tapping stage Emptying phase is to put the slow start at the end of the redundant 2 BDP empty the amount of data, at this stage to send rate began to fall, is the unit of the decline in the number of packets sent time, until not confirm the number of packets < BDP think already emptying, can also be considered a RTT is no longer falling, emptying phase.
After slow start and empties in ProbeBW bandwidth detection phase, the sender enters a stable state to send data. Since the network bandwidth changes more frequently than RTT, ProbeBW phase is also the main phase of BBR. During the detection phase, increase the packet sending rate and continue to increase if packet ACK is not affected. When the bandwidth decreases, the packet sending rate decreases.
ProbeRTT Delay Probe phase The first three processes may enter the ProbeRTT phase when running. When the minimum delay state is not updated within a certain set time, the number of packets sent will be reduced to try to detect a smaller MinRTT. After the probe is complete, the slow start or ProbeBW phase will be determined according to the latest data.

Let’s look at a schematic diagram of these four processes:

Curve description: These two coordinates give a comparison process of CUBIC and BBR for 10Mbps and 40msRTT networks. In the figure above, blue represents the receiver, red represents CUBIC, and green represents BBR. In the figure below, RTT fluctuations in the corresponding process of the figure above are shown, red represents CUBIC and green represents BBR.

Some effects of the TCP BBR algorithm

The BBR algorithm is not a silver bullet, but we can first look at some of the application effects of BBR algorithm driven by Google, including the impact of throughput, RTT, and packet loss rate:

From the figure, we can see that after the application of BBR algorithm IN YouTube, the throughput generally increased by about 4%, especially IN Japan, the increase reached 14%, the RTT decreased by 33% on average, and IN(guess is India) reached more than 50%. IN the packet loss rate test, BBR is not as sensitive as CUBIC. It is not until packet loss rate reaches 5% that throughput begins to decrease significantly.

0 x05. Summary

This paper first reviews the traditional congestion control algorithms represented by CUBIC, and then develops some introduction to BBR algorithm.

There are many articles about BBR on the Internet, and the author has tried to combine many articles and foreign language materials to understand and summarize. However, due to the author’s work experience and level, there may be some problems in the above words, for which I apologize. Besides, many details have not been elaborated, so it can only be regarded as a brief talk.

0x06. Shoulders of giants

TCP congestion control and BBR principle analysis – Cloud + community – Tencent Cloud

www.cnblogs.com/codingMozar…

CSDN- Professional IT Technology community – Login

My.oschina.net/piorcn/blog…

Accedian.com/enterprises…

Why was TCP replaced by UDP

What advantages does the BBR algorithm in Linux Kernel 4.9 have over previous TCP congestion control?

Cloud.google.com/blog/produc…

Long fertilizer pipeline transmission pain and solutions – cloud + community – Tencent cloud

Queue.acm.org/detail.cfm?…

Introduction to TCP BBR – Nuggets

Netease Yunxin: BBR application in the field of real-time audio and video

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Interview hot | introduction to TCP/IP transport layer TCP BBR algorithm

0 x00. Preface

A Brief history of Congestion Control