One, foreword

Network optimization to solve the core problems of three, the first is the security problem, we in the series of “a” DNS optimization was explained in detail. The second is the speed problem, we in the series of “two” connection optimization also made a detailed introduction. The third is the weak network problem, which is the most complex network optimization and needs repeated verification and analysis of the problem, our series “three” weak network optimization is to deeply explore this problem.

Second, the background

Weak network optimization needs to solve two core problems

[1] As the mobile network environment is so complex, how can we determine the current weak network environment?

[2] In the weak network environment, how can we improve the success rate of the weak network, reduce the delay of the weak network, and then improve the network experience of users?

Baidu App carries hundreds of millions of traffic, and the proportion of weak network is 0.95%, which is not small. How is this proportion obtained? Still want to speak from what is judge weak net index.

Third, to judge the indicators of weak networks

First of all, we will discuss which indicators will affect the quality of the network, including HTTPRTT, TCPRTT, throughput, Signal strength, bandwidth-delay Product.

1.httprtt

HTTPRTT (HTTP round-trip Time), also known as Time to First Byte (TTFB), refers to the Time difference between sending the first byte requested by the client and receiving the first byte of the HTTP header. If the Duration of HTTPRTT is too long, the access network quality of the client and the service delay are high.

2.tcprtt

TCPRTT (TCP round-trip Time) indicates the Time difference between the first byte sent and the first byte received on the CLIENT TCP channel. Because HTTP protocol is based on TCP, the time of HTTPRTT includes the time of TCPRTT on the premise of reusing the same TCP connection. In most cases HTTPRTT already explains the cause of the problem.

3.throughput

Throughput. It is used to measure the number of successfully transmitted data per unit time. It can be an objective indicator to measure the network quality. Throughput = (bits end size – bits start size)/(bits end time – bits start time). Note that the POSIX socket’s read function returns bytes, so it is multiplied by 8 to get bits. Typically, when HTTPRTT is small and the network is still slow, you can use throughput to determine the quality of the network.

4.signal strength

SignalStrength, which refers to wireless signal strength, can be obtained with PhoneStateListener’s onSignalStrengthsChanged method on Android, but note that it only takes effect with Android M or higher. There is currently no reliable implementation on iOS.

5.bandwidth-delay product

Bandwidth-delay product (Throughput) refers to the throughput of a data link multiplied by the round-trip communication delay (RTT). The result of the bandwidth delay product is bits, not bits, which reflect the maximum capacity of the current network pipe. TCP has a concept of window size, which will limit the size of data sent and received, so the adjustment of TCP window size is directly affected by the bandwidth delay product, according to the value of the bandwidth delay product to set the socket setsockopt method, The options set are SO_RCVBUF (receive buffer size) and SO_SNDBUF (send buffer size).

Through the above content, we affect the network quality indicators have a certain understanding, for different products, the indicators can be interpreted as affecting the quality of the network, but for each index of the threshold value must be different, because it contains business scenarios, such as the trill is network transmission, video class WeChat is long connection data transmission, baidu is a text image data transmission. It also includes server configuration. Different product lines have different service cluster capabilities. For example, it takes different time for the server to return to the client. Therefore, the weak network index is basically the same for different products, but the value of the index is certainly not the same.

Four, how to establish weak network standards

Establishing weak network standards is a gradual process, when we have no money how should we establish this standard? There are three stages to the answer.



Steps to establish weaknet standards

1. Stage 1: Offline testing. For example, apple’s Network Link parent and Facebook’s ATC (Augmented Traffic Control), we need to be able to get threshold values for different offline networks. Generally, we will test App cold startup scenarios, network switching scenarios, DNS failure scenarios, and weak network scenarios (usually by configuring the bandwidth, packet loss rate, delay, DNS delay parameters, or more simply by using some of the default weak network configurations of the tool).

2. The second stage is online verification. Through offline test access to fully the threshold, the proportion of online access to the weak network, baidu App is here for specific scenarios, such as Feed refresh, search landing page open, even has been generally accepted in the era of mobile Internet experience good WeChat, only on the signal transmission (send and receive messages) perfectly optimization, Therefore, it is important to collect weak network data for scenarios.

3. Stage THREE: online trial and error. In order to achieve the ideal weak network effect, it is necessary to adjust the threshold repeatedly online. By adjusting the threshold value, we can compare the success rate, time consumption, connection reuse rate and other indicators of network request for the scene, so as to obtain a reasonable threshold value for the scene.

Talking so much, so how to achieve the detection of weak network?

Fifth, the overall architecture and implementation of network detection

Network detection is the basis of weak network detection. Whether the network quality can be detected immediately and correctly is the first problem we need to solve. We divide network detection into two parts, active network detection and passive network acquisition.

1. Active network detection

The so-called active detection is that after triggering certain conditions, the active network detection, and according to certain conditions to check whether the weak network state. Baidu App has developed an active detection component, as shown in the figure below.



Active network detection

1.1 strategy layer

The detection strategy layer greatly improves the instantaneity and accuracy of active detection through the combination of multiple strategies. We explain the significance of the detection dimension by combining the strategy layer diagram above.

We trigger the logic of weak network detection when the network request succeeds and fails respectively. There are three kinds of logic.

1) When successful, how to judge the weak network state? Check the threshold value of WeakHttprTT, which depends on the setting of business (generally, this value will be 95 bits or larger for the request of special scenarios). If the value is greater than this, weaknet detection will be entered. In order to prevent frequent triggering of detection, time interval dimension is added. From the offline simulation test, as long as the value is greater than this threshold, the detection result must be weak network state.

2) How to judge the exit from the weak network state when it succeeds? Check the goodhTTPRTT threshold. This value depends on the service Settings (generally, this value is 95 qubits or more of the overall network). If it is less than this value, it is necessary to switch back to normal network state. From the offline simulation test, as long as it is less than this threshold, the detection result must be normal. If greater than or equal to the threshold value, also cannot prove that a certain is not a normal network, so also need a network detection, but because this is in the success callback, frequency will be very high, so we add the time interval of the limit of 30 seconds, also joined the limit, the number of consecutive successful % number threshold number (4) equal to zero. But that still seemed a little high frequency, so we introduced stepwise increments, which multiply geometrically by 60 seconds as the number of times increases.

3) In case of failure, how to judge the weak network state? Firstly, the number of consecutive failures will be judged, and the number of consecutive failures/number threshold (2 times) is equal to 1 and the number of consecutive failures % number threshold (2 times) is equal to 0. Compared with success, the number of failures is more stringent, and the loss performance of multi-trigger network detection is mainly considered.

After entering the weak network state, the ping and DNS query probes at the basic capability layer are triggered.

1.2 Basic ability layer

The detection basic capability layer mainly provides the means of weak network detection, namely DNS query and ping. Baidu App uses C++ to realize these two capabilities. Why do you choose these two options? As we described in Series 2, a network request is divided into four stages: DNS- TLS- TCP- data transfer. DNS query and ping are used to check the connectivity between DNS and TCP. DNS query Initiates a DNS query for baidu’s core domain name mbd.baidu.com. The DNS server to be queried is the system-configured DNS server. (iOS uses the res_ninit function to construct a __res_state structure. Android obtains net.dns1 and net.dns2 from systemProperty to obtain the DNS server configured by the system. The DNS query timeout period is 3s. The target IP address of the ping server is Mbd.baidu.com, and the ping operation is twice. The timeout period of each ping is 1s by default.

After the weak network status is determined, the result is provided to the interface layer.

1.3 the interface layer

The interface layer mainly provides actively detected network status, which currently includes GOOD, BAD, UNKNOWN and OFFLINE.

1) GOOD: The DNS query is successful and the ping is successful, that is, it is marked as GOOD.

2) BAD: Indicates that the ping fails once.

3) UNKNOWN: Initial state or UNKNOWN state.

4) OFFLINE: DNS server error (failed to obtain the DNS server address to be sent), gateway error (failed to read the contents of /proc/net/route file), DNS sending error (failed to send DNS data), ping read/write error (read/write error during ping), DNS receiving error (DNS data receiving error), PING address error (ping address is empty), DNS unknown domain name error (DNS does not query domain name error), ICMP initialization error (icmp initialization failure), DNS UDP error (UDP socket creation failure), That is, make it OFFLINE.

2. Passive network collection

The so-called passive collection is to record all details of each network request and report the original information according to certain conditions. The upper-layer determines whether the network is in weak state based on the conditions. Based on CRONet’s NQE (Network Quality Estimator), Baidu App has carried out secondary customized development.

Firstly, we will explain the data to be collected, including TCPRTT, HTTPRTT and Throughput, as shown in the figure below.



Passive acquisition data

1) TCPRTT, based on POSIX and Windows socket programming interface to obtain TCPRTT. The fetch time is when the connection is complete, the read is complete, and the write is complete.

2) HTTPRTT, based on HTTP protocol stack, by calculating the time difference between the beginning of receiving response data and the beginning of sending, to obtain HTTPRTT. Get time when the first packet is read.

3) Throughput. Bytes and time need to be obtained according to the above calculation formula. Bytes can be obtained based on POSIX and Windows Socket programming interface. Get time records bytes received when a read is complete and bytes sent when a write is complete. Time is captured in the throughput management module, as described below. This is when the request completes and when the request is destroyed.

The overall architecture diagram of passive network collection is shown below.



Passive network acquisition

1.1 ability to layer

As we have mentioned above, the capacity layer mainly collects data from three dimensions: TCPRTT, HTTPRTT and Throughput.

1.2 strategy layer

The passive collection policy layer combines multiple policies to reduce the reporting time of various collected data and reduce the impact on performance.

1) The socket management module is first responsible for obtaining the value of TCPRTT. How to obtain TCPRTT? Use getsockopt to get the TCPI_RTT value in the TCP_info structure. Secondly, TCPRTT is reported frequently, so the time interval of 1 second is limited.

2) The throughput management module is responsible for calculating the throughput. The formula described above is to obtain bytes from the network activity monitor module, but the throughput unit is bits, so multiply bytes by 8. Only GET requests are counted in the count, and at least five requests have been accumulated before the count can begin. Eliminate precision interference, such as localhost, host on private subnet, and special-purpose subnet host, refer to RFC1918.

3) Network quality management module, obtains TCPRTT from socket management module, obtains throughput from throughput management module, and obtains HTTPRTT when the first packet of HTTP protocol stack reading is completed. After obtaining these three values, we need to go through some policies to limit the frequency of reporting, the 10-second interval limit; The network type cannot be UNKNOWN (explained in part 3 of 1.3). The network cannot be switched frequently. RTT and throughput total size of 300 each.

1.3 the interface layer

The interface layer mainly provides passively collected network status, including GOOD, BAD, UNKNOWN, and OFFLINE.

1) GOOD: 3G and 4G in a broad sense, and any condition met is marked as GOOD state. By marking 3G and generalized 4G with thresholds, HTTPRTT is greater than or equal to 273ms and TCPRTT is greater than or equal to 204ms, namely marking 3G status. Anything less than that is labeled as broad 4G, which includes 4G, WiFi, and better quality access networks.

2) BAD: Slow 2G, 2G and HTTPRTT is greater than 1.31 seconds. If either condition is met, the state is marked as BAD. If HTTPRTT is greater than or equal to 2.01 seconds, and TCPRTT is greater than or equal to 1.87 seconds, it is marked as slow 2G. HTTPRTT is greater than or equal to 1.42 seconds, TCPRTT is greater than or equal to 1.28 seconds, marked as 2G. HTTPRTT > 1.31 seconds indicates the threshold for baidu App Feed refresh.

Note: The time value mentioned above is the internal mechanism of NQE, which is universal.

3) UNKNOWN: illegal HTTPRTT, TCPRTT, throughput, any condition that meets the mark as UNKNOWN state. What is illegal? A value of -1 is illegal, so what condition is marked -1? Firstly, it will be marked as -1 when initialization. Secondly, if no values of HTTPRTT, TCPRTT, or Throughput are obtained, the local default values will be used as the judgment standard, which is a fault tolerant processing.

4) OFFLINE: The Android platform obtains NetworkInfo from ConnectivityManager. The Android platform obtains NetworkInfo from NetworkInfo through isConnected. If the Android platform is OFFLINE, the Android platform obtains NetworkInfo from NetworkInfo through isConnected. If NetworkInfo is empty, it is in OFFLINE state.

Vi. How does Baidu App improve user experience under weak Network state



Baidu App in the weak network means


1. Best practice of QUIC in Baidu App weak network

QUIC (Quick UDP Internet Connections) is a new generation of Internet transport protocol, first originated from Google, its detailed content can be referred to [3], this chapter we do not do QUIC popular science introduction.

Ordinary network requests of Baidu App will switch to QUIC in weak network state. This chapter focuses on the problems encountered by Baidu App after ENABLING QUIC in weak network state. First, can the QUIC be rolled back in case of problems? Two is in the weak network how to let the flow as far as possible to go QUIC? For these two problems, our solution is QUIC upgrade-downgrading principle and QUIC preconnection.

1.1QUIC upgrade and downgrade principle

As shown in the QUIC section above, the upgrade and demotion of QUIC relies on HTTP Alternative Services, a protocol related to HTTP that is not specifically designed for QUIC. The HTTP protocol is mainly responsible for the replacement of the new service. For HTTP1.1, it is transmitted back through the HTTP response header, so it can only take effect on the second request, as shown in the following format.

Alt-Svc: quic=”alt.example.com:443″, quic=”:443″; ma=2592000

The preceding information indicates the switch to THE QUIC protocol, the domain name service and port are specified, and the validity time is specified in seconds.

Alt-Svc: clear

The above information indicates that the ALTER configuration is cleared

In the network library, there is a competition mechanism between THE ALTER connection and the original connection. If the ALTER information already exists, the ALTER connection will be sent first, while the original connection will be delayed. The default delay time is 300ms. The number of failures for this ALTER message is recorded and, based on the number of failures, an expiration time is calculated that increases exponentially with the number of failures up to 2 days. This ALTER message is cleared when the expiration time expires and when the ALTER connection succeeds in the QUIC handshake.

1.2 QUIC connection

The so-called QUIC preconnection is to establish the QUIC connection before entering the weak network state. The client sends a Client Hello to the server. The server responds with a Server Reject message, which contains the Server config. With server Config, the client can directly calculate the key and complete 0RTT. For details, please refer to [4].

According to the above principle, the probability of the client’s success in pulling the Server Config will directly affect the flow of QUIC under the weak network. Therefore, we will do a QUIC preconnection during App startup to pull down the Server Config. In this way, after entering the weak network, alter connection will compete with the original connection with a high probability. Then go to QUIC protocol.

2. Best practice of compound connection in Baidu App weak network

The specific principle of compound connection can be seen in the “Baidu App network depth optimization series” 2 “connection optimization” in the specific description, Baidu App currently under the weak network only for the picture network request to open compound connection, because the picture request either HTTPDNS result or localDNS result is multiple IP, This is a prerequisite for satisfying compound joins. In the weak network, the result of multiple IP is better than that of single IP. In addition, the proportion of weak network is relatively small, and the load of composite connection to the server will be less.

Vii. Overall architecture of Baidu App network



Baidu App network overall architecture

The overall architecture of Baidu App network takes the network facade as the middle layer, isolating the best practices of the upper layer and the basic network library of the bottom layer.

1.1 Best Practices

Part of the work of the client network library is on how to make best practices better. In audio and video, both AVPlayer for iOS and ijkPlayer for dual-end use the HTTPDNS component to take over the DNS module, not the entire network module. ReactNative’s network module RCTNetworking, photo library Fresco for Android and SDWebImage for iOS, WebView components Chromium for Android and WKWebView for iOS, as well as baidu App’s own business. Both are taken over directly through the interface layer of the network facade. For third-party services, considering the coupling relationship with the host, the interface of HttpURLConnection of Android and URLSession of iOS system standard is directly used.

1.2 Network facade

Network facade mainly includes interceptor module (providing the mechanism for customized network facade), concurrent queue module (providing high, medium, low and very high network request priority), network detection component (weak network active detection capability), network diagnosis module (including HTTPS, ping, DNS verification), HTTPDNS component (” Baidu App Network In-depth optimization series “DNS optimization” in detail), network monitoring module (client dovepoint mechanism, server routine and burst monitoring), HttpURLConnection encapsulation layer, URLSession encapsulation layer.

1.3 Basic network library

The basic network library consists of two parts, one is the unified network library based on cronet secondary customization, the other is the WebSocket basic library (JavaWebSocket for Android, SocketRocket for iOS). The unified network library contains the content of connection optimization (explained in detail in baidu App Network in-depth optimization series two connection optimization), and the content of weak network optimization (passive collection mentioned above). Inject the underlying protocol stack into HttpURLConnection (using URLStreamHandlerFactory) and URLSession (using the protocolClasses property of URLSessionConfiguration) via AOP. Both are URL Loading mechanisms provided by the system.

Eight, earnings

The benefits of weak network optimization are mainly from the above means of entering the weak network state, including opening QUIC, QUIC preconnection and opening composite connection.

1) After QUIC is enabled under weak network, the success rate of network connection increases by 0.01% and the average time consumption decreases by 23.5%.

2) After the QUIC preconnection is enabled under the weak network, the PV of THE QUIC protocol increases from 370,000 to 900,000.

3) After the composite connection is enabled in weak network, the time consumption is reduced by 2.5% in bad state and 7.7% in offline state.

Nine, epilogue

The content of series 1 to series 3 has been completed by today. I hope it can be helpful to your work and study. Thank you for your continuous attention and encouragement. Life, more than optimization, do technology we are serious.

X. Reference materials

1. https://chromium.googlesource.com/chromium/src/+/HEAD/docs/android_build_instructions.md

2. https://chromium.googlesource.com/chromium/src/+/HEAD/docs/ios/build_instructions.md

3. https://www.wolfcstech.com/2019/03/27/quic_2019_03_27/

4. https://www.wolfcstech.com/2017/03/09/QUIC%E5%8A%A0%E5%AF%86%E5%8D%8F%E8%AE%AE/

5. https://tools.ietf.org/html/rfc1918

6. https://github.com/Tencent/mars

More recommended

Baidu App network depth optimization series “A” DNS optimization

Baidu App network depth optimization series “two” connection optimization

Welcome to: