With the popularization of optical fiber into the home and the continuous improvement of computer performance, the audience’s demand for live broadcasting is getting higher and higher. Although the commonly used streaming media protocol HLS has been widely used in PC and mobile phone terminal audio and video services, there are still some shortcomings in use. We invited Mr. Jiang Jun from Bilibili Bullet Video Network Live Technology Department to introduce the HLS-based live P2P and the challenges they met in the research and development process and the future plan.

The text/Jiang Jun

Organizing/LiveVideoStack

Hello everyone, I am Jiang Jun from Bilibili Bullet Video Network Live Technology Department. Today I will mainly introduce P2P based on HLS. HLS is an earlier technology, full name of which is HTTP Live Streaming. It literally means to broadcast Live using HTTP. In the process of development or network construction, HLS can make full use of the CDN service of previous static files, and the deployment process is relatively convenient. However, the disadvantage is that the delay is large and the first frame is slow to load. We have various ways to mitigate the problem when we engineer the deployment of services.

The first part introduces the content of HLS, and the second part introduces the P2P based on HLS. Because HLS is based on short file, file by file is transferred in slices, so it is relatively easy to develop HLS-based P2P.

Today’s sharing is divided into six aspects — introduction, HLS live streaming, optimization of HLS live streaming, P2P live streaming, challenges of P2P live streaming, decentralized collaboration.

# 01 introduction

First, the introduction.

At present, users have an increasingly strong demand for HD live broadcasting. The improvement of computer performance enables the previously slow videos to be played smoothly, and the content platform can provide higher definition videos to give full play to the hardware performance and get a better viewing experience. The improvement of live broadcast picture quality puts forward higher requirements for codec performance and broadband cost. The clearer the picture and sound, the more data needs to be transmitted. The cost of the current network has always been very high (in terms of bandwidth), and the cost of bandwidth is very high if you provide high quality content to every user.

Bilibili live broadcast now adopts the combination of HLS transmission and P2P, which improves the utilization efficiency of the server’s bandwidth (the original doubling the amount of bandwidth to double the amount of people to see, then now doubling the amount of bandwidth to double or even triple the amount of people to see, under the same pressure of bandwidth can serve more viewers).

Bilibili can now save 70% when the share rate is only 100%, which is based on similar performance regardless of the number of users.

# 02 HLS

First of all, I would like to introduce the first half – HLS.

2.1. What is HLS — HTTP Live Streaming

The full name of HLS is HTTP Live Streaming. There are many versions of HLS. What you’ll see a lot is the older version of HLS, which is an M3U8 file that contains a list of a lot of little TS files. There are a lot of things like M3U8, which are familiar to people who have been using computers since the previous era, when music players had file lists in the “.m3u “format. In fact, M3U8 is the same. As a playlist, it has more and more items in it as time goes by, because the total length of the content is increasing with the live broadcast. As new items are added, the old items are removed, just like the queue. The HLS playlist will keep a list of files with the content of the last short period of time, so the client will poll the M3U8 file for changes, and download a small shard to play.

Due to the limitation of the traditional MP4 file structure, all the indexes are put together and all the data are put together. When playing, both are indispensable, but you cannot get the complete index before getting the complete data. In this case, although traditional MP4 files can be broadcast, but can not achieve live broadcast. In order to live broadcast, MP4 files need to be segmented (0.5s or 1S as a group), and then be sent to the server on one side and the server to the user on the other side for live broadcast. HLSV7 is to separate the data blocks into small files and put them in the list, and then constantly update the playlist of M3U8.

The image on the left is a traditional MP4. Unlike segmented MP4, a traditional MP4 file is a large index with only one large block of data. A segmented MP4 file has an index and a corresponding data block for each small block. Such a segmented MP4 file and then cut into small files, with an M3U8 file can become a HLS stream.

2.2. Why use HLS

CDN Deployment Easier: CDN deployment for HLS is easier because it is based on file transfer rather than long stream transfer, so a traditional static file CDN can support HLS live with a few modifications to the cache configuration.

Dynamic switching of definition: because HLS is a short file without a long stream, because the long stream is more difficult to do the definition switch from a definition to another definition, we are difficult to know where the location is, this seek operation is more difficult to server-side development; But if they’re all small files, it’s easy to switch sharpness as long as the files are aligned in different sharpness.

Easy support for real-time lookback: With HLS, if the expiration date of the shard file is set long enough, the client can provide the old shard file to the user when he enters the studio later and wants to watch earlier content.

Native support for mobile Safari: Safari differs from other browsers in that it does not have a Media Source Extension component on mobile; This means that streams that are not directly supported by Safari’s video TAB will not play.

Native support for H.265, AV1 and other new codes: HLS relies on MP4 structure, can support H.265, AV1 native, rather than through some private hack protocol to do, currently the most used online FLV, because each new protocol is constantly hack to add, so the native support of various tool systems is relatively poor, for example, I want FLV to support H.265, Found that the existing software system does not support, it must be transformed one by one; If different vendors hack differently, these systems won’t be able to integrate with each other.

2.3. Optimization of HLS live broadcast

We mainly optimize from three aspects:

Use FMP4 instead of TS: Most browsers can’t support TS streaming, but they can support CMAF streaming. Therefore, according to the new HLS protocol, we changed the TS into the CMAF format in HLSV7. In this way, the data can be sent directly to the browser, and the browser can play the data directly, without requiring JavaScript to do large-scale processing in the middle. It can reduce the CPU consumption of the browser, and the computer is not easy for the user to get stuck.

Non-keyframe cutting: The keyframe is used to start broadcasting. After starting broadcasting, the remaining data can be sent to the browser without the boundary of the keyframe, so the cutting length can be relatively small (e.g. 0.5s or 1S slice), and the transmission delay can be reduced. The old HLS, because each shard had to start from the key frame, otherwise it would go black; If you can support non-keyframe cutting, the latency problem can be solved.

Multiple file merge: The shard file can be broadcast from bottom to bottom. In the left picture (page 7 of PPT) just now, you can see that the partition on the right of the picture is still init shard, 001, 002. In fact, 001 can be subdivided into smaller shards (001.2, 001.3), which can be transmitted as the same file. That is to say, there may be more than one shard in a file (a combination of Moof and MDAT). For such files, it can be played from bottom to bottom (for example, a file 1s can be divided into three 0.33s small shards, so the file can be sent directly to the browser as long as 1/3 of the file is downloaded, and the browser will start to play the screen). Can improve the user experience: because before the file is good, the screen can be all out, there will not be a user into the studio for a long time but did not load the screen.

# 3 P2P

One of our main reasons for doing HLS is actually to develop P2P. Because HLS small file sharding transmission mode is more suitable for developing P2P: small files are naturally cut sharding units, that is, data can be exchanged between users in files as a unit.

P2P is divided into three parts: live P2P, the challenges of live P2P, and the most critical decentralized collaboration.

3.1. Live P2P

Compared with traditional live streaming, P2P actually has a data exchange between users. The logic related to P2P has been existed for many years. Data transmission from BitTorrent to EMule is based on P2P. For example, the concept of Peer in the P2P network, which represents the Peer nodes of the entire P2P network, will transmit data to each other. The roles that provide and receive data are Seeder and Leecher, respectively, and they can both be counted as peers. In the P2P network we designed, Seeder and Leecher are hybrid, meaning that a user does both Seeder and Leecher, and then exchanges data with each other.

The share rate is the ratio of uplinks to downloads, a concept that was common in the past with BitTorrent software. BT software has a function that after the data download is completed, it will continue to do the seed for a period of time before it can be stopped. The stop condition is that the sharing rate reaches a value (for example, downloading 1G data, the sharing rate is set to 1.5, which means that the task can be stopped when the upstream volume reaches 1.5G data).

We are now using a mixture of CDN and P2P, so we have the concept of “saving rate”, literally the proportion of P2P downloads in the total download, for example, P2P accounts for 70% of the data volume, CDN accounts for 30%, then the saving rate is 70%.

The P2P download process is shown in the figure, starting from the download of each shard to the delivery of data to the player, which includes three nodes, some of which can be skipped. Download the seed data at the beginning, the seed data refers to the directly obtained from the CDN, then data available to other users, for example, I now have 4 users, for example, their small network of each other, each user can be downloaded from the server in a quarter part, when after users download the four data, server’s upside volume is actually only one times, A file spits 4 times 1/4 of each time, but the QPS is higher. After users get their data, they can have complete data after exchanging with each other. There is also an unexpected situation where one of the four users downloads the same shard as the others, resulting in missing data in the whole network. After the P2P data exchange is completed, the missing data will be supplemented from the server and finally delivered to the browser.

Because this P2P is based on pure HLS, no special adaptations are required for the server. The whole process is “download the seed data, exchange, download the missing data, deliver”. The standard HLS can fully meet the normal operation of this process. However, the download of a single segment file can be satisfied by the HTTP Range, so the CDN page of the file does not need to be specially reformed. Normally, it can be used to support the download of static files. If it can support Range, it can provide services for this P2P scheme.

No need to download the seed data. The example of four users exchanging data was given because there were only four users. But if there are six users, can appear such a situation: 6 users, some people don’t download the data from the CDN, just download from P2P networks, we now this protocol supports data from P2P networks and then go back to the P2P network, the whole process is passed the data to help others, but themselves also have this portion of the data at hand, such efficiency is higher.

Elastic length of P2P exchange data. P2P time is too short for sharing rates to increase because it takes time for users to exchange data. If the user’s player buffers more data, it can leave more time for P2P exchange. In this way, the time can be adjusted. Under the condition of ensuring the sharing rate and saving rate, the user experience is better and the delay will not become longer.

No vacancy is required when the data is complete. Nodes in the diagram that download missing data can be skipped when full data is retrieved.

3.2. Live P2P challenges

The challenges of P2P are threefold:

The real-time requirement is high. Because now do is live stream, not download after watching. BitTorrent users can hang files in the background and download them slowly for a day or two, then watch them when they are finished. Live streaming loses its meaning if it is downloaded and then watched. The high real-time requirement means that the download speed must be greater than the player consumption speed.

The broadcast room is in and out frequently. Users in and out of the broadcast room so frequently that the P2P network needs to be constantly adjusted. I just said that our process is different from what I’ve seen on the Internet. The process is to group different users into groups, and the users within the group connect with each other and exchange data according to the tasks scheduled by the server. Then, if a large number of users frequently enter and leave the broadcast room, the grouping changes dramatically, and the success rate of connection between users should be considered, scheduling is a great challenge, and it is not easy to reach a stable state, and the saving rate of P2P will be affected.

The network environment is complex. In the same example, there are four people connected to a small network, but all four people may not have the same upstream download speed and data transfer speed. Maybe A to B has A good network transmission, C to D has A bad network transmission, but B to C has A good network transmission, this is not necessarily true, or maybe A to B is bad, but B to A is fast. The network environment is complex, and the server assigns tasks, decides who should download what and transmits data to whom, so there will be a lot of state data of each user on the server side, and it is very difficult to schedule with such a large amount of data as the user changes in real time.

Our P2P protocol is based on the transmission of the WEBRTC DataChannel. The DataChannel is very flexible and does not require a video channel to transfer full video data. This means that each user can “feed” as long as they have a certain amount of uplink bandwidth beyond the set “limit”. Because of the RTC standard protocol, this scheme can run in the browser, but also on the mobile phone or computer native client. The advantage of this is that the situation of mobile phones is different from that of computers. If the mobile phone can only be connected with the mobile phone and the network stability of the mobile phone is usually poor, the saving rate of the mobile phone terminal will be relatively poor. If you can use the two networks together, allowing phones and computers to connect to each other, you can make the network more efficient.

P2P protocols are designed to be asynchronous, overlapping implementations. Much like HTTP/2 for example, the P2P protocol makes one request and sends the next without waiting for a reply.

Peer spontaneous query + download, preemption + retry implementation. “Sponsored inquiry” means that users do not need to know who has the data when they download the data, but broadcast the inquiry download progress to everyone, and then initiate the download after confirming that the other party has the desired data. This is not scheduled by the server, but by the users themselves. Preemption + retry means that when downloading data, suppose a file is divided into ten blocks, now to download the first block must send a request to download, then the first block is equivalent to a task. However, the user situation is changeable. After sending the request, the connection may be disconnected, so the first block fails to download at this time, and someone else has to try again. Preempting is when a broadcast query returns the first block to be downloaded by whoever returns the first block. Only in this way can high transmission efficiency be guaranteed. If I get the first block of data, then I can pass it on to somebody else, secondary pass, tertiary pass, increase utilization.

Uploads are evenly distributed. Now set the uplink limit for each user to be no greater than the download amount. For example, if one Megabyte of data is downloaded, the uplink limit cannot be greater than one Megabyte. If the uplink is too high, it will affect the normal access to the Internet, the user’s network card and the use of other software. In addition, if I am a user who sees the uneven upstream and downstream speed, less downloaded data and more uploaded data, I will feel unhappy and make a complaint, which is also a bad influence.

3.3. Distributed scheduling

Now the P2P task assignment does not need the central server to distribute the task. Although the user will connect to a server (because the entire connection process of WebRTC must involve the server), only after the server participates in the connection establishment phase, no tasks are issued. What happens after the connection is up to the user, not the server.

Transmit as you adjust. Going back to the example of a small network of four users who download four different parts from the server and exchange data, the transmission efficiency is highest. At the extreme, P2P is a server that can supply twice as much data to all the users in all the broadcast rooms. This, of course, is almost impossible. We need to use the tuning algorithm to make the server transfer as little as possible. In the case of P2P participation, the fewer duplicates in the data transmitted by the server, the less total data is. If duplicate data is transmitted, the server’s bandwidth utilization will be low. “Adjustment” means that four users can download different parts of the file, but because users are moving in and out, they may have new users coming in after they disconnect, and they don’t know what to do after they come in. We need an algorithm to help new users decide which part of the file to download.

The algorithm we use is the market adjustment algorithm.

This is a copy of the market supply and demand relationship. Divide each file into 4 pieces, and calculate the saving rate and sharing rate independently for each piece to calculate the supply and demand relationship. When the supply of data provider is limited, the user will find it very difficult to download data from the P2P network. At this time, I will think that this data block is in short supply in the market and should be filled according to the scheduling algorithm. For example, the user finds that a certain range of data cannot be obtained from the P2P network. Instead, it downloads data from servers and feeds it to P2P networks. For example, if many users download the first shard directly from the server, then it actually means that the sharing rate is very low: everyone requires the server to give the same data, and this data is better satisfied through P2P network because of the large number of people downloading it. When the supply exceeds the demand, some users will change from the internal algorithm of the SDK to the state that they do not download the data from the CDN, but obtain it from others.

Make real-time adjustment according to the supply and demand relationship, and finally converge to the supply and demand balance. There are just so many users download data to transfer to others, and the transmission of data is exactly what others need, there is no more download or lack of data.

Now run time on the net, the algorithm (graph display on the run to qualify the actual data of the measured effect) curve represents savings rate, can be seen close to 75%, as users, up and down, dotted volatility will not very big, the horizontal axis of a few sag is about 4 o ‘clock in the morning, then users too little is beyond the scope of consideration, We mainly improve the savings rate when the number of users is large. It can be found in the figure that when the number of users increases or decreases sharply, the saving rate remains stable. The sharp lines in the figure represent the bandwidth (which is also the changing trend of the number of people). As the bandwidth changes, the saving rate does not change too much.

It can be seen that the saving rate is not sensitive to the change of the number of people, because the internal convergence to equilibrium is through the market adjustment algorithm. Secondly, it can adapt to complex network environment. This set of supply and demand relationship can be divided into two situations when the user has data in hand: one is good uplink bandwidth and the data can be sent out; the other is bad uplink bandwidth and the data can’t be sent out. If it is adjusted by the server, the server should also consider the situation of user bandwidth, and the adjustment algorithm will be very complex. With adaptive adjustment, when a user has data in hand but cannot send it, others think it is in short supply.

3.4. Other relevant results

Other relevant results are twofold.

The first is the issue of real-time parameters. There are many small parameters in the market regulation algorithm for control. If the real-time adjustment results of parameters can be seen, the adjustment efficiency will be relatively high and the adjustment will be more convenient, and finally a set of optimal parameters will be obtained.

The second is QPS. As mentioned earlier, users use Range to request file downloads. Before we went online, we were most worried that the QPS of the server would be very high if we used Range to request the server, but now we find that the QPS of some P2P network transmission is almost the same as that of closing P2P, fluctuating back and forth between 90% and 110%. 90% means that the QPS is lower when P2P is open, while 110% means that P2P is open. QPS is about 10% higher. At present, the highest bandwidth online data can run up to 110%, usually fluctuate around 100%, early morning saving rate is relatively low while QPS is also relatively low. In other words, in this system, QPS can be considered to be almost equivalent to pure running HLS.

3.5. Future research direction

Now that the algorithm is online, there are many aspects that need to be optimized:

Shorten algorithm convergence time: internal test of algorithm convergence needs about 30s-60s to reach supply and demand balance. Although the time seems short, it is difficult for the network to reach the optimal situation under frequent user access, so the sharing rate has been fluctuating at 70%. In my opinion, there is room for optimization because there was a channel that broadcast the launch of the lunar exploration satellite. Different from the entertainment broadcast, the audience will stay in the broadcast room until the launch of the satellite is completed. That is to say, users will spend a long time in the broadcast room. The algorithm converged, and that time the share rate reached 80 percent. However, frequent user access is normal, and the convergence time of the algorithm still needs to be optimized to achieve better results.

Shorten the elastic duration of P2P links: Said just now, player of the buffer length can adjust the use time of P2P, although can improve the P2P sharing rate and saving rate, but the downside is that the longer the time consuming of P2P data exchange nodes, users see the delay of data, reducing the user’s experience, to reduce the delay to close without open P2P is the better.

Optimized data block routing: is the data block through what way and route from the server to the client. Now the entire data is transmitted through “preemption”, resulting in some users have been in a state of “hunger”, if the server can intervene a little bit, through the algorithm to pull the end of the user to the front, which will improve the network sharing rate.

Improve allocation efficiency and lower share rate limit: Although our share rate has been adjusted to no more uploads than downloads, it is still a bit unpleasant for users to find data running in the task manager or network monitor. Pressing the upstream rate can also improve the user experience. The experience doesn’t necessarily include just whether the web page is smooth and the business is available, but it may also include the resources that the user sees running the service on each monitor.

The above is what I share, thank you!

For more information, please scan the __ QR code in the picture or click __ to read the original __ to learn more about the conference.