One common question is: “Why do we need machine learning to improve streaming?”


This is a very important question, and in this article, Netflix describes some of the technical challenges facing video streaming and how they can be overcome through statistical modeling and machine learning techniques.

Netflix now has more than 117 million members worldwide. With more than half of its members living outside the US, providing a high-quality streaming experience for a global audience is a huge technical challenge. Much of this is the work required to install and maintain servers worldwide and the algorithms used to stream content from those servers to user devices. The one-size-fits-all solution for streaming video has become less and less ideal as Netflix has rapidly expanded to include viewers with different viewing behaviors, running on networks and devices with a variety of capabilities.

Here’s an example:

  • Viewing/browsing behavior on mobile devices is different from that on smart TVS
  • Cellular networks can be more volatile than fixed broadband networks
  • Networks in some markets may experience higher levels of congestion

Due to differences in hardware, different sets of devices have different Internet connectivity capabilities and fidelity.

Netflix needs to adapt its approach to these different, often fluctuating conditions in order to provide a high-quality experience for all members. At Netflix, real-time observations of network and device health and the user experience (such as video quality) that can be provided for each session enable Netflix to leverage statistical modeling and machine learning in this area. Here are some of the technical challenges we face with devices.

Network quality characteristics and prediction

Network quality is harder to predict. While the average bandwidth and return time supported by the network are well-known indicators of network quality, other features, such as stability and predictability, vary widely in terms of video streaming. Richer feature analysis of network quality will help to analyze the network (for positioning/analyzing product improvements), determine initial video quality/or adjust video quality throughout playback (see more below).

Here are some examples of network throughput measured during a real viewing session. You can see that they are very noisy and fluctuate over a wide range. In the last 15 minutes of data, you can predict what throughput will look like in the next 15 minutes. How do we integrate long-term historical information about networks and devices? What kind of data can we provide from the server so that the device can adjust in the best way? Even if we can’t predict exactly when a network loss will occur (there are many unexpected cases), can we at least specifically analyze the distribution of throughput that we want to see given historical data?

Because Netflix is observing this data on a large scale, it is possible to build more sophisticated models that combine time pattern recognition with various contextual indicators to more accurately predict network quality.

One of the most important criteria for judging the usefulness of a web prediction APP is its ability to adjust video quality during playback, as described below.

Adaptive video quality during playback

Movies and TV shows are often encoded in different video qualities to support different network and device functions. The adaptive streaming media algorithm adjusts the quality of the streaming video in the whole playback process according to the current network and device conditions. The following figure illustrates the video quality adaptive Settings. Can we use data to determine video quality that optimizes the quality of the experience?

The quality of the user experience can be measured in a number of ways, including the initial time spent waiting for the video to play, the overall video quality of the user experience, the number of times playback is paused to load more video into the buffer (the “rebuffer”), and the amount of perceived quality fluctuation during playback.

Above is an illustration of the video quality adaptation problem. The video is encoded in different qualities (in this case there are 3 qualities: green high, yellow medium, and red low). Each quality version of the video is divided into blocks of fixed duration (gray boxes). Decide which quality to choose for each block downloaded.

These metrics can be tradeoffs: we can choose to be proactive and deliver high-quality video, but increase the risk of rebuffer. Or we can choose to download more videos up front and reduce rebuffer risk at the expense of increased wait time. The feedback signal of the user’s decision is usually delayed and less. For example, an aggressive switch to higher quality might not have an immediate impact, but could gradually deplete the buffer and eventually lead to rebuffer events in some cases. Such “credit allocation” problems are well-known challenges when learning optimal control algorithms, and machine learning techniques have great potential to solve these problems.

Predictive cache

Another way statistical models can improve the streaming experience is to predict what the user will play, so that all or part of the content is cached on the device before the user clicks play, enabling videos to launch faster or at a higher quality. For example, users who have been watching a particular episode are more likely to play the next episode they haven’t watched. Through their at all aspects of the history and recent user interaction in combination with other context variables can be formulated as supervised learning model, through the learning samples, we hope that the maximum likelihood possibility user cache and he may end up at the end of which content nodes to watch, and at the same time pay attention to the cache and bandwidth resource constraints. Netflix has seen a significant reduction in the amount of time users spend waiting for video to start after using the predictive caching model.

Device anomaly Detection

Netflix runs on more than a thousand different types of devices, from laptops to tablets, smart TVS to phones. New devices are constantly entering the ecosystem, and existing devices often update their firmware or interact with changes in the Netflix application. These usually have no barriers but can easily cause user experience issues – for example, the application will not start properly, or the quality of the video played will be degraded. In addition, over time, the quality of the equipment will gradually increase. For example, continuous UI revisions can progressively degrade performance on a particular device.

Detecting these changes is a challenging and labor-intensive task. Alerting frameworks can help us catch or discover potential problems, but in general these potential problems are not defined as a specific problem that needs to be solved. A “liberal” trigger can trigger many false positives, leading the team to do a lot of unnecessary manual investigation, but a very strict trigger can miss the real problem. But we can actually tease out the history of the alarms that triggered them, and the problems that caused them. We can then use it to train a model that can be used to predict the likelihood of causing an accident under certain measurement conditions.

Even when we are sure that what we are observing must be a bug, it is often difficult to determine the root cause. Is this incident due to network quality fluctuations at specific ISPs or in specific regions? Is it internal A B test? Is it a firmware update released by the device manufacturer? If we can do statistical modeling we can also control for various covariates to help us determine the root cause.

From Netflix’s successful practice, we have seen a significant reduction in overall alerts by using predictive modeling for device anomaly detection, while maintaining an acceptably low error rate and greatly improving the efficiency of our team.

Statistical modeling and machine learning methods can greatly improve the current state of the art, but there are still many difficulties to overcome:

  • Huge amount of data (over 117 million members worldwide)
  • The data is highly dimensional, and it is difficult to hand-craft a minimal set of information variables for a particular problem
  • There is an abundance of structure in the data because of the complexity of the product itself (such as preferences, hardware level of the device)

Addressing these issues will be central to Netflix’s strategy to deliver video on increasingly diverse networks and devices.


This article is translated from Netflix Techblog. If you want to get more product and technical products, remember to pay attention to netease Yunxin blog.