This article is originally published by AI Frontier. The original link is t.cn/RTGs81x


Netflix has created quite A stir by rolling out A staggered testing personalized recommendation algorithm that is 100 times faster than A/B testing. However, only a week later, this video website announced that they had realized the personalized processing of the video images based on the situational recommendation algorithm.


For years, the primary goal of Netflix’s personalized recommendation system was to recommend the right videos at the right time. With tens of thousands of movies on each category and billions of user accounts, finding the right video for each user is a top priority. But recommendation systems can do more than that. How do you get users interested in your recommended videos? How do you get an unfamiliar video to pique users’ interest? What kind of videos are worth watching? Answering these questions is crucial to helping users discover good content, especially for unfamiliar videos.

Illustrations, or images, used to describe video, are one way to easily solve this problem. If a caption has enough appeal to the user — a familiar actor, an adrenaline-pumping car chase, or the quintessential dramatic scene of a movie or TV show (a picture is worth a thousand words) — it will entice the user to click on the video. This is where Netflix differs from traditional media products: we have probably more than 100 million products, personalized recommendations and personalized visuals for each user.

(No photo on Netflix’s home page)

Earlier, we discussed how to match the best images for all members’ videos. Using the multi-arm slot machine algorithm, we can find the most suitable image for the video. In the case of Stranger Things, this film has the highest user view rate. However, given the wide range of tastes and preferences among users, wouldn’t it be better if we could figure out what each user preferred and show them what they were most interested in?

(Created for Stranger Things, different images cover different themes from the show)

Let’s look at some of the scenarios in which personalization is important. For example, each user has a different viewing history. Below are three videos that users have watched in the past, on the left, and popular movies that we recommend for members on the right.

We designed personalized illustrations for the movie “Good Will Hunting” based on each user’s preferences for different genres and themes. People who watch a lot of romantic movies might be interested in “Good Will Hunting” if their recommended images include information about Matt Damon and Minnie Driver, while those who watch a lot of comedies, We had a better chance of attracting Robin Williams, the famous comedian, by including him in our recommendations.

Also, how does personalization affect users who like different actors? In the case of ‘Pulp Fiction,’ a user who has watched a lot of Uma Thurman movies might respond more positively to an image containing Uma’s message. Similarly, Fans of John Travolta are more likely to be attracted to a movie because the image contains John.

Of course, not all graphic personalization scenarios are so straightforward. So instead of exhaustive rules, we rely on data to tell us what images to use. Overall, we can help improve each user’s experience by personalizing images.


Overcoming challenges

Netflix also uses algorithms to personalize the site to improve the membership experience, including home page list selection, list titles, images displayed, messages sent, and more. For us, every aspect of personalized processing is a unique challenge, personalized illustration is no exception. One of the challenges of image personalization is that there can only be one image for each location video. In contrast, a typical recommendation setting provides members with multiple choices, from which we can then learn their preferences. This means that image selection is a chicken-and-egg problem operating in a closed loop: members choose which videos to play based only on images. This leads to a question: when we personalize images, does it affect how members play (or don’t play) the video, and in which case users will still play (or don’t play) the video regardless of which image we play? Therefore, personalized image recommendation should be combined with traditional methods and algorithms to be effective. Of course, in order to learn how to personalize images properly, we need to collect a lot of data to find information that indicates which images are more appropriate for users.

Another challenge is to understand whether the effects of captions make videos less recognisable, making them visually harder to find. For example, a video that members were previously interested in but hadn’t noticed yet, or whether a caption change would change a user’s mind. If we find a better picture to present to our members and keep changing the picture, it will confuse our members. In addition, changing the image also raises attribution issues, since it is not clear which image sparked the member’s interest in the video.

The next step is to understand how images relate properly to other images selected for the same page or stage. Perhaps the bold close-up of the main character works well for the video illustration on the page, as it stands out in comparison to other works. However, if the entire page is filled with this type of illustration, it will be less effective. Therefore, looking at each image in isolation may not be enough, we need to think about how to use a variety of images throughout the page. The effect of the illustration may depend on factors other than the image (intro, trailer, etc.). Therefore, we should diversify our image selection so that each video can complement each other.

To achieve effective personalization, we also need to provide a good library of work for each video. This means we need multiple stock images that are attractive, informative, and video appropriate, but avoid the “headline bait” type of image. The video’s image set also needs to be diverse enough to cover a broad audience of potential viewers interested in different angles of content. After all, the amount of information a caption contains depends on the individual who sees it. Therefore, our illustrations need to highlight not only the different themes in the video, but also the different aesthetics.

Finally, there are the engineering challenges of large-scale personalized mapping. Because our membership experience is visual and contains a large number of images, the system at its peak processes over 20 million low-latency requests per second. The system must be powerful, because the user interface does not render the artwork correctly, and the user experience deteriorates significantly. Also, personalization algorithms need to respond quickly when a video is uploaded, which means learning personalization quickly on a cold start. Once started, the algorithm has to be constantly tweaked, as the effects of images can change over time, the life cycle of videos evolves, and members’ tastes change.


Landscape customization is recommended

Most of Netflix’s recommendation engine uses machine learning algorithms. First, we collect a batch of data about how members use the service, and then run a new machine learning algorithm on that batch of data. Next, we performed A/B test of the algorithm on an existing production system. By A/B testing on A random subset, we learn whether the new algorithm is better than the existing production system. Group A members represent the current product experience, while group B represents the product experience under the new algorithm. If the members in Group B are more engaged with Netflix, then we will roll out the new algorithm to the entire membership. Unfortunately, this batch approach also has its drawbacks: many members have not had a better user experience for a long time, as shown in the following figure:

To reduce this shortcoming, we abandoned batch machine learning in favor of online machine learning. Moreover, for the individuation of images, the online learning framework that we have used is situational. Moreover, the landscape is not trained by collecting entire batch of data and learning models until the end of A/B tests, but that the most appropriate personalized pictures can be quickly found for each member. In short, situational training is a class of online learning algorithms, which can balance the cost of training data required for learning unbiased models and the benefits of applying the learning models to each member. We use non-situational training to select non-personalized images, and find the best images which are not based on the situation. For personalized recommendations, each member represents a different situation because we expect different members to react differently to the images.

An important attribute of the landscape is that it is designed to minimize the imperfections. At a high level, we have obtained the training data of situational training by inentering controlled randomization in the prediction of the learning model. The complexity of the randomization schemes can range from a simple Epsilon-greedy formula with uniform randomness to a closed-loop scheme that adaptively changes the degree of randomization as the model is uncertain. We call this process data exploration. To do this, we need to record the randomization of each image selection. This logging allows us to correct biased selection tendencies and perform offline model evaluations in an unbiased manner described later.

Because we may not adopt the best images predicted by situational algorithms, data exploration may incur costs (or drawbacks). How does this randomness affect the member experience (and our metrics)? We have more than 100 million members, and typically, the pitfalls of exploration are very small, spread over a large membership base, with each member contributing a small amount of feedback to the record. This makes the exploration cost of each member negligible, which is also an important factor in selecting certain environments to improve the experience of members. Moreover, if the cost of exploration is very high, it is not appropriate to use situational training for randomization and data exploration. According to our online data exploration scheme, we get a training data set that records each (member, title, image) tuple, regardless of whether the video is played or not. In addition, we can control exploration so that image selection doesn’t change as often, which makes it much clearer how members are engaged with a particular image.


Model training

In online learning, we trained situational models to select the most appropriate pictures for each member according to the situation. Usually each video has a maximum of dozens of candidate images. In order to train the selection model, we rank each member’s images to simplify the problem. After simplification, we can still find members’ preferences for video images, because for each candidate image presented to the user, some of the images will elicit user participation, while others will not. We can model and predict these preferences, and the probability that members will enjoy high-quality engagement increases accordingly. These models can be either supervised learning, or Thompson Sampling situational environment, LinUCB, or Bayesian methods.


Underlying message

Moreover, in certain scenarios, the situations are usually represented as feature vectors provided by the model inputs. We can use a lot of information as characteristics, especially many attributes of members: the videos they play, the types of videos, members’ engagement with a particular video, nationality, language preference, device used, time of day, etc.

Another important consideration is that some images in the candidate pool are better than others. We looked at the overall take rates of all images in data exploration, which is the number of high quality plays divided by the number of impressions. In the past, when we did non-personalized image selection, we only decided the best image for users to batch select according to the difference between the overall conversion rate. However, in our new situational and individualized models, the overall transformation is still important, and the personalized recommendations will still coincide with the ranking of non-personalized images.


Image selection

Providing appropriate images for members is really a matter of selectivity in finding the best candidate image from the pool of available images that match the video. After the model has been trained above, we use it to sort the images of each situation and predict the probability that recommending images to members will trigger playback. We sort the candidate image set according to these probabilities and select the image with the highest probability.


evaluation

offline

Prior to online deployment, we can use an offline technology [1] called “replay” to evaluate situational segmentation algorithms. This approach allows us to answer counterfactual questions based on recorded exploration data (Figure 1). In other words, what happens offline in different contexts if we use different algorithms under the same conditions.

Figure 1: A simple example of calculating replay rates based on recorded data. Each member is assigned a random image (first line), and the system records the video impression and whether the user played the video (green circle) or not (red circle). The replay index of the new model is calculated by calculating the score of that subset by matching the portion of the random assignment and the overlap of the model assignment (the black square).

If we assume that the images provided were selected by the new algorithm, rather than the current one, the replay shows the member’s participation in the video. FIG. 2 shows how situational bandits, compared with randomly selected or non-situational bandits, improve the average participation rate of the users in the recordings.

Figure 2: Average image scores selected by different algorithms based on replay rates in image exploration data records (the higher the better). Random (green) indicates random selection of images, and the simple Bandit algorithm (yellow) selects the image with the highest score. The situational algorithms (blue and pink) select different images for different members of the field according to the situation.

Figure 3: Example of situational image selection based on user profile. Comedy refers to the profile of people who watch mostly comedies, while Romance refers to the profile of people who watch the most romantic movies. Moreover, the situational algorithm has recommended some images of the famous comedian Robin Williams, and some more romantic lovers kissing, for the members who prefer comedies.)

online

After testing various offline models, we found models that can improve the replay rate, and finally conducted A/B tests to compare the field of individualized and non-individualized environments. As expected, personalization played a significant role in improving core indicators. We also see a reasonable correlation between offline measurement replay rate and online model. The online results also found interesting results, for example, in videos that members had not previously participated in, personalized improvements were better. There’s a good reason for that, because we’d like the algorithm to work better with videos that users aren’t familiar with.


conclusion

Now, we’ve taken the first steps with personalized image recommendations and other services. This has improved the way users discover new content, and historically we’ve personalized not only what we recommend, but how we recommend it. However, there are many improvements and applications that could be expanded, including the development of algorithms using computer vision to personalize images and videos as quickly as possible. Another opportunity is to extend this personalization approach to other types of illustrations and other video descriptors we use, such as summaries, metadata, and trailers.

Article Source:

Medium.com/netflix-tec…

Authors: Ashok Chandrashekar, Fernando Amat, Justin Basilico and Tony Jebara

Follow our wechat account “AI Front “and reply to “AI” in the background to obtain the SERIES of “AI Front “PDF e-books