By Han Xinzi @Showmeai, August @iQiyi

Address: www.showmeai.tech/article-det…

Statement: All rights reserved, please contact the platform and the author and indicate the source

Short video is one of the most popular businesses on the Internet at present, which gathers a huge amount of Internet user traffic. It is also a business field that major companies compete to develop. As the main revenue business direction, the recommendation algorithm of short video direction also changes with each passing day and drives the business growth. In this episode, we see iQiyi’s short-sighted frequent channel, recommendation multi-task algorithm application practice path and implementation scheme.

Read the full text in one picture

The implementation code

To obtain the code for multi-objective Model method implementation, go to GitHub project github.com/ShowMeAI-Hu…

Paper Download & Data set download

For some papers and “wechat data set” involved in this article, please reply to the keyword “iQiyi multi-task” in the background of the public account (AI Algorithm Research Institute).

If you are interested in the application of “multi-objective learning”, please pay attention to our official account (AI Algorithm Research Institute) and check out more big factory landing schemes!

The multi-objective optimization and application (including code) www.showmeai.tech/article-det…
The parallel twin towers CTR structure of information flow to recommend sorting www.showmeai.tech/article-det…

Let’s take a look at how the short video service of the head Internet company is implemented in multi-task optimization. Here we see the short video recommendation service from IQiyi.

I. Short video recommendation service

1.1 Service Introduction

In the short video recommendation business of IQiyi, the main flow forms are composed of two parts: the video at the bottom Tab of iQiyi App, the hot module at the top of the navigation, and the short video stream recommendation page on the home page of iQiyi App.

1.2 User feedback

In iQiyi App, there are two types of user behaviors in the Feed stream page:

Display feedback: Click play, click up main profile picture, follow, click/publish comments, favorites, click circle, share and other positive interactive behaviors; click dislike, report and other negative behaviors.
Implicit feedback: playing time, completion rate, user’s quick action, etc.

1.3 Service Optimization Objectives

At first, click + duration is the ranking target. Later, the business development needs to take into account the ecological benefits of strong interactive behaviors such as user comments and likes, and reduce the recommendation of negative content such as short stops.

1.4 Multi-objective optimization results

Iqiyi’s attempts and practical iterations in the direction of multi-objective modeling of recommendation system have achieved positive benefits from short video recommendation business, with per capita playing time increasing by 7%+ and interaction increasing by 20%+.

2. [Implementation method 1] CTR prediction model integrates weight

2.1 YouTube’s weighting strategy

2.1.1 Solution Introduction

Processing method: YouTube’s weighting strategy is a common processing method in CTR estimation of video recommendation. Specifically, the playing time of positive samples is taken as the sample weight, and weighted processing is carried out during training classifier.

2.1.2 Disadvantages of the mode

This processing method will make the model give high weight to long videos and recommend long videos first. However, in combination with the business, no matter which business index (playing time, completion rate) is used as the sample weight when applying the method, the model will be biased towards the video length, which is not what we want.

2.2 Weight of fusion duration

Iqiyi proposed the modeling method of integrating duration weight, and the online revenue increased by 3% per capita playing time and 0.2% for UCTR. The specific methods are as follows:

2.2.1 Weight calculation

First, a hypothesis is proposed in the business scenario: “The quality of the video is independent of the length of the video, and should be approximately evenly distributed in each interval of the length of the video”. In any interval, the mean of the sample weights is roughly the same. That is:

\frac{1}{\operatorname{count}\left(D_{i}\right)} * \sum_{d \in D_{i}} w(\text { playtime, } \text { duration })=C

Video duration and playtime are equally divided into buckets, as shown in the figure below:

Specific measures are as follows:

The playback samples within a time window are sorted by duration and distributed to 100 buckets to ensure the same number of videos watched in the same bucket.
Each duration bucket is evenly dispersed to 100 buckets, sorted by playtime, and the weights are normalized to integers in the range [0,99].

After this processing, for any given sample, bucket coordinates can be determined according to (duration, playtime) to determine the weight.

2.2.2 Weighting of playback duration

Next, the higher sample weight of Playtime will be improved as a whole, hoping to optimize the indicator of playing time and control the tendency of the model to long videos. Boosting method is shown in the following formula:

boost\_sigmoid\left ( playtime \right ) =\frac{Am}{1+e^-\frac{playtime+offset}{slope} }+shift

AmAmAm is “upper bound”
Shiftshiftshift is “lower bound”
Offsetoffsetoffset is “time offset”
Slopeslopeslope is slope.

As you can see, the final booST_sigmoID increases as playTime increases, and offset, slope, and Shift can be adjusted for weighting.

2.2.3 Adjust the weight of video age and user habits

Use video age (user behavior time – video release time) to reduce the weight of the sample (the older the age, the lower the weight).
On the basis of ensuring the efficiency of task output, specific weight configurations are generated for users on different platforms to achieve periodic updates and timely fit the recent consumption habits of the whole user.

2.2.4 Advantages and disadvantages of the scheme

Advantages:

The statistical information of sample distribution is used to fit the recent consumption habits of users. The model is easy to adjust and can be quickly iterated online.

Disadvantages:

The training loss is influenced by the way of sample weight adjustment, and then there are different tendencies for different objectives in the optimization process. It is not explicit multi-objective modeling, and the information is not fully utilized and the benefits are limited.

3. [Implementation Method 2] Multi-model fusion

3.1 Solution Introduction

A more direct multi-model fusion method is to train a model for each target, and in the actual online use, the scores are estimated according to different models, combined with business indicators and target priorities, and then the scores are added/multiplied for fusion ranking. In the scene of iQiyi, the binary model of click and the regression model of viewing duration estimation were trained respectively. For the fusion parameters, the grid Search method is used to get the combination value offline.

3.2 Advantages and Disadvantages of the scheme

Advantages:

Single target training single model, simple optimization, easy to adjust to the “better” single model.

Disadvantages:

It is difficult to quantitatively estimate the importance of different goals and difficult to combine them.
Offline training of multiple models consumes a lot of time and computing resources. Online estimation requires multiple models, which may increase complexity and delay.
The data distribution changes with time, so it is necessary to update the model and combination parameters, and determine the update timing.
When a certain target data is sparse, the target training cannot combine other information to carry out effective training and iteration.

Multitask learning: Network design and tuning

Under the community construction and business trend of IQiyi, the feed stream recommendation effect needs to:

Ensure the improvement or stability of basic indicators such as user viewing duration, number of videos watched and click rate
Guide users to comment, like and other interactions

This is typical of multitasking multi-objective learning scene, with a different approach in the area of deep learning (see more “target optimization and application (including the code implementation)”) (www.showmeai.tech/article-det… (1) ESSM modeling (2) MMoE modeling After the latter method combined with Pareto optimization iteration, the business improvement effect of “interaction rate increased by 20% and per capita playback time increased by 1.4%” was achieved.

4.1 ESSM modeling

4.1.1 Solution Introduction

Ali proposed the method of ESMM[1] to model CTR and CVR. User behavior in the recommendation scene has a certain sequence dependence relationship, and the transformation behavior in the e-commerce scene occurs after the user clicks, which can be modeled based on the sequence dependence.

As shown in the figure above, in the feed stream scene of IQiyi, the user’s viewing time or interaction takes place after clicking, so “click&duration” or “click&interaction” can be taken as the direction of ESMM iteration. In the actual iQiyi scene, interaction is the main task, click as the auxiliary task, duration is the positive example weight of the two, and the loss of the two is directly added during offline training. The actual online experiment of IQiyi was flat and slightly positive. The two target estimates were changed and the Label formulation method with longer duration was tried again, but there was no significant improvement online.

4.1.2 Scheme summary

In the information flow scenario, the link between clicks and interactions is not as strong as the link between clicks and transitions in the e-mall scene. The link between clicks and duration and interactions is not suitable for ESMM application scenarios.

The interaction behavior is very sparse and the training effect is poor.
Loss of multiple targets is added directly, which makes it difficult to balance the influence on each target task and causes disturbance to model training.
Different targets may vary greatly, making it difficult to share the underlying representation directly.

4.2 MMoE+ Pareto optimization

The developer of IQiyi made a comparative analysis of the videos of “Top 100 watched hours” and “Top 100 comments”, and found that the degree of overlap was low and the ranking was very different.

Therefore, from a business perspective, both the “length” and “comments” is not strong, then consider Google MMoE [2] [3], considering the joint Loss needs a large number of super parameter adjustment, possible target one up one down phenomenon, so using the “pareto optimal” ensure that the original target effect is reduced, Improve the interaction effect.

4.2.1 Scheme Introduction

At the bottom of MMOE model, Soft Parameter sharing is adopted to effectively solve the multi-task learning in the case of poor correlation between two tasks.

Method and train of thought about MMoE explanation and example code, you can refer to our articles of the multi-objective optimization and applications (including code) www.showmeai.tech/article-det…

Ali proposed pareto optimization in multi-objective optimization in his paper[4] published by RecSys in 2019. Compared with manual adjustment combined with Loss, KKT condition was used in the paper to be responsible for the generation of weight of each objective.

The dashed line box shows the pareto optimization process:

“Updatable target weight value” and “weight boundary value overparameter” were uniformly set, and pe-LTR algorithm was used to train and update the weight value.
Different “weight boundary value overparameters” were adjusted to conduct multiple task training, and the model with the best effect was selected according to the importance of the target.

4.2.2 Scheme summary

The experience of iQiyi engineering application shows that the “weight boundary value” has a great influence on the model effect and needs to be optimized for several times. The multi-objective weight converges in the early stage and fluctuates slightly in the middle and late stage. Pareto optimization scheme is mainly applied to off-line training, but other strategies are needed for online service.

5. [Implementation 4] Multi-task learning: Fusion scheme

In addition to the network structure design and optimization mentioned above, IQiyi also optimized the multi-objective output combination in the model reasoning stage, and added “completion rate” and “duration target”.

During actual online Serving, the multi-objective coordination and compromise are achieved through the fusion of different prediction points, ensuring that the model has good effect on each sub-objective. Therefore, in the process of multi-objective modeling, combined Loss should be optimized first to ensure that the offline effects of each target are better. Then the sub-goals are fused and sorted to achieve the balance and overall improvement of multiple goals.

5.1 Multi-objective score multiplication fusion

In online reasoning, IQiyi uses the superparameter combination formula for fusion. The final multiplication fusion method brings about the improvement of the business effect of “CTR1.5% improvement, 1% improvement of the per capita playing time”.

Iqiyi initially adopted the method of “weighted sum”, since the score scale of each sub-target may be different, two superparameters α and β are added to adjust the flexibility of adaptation. The specific formula is as follows:

\text { score }=\sum_{i=1}^{n} \text { factor }\left(\alpha_{i}+score_{i}\right)^{\beta_{i}}

In the formula:

Alphaialpha_ {I}alphai: hyperparameter, sensitivity
Scoreiscore_ {I} SCOreI: Output from model I
Betaibeta_ {I}betai: hyperparameter, lifting ratio, nonlinear processing;
Factorfactorfactor: superparameter, combination weight;
NNN: Number of models.

The addition mode is suitable for the scenario where there are few service objectives and benefits can be obtained quickly in a short period. However, when the number of targets increases, the fusion sorting ability of addition will be gradually limited, which is embodied as follows:

For the new target, the addition fusion is limited by the scale of the new target and needs to be adjusted. In contrast, multiplicative fusion has certain objective independence.
When the number of objects increases, the influence of the importance of each subobject will weaken in the addition fusion. But multiplicative fusion is not affected by that.

On this basis, iQiyi adjusts the multi-objective fusion mode to multiplication, and the specific formula is as follows (the parameter meanings of the formula are the same as the above formula) :

\text { score }=\prod_{i=1}^{n} \text { factor }\left(\alpha_{i}+score_{i}\right)^{\beta_{i}}

5.2 More related business objective modeling

In order to improve the penetration of short videos and promote users’ deep consumption, iQiyi developers constructed three new goals and optimized them, achieving the business optimization effect of “UCTR1% increase, CTR3% increase, and 0.6% increase in per capita playing time”. The specific goals are set as follows:

By limiting the threshold of “completion rate”, the “dichotomous target” is constructed to satisfy the assumption of logistic regression approximately.
The “playing time” after fitting and smoothing was taken as the “regression target”;
Set the threshold of “play duration” to construct the “binary target” of effective play.

PNR (positive-negative-ration) is used to evaluate the ranking effect after multi-objective estimation fusion. In the end, it achieved a significant increase in the playback volume and per capita time in both instant home page feed stream and iQiyi immersive scene.

5.3 PSO evolutionary optimization algorithm

The overall steps of the multi-objective scoring fusion method mentioned above are as follows:

Off-line hyperparameter group with better off-line can be obtained by Grid Search
Online AB test to verify the actual effect

In this process, the iteration efficiency of the experiment is very low, and with the iteration of the model and the change of sample distribution, the optimal parameter set actually changes, and the stability is weak.

Therefore, iQiyi’s r&d students used the idea of multi-objective evolutionary Optimization algorithm for reference, and searched fusion parameters based on heuristic Particle Swarm Optimization (PSO) to approach the multi-objective Pareto frontier.

PSO algorithm initializes a group of random particles and iterates heuristically for many times to get the optimal solution. At each iteration, particles update their positions by individual extremums (the optimal solution the particle has traveled through) and population extremums (the optimal solution the population has found). Finally, all particles will give consideration to the individual historical optimum and the global optimum shared by the group until convergence.

Based on PSO algorithm, the parameters are initialized first. The final ranking score of each video was obtained by multiplication and fusion, and the AUC of each subobject was calculated. According to the importance of indicators, weights are determined for AUC and PNR of the classification of complete broadcast rate, and the overall optimization objectives are defined as follows:

O b j=w_{1} * AUC(\text {ctr})+w_{2} * AUC(\text {comment})+w_{3} * PNR(\text {playtime})+\cdots

Finally, the total evaluation objective Obj is maximized by continuous iteration, that is, the hyperparameters α and β of each subobjective are obtained. The following figure shows the convergence curve of total target score Obj with the number of iterative steps.

The PSO parameter search process can make the model and fusion parameters update synchronously, and greatly reduce the cost of manual parameter adjustment. Showmeai. tech/article-det… Particle Swarm Optimization…

Six, reference code implementation

Please go to GitHub github.com/ShowMeAI-Hu for some multi-objective optimization methods and their implementation on sample data (wechat multi-objective optimization data set). Take a look.

For the download of relevant data sets, please reply to “IQiyi Multi-task” in the public account (AI Algorithm Research Institute).

7. References

[1] Ma X, Zhao L, Huang G, et al. Entire space multi-task model: An effective approach for estimating post-click conversion rate[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1137-1140.

[2] Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1930-1939.

[3] Zhao Z, Hong L, Wei L, et al. Recommending what video to watch next: a multitask ranking system[C]//Proceedings of the 13th ACM Conference on Recommender Systems. 2019: 43-51.

[4] Lin X, Chen H, Pei C, et al. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation[C]//Proceedings of the 13th ACM Conference on Recommender Systems. 2019: 20-28.

Eight, resources download