Exploration and application of takeout package collocation

This is the third article in the takeaway food knowledge Graph series. From the technical level, we will introduce the technical solutions of takeaway meal collocation, including off-line and real-time iteration of meal collocation, meal quality evaluation scheme, and the business application of meal collocation.

1. The background

Meituan takeout has been making efforts to make users more convenient and quick to choose satisfactory takeout products. This paper mainly introduces the set meal collocation technology and application practice for gourmet business. In the selection process of selling out food, users will generally consider factors such as single product preference, combination and collocation, and the selection process of merchants and products takes a long time. Through the package matching technology, we can automatically match high-quality packages based on the candidate products of merchants, which can easily solve the “difficulty in choosing” of users and improve the decision-making efficiency of users.

Business goals and challenges

2.1 Business Objectives

At present, meituan takeout App has a lot of package collocation apps, including “Today’s Package recommendation”, “Full reduction Magic tool”, “Package collocation recommendation” and so on. Due to the current weak ability and willingness of takeout package vendors to match packages by themselves, the coverage rate of the underlying supply of takeout packages to business scenarios and businesses is low, which cannot meet the needs of recommended ordering applications related to packages. Therefore, the business goal of takeout package collocation is to match the candidate package combination for gourmet merchants and provide richer package supply for package-related applications.

As for package-related applications, we conducted business analysis: the collocation conditions of “Recommendation today”, “Full Submerge” and other businesses are relatively weak and can obtain the collocation conditions offline, which are classified as recommence-related businesses. Such businesses need to ensure that the coverage of packages of merchants is improved, so as to ensure the exposure of recommendations of merchants. The collocation conditions of details page, full reduction and purchase business are strong, and partial real-time. For example, the details page is for users to specify a dish for collocation, and the full minus plus purchase scene is for users to select a dish and a specific price range as conditions. These are tie-in services, which need to ensure the coverage of the package to the real-time scene, so as to ensure the exposure of the package with Tab. The objectives of package collocation algorithm are as follows: (1) To improve the coverage of package combination, so as to provide package combination with high scene coverage and sufficient diversity for downstream package related applications. ② Ensure the quality of the package.

2.2 Service Challenges

There are also many applications of commodity collocation in electric shopping malls, such as Shopping cart collocation, clothing collocation and cosmetics collocation on Taobao. Shopping cart collocation is based on the user’s shopping cart and the package recommendation of the purchased goods. For example, after the user adds a toothbrush, the user can give the recommendation of toothpaste. This kind of method mainly makes relevant recommendation based on the purchase behavior of goods, and the goal is not to form a complete collocation combination. However, the combination of takeaway food commodities needs to consider the rationality of the whole combination, rather than simply based on whether the commodities are related. For example, in a large number of orders, there are combinations of “small fried meat + tomato and egg soup + rice”, “fish fragrant shredded pork + tomato and egg soup + rice”, but “tomato and egg soup + rice” does not constitute a good set meal combination.

Clothing collocation and cosmetics collocation are recommended for combination-oriented collocation. Solutions to such collocation problems can be roughly divided into two categories. One is: The collocation mode is used to prune the process of selecting goods by model, and the collocation mode can be given manually or by model prior. Papers 4 and 5 in the references adopt this idea. The characteristic of this method is that the collocation effect is guaranteed by prune strategy and quality evaluation model. The other is to learn the collocation mode idea through end-to-end network parameters, which is adopted in paper 6 and our offline package collocation. The characteristic of this scheme is that the collocation effect is more dependent on the guarantee of end-to-end model, but the collocation model is more complex.

Compared with the commodity collocation of e-mall scene, food collocation faces unique business challenges:

The business scenarios and collocation conditions of package collocation are diverse, so the package collocation schemes need to meet the needs of various businesses and collocation conditions.
Gourmet goods are non-standard products, and different businesses sell different goods, resulting in different set meal collocation mode. For example, different vendors sell kung pao chicken in different amounts, taste, ingredients and prices, so there are different ways to match the dish.
Algorithm matching will inevitably produce low-quality matching results, and the non-standard property of goods makes it more difficult for us to measure the quality of food matching. Low-quality pairings may include: a. Pairings containing non-American food items that are not suitable for separate sales, such as gifts, POTS and utensils, and tableware. B. Collocation results do not conform to the conventional collocation mode, such as two drinks, drinks + steamed bread, etc.

To this end, our solution is:

In order to solve the problem of diverse business scenarios and matching conditions, we formed an algorithm matching framework combining offline and real-time. For recommency-related businesses, we use offline collocation method to pre-match package candidates, and then make personalized sorting in business scenarios. In line with the iterative thinking of rule to model, rule collocation relies on the representation of goods of knowledge graph, and produces relatively high-quality packages to ensure the coverage of head merchants through high-frequency aggregation + rule collocation generalization. Model collocation can guarantee the collocation quality and improve the scene coverage of packages through model generalization. For real-time matching services, the algorithm will match packages in real time according to the matching conditions of services to further improve the coverage of packages in various real-time scenarios.
In order to solve the problem of non-standard food products, we introduced the takeaway food atlas to depict dishes in various directions. Based on the takeout knowledge graph, we extracted abundant information representations of dishes, such as standard dishes, dishes category, taste, ingredients and methods, to reduce the influence of non-standard products.
In order to ensure the quality of package, we developed a package quality evaluation model.

In general, we have explored and iterated on commodity representation of non-standard products, merchant representation, package collocation model and package collocation quality assessment, forming the package collocation framework as shown in Figure 2 below.

3. Combo model

3.1 Package collocation model based on atlas label induction

One of the problems we are facing is that takeout products are non-standard products, and the food data quality is poor and the attributes are missing. Therefore, based on the menu, recipe, product description and other information sources, we constructed the knowledge graph with food as the core through information extraction, relationship identification, knowledge fusion and other methods, and established the multi-dimension representation of dishes such as category, taste, practice and efficacy.

Historically high sales packages of merchants can generally be considered as high-quality packages. However, the number of high sales packages of medium and low sales merchants is small, and it is difficult to support personalized recommendation and other applications of packages. Relying on the semantic expression of food atlas, we first tried the package collocation scheme based on the direct induction and deduction of knowledge atlas. For example, through high-frequency orders, it can be concluded that {hot dish}+{rice}+{soup} is a common way of set meal collocation, and then the merchants deduce the set meal collocation of “tomato scrambled egg + tomato egg soup + rice”.

The process of atlas induction and deduction is the process of high-frequency aggregation and generalization based on collocation templates. Through order aggregation, generalization of the same brand, the same label and the same dish template, high-quality set meal collocation is generated, and the coverage of the set meal merchants has been significantly improved. However, the problem with collocation templates is that it is difficult to achieve a compromise between collocation quality and generalization. Collocation templates with strong constraints can ensure collocation quality, but lack generalization ability and low coverage of packages. If a single or a small number of labels are used to describe collocation items, the pattern will be over-generalized and the accuracy cannot be guaranteed. To this end, we introduce a model-based package collocation method.

3.2 Package collocation model based on Encoder-Decoder

The user collocation package is also a process from information coding to information output: the user browses the merchant menu, which is the coding process, gets an overall overview of the merchant and commodity information, and then carries out the collocation of packages based on this overview. One idea that fits the process is to use Encoder-Decoder framework to build the model of package collocation. Encoder analogs the process of browsing the menu to learn the semantic information of the menu, and Decoder is responsible for pairing the package. Encoder-Decoder is a deep learning network framework, which has been widely used in text summarization, machine translation, conversation generation and other applications. Its modeling method is to learn the mapping from Encoder input data to Decoder output data through encoding (feature extraction) and decoding (target fitting). The common encoding methods include CNN, RNN, Transformer and other structures, and the decoding methods are similar.

3.2.1 Package collocation model based on LSTM

Package generation problem is to extract multiple product subsets from all candidate product sets of a merchant, and form a package that is convenient for users to select and order directly. The data source generated by package is mainly the candidate product information of the merchant (such as product name, label, price, sales volume, etc.), combined with the price range, the number of diners and other constraints, as well as user preferences and other information. At first, we used LSTM as the neural network of Encoder and Decoder for package collocation. We extract product semantic representation based on atlas semantics and input it into Encoder’S RNN model. The Encoder coding process is similar to the process of the user looking through the merchant candidate products. The Encoder side input the dish name, dish label and dish business attributes (price, sales volume, etc.), and extract the features of non-standard dishes through LSTM. As shown in Figure 4 below, the name of each commodity is extracted by Embedding layer, CNN+Pooling layer and segmented with the continuous features of dish label, category, price and sales volume, and finally used as the input of each step in Encoder RNN.

Decoder generally relies on a fixed dictionary or dictionary as a candidate set in the decoding process, and outputs the probability distribution of words and words selected in each step of the candidate set. For the combo Network, the Decoder Decoder candidate set comes from the commodity list in the merchant of Encoder’s input side, rather than the fixed dimension external dish word list. Pointer Network is an effective framework to model this problem. Pointer Network is an extension based on Seq2seq, which mainly solves the problem of unfixed candidate set. This model architecture has been successfully applied to the solutions of abstracts text summarization and combinatorial optimization problems such as traveling salesman problem and convex hull problem.

The specific process of combo decoding is that Decoder estimates the probability distribution of target dishes from dishes list at each step. At step n (n>=1), this probability distribution vector expresses the probability that a good or stop bit will be selected if n-1 good has already been selected. If the probability corresponding to termination bits is large, the model tends to form a complete package collocation with the previous N-1 selected goods. In the decoding process, we combine BeamSearch algorithm to generate TopN results to ensure the diversity of collocation.

3.2.2 Optimization of package collocation model

Learning objectives of the package collocation model

In order to solve the problem that dish collocation patterns vary from merchant to merchant, the model learns the matching characteristics of the merchant by fitting the historical orders of the merchant. A more mainstream form of training is based on the real orders of merchants, where the training is conducted in the form of Teacher Forcing, so that the dishes predicted by the model match the dishes in the real orders one by one. The Teacher Forcing training method makes the probability of predicting dishes tend to be 0-1 distribution, but the reality of dishes collocation is usually personalized and diverse. For example, based on Decoder’s output of “Kung Pao chicken” dishes, the next choice of staples is “rice” or “fried rice”.

To this end, we carry out statistics on the merchant’s historical single package collocation mode and calculate the probability distribution of commodity selection. Decoder takes the probability distribution of commodity selection as the training target, calculates MSE Loss with the estimated distribution, and minimizes the value to guide the training of the model. Another problem in Teacher Forcing is that it is difficult to introduce external knowledge such as collocation quality and click-buy behavior of packages to guide model training. Therefore, we try to improve model training with reinforcement learning. At the moment T in the decoding process, complete package candidates are obtained through Monte Carlo Sampling, and the matching quality score of package candidates is calculated as Reward. Model training is carried out in combination with MSE Loss and matching quality score.

Set meal collocation constraints

The package collocation process will face a variety of business constraints. For example, for “full reduction magic device”, the package collocation needs to meet the given full reduction price range. “Smart Assistant” meal collocation process needs to consider the user selected screening conditions, for example, the conditions may be “staple rice” and “price less than 30 yuan”. We through pruning strategy to ensure that the matching process to meet the constraints, to “full reduction artifact” price interval constraints for example, Decoder end in a single step to produce candidate dishes, based on the remaining price filter out beyond the remaining price range of dishes. As shown in Figure 6 below, for dishes A, B, C, D and E of merchants, Decoder will pry the next round of dishes A, B, C, D and E using the remaining price range “within 15 yuan”, and delete the C and D dishes that exceed the price range.

The model of meal collocation based on Attention network

The problems of dish feature extraction in businesses based on LSTM network are as follows: First, dishes in business menus are out of order, while RNN network relies on sequences for modeling. Second, there may be long-distance semantic dependence between dishes. For example, whether there are “rice”, “steamed bread” and other dishes in the menu will affect the collocation of “Kung pao chicken” dishes.

In order to better characterize the dependence information between disordered menus and dishes, we tried encoder-decoder model based on Attention structure. In the Encoder part, hierarchical Attention structure is used to extract semantic information of dishes, including Attention of the bottom level of individual dishes and Attention between dishes. For Attention of a single dish level, multi-head Attention structure is adopted in the word dimension to obtain semantic vector of dish name, and multi-head Attention is also adopted in dish label to obtain semantic vector of dish label. As for transaction attribute of dishes, We use multi-layer fully connected network to extract semantic vectors of transaction features.

Finally, the semantic vector of dish name, dish label and transaction feature is spliced and normalized by full connection layer + layer to obtain the semantic vector of dish. As for the Attention layer between dishes, we adopted multi-head Attention for the menu level semantic vector of the restaurant. The Decoder part of the model also uses multi-head Attention to decode. The input information includes user preference information, decoding input of historical time, price constraint and other contextual information. The model outputs the probability distribution of dishes being selected in the merchant menu at each step. In the process of Decoder, we carry out multi-head Attention on the user preference information and the semantic vector of business menu level, and consider the user’s dining preference in the course of meal collocation.

3.2.3 Analysis of package collocation model

We believe that merchants’ high-quality collocation can be reflected from the sales volume of orders. One evaluation method is the coverage of packages output by the evaluation model to the merchants’ real high-sales packages. Through offline and online evaluation, we found that the model can fit the high sales packages of merchants. In the part of manual evaluation, we mixed the packages matched by the algorithm and the real orders and asked the manual to distinguish them, and found that the manual could not distinguish the difference between the orders matched by the model and the real orders. At the same time, the model has good generalization ability and significantly improves the coverage of packages to merchants and specific business scenarios.

We analyzed the dish representation vector output by the model to understand the set meal collocation mode of the model. TSNE was used to reduce and cluster the vectors, and it was found that the dishes of “staple food”, “main course” and “snack” were clustered together by the observation of the cluster graph. It could be seen that the model recognized the semantic attributes of “staple food”, “dishes” and “snack”, and the set meal collocation was carried out according to the semantic attributes.

Staple food: “wonton” TOP N similar dishes	Dish category: TOP N similar dishes of “braised pork in soy sauce”
Chicken Soup Wonton 0.981	Cucumber and beef 0.975
Split hot and sour powder 0.979	Fresh mushroom beef 0.977
Pork wonton 0.975	Maojia braised pork 0.980
Beef noodle soup 0.975	Chinese cabbage sausage 0.973
Skin belly fat sausage noodles 0.974	Mix the small intestine 0.976
Fried seafood udon noodles 0.974	Bath chap 0.981
Green onion pot stickers 0.973	Braised small potatoes 0.975
Split rice flour 0.971	Mix beef 0.980

3.3 Real-time package matching model

The scheme of using off-line collocation to generate package candidates can meet the needs of recommended businesses, but it still has insufficient coverage for some tie-in business scenarios. For example, the coverage of dishes by off-line collocation is low at present, that is, only part of PV collocation modules are exposed for dishes details page and other applications.

One solution is to improve the coverage of food products through offline collocation, but the storage cost of this solution is high, so we adopt real-time collocation scheme. The difficulty of real-time generation scheme is to ensure the quality of the package, meet various collocation conditions, and the most important is to ensure real-time. Initially, we applied the offline collocation model to online real-time collocation and found that there was a bottleneck in performance. Therefore, we simplify the offline model. The idea of streamlining is to simplify the process of selecting dishes into the process of selecting dishes, and simplify the matching relationship of dishes into the matching relationship of dishes, so as to realize the reduction of the whole solution space. As shown in Figure 8 below, the specific process is as follows:

Collocation template mining: mining the collocation relationship of the category level of high sales through the historical orders of merchants, that is, collocation template, such as “hot dishes + staple food”.
Search pruning: When selecting dishes, select dishes according to the dishes category in the collocation template. For example, in the above example, first choose “hot dishes” and then choose “main dishes”. In the selection process, the whole selection process is pruned according to the real-time needs of users, such as specifying mandatory dishes, specifying prices, specifying staple food types and other constraints.
Screening evaluation: After the completion of collocation, the quality of the candidate collocation results obtained is evaluated. Based on performance considerations, the tree model is used for quality evaluation to screen out Top N collocation results.

4. Package quality assessment

Orders with high sales volume also have packages with lower matching quality. Coupled with the precision problem of model generalization, the matching model is easy to generate matching combinations with lower quality. On the right side of Figure 9 below, the last two packages generated by the model are not particularly reasonable. In order to further ensure user experience, we have established a package collocation quality model to evaluate the package quality uniformly. The package quality classification model transforms the package quality into a classification problem. Because package combination is composed of multiple items, so based on our food information such as name, label, in food, and then through the Global – Attention to realize the importance between dishes, and add the total number of goods, the total number of Global features to represent the integral collocation information, model structure specific see chart 9:

We have a fine-grained grading of the quality of the package: The corresponding model has four output values, each of which represents the probability of this bit being 1. For example, “very bad” represents “1,0,0,0”, “poor” represents “1,1,0,0”, and “medium” represents “1,1,1, 1,0”. “Good” means “1,1,1,1”. The Pair Hinge Loss function is adopted for the Loss of the model to avoid the situation that the front node is 0 and the back node is 1 to ensure the accuracy of the model. The collocation quality score of the package is the sum average of the four output nodes, making the predicted value more reliable. The classification models with the same model structure are basically the same, and the objective function is as follows:

In the construction process of package quality model, negative samples are mainly from Bad cases feedbacks from users and packages screened out by artificially constructing unreasonable collocation mode version. The problems of this method are: the collocation negative samples of Bad Case and artificial construction are biased and poor in diversity, and the ratio of negative samples to positive samples is not easy to adjust.

To this end, we introduce a pre-training task to learn the collocation mode of historical orders, and introduce more prior collocation knowledge for the package collocation quality model. The pre-training process is shown in Figure 11 below. We randomly Mask a dish in a single collocation combination, and then train Transformer model to restore the dishes that are masked out. In the process, consider the reasonableness of some suboptimal packages (e.g. “Kung pao chicken, rice and cola,” Mask “kung pao chicken”, generator was “braised pork”, “braised pork, rice and cola” can be understood as a suboptimal package), we loss function is added in the final to predict food dishes category similarity with the target of discrimination to solve this kind of situation. The parameters obtained by pre-training are used to initialize the package collocation quality classification model, and the model is optimized based on a small amount of manual annotated corpus.

5. Application and future outlook of package tie-in

At present, takeout has created a number of products with set meals as the core supply. “Today’s set meal Recommendation” helps users solve the problems of not knowing what to eat and choosing slowly. The “full reduction artifact” and “single product collocation recommendation” in the store page solve the problems of users’ difficulty in order collection and collocation. In order to solve the problems of package collocation in various business scenarios, the package collocation algorithm has been continuously optimized for coverage, quality and diversity, providing important technical and data support for the business. Offline package collocation is used for “full reduction magic ware”, “today’s package recommendation” and other businesses, which significantly improves the coverage rate of package merchants. Real-time package collocation is used for “menu details page package collocation” and other businesses, and has achieved good business benefits.

In the follow-up work, on the one hand, we will continue to optimize the construction of dish knowledge map, improve the depiction of non-standard dishes, further improve the accuracy and coverage of data by introducing multi-modal data such as images, and better depict user demand and supply by constructing scene knowledge map. On the other hand, we will explore scenario-based package collocation: at present, we have little work on scenario-based package collocation, and users will have different package demands in different scenarios, for example, they prefer hot pot set meals in cold weather, porridge set meals in Laba Festival, and they want to eat local characteristics set meals in different places. Next, we will explore scenario-based meal collocation to better meet users’ personalized and scenario-based dining needs by collocation meals for solar terms, festivals, crowds and other scenes.

6. References

Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. “Pointer networks.” Advances in neural information processing systems. 2015.
See, Abigail, Peter J. Liu, and Christopher D. Manning. “Get to the point: Summarization with pointer-generator networks.” arXiv preprint arXiv:1704.04368 (2017).
Gong, Jingjing, et al. “End-to-end neural sentence ordering using pointer network.” arXiv preprint arXiv:1611.04953 (2016).
Han, Xintong, et al. “Learning fashion compatibility with bidirectional lstms.” Proceedings of the 25th ACM international conference on Multimedia. 2017.
Alashkar, Taleb, et al. “Examples-Rules Guided Deep Neural Network for Makeup Recommendation.” AAAI. 2017.
Chen, Wen, et al. “Pog: Personalized outfit generation for fashion recommendation at alibaba ifashion.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
Rush, Alexander M., Sumit Chopra, and Jason Weston. “A neural attention model for abstractive sentence summarization.” arXiv preprint arXiv:1509.00685 (2015).
Paulus, Romain, Caiming Xiong, and Richard Socher. “A deep reinforced model for abstractive summarization.” arXiv preprint arXiv:1705.04304 (2017).
See, Abigail, Peter J. Liu, and Christopher D. Manning. “Get to the point: Summarization with pointer-generator networks.” arXiv preprint arXiv:1704.04368 (2017).

7. Author profile

Rui Yu, Wen Bin, Yang Lin, MAO Di, are from the United States group takeout technology team.

Read more technical articles from meituan’s technical team

| in the public bar menu dialog reply goodies for [2020], [2019] special purchases, goodies for [2018], [2017] special purchases such as keywords, to view Meituan technology team calendar year essay collection.

| this paper Meituan produced by the technical team, the copyright ownership Meituan. You are welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication. Please mark “Content reprinted from Meituan Technical team”. This article shall not be reproduced or used commercially without permission. For any commercial activity, please send an email to [email protected] for authorization.