Pingshu: Meituan Machine Learning Practice

My new book Meituan Machine Learning Practice is very interesting after reading it. I would like to share part of the content with you.

The book is divided into six parts. The first part is general flow, which is about the general flow of machine learning in practice, and some of the lessons are interesting. The second part is about data mining, mainly including user portrait, POI entity link and comment mining; The third part is search and recommendation, as the name implies, is about the application of Meituan search and recommendation architecture; The fourth part calculates advertisement; Part five is about deep learning; Part six deals with algorithmic engineering.

Part 1 and Part 5 are general processes, and there are some practical lessons to learn, but they are better understood by reading books on them. The sixth part, which I personally think is relatively common, is recommended to read the section devoted to parallel machine learning to understand these things. What really stands out is the second, third, and fourth parts, or the parts that deal with POI and O2O, are worth reading!

Meituan can be said to be the largest POI and O2O application company in China. They have accumulated a lot of experience in this field and their structure is very beautiful. Even though I am not familiar with this field, I can have a general understanding of the application of machine learning in POI and O2. Next, I will introduce some of the parts that I find interesting.

POI entity links

To be honest, the user portrait part of The second part is the most architecturally impressive to me, but the most referential part is the POI entity link. There are many companies that do user portraits and comment mining, but Meituan’s POI algorithm is probably the best in the country.

A POI in this book represents an information entity, such as a hotel in a hotel business. Maintaining a good POI database is the foundation of all algorithms. In this part, Meituan discusses a problem like this: The POI database that Meituan already has is called inventory POI; The POI information library that needs to be recorded is called the POI library. Most of the POI in the POI library to be selected may originally be in the inventory POI, but the name, description, information and so on May be different. An important problem in the application is to correspond the POI library to the inventory POI library.

Take the hotel business as an example, the Golden Spring Holiday Hotel and golden Spring business hotel may be the same entity, how to put them together? Starting with name similarity is the solution everyone wants, but this approach may not be accurate, so you need to introduce other information, such as address, phone number, longitude and latitude, etc. Another concern in this scenario is that a full comparison cannot be made, otherwise we would end up with an algorithm whose complexity is at least the size of the waiting POI library * the size of the inventory POI library, which is unacceptable in practice. So what is Meituan’s plan?

Step 1: Narrow down the candidate set

POI aggregation is done by clustering: that is, aggregation is first done at the city dimension
Build an inverted index to narrow down the comparison candidate set

Step 2: Similarity comparison

There are two ways to do this:

If-else combos, which use a series of if-else conditions to determine whether the same entity is present
Similarity scoring: Score each dimension of different POI to compare similarity weighted

The specific process

The specific process of this work is as follows: data cleaning – feature generation (various modules) – model selection and effect evaluation

Their candidate models include GBDT, SVM, LR, etc., which are not complicated, but cover a lot of details in practice.

Search in O2O scenarios

Search in the O2O scenario is more interesting than a normal search question because it is deeply dependent on the current environment. Users searching for the same word in Japan, Beijing and Wuhan often mean different things. When users search for food at noon, afternoon and evening, they often expect to get completely different food recommendations. Users searching for a location may not necessarily be looking for a location, but may also be looking for a nearby restaurant or hotel. These problems put forward very high request to search system.

In Meituan, the main problems to be solved by search engines are as follows:

How to define the user’s query intent?
How to identify the user’s query intent?
How do I link a user’s query intent to a specific entity?
How do I guide the user through the search?

The book gives a detailed explanation of the solutions to these problems. I will not describe it in detail here. Another problem is the sorting method of search results. Meituan believes that its search sorting scenarios mainly have the following four characteristics:

Mobile: Users are constantly moving, and distance is an important factor in sorting
Contextization: The user is at home, at work, outdoors, or in store, and these different scenarios are important for understanding the user’s intent
Localization: The target of a search query is often localized
Personalization: User preference is obvious

In view of these characteristics, Meituan has implemented its own search sorting framework.

Recommended in O2O scenarios

According to Meituan, the main differences between O2O recommendation and other recommendations are as follows:

Geographical factor
User history behavior: Different from other recommendation scenarios, users are highly likely to repurchase at the same store
Real-time recommendation: firstly, users’ real-time geographical location and real-time consumption should be taken into account. In the O2O scenario, the time from considering consumption to placing an order is very short

Meituan still uses the classic recommendation framework, consisting of recall and sort stages:

Recall phase: Recall strategies include recall based on collaborative filtering, location-based recall, recall based on search queries, graph-based recall and recall based on real-time user behavior
Sorting stage; The ranking model is still a classic model without any special features. The features include: item dimension feature, user dimension feature, user and item cross feature, distance feature and scene feature.

Judging from the writing, recommendation is not the most important entrance of Meituan.

Advertising marketing in the O2O scenario

Advertising marketing in O2O scenario mainly has the following characteristics:

Mobile: mainly embodied in the accuracy, immediacy and interactivity of three aspects
Localization: Meituan found that in over 90% of transactions, the distance between the user and the merchant was less than 3 kilometers
Scenarioization: Mobile scenarios are more accurate than Web scenarios
Diversity: THE O2O model faces a variety of merchants and their demands vary greatly

Combined with these features, Meituan’s AD ranking mechanism is very interesting, and it can fulfill some requirements that are impossible on the Web: such as distinguishing which ads are lost and which competitors are lost to. I won’t bore you with the details here, but overall it’s interesting.

conclusion

From a personal point of view, the book’s greatest value lies in its broadening horizons. In different scenarios, algorithms face different problems, and some problems are often unexpected to people who do not do these things. In many cases, finding and locating problems is more valuable than solving them. Therefore, it is very helpful to have a look at the algorithm application in different scenarios. If we have a chance, we will analyze meituan’s user portrait architecture next time.

Pingshu: Meituan Machine Learning Practice

POI entity links

Step 1: Narrow down the candidate set

Step 2: Similarity comparison

The specific process

Search in O2O scenarios

Recommended in O2O scenarios

Advertising marketing in the O2O scenario

conclusion

Related Posts

C + + cow force!

Redis invasion

Set up regular cleanup and rollback of container instance logs with ease