Airbnb personalized recommendations

Airbnb personalizes scenes

Airbnb usage scenarios:

Bilateral short-term housing rental platform (customer, landlord)

Customers find rooms through search or system recommendation => contribute 99% of Airbnb’s booking

It is rare for a customer to book the same room more than once

Only one house can be rented by one customer at a given time

There is severe sparsity in the data

List Embedding

Place each house => house embedding

The data set is composed of N users’ clicking sessions, where each Session is defined as an uninterrupted sequence composed of M house ids clicked by users

A new Session is considered as long as two consecutive clicks are more than 30 minutes apart

The goal is to learn the D-dimension (32-dimension) embedding representation of each housing listing through the collection of S, so that similar listings are closer to each other in the embedding space

It borrows from the Skip-Gram algorithm in Word2VEc

Housing resource embedding takes the housing resource Session clicked by each user continuously as a sentence, and each housing resource is treated as a word to train the housing resource embedding

The use of Word2Vec

• The space where the original word resides is mapped to a new space through Embedding, so that semantically similar words are close to each other in this space.

• Word Embedding => Learn the weight matrix of the hidden layer

• The input layer is one-HOT coding, and the output layer is probability values

• The input layer and output layer size are equal to the thesaurus size

• The number of neurons in the hidden layer is hidden_size (Embedding Size)

• For the weight matrix W between input layer and hidden layer, the size is [VOCab_size, hidden_size]

• The output layer is a vector of [VOCab_size] size, with each value representing the probability of output of a word

• Suppose there are multiple training samples (juice,apple) (juice,pear) (juice,banana)

• The central word is juice, which specifies that there are apple, PEAR, and banana in the window. The common input juice corresponds to the output of different Word. After training with these samples, the probability values of Apple, PEAR and banana are higher => Their parameters in the corresponding hidden layer are similar

• Calculate the cosine similarity between the hidden layer vectors corresponding to Apple, PEAR and banana

• Word2vec enables words to have similarities and analogies

• The result we want is not the model itself, but the parameters of the hidden layer, that is, to transform the input vector to the new embedding

• 

 

Evaluation of List Embedding

Offline evaluation of List Embedding:

Before using the recommendation system based on embedding to conduct online search test, it needs to conduct several offline tests. The purpose is to compare the embedding trained by different parameters, determine the embedding dimension, algorithm ideas and so on.

Evaluation criteria that test how likely a user’s most recent click on a recommended listing is to result in a reservation

Step1, get the houses that the user recently clicked on, the candidate list of houses that need to be sorted, and the houses that the user finally booked

Step2. Calculate the cosine similarity between the selected housing and the candidate housing in the embedding space

Step3: rank the candidate housing sources according to their similarity, and observe the position of the final reserved housing sources in the ranking

The housing resources in the search are reordered according to the embedding similarity, and the final ordering of the reserved housing is calculated according to the average click before each booking, which can be traced back to the 17 clicks before booking

Evaluation of List Embedding:

The validity of embedding is verified in various ways

K-means clustering, embdding is clustered, and then its geographical differentiation can be found

Cosine similarity between Embeddng

Cosine similarity between different types of listings

Cosine similarity between listings in different price ranges

In addition to the basic attribute (price, geographical location), it is obvious that it can be directly obtained, and the implicit attribute can be found by embedding, such as the literature and science of houses

Calculate the K-nearest neighbor of each listing embedding, and compare this listing with k-nearest neighbor => embedding Evaluation Tool

There are special videos on YouTube, which proves that Embedding is very useful => Make similar listings closer to each other in Embedding space

Cold start of List Embedding

Similar housing recommendation based on List Embedding

Each Airbnb listings detail page contains a “similar listings” roundup, recommending listings that are similar to the current listings and can be booked within the same time frame

After the establishment of list embedding, A/B test was conducted. The recommendation based on embedding increased the click rate of “similar housing” by 21%, and the reservation generated by “similar housing” increased by 4.9%

In the recommendation based on embedding, similar housing resources find K nearest neighbors in the list embedding space

For the list embedding that has learned well, by calculating the cosine similarity between the vectors of all the housings from the same destination, all the similar housings that can be booked for the specified housings are found (conditional on the check-in and check-out dates). The houses need to be available for booking within this period) => The final K houses with the highest similarity form the similar house list

User Type Embedding and Listing Type Embedding

Long-term behavior, such as a person who booked a room in another city a long time ago, is likely to prefer the same type of room in the current city

Further capture this information from the scheduled Listing

Construct the data set: from the previous click sequence to the scheduled sequence, the data set is a set of sessions consisting of listings booked by N users, each Sesison can be represented as

Existing problems:

The training data set is small because the scheduled data is an order of magnitude smaller than a click.

Many users have only booked once in the past, and this data is not available for training models

You need to further remove listings that have been booked a small number of times in total on the platform (e.g., fewer than 5-10 listings)

The time span is too long, and users’ preferences may have changed

Real-time personalized search based on Embedding

Calculate the cosine similarity between the user type embedding of the user and the listing type embedding of the candidate listing

Recommend listing with high similarity to users

Fintech data analysis

Fintech application scenarios

Fintech:

Financial + Technology makes Financial services more efficient through technological means

Financial services: insurance, banks, securities brokers and funds need to be supported by technology. In addition, Internet companies are also launching financial businesses, such as Ant Financial

Fintech Companies & Talent development

Fintech industry club seeks interdisciplinary talents with digital skills and business analysis skills in software Engineer of technology companies and financial industry

Typical companies: Ant Financial, JD Finance, Grab, Sofi, Oscar Health, Nubank, Robinhood, Atom Bank, Lufax, Bloomberg, Factset, PayPal

How to use Python for quantitative trading

Quantitative trading (VNPY, JointQuant, Ricequant)