How to become an expert in user profiling

User portrait is an old topic, almost all Internet companies, need to use user portrait, whether ToC business or ToB business. But it is not so easy to really understand what a user portrait is, and to produce accurate and effective user portraits.

Therefore, I would like to share with you my thoughts on the direction of user portrait and explore how to complete user portrait more professionally through this article.

Alan Cooper, the father of interaction design, first proposed the concept of user portrait as follows: “Personas are a concrete representation of target users”, a virtual representation of real users, a target user model based ona series of attribute data.

Why do you need to do user portraits?

You’ve all had the experience of being recommended a certain product or a certain movie by your best friend, and the success rate is often very high. The reason is simply that your friend knows you well enough to know what you are interested in and what you need at the moment.

Internet companies are desperate to get to know you as well as your friends do, so growth is fast and customer satisfaction is high.

From the company’s perspective, user profiling serves two main business goals:

Expand new users
Get new orders

Expand new users

We are faced with a huge amount of information every day, but how much of it can attract your attention and finally successfully convert you, it must be very little, the vast majority of information is “wasted”.

Therefore, the company only accurately understand the existing users, in the vast sea of people, through accurate marketing, to obtain the increasingly scarce attention of new users.

Get new orders

Now, the amount of content or products available on any one platform far exceeds the extent to which users can simply browse for comprehensive information.

If the information that users are interested in cannot be recommended to them in the first time, then users are likely to lose patience quickly in the process of searching for it, and not only new orders cannot be concluded, but even users will lose.

The platform needs to capture the needs of users in order to quickly facilitate the closing of new orders.

What do we need to do about user portraits?

Many companies have A DMP (Big Data management platform) that serves as a tool to help users monetize their portraits. From a technical point of view, DMP labels user data, uses algorithms to find similar groups, combines them with business scenarios, screens out highly matched user groups, and tries to reach these users (pop-ups, SMS, advertising alliances, etc.), and tracks their effects.

In advance, we need to define what different dimensions are needed for user portraits. I have compiled a comprehensive list of dimensions:

Natural attributes, such as sex, height, etc
Social attributes, such as occupation, education, etc
Wealth status, such as income, expenditure, etc
Family information, such as marriage, children, etc
Shopping habits, such as price sensitivity, brand loyalty, etc
Location characteristics, such as city, place of work, etc
Other behaviors (interests, etc.) such as: football fans, game fans, etc

How to do user portrait?

It’s impossible to really understand your users and accurately portray them. Because, the user’s consciousness is 99% exists in the “mind”, is a single, can not touch, only 1% will be reflected in the “online”, such as: search a key word, bought a bag of rice online and so on.

The user’s online data is only a small projection of the user’s inner world. From high dimensional to low dimensional, from infinite to finite mapping, natural information will be lost.

It is theoretically impossible to deduce high dimensions from low dimensions and infinite from finite ones, so we can only do user portraits in a narrow sense.

We usually describe an event in this way: what user, at what time, where, to what object, what action.

An event in the log is explained as follows:

What user, the way to identify the user, such as: Cookie, registration ID, Email, mobile phone number, ID card, etc.
What time? The timestamp that generated the log.
What location, page type, e.g. launch page, search page, details page, etc.
What object, content, item type, e.g., title, description, etc.
What actions, user actions, such as: browse, like, comment, share, favorites, buy, etc.

Typical data formats are as follows:

{
  'user_id': '001'.# the user ID
  'opt_time': 1578905680.# operation timestamp
  'opt_page': 'search'.# Operation page location
  'opt_type': 1.Behavior types: 1- like, 2- comment, 3- Share, 4- browse
  'opt_content_id': 1     # object ID
}
Copy the code

The user ID and timestamp are easy to understand and are basically literal, while the page location, behavior type and object ID need to be explained.

The page position

Although the object of user operation is the same, but in different page location, it reflects the degree of user will is also different, namely weight. For example: a bottle of mineral water, 1 yuan in the supermarket, 3 yuan in the railway station, 5 yuan in the scenic spot.

We need to define different weights for different page positions in order to get a more accurate portrait of the user.

Types of behaviour

Similar to the location of the page, the user’s different behavior can also reflect the user’s different degree of willingness.

Typical user action weights are as follows:

behavior	The weight
browse	1
give a like	2
collection	5
share	7
comments	10

The object ID

It is far from enough to only record the object ID, which cannot truly reflect the user’s interest point. It is necessary to label the object ID.

Label is a dimension of data identification that expresses people’s basic attributes, behavioral tendencies, interests and preferences. It is a keyword with strong correlation, which can briefly describe and classify people.

The definition of labels comes from business objectives. Based on different industries and application scenarios, the same label name may represent different meanings, and determines different model designs and data processing methods.

Now that we’ve covered the user data, we need to compute it.

Regarding the calculation method, we need to pay attention to the following two points:

Time decay
Heat attenuation

Time decay

The earlier the behavior, the weaker the influence on the current interest expression of the user. The weight of the user tag will decay with the increase of time, so it is necessary to define the time decay factor.

Heat attenuation

If a lot of users like the content or products, then it shows that it is popular, and can not truly reflect the interests of users, we need to punish, and weight the unpopular, partial door.

Eventually, we might end up with something like this:

User A’s occupation: programmer 0.8; User operation 0.3.
Gender of user A: male 0.7; Female 0.3.
Age of user A: under 20 0.6; 20-30, 0.3; 0.9 for those over 30 years old.

conclusion

The quality of user portrait is high or low, which directly affects the good or bad business development. The factors affecting the quality of user portrait are often the processing of details. This paper explains the principle and process, and points out the details that need to be considered in the processing, hoping to be helpful to everyone.

Finally, we have a book entitled “Understanding NLP Chinese Word Segmentation: From Principle to Practice”, which will help you master Chinese word segmentation from scratch and step into the door of NLP.

If the above content is helpful to you, I hope you can help me point a like, transfer a hair, comment.