I. Introduction to labels

Label concept

Label, originally used to classify and mark objects, such as the name of the item, weight, volume, purpose and other brief information. Later, it gradually became popular in the data industry, used to mark data, quickly classify data acquisition and analysis.

Label characteristics

Precise description of location and search, with lifecycle features that can be calculated, configured, and normalized. Tags can be used to describe all kinds of structured and unstructured data (documents, images, videos, etc.) so that the content can be managed efficiently.

  • Description features: tag [phone color], features [red, white];
  • Description rules: tags [active users], rules [daily login, generated transactions];

The label value

  • The basis of fine operation, effectively improve the accuracy and efficiency of flow.
  • Help product quickly locate demand data and make accurate analysis;
  • Can help customers get into the market cycle faster;
  • In-depth forecast and analysis of data and timely response;
  • Developing intelligent recommendation system based on label;
  • Based on certain types of data analysis, insight into industry characteristics;

The core value of tags, or the most commonly used scenario: real-time intelligent recommendation, accurate digital marketing.

2. Label definition

attribute

Attribute tags are used to describe basic features, which do not require behavior generation, and are not based on rule engine analysis. For example, they are based on user real-name authentication information and obtain: gender, birth date, birth date and other characteristics. The frequency of change is minimal and the accuracy is high.

Behavior TAB

Through different business channels, users’ behavioral data are captured. Based on the analysis of these data, labels are formed to describe the results. For example, by analyzing users’ “online shopping platform”, the results obtained are Pindingduo, Taobao, JINGdong, Tmall, etc. These are all labels that need to be judged by behavioral data.

Rules of the label

Labels analyzed under the rules are more based on products or operations. For example, e-commerce platforms need to issue benefits to members with membership level over 5 and who have been active in the past 7 days. This involves two label applications: 1. 2. How to judge “active in the last 7 days”, whether it is based on login or transaction behavior, these should be dynamically configurable, and then generate results based on rules engine. Based on dynamic rule configuration, description labels are generated after calculation and analysis, that is, rule labels.

Fitting the label

Fitting tags are extremely complex. Through intelligent combination analysis of various tags, prediction description can be given, or advanced definition can be directly given. For example, the so-called mind-reading technique can judge people’s mental activities through multiple features and eye information. There is a saying in machine learning: by judging and learning the behavior of the user over time, the machine may know the user better than the user.

Label management system

Hierarchical classification

The basic means of label management, usually divided by industry: finance, education, entertainment, etc.; Refined management through multi-level classification.

Based on label

It is a key label of data. It is precise, flat, and cannot be subdivided. It is used to accurately describe data, similar to metadata. Structured table management occurs when multiple tags are combined to describe data characteristics.

Label value type

Value types: number, dictionary, Boolean, date, text box, custom, etc., is the management of the specific value of the tag. For example, tag “gender”, tag value “male. Female. Unknown”, this kind of scenario is typically described by listing dictionaries.

Iv. Label production process

1. Basic process

The data collection

There are many channels for data collection, such as various business lines within the same APP: shopping, payment, financial management, take-out, information browsing, etc. Through the data channel to the unified data aggregation platform. With the support of these massive log data, there are basic conditions for data analysis. Data intelligence, deep learning and algorithms are all based on massive data, so as to obtain valuable analysis results.

The data processing

Combined with the above business, through the processing, analysis and extraction of massive data, to obtain relatively accurate user labels, there is a key step, is to constantly verify and repair the existing user labels, especially the relevant labels of rule class and fitting class.

Tag library

Label through the tag library, managing complex as a result, in addition to the complicated tags, and based on the time line of the label, tag data here, already has a considerable value, can around the tag library open some fee, common, for example, users in a certain electricity APP to browse some goods, you can see recommendation in a flow of information platform. The era of big data is so smart and suffocating.

Label business

After a long circle of data conversion into labels, it is natural to return to the business level. Through the analysis of users of label data, precision marketing and intelligent recommendation can be carried out. E-commerce applications can improve transaction volume and information flow can better attract users.

The application layer

The above business is developed into services, integrated into the application level, constantly improve the quality of application services, continue to attract users, provide services. Of course, the user’s data is constantly generated at the application level, and then transferred to the data acquisition service, finally forming a complete closed-loop process.

2. Data aggregation pool

  • Based on IDmapping technology, replacing unique identifier [UID];
  • Associate labels based on UID and add them to the computing pool.
  • The same UID carries tags that behave like snake;
  • Constantly enrich the label content carried by the UID;

In this way, the scene of the tag is enriched to generate greater data value;