Abstract: How do your estranged EX, your forgotten junior high school classmate, your former colleague, and even the last person you want to meet — your BOSS — end up on your social networking app’s recommendation list? The key technology is link prediction of knowledge base, also known as knowledge graph completion.

In the search for him thousands of baidu, suddenly look back, that person is in the recommended list.

One of the best things about social software is the deep mining of user relationships. Even though you have blocked someone’s phone, wechat, and all social media accounts, they still appear in the “People you may know” section of your page. These include your EX, your forgotten junior high classmate, a former colleague, and even the last person you want to meet — your BOSS.

▲ Douyin – Discover friends

So, how did these people end up on your list?

The key technology is link prediction of knowledge base, also known as knowledge graph completion.

What is a knowledge graph?

Knowledge graph (KG) is a multi-relational graph that expresses knowledge as structured triples, including entities, concepts and relationships.

Entities are things in the real world such as people’s names, places, institutions, etc. ** The concept refers to a collection of entities with the same characteristics, ** the “athlete”, “Golden Ball”, etc., as shown below. Relationships are used to express some kind of connection between different entities.

The knowledge graph is composed of entities and relationships to intuitively model real-world scenarios. The process of constructing knowledge map is essentially a process of establishing cognition and understanding the world.

How to complete the knowledge graph

In the case of Xiao Ming, who works for Sina in Wudaokou, the system can infer that He works in Beijing. He recommended Xiao Wang, who also works at Sina in Beijing. In the figure below, the blue arrow represents the existing relationship, and the red arrow represents the relationship after completion of the knowledge graph.

The relationship between knowledge graph and knowledge representation learning

Knowledge graph is composed of entities and relations, which are usually represented in the form of triples — head, relation and tail, abbreviated as (H, R, t). Knowledge representation learning task is to learn distributed representation of H, R, T (also known as embedding representation of knowledge graph). It can be said that AI type knowledge graph application is possible only with Embedding of knowledge graph.

How to understand Embedding meaning Embedding?

Simply speaking, embedding is applied to an object (word, word, sentence, article…). Description in multiple dimensions is equivalent to describing an object through data modeling.

For example, the RGB representation of colors in Photoshop that we often use is an atypical embedding. Here the colors are broken down into three characteristic latitudes, R(red intensity, 0-255), G(green intensity, 0-255), and B(blue intensity, 0-255). RGB(0,0,0) is black. RGB 41,36,33 is ivory black. In this way, we can describe colors by numbers.

First, what are the methods of knowledge representation learning

The key to knowledge representation learning is to design a reasonable score function. We want to maximize the score function given that triples of facts are true. It can be divided into the following two types in terms of implementation form:

  • Structure-based approach

The basic idea of this kind of model is to learn the representation of entity and relation of knowledge graph from the structure of triplet, and the most classical algorithm is TransE model. The basic idea of this method is that the head vector representing H and the relation vector representing r are as close as possible to the tail vector representing t, that is, H +r≈ T. The “proximity” here can be measured using the L1 or L2 norm. The schematic diagram is as follows:

Such knowledge representation learning models include: TransH, TransR, TransD, TransA, etc.

  • Semantically based approach

Such models learn the representation of entities and relationships of KG from a textual semantic perspective. Such representation methods mainly include LFM, DistMult, ComplEx, ANALOGY, ConvE, etc.

2. Application of knowledge representation learning

Due to representation-based learning, entities and relations of the knowledge graph can be vectorized to facilitate the calculation of subsequent downstream tasks. Typical applications are as follows:

1) Similarity calculation: using the distributed representation of entities, we can quickly calculate the semantic similarity between entities, which is of great significance for many tasks of natural language processing and information retrieval.

How do you calculate the similarity? Let me give you an example.

It is assumed that the embedding of “Li Bai” is 5 dimensions and its value is [0.3,0.5, 0.7, 0.03, 0.02], where each dimension represents the correlation with something and the five values respectively represent the meanings of poet, writer, litteratek, freelancer and knightly.

While “Wang Wei “=[0.3,0.55, 0.7, 0.03, 0.02],” Newton “=[0.01,0.02, 0.06, 0.4, 0.01], we can use cosine distance (in geometry, the Angle cosine can be used to measure the difference between two vector directions; In machine learning, this concept is borrowed to measure differences between sample vectors.) To calculate the distance between these words, it is obvious that Li Bai is closer to Wang Wei and farther from Newton. Thus it can be concluded that “Li Bai” and “Wang Wei” are more similar.

2) Knowledge graph completion. To build a large-scale knowledge graph, relationships between entities need to be constantly supplemented. By using knowledge representation learning model, the relationship between two entities can be predicted, which is generally called link prediction of knowledge base, also known as knowledge graph completion. The above example of “Wudaokou Xiaoming” can be well explained.

3) Other applications. Knowledge representation learning has been widely used in relation extraction, automatic question answering, entity linking and other tasks, showing great application potential.

Question answering is an application deeply combined with knowledge representation learning. For intelligent question answering products, background design, generally divided into three layers, input layer, representation layer, output layer. The input layer is simply the question library, which is a collection of all the questions a user might ask. After knowledge extraction of the presentation layer, the final result is returned.

Typical intelligent question answering products are Apple Siri, Microsoft Xiaoice, Baidu, Ali Xiaomi and so on. One of the features of these q&A products is that they can make search results more accurate, rather than returning a bunch of similar pages and having you sift through them yourself. A search for “how much is Wang Sicong worth,” for example, returns specific numbers.

Third, summary

In short, based on knowledge graph knowledge completion technology, social products predict missing triples through representation of entities and relations, and predict tail entities based on known head entities and relations between head entities. In other words, they are based on user profiles, and if you don’t want “old acquaintances” to appear on your recommendation list, it’s best to turn off geolocation on social products and reveal as little personal information as possible.

The resources

1. Liu Zhiyuan, Sun Maosong, Lin Yankai, Xie Ruobing, Research Progress in Knowledge Representation Learning

Click to follow, the first time to learn about Huawei cloud fresh technology ~