My home has two little tigers, they want to wish everyone a happy Year of the tiger, ha ha ha, come on, above!

Brush before the holiday to Arxiv a paper related to knowledge map, is the study of physical link prediction, simple before swept one eye, the authors are from tencent, must be through the test of real business scenario data, but there was no time to read, save as a TODO, spare some time probably looked at these two days, simple to share to everyone, Big bosses are generous to give advice ~

First, the core summary

Physical link prediction is a very important part of the field of knowledge map, and play a big role in physical link prediction is to study knowledge representation, knowledge map Embedding), research results show that there are a lot of literature knowledge map before Embedding can be roughly divided into two areas, namely the distance model based on translation, and semantic matching model.

The translation-based distance model uses distance-based scoring functions to express different relationships between nodes by designing distance evaluation methods. The typical models of this kind of methods are a series of translation models starting from TransE and their subsequent variants, such as TransE, TransH, RotatE, HAKE, etc. There is a previous article devoted to these translation models, and you can check it out. Although the representation of entity relationship distance based on translation distance can be very diverse, it is difficult to predict entity information that has not yet appeared.

The second type of method is the semantic matching method, which is not affected by cold start and can be retrieved from the context of the text for the unseen entity representation. Some well-known representative models of this method include KG-Bert, MLMLM, StAR, etc. This kind of method also has its corresponding disadvantage. In the pre-training stage, only the context knowledge is learned, while the relational information is ignored. In addition, the model structure is usually complex, and it is difficult to construct a high proportion of negative samples, resulting in insufficient learning of negative samples in the training process.

Two, insufficient improvement

Aiming at the problems of poor entity prediction ability and inadequate training of semantic matching model (relational information and negative sample construction), a new Knowledge Graph BERT pretraining framework (LP-BERT) is proposed, whose essence is semantic matching. There are two main parts to the above problem:

One USES the multitasking learning strategy training, in the process of training in advance, not only USES the MLM system study context knowledge, semantic and semantic relationship between prediction and introducing the entities triples knowledge information in learning knowledge map, MEM and MRM respectively, which convert the structured information of knowledge map for unstructured information embedded in the training process. Second, inspired by the recent fire contrastive learning, the negative sampling method of triplet was added to a training batch sample, which greatly increased the proportion of negative sampling under the condition of keeping the training time unchanged, and solved the problem of insufficient model training caused by the low proportion of negative sampling. In addition, in order to improve the diversity of training samples, a data expansion method based on reverse relation of triples is proposed again.

Three, a rough look at the model

Lp-bert’s model structure is mainly divided into two parts. The following figure shows the overall architecture of LP-BERT, which is mainly divided into multi-task pre-training stage and knowledge finetuning stage. The multi-task pre-training task includes MLM, MEM and MRM.

1. Pre-training

The following diagram shows the structure of multi-task pre-training. Different colors represent different meanings, and different dashed boxes represent different pre-training tasks. In the following figure, E_h and D_h respectively represent the header entity and its corresponding text, R represents the relationship between entities in the triplet, E_t and D_t respectively represent the tail entity and its corresponding text. X^{[MASK]} represents the masked words in the pre-training, and X^{[PAD]} represents the vector of fixed length to be completed. MEM_h represents head entity masking, MEM_t represents tail entity masking, MRM represents relational masking, and MLM represents the masking language model proposed in original BERT.

Mask Entity Modeling(MEM) : For semantically based Entity prediction tasks, since each triad contains two entities: a head Entity and a tail Entity, two different tasks are designed: a head Entity prediction and a tail Entity prediction. The first dotted box shown in the figure above is the header entity prediction. The blue font represents the header entity information, including the masked word X^{[MASK]} and the real label information half mile. The red font represents MLM randomly blotted out words and real words. The second dotted box is the same as the first, but with the shadowing of the tail entity.

Mask Relation Modeling(MRM) : For the relationship prediction task, the sample construction strategy is similar to the MEM task. Relationships are masked and predicted while preserving the header and tail entities and descriptions in triples.

Mask Language Modeling(MLM) : In order to coexist with MEM and MRM, different from BERT’s random masking prediction for all words in the sequence, the MLM method proposed in this paper only performs local random masking for specific text range of the sample. For example, for the header entity prediction task, only the tail entity (E_t) and the corresponding text (D_t) will be masked and predicted, without affecting the information within the scope of the header entity. Other tail entity prediction and relationship prediction are similar strategies.

Loss Designing: The strategies for constructing samples in MEM and MRM tasks are mutually exclusive, so three samples of the same input model training cannot simultaneously predict predictions for head and tail entities. In order to ensure the generalization ability of the model, MEM and MRM tasks are combined into MIM (Mask Item Model) tasks, and the loss function is defined as follows

2. Knowledge fine-tuning

The fine-tuning stage mainly consists of two parts. One is to improve the negative sampling based on the idea of comparative learning and construct negative samples in a batch of training samples, which can solve the problems of insufficient negative sample construction and insufficient training of the previous method. The other is to use the method of data enhancement in the training to make a reverse relation of the original triplet relationship, for example, the previous head entity prediction sample is (? ,R,E_r) to (E_r, R_{rev},? To enhance the data. In addition, two distance calculation methods are designed to jointly calculate the loss function, as follows:

4. Experimental results

1. Data sets

Data sets are WN18RR, FB15K-237 and UMLS. The relevant data distribution statistics are as follows.

2. Experimental results

The following are the experimental results, which are divided into translation distance model and semantic matching model. The experimental results are all positive, and in the WN18PR data set, the relevant evaluation indicators have been significantly improved.

The following figure shows the initialization effects of three different semantic matching models using different pre-training models in WN18PR. From the experimental results, LPbert-Base is significantly improved compared with Robert-Base and Robert-Large.

Other relevant details in the paper you can move to download the paper to view.

Thank you.

Share the paper

Paper: LP-BERT: Multi-task Pre-training Knowledge Graph BERT for Link Prediction

Arxiv: arxiv.org/abs/2201.04…

The resources

  1. LP-BERT: Multi-task Pre-training Knowledge Graph BERT for Link Prediction

PS Welcome to “AI Natural Language Processing and Knowledge Mapping” public account