Offer to come, dig friends take it! I am participating in the 2022 Spring Recruit series activities – click on the task to see the details of the activities.

Groeb 1.0

ERNIE is known as Enhanced Language Representation with Informative Entities. In order to enable the model to learn current knowledge contained in mass texts, Baidu proposed the ERNIE model, which masks words, Entities and other semantic units, Enables the model to learn the semantic representation of complete concepts.

The innovation points

A multistage knowledge cover-up strategy

To enhance the pre-trained language model, ERNIE uses a multi-stage knowledge mask strategy, which adds two mask strategies (entity-level and phrase-level) to the basic level mask strategy used by Bert. After improvement, compared with Bert model, ERNIE model can potentially learn longer semantic dependencies and is more generalized.

Basic level policy: the mask policy used in Bert. During the training, 15% of the words and characters are randomly masked and a Transformer model is trained to predict the masked words and characters.

Phrase level strategy: Use lexical analysis and chunking tools to obtain phrase boundaries in sentences, randomly mask phrases with multiple words, and predict masked phrases.

Entity-level strategy: first analyze the entities in the sentence (such as people’s names, locations, etc.), then mask a random number of entities, and train the model to predict these entities.

Multi-source corpus was used for pre-training

Baidu Encyclopedia corpus: contains articles written in official languages

Baidu News: provides the latest movie names, actors names, team names, etc

Corpus of Baidu Tieba: Posts can be regarded as dialogues, and questions in the same reply often have similar semantics. Based on this assumption, ERINE uses DLM (Dialogue Language Model) for modeling. Further improve the semantic representation ability of the model.

Groeb 2.0

ERNIE 2.0 adds more pre-training tasks on top of 1.0. The pre-training tasks are divided into three categories: lexical tasks, grammatical tasks and semantic tasks. The model continuously learns these pre-training tasks, thus learning the lexical, structural, semantic and other dimensions of knowledge, greatly improving the level of semantic understanding. The effect of dealing with downstream tasks can be significantly improved.

The innovation points

Task Embedding

Task Embedding is added on the basis of 1.0 Embedding. Task Embedding records the Task ID and is used to distinguish each Task.

The lexical layer

Knowledge mask task: Prediction of masked content based on multi-stage knowledge mask strategy

The structure layer

Sentence ordering task: Enter three sentences, the order of the three sentences is out of order, and the model predicts the correct combination of one permutation from all possible combinations, thus learning the logical and chronological order between sentences.

Sentence distance task: Input any two sentences, the model predicts that the two sentences are not adjacent, whether they belong to the same article, so as to judge whether the semantics of the two sentences are close and whether they belong to the same topic.

The semantic layer

Logical relationship prediction task: Conjunctions between sentences generally represent logical relationships between sentences. This task uses conjunctions between sentences for unsupervised classification of logical relationships to model fine-grained semantics.

Keep multitasking

If only one pre-training task training model is used in each stage of training, it will be easy to forget the knowledge learned in the previous task. But if you study all the tasks together, you have to organize all the pre-training tasks before each training session, and when new tasks are added, you have to learn from the beginning. Version 2.0 uses a new way of continuous training tasks. Whenever new pre-training tasks are added, the model is initialized with previously learned parameters, and the new tasks and the original tasks are trained simultaneously, so that new tasks can be added at any time, and the previously learned knowledge will not be forgotten.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Pre-training model ERNIE

Groeb 1.0

The innovation points

A multistage knowledge cover-up strategy

Multi-source corpus was used for pre-training

Groeb 2.0

The innovation points

Task Embedding

The lexical layer

The structure layer

The semantic layer

Keep multitasking

Pre-training model ERNIE

Groeb 1.0

The innovation points

A multistage knowledge cover-up strategy

Multi-source corpus was used for pre-training

Groeb 2.0

The innovation points

Task Embedding

The lexical layer

The structure layer

The semantic layer

Keep multitasking

Related Posts

Digital signal denoising based on MATLAB median filtering + singular value decomposition (SVD) digital signal denoising

[Basic course] Production of EPIDEMIC prevention GIF based on MATLAB

Data lake and data warehouse, don’t be silly to tell the difference