This paper introduces a new language representation model BERT, a bi-directional encoder representation from Transformer. Unlike recent language representation models, BERT aims to pre-train deep bidirectional representations based on all layers of left and right contexts. BERT is the first fine-tuning representational model to achieve current optimal performance in a large number of sentence-level and token level tasks, outperforming many systems using task-specific architectures and breaking the current optimal performance record for 11 NLP tasks.

BERT related Resources

The application method and experimental results of adding BERT in Chinese and small data sets

The title instructions additional
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Original paper 20181011
Reddit discussion The authors discuss
BERT-pytorch Google AI 2018 BERT pytorch implementation
BERT Model and Fine-Tuning Xi Xiangyu’s thesis interpretation
The strongest NLP pre-training model! Google BERT swept 11 NLP task records Paper analyses
【NLP】Google BERT Lee is obsessed with reading
How to evaluate the BERT model? Read the paper thought point
Detailed interpretation of BERT model of NLP breakthrough achievement Octopus Maruko reads
BERT interpretation of Google’s strongest NLP model AI Technology Review
To pre-train BERT, this is how they solved it with TensorFlow before the official code was released Reappearance of the paper 20181030
Google has finally opened the BERT code: 300 million entries and a full understanding of the heart of the machine 20181101
Why does Bert work miracles? 20181121
The ultimate in BERT Fine-Tune practice BERT’s guide to Fine Tune on Chinese data sets 20181123
BERT is an open source implementation that delivers significant improvements with minimal data 39 20181127

The essence of BERT’s paper

Model structure

One of the main modules, Transformer, comes from Attention Is All You Need

Model input

Pre-training method

Mask the language model (cloze) and predict the next sentence task.

The experiment

Model analysis

Effect of Pre-training Tasks

Effect of Model Size

Effect of Number of Training Steps

Feature-based Approach with BERT

conclusion

Recent empirical improvements due to transfer learning with language models have demonstrated that rich, unsupervised pre-training is an integral part of many language understanding systems. Inparticular, these results enable even low-resource tasks to benefit from very deep unidirectional architectures.Our major contribution is further generalizing these findings to deep bidirectional architectures, allowing the same pre-trained model to successfully tackle a broad set of NLP tasks. While the empirical results are strong, in some cases surpassing human performance, important future work is to investigate the linguistic phenomena that may or may not be captured by BERT.


BERT related Resources

The title instructions additional
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Original paper 20181011
Reddit discussion The authors discuss
BERT-pytorch Google AI 2018 BERT pytorch implementation
BERT Model and Fine-Tuning Xi Xiangyu’s thesis interpretation
The strongest NLP pre-training model! Google BERT swept 11 NLP task records Paper analyses
【NLP】Google BERT Lee is obsessed with reading
How to evaluate the BERT model? Read the paper thought point
Detailed interpretation of BERT model of NLP breakthrough achievement Octopus Maruko reads
BERT interpretation of Google’s strongest NLP model AI Technology Review
To pre-train BERT, this is how they solved it with TensorFlow before the official code was released Reappearance of the paper 20181030
Google has finally opened the BERT code: 300 million entries and a full understanding of the heart of the machine 20181101

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (Submitted on 11 Oct 2018)

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, Including pushing the GLUE benchmark to 80.4% (7.6% Absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD V1.1 question answering Test F1 to 93.2 (1.5%) Absolute improvement), Outperforming human performance by 2.0%. Comments: 13 pages

Abstract: This paper introduces a new language representation model BERT, which means Bidirectional Encoder representation from Transformer. Recent models of language representation (Peters et al., 2018; Radford et al., 2018), BERT aims to pre-train the depth bidirectional representation based on the left and right contexts of all layers. Thus, pre-trained BERT representations can be fine-tuned with just an additional output layer to create the current optimal model for many tasks, such as question answering and linguistic inference tasks, without much modification to the task-specific architecture.

BERT’s concept is simple, but the experimental effects are powerful. It updates current optimal results for 11 NLP tasks, including increasing GLUE benchmark to 80.4% (an absolute improvement of 7.6%), increasing MultiNLI accuracy to 86.7% (an absolute improvement of 5.6%), And increased SQuAD V1.1’s F1 score on the Q&A test to 93.2 points (up 1.5 points) — two points higher than human performance.

Subjects: Computation and Language (cs.CL) Cite as: arXiv:1810.04805 [cs.CL] (or arXiv:1810.04805v1 [cs.CL] for this version) Bibliographic data Select data provider: Semantic Scholar [Disable Bibex(What is Bibex?)] No data available yet Submission history From: Jacob Devlin [view email] [v1] Thu, 11 Oct 2018 00:50:01 GMT (227kb,D)

Reddit discussion

Google-research Bert officially reappears

Google recently released a large-scale pre-training language model based on two-way Transformer, which can efficiently extract text information and apply it to various NLP tasks. This study broke the current record of optimal performance for 11 NLP tasks using the pre-training model. If this kind of pre-training holds up in practice, and NLP missions can perform very well with very little data, BERT will become a veritable backbone network.

Introduction

BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: Arxiv.org/abs/1810.04… .

To give a few numbers, here are the results on the SQuAD v1.1 question answering task:

SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1
1st Place Ensemble – BERT 87.4 93.2
2nd Place Ensemble – nlnet 86.0 91.7
1st Place Single Model – BERT 85.1 91.8
2nd Place Single Model – nlnet 83.5 90.1

And several natural language inference tasks:

System MultiNLI Question NLI SWAG
BERT 86.7 91.1 86.3
OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0

Plus many other tasks.

Moreover, these results were all obtained with almost no task-specific neural network architecture design.

If you already know what BERT is and you just want to get started, you can download the pre-trained models and run a state-of-the-art fine-tuning in only a few minutes.

Repetition bert_language_understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding

Repetition BERT – keras

Keras implementation of BERT(Bidirectional Encoder Representations from Transformers)

Repetition pytorch – pretrained – BERT

PyTorch version of Google AI’s BERT model with script to load Google’s pre-trained models.

BERT’s data set, GLUE

GLUE: A Multi-task Benchmark and Analysis Platform for Natural Language Understanding

Abstract

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.