Abstract: This paper is a preliminary interpretation of the paper’s work on THE BART named entity recognition based on ACL2021 NER template.

This article to share from huawei cloud community “ACL2021 NER | BART named entity recognition based on template”, the author: JuTzungKuei.

Paper: Cui Leyang, Wu Yu, Liu Jian, Yang Sen, ZhangYue. TemplateBased Named Entity Recognition Using BART [A]. Findings of theAssociation for Computational Computational Linguistics: ACL-IJCNLP 2021 [C]. Online:Association for Computational Linguistics, 2021, 1835 — 1845.

Link: aclanthology.org/2021.findin…

Code: github.com/Nealcly/tem…

1, the

  • Small sample NER: More source domain data, less target domain data

  • Existing methods: Similarity-based measures

Disadvantages: Knowledge in model parameters cannot be used for migration

  • A template – based approach is proposed

NER is seen as a sort of language model problem, seQ2SEQ framework

The original sentence and template serve as the source sequence and template sequence, respectively, and are populated by the candidate entity SPAN

Inference: Classify each candidate span according to the corresponding template score

  • The data set

CoNLL03 rich resources

MIT Movie, MITRestaurant, ATIS low resources

2, introduce

  • NER: NLP basic tasks, identify mention span, and classify

  • Neural NER model: Requires a lot of annotated data, lots of news fields, but little else

Ideal: Rich resource knowledge migrates to low resource

Reality: Different domains have different categories of entities

Training and testing: SoftMax layer and CRF layer need consistent tags

New territory: The output layer must be retuned and trained

  • Recently, a small sample of NER used the distance measure: training similarity measure function

Excellent: Reduced field fit

Lack of :(1) heuristic nearest neighbor search to find the best overparameter, network parameters are not updated, can not improve the neural representation of cross-domain instances; (2) Depending on the text mode similar to the source domain and target domain

  • A template – based approach is proposed

Using the small sample learning potential of PLM generation, sequence labeling is carried out

BART is fine-tuned by predefined templates populated with annotation entities

<candidate_span>is a <entity_type> entity

<candidate_span>is not a named entity

  • Advantages of the method:

Annotation examples can be used to fine-tune new areas

Is more robust than the distance-based approach, even though there is a big difference in writing style between the source domain and the target domain

Any class of NER can be applied without changing the output layer and sustainable learning

  • The first one uses generated PLM to solve the labeling problem of small sample sequences

  • Prompt Learning

3, methods,

3.1 Creating a Template

  • Consider NER task as LM sequencing problem under SEQ2SEQ framework

  • Entity_type: \mathbf{L}=\{l_1,… ,l_{|L|}\}L={_l_1,… _l_∣ _l_∣} {LOC PER ORG… }

  • \mathbf{Y}=\{y_1,… ,y_{|L|}\}Y={_y_1,… ,_y_∣_L_∣} {location,person, orgazation… }

  • Entity template: \mathbf{T}^{+}_{y_k}=\text{

    is a location entity.
    \_span>

  • Mathbf {T}^{-}= text{

    is not a named entity.}T−=

    is not a named entity.

    \_span>

  • Template collection: \ mathbf {T} = [\ mathbf {T} ^ {+} _ {y_1},… \ mathbf {T} ^ {+} _ {y_ {| L |}}, \ mathbf {T} ^ {to}] T = [T_y_1 +,…, T_y_ ∣ _L_ ∣ +, T -]

3.2 reasoning

  • Enumerate all spans, limit the number of n-grams to 1 to 8, each sentence has 8N templates

  • Mathbf {T}_{{y_k},x_{I :j}}=\{t_1,… ,t_m\}Tyk,xi:j={t1,… ,tm}

  • X_ {I :j}xi:_j_ entities have the highest score

  • If there are nested entities, select the one with the highest score

3.3 training

  • Gold standard entities are used to create templates

Entity x_ {I, j} xi: _j_ types for y_k_yk_, its template for: \ mathbf {T} ^ {+} _ {x_ y_k, {I, j}} T_yk_, xi: _j_ +

The entity x_ {I, j} xi: _j_, its template for: \ mathbf {T} ^ {-} _ {x_ {I, j}} Txi: _j_ –

  • Building a training set:

\mathbf{X}, \mathbf{T}^+)(X,T+)

Negative case: (\ mathbf {X}, \ mathbf {T} ^ -) (X, T -), random sampling, the number 1.5 times that of positive cases

  • Encoding: \ mathbf {h} ^ = {enc} \ text {ENCODER} (x_ {1: n}) henc = ENCODER (_x_1: _n_)

  • Decoding: \ mathbf {h} _c ^ = {dec} \ text {DECODER} (h ^ {enc}, t_} {1: c – 1) h_cdec_ = DECODER (henc, _t_1: _c_ – 1)

  • Conditional probability of word T_c_tc_ : P (t_c | t_} {1: c – 1, \ mathbf {X}) = \ text {SOFTMAX} (\ mathbf {h} _c ^ {dec} \ mathbf {W} _ {lm} + \ mathbf {b} _ {lm}) p (_tc_ ∣ _t_1: _c_ – 1, X) = SOFTMA X(h_cdec_W_lm_+b_lm_)

\ mathbf {W} _ {lm} \ in \ mathbb {R} ^ {d_h \ times | V |} W_lm_ ∈ R_dh_ x ∣ _V_ ∣

  • The cross entropy loss

4, the results

  • Test results for different template types

Select the first three templates and train the three models separately

  • The experimental results

The last line is three-model fusion, entity-level voting

Extra extra: Want to know more AI technology dry goods, welcome to huawei cloud AI zone, currently there are AI programming Python and other six combat camps for everyone to learn for free.

Click to follow, the first time to learn about Huawei cloud fresh technology ~