ACL is the most important international conference in the field of computational linguistics and natural language processing, organized by the International Association for Computational Linguistics and held annually. According to the Google Academic Journal of Computational Linguistics, ACL ranks first in terms of influence and is A ccF-A recommended conference. Meituan technical team a total of 7 paper (including six long piece, 1 passage) is an ACL 2021 received, these papers are Meituan technical team in event extraction, entity recognition, intention recognition, discovery of new slot and unsupervised sentence said, semantic parsing, document retrieval and natural language processing tasks on some of the exploration and application of the frontier.

The Annual conference of the Association for Computational Linguistics (ACL 2021) will be held in Bangkok, Thailand, August 1-6, 2021 (virtual online conference). ACL is the most important international conference in the field of computational linguistics and natural language processing, organized by the International Association for Computational Linguistics and held annually. According to the Google Academic Journal of Computational Linguistics, ACL ranks first in terms of influence and is A ccF-A recommended conference. The theme of this year’s ACL is “NLP for Social Good.” According to official statistics, 3,350 valid submissions were received, 710 main conference papers (21.3% acceptance rate) and 493 Findings papers (14.9% acceptance rate).

A total of 7 papers (6 long papers and 1 short paper) from the technical team of Meituan were received by ACL 2021. These papers are the precipitation and application of some technologies of Meituan in natural language processing tasks such as event extraction, entity recognition, intent recognition, new slot discovery, unsupervised sentence representation, semantic parsing, document retrieval and so on.

For event extraction, we propose a bidirectional entity-level decoder (BERD) to generate an argument role sequence for each entity step by step, using the semantic level of argument role information of surrounding entities. For entity recognition, for the first time, we put forward the concept of slot can transfer between degrees, and puts forward a method to the calculation of degree of migration between slot, by comparing the target’s transferable tank with the source task, find the corresponding source for different target slot task as its source trough trough, only based on the source of training data to target groove groove filling model construction; For intention recognition, we propose a method of intention feature learning based on supervised contrast learning, which maximizes the interclass distance and minimizes the intra class variance to improve the distinction between intentions. Novel Slot Detection (NSD) tasks are defined for the first time for new Slot discovery. Different from traditional Slot identification tasks, new Slot identification tasks attempt to find new slots in real conversation data based on existing intra-domain Slot annotation data. And then continuously improve and enhance the ability of the dialogue system.

In addition, in order to solve the “collapse” phenomenon of BERT native sentence representation, we propose a sentence representation transfer method based on contrast learning, called ConSERT. By fine-tune the unsupervised corpus in the target domain, the sentence representation generated by the model can better match the data distribution of the downstream task. We also propose a new unsupervised semantic parsing approach, synchronous Semantic decoding (SSD), which can solve the semantic gap and structural gap simultaneously by using both retell and syntactic constraint decoding. We also start with improving the document encoding to improve the semantic representation ability of the document encoding, which not only improves the effect but also improves the retrieval efficiency.

Next, we will give a more detailed introduction to these seven academic papers, hoping to help or inspire those who are engaged in relevant research. Please leave a message in the comment section at the end of the article to communicate with us.

01 Capturing Event Argument Interaction via A Bi-Directional Entity-Level Recurrent Decoder

Download | | Paper the authors of the Paper: XiXiangYu, Ye Wei (Beijing university), Zhang Tong (Beijing university), (Beijing university), zhang shikun Wang Quanxiu (RICHAI), jiang will star, wuwei | Paper type: the Main Conference Long Paper (Oral)

Event extraction is an important and challenging task in the field of information extraction, which is widely used in the fields of automatic summarization, automatic question answering, information retrieval, knowledge graph construction, etc., aiming at extracting structured event information from unstructured text. Event argument extraction Is an important and extremely difficult task in event extraction to extract the description information (called argument information) of specific events, including event participants, event attributes and other information. Most argument extraction methods usually model argument extraction as an argument role classification task for entities and related events, and conduct separate training and testing for each entity in a sentence entity set, ignoring the potential interaction between candidate arguments. However, some of the methods that use the argument interaction information do not make full use of the semantic level of the argument role information of the surrounding entities, and ignore the multi-argument distribution pattern in a specific event.

In view of the existing problems in event argument detection, this paper proposes to display the semantic level of argument role information of surrounding entities. In this paper, argument detection is first modeled as an entity-level decoding problem. Given sentences and known events, the argument detection model needs to generate sequence of argument roles. At the same time, different from the traditional word level Seq2Seq model, a bidirectional entity level decoder (BERD) is proposed to generate a sequence of argument roles for each entity step by step. Specifically, this paper designs an entity-level decoding loop unit, which can utilize both the current instance information and the surrounding argument information. The forward and backward decoders are used to predict the current entity from left to right and from right to left respectively, and the left/right argument information is used in the unidirectional decoding process. Finally, after the two-direction decoding is completed, a classifier combined with the characteristics of the bi-directional encoder is used to make the final prediction, so that the arguments on both sides can be used simultaneously.

In this paper, experiments are carried out on the open data set ACE 2005, and compared with many existing models and the latest argument interaction methods. Experimental results show that the performance of this method is better than the existing argument interaction methods, and the improvement effect is more significant when the number of entities is large.

02 Slot Transferability for Cross-domain Slot Filling

Download | | Paper authors: lu hengtong (Beijing university of posts and telecommunications), Han Zhuo core (Beijing university of posts and telecommunications), cai-xia yuan (Beijing university of posts and telecommunications), xiao-jie wang (Beijing university of posts and telecommunications), ray for book, jiang will star, wuwei | Paper type: Findings of ACL 2021, Long Paper

Slot filling is designed to identify task-related slot information in user discourse and is a key part of task-based dialogue system. When a task (or field) has more training data, the existing slot filling model can obtain better recognition performance. However, for a new task, there is often little or no slot annotation corpus. How to use the annotated corpus of one or more existing tasks (source task) to train the slot filling model in a new task (target task) is of great significance for the rapid expansion of task-based dialogue system applications.

Aimed at the problem of the existing research mainly divided into two kinds, the first through the establishment of the source channel information representation and target task slot implicit semantic alignment between information said, to use the source data model directly to the target task training, these methods described groove, groove value samples containing groove information such as the content of the interaction with word said in a certain way to tank related words said, This is followed by a “BIO” based slot labeling. The second approach adopts a two-stage strategy, in which all slot values are regarded as entities. Firstly, a general entity recognition model is trained to identify all candidate slot values of the target task by using the source task data, and then the candidate slot values are classified to the slots of the target task by comparing the similarity with the representation of the slot information of the target task.

Most of the existing work is focused on the construction of cross-task migration model using the correlation information between source and target tasks, and the data of all source tasks is generally used in the model construction. In practice, however, not all source task data will be of transferable value to slot identification of the target task, or the value of different source task data to a particular target task may be quite different. For example, the plane ticket booking task has a high similarity with the train ticket booking task, and the slot filling training data of the former is helpful to the latter, while the plane ticket booking task and the weather query task are quite different, and the training data of the former has no or only a little reference value to the latter, and even plays a role of interference.

Further, even if the source and target tasks are similar, but not the task of each source of training data will help target tasks all groove, for example, the departure time of flight reservation task slot training data may departure time of train ticket booking the task slot filling help, but on the train type groove, there is no help, instead of interference effect. Therefore, we hope to find one or more source task slots that can provide effective migration information for each slot in the target task, and build a cross-task migration model based on the training data of these slots, which can make more effective use of source task data.

For this reason, we first propose the concept of inter-slot mobility, and propose a calculation method of inter-slot mobility. Based on the calculation of inter-slot mobility, we propose a method to select slots that can provide effective migration information in the source task for the target task. By comparing the mobility of target slots and source task slots, the corresponding source task slots were found for different target slots, and the slot filling model was built only based on the training data of these source slots. Specifically, but migration degree fusion target slot between tank and source distribution similarity value, value and groove context said distribution similarity as a transferable degree between two slot, and then the source task slot on the basis of its and transferable between the target slot degree high and low rank, with the highest can migrate the groove of the corresponding training corpus a slot filling model, To obtain its performance on the target slot verification set, a new training corpus training model corresponding to the source task slot was added according to the order of mobility, and the corresponding performance of the verification set was obtained. The source task slot corresponding to the point with the highest performance and the source task slot with higher mobility were selected as its source slot. The selected source slots are used to build the target slot filling model.

The slot filling model identifies slot values according to the slot value information and the context information of slot values. Therefore, when calculating the inter-slot mobility, we first measure the similarity between the slot value representation distribution and the context representation distribution. Then, we refer to the fusion method of F value for accuracy and recall rate. Finally, the value obtained is normalized to 0-1 by Tanh, and the value obtained is subtracted by 1, in order to conform to the intuitive cognition that the larger the calculated value is, the higher the mobility is. The following formula is the calculation method of inter-slot mobility proposed by us:

Sim (pv (sa) and pv (sb)) sim (p_v (s_a), p_v (s_b) sim (pv (sa) and pv (sb) and Sim (PC (SA), PC (sb)) SIM (P_C (S_A), P_C (S_B)) SIM (PC (SA), PC (sb)) respectively represent the similarity of slot a and slot B in the slot value representation distribution and context representation distribution, we use the maximum mean difference (MMD) to measure the similarity between the distributions.

We did not put forward a new model, but we are the source of the channel selection methods can be combined with ALL the known model, experiments on several existing models and data sets show that our proposed methods bring to target task slot filling model consistency of performance improvement (ALL the columns represent the performance of the existing model original The column STM1 is in represents the performance of the model trained with the data selected by our method.

03 Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

A dissertation download | | author: zhi-yuan zeng (Beijing university of posts and telecommunications), He Keqing, YanYuanMeng (Beijing university of posts and telecommunications), zi-jun liu (Beijing university of posts and telecommunications), Wu Yanan (Beijing university of posts and telecommunications), Karen (Beijing university of posts and telecommunications), jiang will star, wei-ran xu (Beijing university of posts and telecommunications) | paper type: Main Conference Short Paper (Poster)

In the actual task-based dialog system, out-of-domain Detection is a key link, which is responsible for identifying the exception query input by the user and giving the rejected reply. Compared with traditional intention recognition tasks, anomalous intention detection is faced with the difficulties of sparse semantic space and lack of annotated data. The existing detection methods of anomalous intent can be divided into two categories: one is supervised detection of anomalous intent, which refers to the existence of supervised OOD intent data in the training process. The advantage of this method is better detection effect, but the disadvantage is that it relies on a large number of annotated OOD data, which is not feasible in practice. The other is unsupervised anomalous intent detection, which only uses intra-domain intent data to identify out-of-domain intent samples. Since the prior knowledge of annotated OOD samples cannot be utilized, unsupervised anomalous intent detection is faced with greater challenges. Therefore, this paper mainly studies unsupervised abnormal intent detection.

One of the core problems of unsupervised abnormal intent detection is how to learn distinguishable semantic representations from the in-domain intent data. We hope that the representations of samples in the same intent category are close to each other, while the samples in different intent categories are far away from each other. Therefore, this paper proposes a method of intentional-feature learning based on supervised contrast learning, which maximizes the interclass distance and minimizes the in-class variance to improve the discriminability of features.

Specifically, we use a BiLSTM/BERT context encoder to obtain the intentional representation in the domain, and then we use two different objective functions for the intentional representation: one is the traditional classified cross entropy loss, the other is the Supervised Contrastive Learning loss. On the basis of comparative learning, supervised comparative learning improves the disadvantage of the original comparative learning that there is only one Positive Anchor. Samples of the same kind are used as Positive samples and samples of different kinds are used as negative samples to maximize the correlation between Positive samples. At the same time, to increase the diversity of sample representations, we use Adversarial Augmentation, an attact-specific approach, by adding noise to the hidden space to achieve traditional data enhancement effects such as character replacement, insertion and deletion, and backtranslation. The model structure is as follows:

We verify the effect of the model on two open data sets, and the experimental results show that the proposed method can effectively improve the performance of unsupervised exception intent detection, as shown in the following table.

04 Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System

Download | | Paper the authors of the Paper: Wu Yanan (Beijing university of posts and telecommunications), zhi-yuan zeng (Beijing university of posts and telecommunications), He Keqing, Karen (Beijing university of posts and telecommunications), YanYuanMeng (Beijing university of posts and telecommunications), jiang will star, wei-ran xu (Beijing university of posts and telecommunications) | Paper type: the Main Conference Long Paper (Oral)

Slot Filling is an important module in a dialog system, responsible for identifying key information in user input. Existing slot filling models can only identify pre-defined slot types, but there are a large number of out-of-domain entity types in practical applications, and these unidentified entity types are critical to the optimization of the conversational system.

In this paper, we define a new Slot Detection (NSD) task for the first time. Different from the traditional Slot identification task, the new Slot identification task tries to find new slots in the real conversation data based on the existing Slot annotation data in the domain, so as to continuously improve and enhance the capability of the conversation system. As shown in the figure below:

Compared with existing OOV recognition tasks and out-of-region intention detection tasks, the NSD tasks proposed in this paper have significant differences: On the one hand, compared with OOV recognition task, OOV recognition of object is training has not seen the new groove value, but the trough value belongs to the entity type is fixed, and the NSD task not only to deal with the issue of OOV, more serious challenge is the lack of prior knowledge of unknown entity type, rely solely on domain slot outside entities to reasoning information; On the other hand, compared with the out-of-domain intention detection task, the out-of-domain intention detection task only needs to identify the sentential intention information, while the NSD task is faced with the influence of the context between the in-domain entity and the out-of-domain entity, and the interference of non-entity words on the new slot. On the whole, the Novel Slot Detection (NSD) task proposed in this paper is very different from the traditional Slot filling task, OOV recognition task and out-of-region intention Detection task, and faces more challenges. At the same time, it also provides a direction worth thinking and research for the future development of conversational system.

Based on the existing slot filling public data sets ATIS and Snips, we build two new slot identification data sets ATIS-NSD and Snips-NSD. Specifically, we randomly selected training type as part of the slot outside class, leaving the rest categories as domain, to appear in a sentence at the same time outside the category and the domain within the category of the sample, we adopted direct delete the entire sample strategy, in order to avoid bias introduced O label, to ensure the entities outside of information only appear in the test set, Much closer to the actual scene. Meanwhile, we proposed a series of baseline models for THE NSD task, and the overall framework is shown in the figure below. The model consists of two phases:

  • Training stage: Based on the slot annotation data in the domain, we trained a Bert-based sequence annotation model (multi-classification or binary classification) to obtain entity representation.
  • Test phase: Firstly, the trained sequence annotation model is used to predict the in-domain entity type. At the same time, based on the obtained entity representation, MSP or GDA algorithm is used to predict whether a word belongs to the Novel Slot, that is, the out-of-domain type. Finally, the two output results are combined to obtain the final output.

We use F1 of entity recognition as the evaluation index, including span-F1 and token-F1. The difference between the two is whether the entity boundary is considered. The experimental results are as follows:

Through a large number of experiments and analysis, we discuss the challenges of new slot identification: 1. Confusion between non-entity words and new entities; 2. 2. Insufficient contextual information; 3. Slot dependencies. 4. Open Vocabulary Slots.

05 ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Download | | Paper the authors of the Paper: YanYuanMeng, such as li mei, Wang Sirui, Zhang Fuzheng, wuwei, wei-ran xu (Beijing university of posts and telecommunications) | Paper type: the Main Conference Long Paper (Poster)

Sentence vector representation learning plays an important role in the field of natural language processing (NLP). The success of many NLP tasks depends on the training of good sentence representation vectors. Especially in Semantic Textual Similarity, Dense Text Retrieval and other tasks, By calculating the similarity of embedding in the representation space of the two sentences encoded, the model measures the semantic correlation of the two sentences so as to determine their matching scores. Although the Bert-based model achieves good performance on a number of NLP tasks (through supervised fine-tune), its own derived sentence vectors (which are averaged for all word vectors without fine-tune) are of lower quality, even compared with the Glove results, and thus cannot reflect the semantic similarity of the two sentences.

In order to solve the “collapse” phenomenon of BERT native sentence representation, this paper proposes a sentence representation transfer method based on contrast learning, called ConSERT. By fine-tune the unsupervised corpus in the target domain, the sentence representation generated by the model can better match the data distribution of the downstream task. At the same time, four different data enhancement methods, such as anti-attack, word order scrambling, cutting and Dropout, are proposed for NLP task. Experimental results on the SENTENCE semantic matching (STS) task show that ConSERT is significantly improved by 8% compared with the previous SOTA (Bert-Flow) in the same setting, and it still shows a strong performance improvement in a small number of samples.

In the unsupervised experiment, we performed fine-tune directly on unlabeled STS data based on pre-trained BERT. The results show that our method significantly outperforms the previous SOTA-Bert-flow with exactly the same Settings, achieving a relative performance improvement of 8%.

06 From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding

Download | | Paper the authors of the Paper: Wu Sha (software) of Chinese Academy of Sciences, Chen bo (software) of Chinese Academy of Sciences, XinChunLei (software) of Chinese Academy of Sciences, Han Xianpei (software) of Chinese Academy of Sciences, sun le (software) of Chinese Academy of Sciences, wei-peng zhang, Chen see shrugged, what, Cai Xunliang | Paper type: the Main Conference Long Paper

Semantic Parsing is one of the core tasks in natural language processing. Its goal is to transform a natural language into a computer language so that the computer can truly understand the natural language. One of the challenges facing semantic parsing today is the lack of annotated data. Most neural network methods rely on supervised data, while semantic parsing data annotation is very time-consuming and laborious. Therefore, how to learn semantic parsing models without supervision becomes a very important and challenging problem. The challenge is that semantic parsing needs to bridge the semantic gap and structural gap between natural language and semantic representation without annotated data. Previous methods typically use retelling as a way to reorder or override methods to reduce the semantic gap. Unlike previous approaches, we propose a new unsupervised semantic parsing approach, synchronous Semantic decoding (SSD), which can solve both semantic and structural gaps by combining retell and syntactic constraint decoding.

The core idea of semantic synchronization decoding is to transform semantic parsing into a retelling problem. We retell the sentences into standard sentence patterns and parse out the semantic representation. Among them, there is a one-to-one correspondence between the standard sentence pattern and the logical expression. To ensure the generation of effective standard sentence patterns and semantic representations, standard sentence patterns and semantic representations are decoded within the constraints of synchronous grammar.

We decoded the repetition model on the restricted synchronous grammar and used the text-generation model to score the standard sentence patterns to find the standard sentence patterns with the highest score (as mentioned above, the space is also restricted by the grammar). This paper presents two different algorithms: rule-level Inference which uses grammatical rules as the search unit and word-level Inference which uses words as the search unit.

We use GPT2.0 and T5 to train the sequence to sequence repetition model on the repetition data set, and then only need to use synchronous semantic decoding algorithm to complete the semantic parsing task. In order to reduce the influence of style bias on the generation of standard sentence patterns, adaptive pre-training and sentence reordering methods are proposed.

We conducted experiments on three data sets: Overnight (λ-DCS), GEO (FunQL), and GEOGranno. The data covers different domains and semantic representations. Experimental results show that our model can achieve the best results on all data sets without using supervised semantic parsing data.

07 Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Download | | Paper the authors of the Paper: Tang Hongyin, Sun Xingwu, bei-hong jin (software) of Chinese Academy of Sciences, jin-gang wang, Zhang Fuzheng, wuwei | Paper type: the Main Conference Long Paper (Oral)

The goal of the document retrieval task is to retrieve the text with similar semantics to the given query from the massive text library. In practical application scenarios, the number of document libraries will be very large. In order to improve the retrieval efficiency, the retrieval task is generally divided into two stages, namely the initial screening stage and the fine arrangement stage. In the initial screening stage, the model selects a part of candidate documents through some efficient retrieval methods, which are used as the input in the subsequent filtering stage. In the fine-sorting stage, the model uses a high-precision sorting method to sort the candidate documents and get the final retrieval results.

With the development and application of the pre-training model, a lot of work began to feed queries and documents into the pre-training at the same time for coding and output matching scores. However, due to the high computational complexity of the pre-training model, it takes a long time to calculate each query and document once, so this application mode can only be used in the fining phase. In order to speed up the retrieval rate, some work began to use the pre-training model to encode documents and queries separately, and the documents in the document library were encoded into vector form before the query. In the query stage, only the query encoding and document encoding were used to calculate the similarity, which reduced the time consumption. Since this method encodes documents and queries into Dense vector form, such retrieval is also called “Dense Retrival”.

A basic dense retrieval method encodes the document and query as a vector. However, because the document contains a lot of information, it is easy to cause information loss. To improve on this, some work has begun to improve the vector representation of queries and documents, which can be roughly divided into three approaches, as shown in the figure below:

Our work starts with improving the document encoding to improve the semantic representation ability of the document encoding. First of all, we believe that the main bottleneck of dense retrieval lies in that the encoder does not know which part of the information in the document may be queried during the coding process, which may cause the mutual influence of different information, resulting in the information being changed or lost. So, as we encoded the documents, we built several “Pseudo Query Embeddings” for each document, each corresponding to the information that might be asked for each document.

Specifically, we cluster BERT encoded Token vectors through clustering algorithm, and retain top-K clustering vectors for each document, which contain significant semantics in multiple document Token vectors. In addition, since we retain multiple pseudo query vectors for each document, efficiency may be reduced in similarity calculation. We use the Argmax operation instead of Softmax to improve the efficiency of similarity calculation. Experiments on several large-scale document retrieval data sets show that our method can improve both the effectiveness and efficiency of retrieval.

Write in the back

These papers are some of the scientific research work in the fields of event extraction, entity recognition, intent recognition, new slot discovery, unsupervised sentence representation, semantic parsing, document retrieval and so on, which have been done by meituan technical team in cooperation with various universities and scientific research institutions. The paper is a kind of embodiment that we encounter and solve specific problems in the actual work scene. I hope it can be helpful or enlightening to you.

Meituan research collaboration is committed to building Meituan departments of cooperation with universities, scientific research institution, a think-tank in bridge and platform, relying on rich Meituan business scenario, data resources and real industry problems, open innovation, convergence upward force, artificial intelligence, big data, Internet of things, unmanned, logistics optimization, digital economy, public affairs, and other fields, We will jointly explore cutting-edge science and technology and macro issues of industrial focus, promote cooperation and exchanges between enterprises, universities and research institutes as well as the commercialization of achievements, and promote the training of outstanding personnel. Looking forward to the future, we look forward to cooperating with teachers and students from more universities and research institutes. Welcome to contact us at [email protected].

Read more technical articles from the Meituan technical team

Front end | | algorithm back-end | | | data security operations | iOS | Android | test

| in the public bar menu dialog reply goodies for [2020], [2019] special purchases, goodies for [2018], [2017] special purchases such as keywords, to view Meituan technology team calendar year essay collection.

| this paper Meituan produced by the technical team, the copyright ownership Meituan. You are welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication. Please note “Reprinted from Meituan technical team”. This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to [email protected] to apply for authorization.