This is the 22nd day of my participation in the First Challenge 2022


Sentence Infilling with Inter-Sentential Transformer (arxiv.org)

I read this article mainly for the material of my paper. I think it neither belongs to intensive reading nor coarse reading, and it may have no reference value for some who want to intensive reading.

Abstract

Missing sentence generation has a wide range of applications in natural language processing, such as automatic article generation and conference minutes extension. Such a task requires that the model be syntactically and semantically connected to the content of the context. The sentence filling task requires natural language processing techniques such as natural language comprehension, discourse planning and natural language generation.

In this paper, the three problems are decoupled, and a framework is proposed to solve these three aspects by using a large-scale pre-training model.

Experimental results demonstrate the effectiveness of the proposed model in learning sentence representation generation, and further generate missing sentences in link context.

introduction

Recently, the generation of missing markers in a sentence or longer text has attracted much attention. Here we examine a related but slightly different sentence filling task, specifically shown in Figure 1.

Remove sentences from long text and generate missing fragments to make them coherent in context.

  • Context-based generation
  • Based on context and auxiliary information (such as keywords, knowledge graphs, or text snippets)

Figure 1: Generate statements between texts so that their semantics and syntax transition smoothly. The legend is a representation of the model in this article on the TripAdvisor dataset. Color text is the auxiliary information key. The second generation method mentioned above is based on context and auxiliary information.

Introduce new task sentence fill from tokens fill.

The task of populating long-format text presents many challenges. Text generation is usually a one-to-many problem, and the output can be varied. This task requires extensive understanding, planning, and generation techniques because of the need to produce content that can both semantically and syntactically smoothly join individual fragments of text.

Large-scale pre-trained language models such as BERT (Devlin et al., 2019) and GPT-2 (Radford et al., 2019) significantly enhance understanding and generation models.

However, how to integrate them as a whole and analyze and establish long-term dependency structures through high-level semantic programming remains a challenge and exploration. Since semantic appropriateness is often more subtle than syntactic appropriateness, autoregressive language models can well represent this feature.

There is very little work in this area:

  • Generate the missing text in turn, starting with the mask tag and ending with the blank tag. Can generate text of any length (arXiv:1901.00158)
    • Problem: Focusing only on lexical correctness does not guarantee global semantic correctness
  • MASS is to predict missing span in text and obtain sentence representation (MASS: Masked sequence to sequence pre-training for language generation.)
    • Problem: Need to specify the predicted length
  • TIGS: An inference algorithm for text infilling with gradient search.
  • SpanBERT: Improving pre-training by representing and predicting spans.
    • Both have the same problem as MASS in that the length of the prediction needs to be specified

Question, is it really appropriate to write these here? I feel these are closer to the cloze feeling.

INSERT:

  • Bert-based Encoder that maps text to semantic space.
  • In the sentence-level planning period, the missing information is inferred and the semantic meaning is coherent.
  • GPT based generator that translates semantic space into social text field.

Is a hierarchical text generation.

Advantage:

  • A sentence filling task beyond text filling is proposed, which is sentence level filling
  • Decoupling the model reduces the scope of tasks for a single model, and each component can be examined and improved individually with additional unsupervised data
  • Focus on semantic coherence
  • Allows content of any length to be generated
  • Small computing requirements

Make your own sentences with tokens. I don’t understand the reason why the computation demand is small.

Related work

Natural language generation

Tasks in the field of neural text generation include machine translation, text summarization, and conversation generation. Most previous approaches have used the encoder decoder architecture for sequence-to-sequence learning. In order to improve the quality of the generated text, reinforcement learning, adversarial learning and inverse reinforcement learning are also applied.

Recent work shows that:

  • Pretrained language models play an important role in natural language generation and understanding through apologetic contextualized word vectors and models.
  • Large Transformer constructs such as GPT-2, Megatron1, and T5 can get good results without benchmark training for other languages.
  • ArXiv :1909.05858. A conditional generation model is proposed that can be trained to condition control code that controls style, content, and task-specific behavior. (CtrL-a Conditional Transformer Language Model for Controllable Generation.) In this paper, Bert is directly used and fine-tune it.

Context-aware Text Generation

Some previous work on text generation with context information in mind:

  • Treat the preceding text as context
  • Treat historical conversations as background information
  • Predict the next sentences based on the first few dialogues

The characteristic of this paper is that the task is constrained by the context, not only limited to the previous content, but also to be coherent with the content after the text. A better understanding of the context is required.

  • Generate growth comments from mood scores based on theme phrases
  • Text fill: Fill sentences with information around missing parts
  • Iterative reasoning algorithm
  • Mask random consecutive spans and train the language model to fill the span that is masked off.

In this paper, Bert and GPT2 are directly used to complete sentence filling.

To be reasonable, I feel that this part is very general, the first part says that predecessors only consider the previous work, we consider the context, this is not a problem. The second part puts together some tasks and says we use the pre-training model, and the third part has the same problem. If I do not read predecessors’ work, I will intuitively feel that the work cited by predecessors is not very relevant and cannot achieve the effect of comparison.

Hierarchical text generation

Layered text generation using high-level semantic programming has been explored before:

  • A hierarchical loop encoder-decoder framework for given coded dialog content
  • A framework is proposed to infer semantic characteristics of responses generated using self-supervised learning
  • Learn a layered representation of long text using a layered LSTM or a layered autoencoder
  • The entire paragraph is encoded as a presentation variable using an autohierarchical encoder, from which text is generated hierarchically

In contrast, our task is more focused on generating intermediate sentences using the surrounding context.

We have focused more on generating sentences from the context around them. The conclusion is so abrupt, shouldn’t it be said that we also have a hierarchical model. Say more after you say it.