This is the 20th day of my participation in the First Challenge 2022

Prompt, as a new paradigm in THE field of NLP, has become a hot topic in NLP research in recent months. As a contemporary young person, I also want to follow the trend. Starting from the articles on literacy, I will gradually go further from this introduction. Combining with the reading of several classic papers, I will systematically introduce Prompt, and compile the harvest into notes, accumulate and share them.

This article is the beginning of the Prompt series. Since there are already many well-written introduction articles for getting started in Prompt, I have learned and organized some of the contents directly from these articles, combining some of my own thoughts, in the hope that it will help you.

Article recommendation:

Follow up Prompt progress! Summary +15 latest papers one by one – Xi Xiaoyao’s Selling MOE house to understand the basic knowledge and classic work of Prompt – zhihu pre-training new paradigm! Why is Prompt more effective? – New business knowledge

Thesis recommendations:

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Prompt summary

Evolution of NLP paradigm

  • Fully supervised learning (non-neural network) : Training specific task models only on input and output sample datasets of target tasks, relying heavily on feature engineering.
  • Fully supervised learning (neural network) : feature learning is combined with model training, so the research focus turns to architecture engineering, that is, to design a network architecture (such as CNN, RNN, Transformer) to learn data features.
  • Pre-train, fine-tune: Pre-train on large data sets, then fine-tune the model for specific tasks to adapt to different downstream tasks. Under this paradigm, the research focus shifted to target engineering, designing training objectives (loss functions) to be used in pre-training and fine-tuning stages.
  • Pre-train, Prompt, Predict: Fine-tune is not needed, allowing pre-trained models to directly adapt to downstream tasks. Convenient and easy, no need for each task each set of parameters, breaking through data constraints.

What ‘s Prompt

Prompt: Transform the input and output form of downstream tasks into the form of pre-training tasks, that is, MLM (Masked Language Model).

The original mission:

Input: I love this movie.

Output: ++ (very positive)

After transforming:

Prefix Prompt edition (Prompt slot at the end of text, suitable for generating tasks or autoregressive LM, such as GPT-3) : Input: I love this movie. Overall, the movie is [Z].

Cloze Prompt (Prompt slot in middle or end of text, suitable for MLM tasks like BERT) : Input: I love this movie. Overall, it was a [Z] movie.

Output: [Z] = ‘good’

Whereas the previous pre-training + fine-tuning was to adapt the pre-training model to the downstream task, Prompt is to adapt the downstream task to the pre-training model.

Why Prompt

Why Prompt works

Instead of fine-tuning to learn a classifier from scratch (for example) and establishing correspondence between pre-trained model outputs and classification results, Prompt tasks take the same form as pre-training and derive more semantic information directly from the input, so even a small amount of data or even a zero-shot can achieve good results.

The advantage of Prompt

  • As mentioned above, the introduction of Prompt makes features extracted from pre-training models more naturally used for downstream task prediction, with higher feature quality.
  • There is no need to add a classifier for the downstream task, because the task form is suitable for the pretraining model itself. You don’t have to train the classifier you’re going to add from scratch. A simple mapping is required to transform the output of the Prompt paradigm into the output required by the downstream task.
  • Perform well in small or even zero sample scenarios.

How Prompt

How to build Prompt pipeline

  • Prompt Addition: Add Prompt to the input;
  • Key words: information Retrieval; Data retrieval;
  • Answer Mapping: Translate the predicted results into the form required for downstream tasks.

How do I design my Prompt model

  • Selection of pre-training model;
  • Prompt Engineering: Choosing the right Prompt involves two aspects:
    • Prefix prompt or Cloze prompt?
    • Design manually or build automatically (search, optimize, build, etc.)?
  • Answer Engineering: Select an appropriate method to map the predicted results back to the output form required by downstream tasks;
  • Multi-prompt: Design multiple Prompts for better results (integrated learning, data enhancement, etc.);
  • Training strategy: Prompt model may contain Prompt parameters other than LM model. Training strategies need to consider the following:
    • Are there additional Prompt Params?
    • Do you want to update these Prompt parameters?