EasyAI. Tech/AI-definiti/easyAI definiti…

Encoder-Decoder is a model framework in NLP field. It is widely used in machine translation, speech recognition and other tasks.

Encoder-decoder, Seq2Seq and their Attention update will be discussed in detail in this article.

What is Encoder-Decoder?

Encoder-Decoder model is mainly a concept in NLP field. It is not a specific algorithm, but a class of algorithms. Encoder-Decoder is a general framework, in this framework can use different algorithms to solve different tasks.

Encoder-Decoder this framework is a good interpretation of the core idea of machine learning:

The practical problems are transformed into mathematical problems and solved by solving mathematical problems.

Encoder is also called an Encoder. What it does is “turn a real problem into a mathematical problem.”

Decoder, also known as a Decoder, is designed to “solve mathematical problems and translate them into real-world solutions”

If the two links are connected, the general graph can be used to express the following appearance:

There are two things to say about encoder-decoder:

  1. The middle “vector C” length is fixed regardless of the length of the input and output (this is also its weakness, which will be explained later)
  2. Different encoders and decoders can be selected for different tasks (it can be oneRNNBut it’s usually a variation of thatLSTMorGRU

As long as it is consistent with the above framework, can be collectively referred to as Encoder-Decoder model. When it comes to Encoder-Decoder model, there is a term often mentioned — Seq2Seq.

What is Seq2Seq?

Seq2Seq (short for sequence-to-sequence), literally, inputs one Sequence and outputs another. The most important aspect of this structure is that the lengths of the input and output sequences are variable. For example:

As shown above: 6 Chinese characters were input and 3 English words were output. Input and output have different lengths.

The origin of the Seq2Seq

Before the Seq2Seq framework was proposed, deep neural networks had achieved very good results in image classification and other problems. In the problems it is good at solving, the input and output can usually be expressed as a vector of fixed length, and if the length changes slightly, operations such as zero-filling are used.

However, many important problems, such as machine translation, speech recognition, automatic dialogue, etc., are expressed as sequences whose length is not known in advance. Therefore, how to break through the previous limitations of deep neural network and make it adapt to these scenarios has become a research hotspot for the past 13 years, and the Seq2Seq framework came into being.

Relationship between “Seq2Seq” and “Encoder-Decoder”

Seq2Seq (emphasis on purpose) does not refer to specific methods, but satisfies the purpose of “input sequence and output sequence”, which can be collectively referred to as Seq2Seq model.

The specific methods used by Seq2Seq basically belong to the category of Encoder-Decoder model (emphasis on methods).

To sum up:

  • Seq2Seq is a big category of Encoder-Decoder
  • Seq2Seq emphasizes more on purpose, Encoder-Decoder emphasizes more on method

What are the applications of Encoder-Decoder?

Machine translation, conversational robot, Poetry generation, code completion, Article summary (text-text)

“Text-to-text” is the most typical application, where the length of the input sequence and the output sequence can be quite different.

Sequence to Sequence Learning with Neural Networks

Speech recognition (audio-text)

Speech recognition also has strong sequence characteristics, which is suitable for Encoder-Decoder model.

Seq2Seq: A Comparison of Sequence-to-sequence Models for Speech Recognition

Image description Generation (Picture – text)

Colloquially, it is called “picture speaking”, in which a machine extracts the features of a picture and then uses words to express them. This application is a combination of computer vision and NLP.

Image Description Generation Paper “Sequence to Sequence — Video to Text”

The defects of Encoder and Decoder

As mentioned above, there is only a “vector C” between Encoder and Decoder to transfer information, and the length of C is fixed.

To make it easier to understand, let’s use the analogy of the “compression-decompression” process:

Compress an 800X800 pixel image into 100KB and it looks pretty sharp. Compress a 3000X3000 pixel image down to 100KB and it looks blurry.

Encoder-Decoder is a similar problem: some information is lost when input information is too long.

Attention solves the problem of information loss

The Attention mechanism is designed to solve the problem of “information being too long and lost”.

The characteristic of the Attention model is that Eecoder no longer encodes the entire input sequence as a fixed length “intermediate vector C”, but as a sequence of vectors. Encoder-Decoder model of Attention is introduced as follows:

In this way, you can make full use of the information carried by the input sequence when producing each output. And this method has achieved very good results in translation tasks.