1. The introduction of

When developing a translation system using the Seq2Seq model, one sentence in French is assumed to be input and one sentence in English is output. In the Decoder output section, select different words, output (translation) results will be different.

Here is an example:

A French sentence is translated into four different English sentences by Seq2Seq model. Which result should we choose as the final result?

In the above figure, a formula is given, in which X represents French sentences, Y represents the final English sentences composed by each word, and different combinations of Y represent different translations, that is, y1~yn represent word sequences.

The key to solving this problem is to find the right y value to maximize the formula value in the graph.

But how? Let’s introduce one method: Greedy Search.

2. Greedy Search

The first selection method is the simplest greedy search, which is a greedy algorithm. It is the simplest: select the word with the largest output probability value to form a word sequence, as shown in the figure below:

First, select the first word with the highest probability as output and feed it into decoder, then select the second word with the highest probability, the third word with the highest probability…

Obviously, this is not a very good approach. Because greedy algorithms don’t find an optimal solution. For example, the translation of the two Sentences is:

  • A. Jane is visiting Africa in September.
  • B. Jane is going to be visiting Africa in September.

Both of these sentences are correct from the point of view of content, but A is more concise and better translation. But if you use a greedy algorithm, if you type in “Jane is”, you might get “going”, because “going” is more common, so the algorithm ends up choosing B.

Ideally, you would enumerate each type of output and then see if the formula in the previous section has the maximum value to find the optimal solution, but this enumerates so many times that the complexity is unacceptable.

reference

  • [1]. Blog.csdn.net/weixin_3893…
  • [2]. Andrew Ng Sequence Models video

The original published in: blog.csdn.net/ybdesire/ar…