preface

The previous natural language processing involves some work of part-of-speech tagging. Hidden Markov model (HMM) is generally used to realize part-of-speech tagging, and the decoding algorithm of HMM model is usually Viterbi algorithm.

About the exhaustive method

One of the most common applications of HMM models is to find the most likely hidden sequence of states based on a sequence of observations. The simplest idea is to directly enumerate all possible implicit state sequences and calculate the probability of each combination of state sequences. The combination sequence with the highest probability is the most possible implicit state sequence. Taking the example of algae and weather, enumerating the probabilities of all possible implicit sequences of states as follows, P(dry,damp,soggy | sunny,sunny,sunny), P(dry,damp,soggy | sunny,sunny,cloudy), P(dry,damp,soggy | sunny,sunny,rainy), …. P (dry, damp, soggy | rainy, rainy, rainy), corresponding to the maximum sequence is the most possible sequences of hidden states. There are 3 to the t paths, and as you can see, as the sequence and the number of states increases, it’s a lot of computation.

Write the picture description here

Viterbi algorithm

The above exhaustive method requires a large amount of computation, so Viterbi algorithm is introduced to reduce the complexity. The decoding problem that Viterbi algorithm needs to solve is the optimal selection problem with multiple steps and multiple choices at each step. According to the chart can clearly see the Viterbi core idea, with the increase of time, each node saves the previous moment all nodes to the optimal value of the node subpaths, as shown in figure in red arrow, the current moment of a particular node possible paths for a moment all the nodes in the path of the node, but we only keep one of the optimal path. After all the steps are calculated in turn, the optimal path of the whole process is finally obtained by backtracking.

Write the picture description here

The following is an example to illustrate the whole process. Suppose there are 3 states in a sequence of T moments, p(A1) represents the value of A1 node, p(b1) represents the value of B1 node, and the same goes for other nodes. The probability of transition between states is constant at different times, so p(aa) is the probability of transition from a to A, whether it’s from time 1 to time 2, or from time 2 to time 3, it’s the same. Similarly, p(ab), P (AC), p(ba)… .

Write the picture description here

The calculation formula of the node value at t+1 is


We calculate the value of p(a) at t=2, which could be a1 to A2, B1 to A2, or C1 to A2. If a1 to A2 is the path that computes the maximum p(a), we leave it there. Similarly, calculate the maximum values of p(b) and p(c) respectively, keeping the paths b1 to B2 and b1 to c2. Then p(a), P (b) and p(c) at time t=3 are calculated. Finally, the maximum p(a), P (b) and P (c) at time T are calculated. The node with the maximum value is selected. For example, if CT is the largest node, that is


======== advertising time ========

My new book “Analysis of Tomcat kernel Design” has been sold in JINGdong, friends in need can go to item.jd.com/12185360.ht… Make a reservation. Thank you all.

Why to write “Analysis of Tomcat Kernel Design”

= = = = = = = = = = = = = = = = = = = = = = = = =

Welcome to:

Write the picture description here