Author: CHEONG

AI Machine Learning and Knowledge Graph

Research interests: Natural language processing and knowledge mapping

Before reading this article, two things should be noted:

1. Machine learning series of articles often contain a large number of formula derivation and proof. In order to better understand, important conclusions of this paper will be given at the beginning of the article to facilitate the understanding of the core of this paper as soon as possible. You can go further if you want to see the derivation in detail;

2. There are a lot of formulas in the paper. If readers need to obtain the Word document containing the original formula, they can follow the official account [AI Machine Learning and Knowledge Graph] and reply: Probability graph model for the fourth part, they can add wechat [17865190919] into the discussion group of the official account and add friends with the remarks from CSDN. Original is not easy, reprint please inform and indicate the source!


This paper mainly introduces two accurate inference algorithms in probability graph inference: variable elimination and Belief Propagation algorithm.


This article conclusion


1. The Inference problem in probability graph is mainly to solve the probability distribution, such as joint probability distribution, edge probability distribution and conditional probability distribution, etc. Variable elimination method and Belief Propagation algorithm are all accurate Inference solutions to the Inference problem.

2. The idea of variable elimination method is to eliminate some variables of joint probability through continuous summation operation (discrete variables, continuous variables is integral operation), and finally obtain the edge probability distribution. The advantages are simple and easy to operate, but there are two disadvantages :(1) there is no saving of intermediate variables in the calculation process, and there is a lot of repeated calculation; (2) The order of variable elimination will greatly affect the computational efficiency, and finding the optimal order of variable elimination is a NP-hard problem.

3. Belief Propagation algorithm is an improvement on the shortcoming of variable elimination method. As shown in the figure below, the unit of repeated calculation of variable elimination method is mi−> JM {I -> J} MI −> J represented by the six arrows in the figure. All mi−>jm{I ->j}mi−>j are directly calculated and saved, and then the edge probability is calculated using mi−> Jm {I ->j}mi−>j.

For more details, see the text below, which contains a lot of formulas to prove the above results.


The body of the


(4) The Inference with probability diagram (4) The Inference with probability diagram (5)

1. Joint probability

2. Calculate the marginal probability distribution:

3. Conditional probability distribution:

4, MAP Inference:

The figure below shows the general classification of solutions to the Inference problem

1. The Belief Propagation Algorithm is based on the improved VE Algorithm, and the Junction Tree Algorithm is based on the common graph structure.

Loop Belief Propagation is aimed at looped graphs. Monte Carlo inference is an approximate inference algorithm based on Sampling, with multiple realization methods such as Importance Sampling and MCMC. Variational inference belongs to accurate approximate inference.


Elimination of variables

As shown below:

Assume a, b, c, d {a, b, c, d} a, b, c, d are discrete binary variable, a, b, c, d ϵ {0, 1} {a, b, c, d} \ epsilon \ {0, 1 \} a, b, c, d ϵ {0, 1}, strives for the probability p (d) p (d) p (d)

In order to find p(d)p(d)p(d), the crudest solution is to enumerate every possibility of variables A,b,c, and d from:

Up to the point of:

Then add up the 8 possibilities, and you get the final value of p(d), p(d), p(d), p(d), but this is obviously too complicated to calculate, and the idea of variable elimination is to sum up the joint probabilities and eliminate the variables, and finally get the edge distribution. The following formula shows the process of solving p(d)p(d)p(d) by eliminating variables:

The process of variable elimination shown above is to eliminate variables in the order of A, B, and C.

Disadvantages of variable elimination method:

1. The algorithm has no function to store intermediate variables, so there is a lot of repeated calculation;

2. The order of variable elimination greatly affects the computational efficiency, and finding the optimal order of variable elimination is a NP-hard problem.


Belief Propagation

The main ideas of Belief Propagation refer to the variable elimination method, but improve the shortcomings of variable elimination method.

1. The following is an example to understand the double calculation problem of variable elimination method:

If we now need to solve the probability p(e)p(e)p(e), the calculation formula is as follows:

Variable is used to eliminate method according to the variable a, b, c, d calculation process to eliminate order as shown in the figure below, the resulting probability p (e) = md – > e (e) p (e) = m_} {d – > e (e) (e) = p md – > e (e)

Then if we need to solve the probability p(c)p(c)p(c), the calculation formula is as follows:

Clearly in the first half of the above formula to calculate p (e) p (e) p (e) have been calculated, namely, as shown in the following MB – > c (c) m_ {b} – > c (c) MB – > c (c) :

However, because the variable elimination method does not save the intermediate calculation results, each calculation requires a large number of repeated calculations. So the simplest and crudest way you can think of is to save each step from left to right or right to left so that you can calculate the edge probability.

2. From the improvement of VE algorithm, the author introduces the Belief Propagation algorithm

Finding edge probability with undirected graph: We can see how to avoid repeated calculation of variable elimination method through undirected graph, as shown in the figure below. The conclusion is given first: Regardless of directed graph or undirected graph, repeated calculation is required to calculate the edge probability such as P (a) P (a) P (a) P (a) P (b) P (b) P (b) P (b)p(b), and the repeated unit is mi−> Jm {I -> J}mi−>j represented by the six arrows in the graph. Therefore, in order to avoid repeated calculation, We just need to calculate all mi−>jm{I ->j}mi−>j, six of which are shown in the figure below, to get all the edge probabilities. The derivation is shown below.

For the above undirected graph, the joint probability formula is given as follows:

So let’s figure out the edge probability

According to the undirected graph factorization rules,

The following formula can be obtained:

The above formula is generalized as follows:

Therefore, it can be concluded that when the edge probability is required, there is no need to calculate each p(a)p(a) P (a)p(b)p(b) P (b)p(b)p(b) separately, which will introduce a large number of repeated calculations. Mj −> IM_ {j-> I} MJ −> I is enough. To understand the meaning of Belief Propagation, the Belief of node B refers to all the information that node B can provide:

The above generalized formula is the core idea of the Belief Propagation algorithm. The core idea of BP algorithm is to directly compute all mi−>jm_{I ->j}mi−>j, and then compute P (xi)p(x_i)p(xi). The pseudo-code of BP algorithm is given below:

Machine learning 【 Whiteboard derivation series 】 author: Shuhuai008

Pattern Recognition and Machine Learning