Author: CHEONG

AI Machine Learning and Knowledge Graph

Research interests: Natural language processing and knowledge mapping

Introduction: If you want to obtain all the original manuscript materials of this paper, scan the public account “AI Machine Learning and Knowledge Graph”, reply: Gaussian distribution lecture 1 can be obtained.

Original is not easy, reprint please inform and indicate the source! Scan the code to follow the public account, regularly release knowledge map, natural language processing, machine learning and other knowledge, add wechat [17865190919] into the discussion group, add friends from nuggets.


Without further ado, let’s ask a question: how can we deduce the mean and variance of X if the Data set Data X follows a Gaussian distribution

To answer the above questions, first disassemble the question:

1. What is the Gaussian distribution and what is the probability density function of the Gaussian distribution

2. What method is used to deduce the maximum likelihood estimation method? What is the maximum likelihood estimation

3. How to deduce the process of derivation of mean and variance of Gaussian distribution by maximum likelihood estimation method

Let’s take a look at the above four questions one by one.


1. Gaussian distribution

The relationship between the unary Gaussian distribution, the standard unary normal distribution and the multivariate Gaussian distribution and their probability density functions will be explained first. The marginal Gaussian distribution, the conditional Gaussian distribution and the mixed Gaussian distribution will be discussed separately.


1. Unary Gaussian distribution and standard normal distribution

If data set X obeys a unary Gaussian distribution with mean U and variance σ\sigmaσ, the probability density function is

The standard unary normal distribution not only standardized data set X:

Then z obeys the standard normal distribution with mean value of 0 and variance of 1, and its probability density function is

Here are two common properties that the Gaussian distribution satisfies, which will be used in later proofs:

(1) If x ~ N(u,σ2)x \sim N(u, sigma^2)x ~ N(u, sigma^2)x \sim N(u, sigma^2)x ~ N(u,σ2) and a and b are real numbers, then

(2) if x ~ N (ux, sigma x2) x \ sim N (u_x, \ sigma _x ^ 2) x ~ N (ux, sigma x2) and y (uy, sigma y2) y ~ N \ sim N (u_y, \ sigma _y ^ 2) y ~ N (uy, sigma y2) is statistically independent normal random variables, then

  • Their sum also satisfies a normal distribution
  • Their difference also satisfies the normal distribution


2. Multivariate Gaussian distribution

Here is a simple case, that is, when multiple dimensions are independent from each other, if each variable is independent from each other, then the joint probability density function is equal to the product of their probability densities.

If X = (x1, x2,… ,xd)TX=(x_1, x_2, … , x_d)^TX=(x1,x2,… ,xd)T, and each dimension is independent, then the probability density function of X is

To simplify the above formula, let’s write it as

Among them:

Sigma Sigma is the covariance matrix in the above formula. Since each dimension of the variable is not correlated, the covariance matrix has value only at the diagonal position. Therefore, the probability density function of the multivariate Gaussian distribution can be deduced as follows:


Maximum likelihood estimation

To give a general understanding of the idea of maximum likelihood estimation, see the following example:

Generally speaking, maximum likelihood estimation method is to use known sample result information to reverse the maximum possibility (maximum probability) to produce the result of the model parameter value, maximum likelihood estimation provides a given observation data to evaluate the model parameters, that is, the model is determined, the parameters are unknown.

An important premise of maximum likelihood estimation is that data samples are independent and identically distributed. Before solving the parameters of the Gaussian distribution with the maximum likelihood estimation, we first look at the general situation. Now consider that there is a data set D, which is subject to a certain probability distribution, and use the maximum likelihood estimation to deduce the parameter vector θ \Theta θ of the data set, and remember the known sample set as:

Likelihood function, namely joint probability density function:

Joint probability density function p (D ∣ Θ) p (D | \ Theta) p (D ∣ Θ) as compared to the parameters of the data set D Θ \ Theta Θ likelihood function, and first requirements meet the parameter values of the maximum likelihood function, which is the probability of makes the group sample have the biggest Θ \ Theta Θ values

In practice, for the convenience of analysis, it is always defined as logarithmic likelihood function:

Now that you know how to use the maximum likelihood estimate, you can use it to solve for the parameters of the Gaussian distribution, namely the mean and variance.


Maximum likelihood estimation deduces the mean and variance of gaussian distribution

First, a batch of Data sets, Data X, obey the Gaussian distribution and are independently and identically distributed among the samples:

θ \Theta θ is solved by maximum likelihood estimation, then the logarithmic likelihood function is:

P (xi ∣ Θ) p (x_i | \ Theta) p (xi ∣ Θ) is the probability density function of gaussian distribution

So the mean is

The extremum of the derivative of the above function is both a minimum

Then its mean value can be

So far, we obtained the mean value u by taking the derivative of the maximum likelihood estimate, and then solved the variance using the same method

So the variance of the parameter is

So far we have obtained the mean and variance of the Gaussian distribution by maximum likelihood estimation

Machine learning 【 Whiteboard derivation series 】 author: Shuhuai008

Pattern Recognition and Machine Learning