To small white look at the Markov Chain (Markov Chain) best introductory tutorial

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

1 What is markov chain?

Markov chains have to mention random process, which itself is an important content in the textbook of random process, just like Newton’s law in the position of mechanics. So what is a random process?

As we know, human cognition world begins with movement, from the macroscopic objects in motion to micro molecular motion, it is a process of “stuff” changes over time, the emergence of Newton, a good systematic explanation to the most familiar movement, and assigned to the human can be accurately calculated and projections for some sports movement. However, there are still a large number of “movement” processes of non-deterministic factors in the world. The reason for putting quotation marks on movement is that it is a general description. For example, every roll of the dice is regarded as a change of things and classified as “movement”, that is, the change of things with time. We all know that the probability of each number from 1 to 6 in an infinite sample is 1/6, but if we want to know the outcome of each roll of the dice, we can never accurately calculate the prediction, and the best way we can think of is to describe that outcome in terms of probability. Just as Newton’s laws play the role of mechanical analysis in mechanics, random process is a method of studying the movement of things in probability theory, and an accurate mathematical description of the movement under uncertainty. Just as the concepts of velocity and acceleration are mathematically defined in mechanics, the two most important concepts of random processes are defined: probability space and random variables, which are not discussed here.

In Newtonian mechanics, deterministic processes deal with the definite change of a quantity over time, while random processes describe the possible change of a quantity over time. In this process, the direction of change at each moment is uncertain, and the random process is composed of a series of uncertain random variables. The state of the system at each moment is expressed by a random variable, and the whole process constitutes the realization of a random process.

After know what is a random process, we can think of one of the most simple random process, this process is composed of N steps, each step has two options (0, 1), then the possible paths will have 2 N, the random process is composed of 2 ^ N to describe the probability that the number of index levels, we saw a index level! ? Dimension so big does not directly explode?? At this point, Markov Processes are deduced: the outcomes of each step of a random process are related only to the last step, and nothing else.

Makov process is expressed in mathematical language as Markov chain. In Markov chains, the change of random processes depends only on the change of the present, not the history, which makes huge computational instantaneously simple.

2. The concrete example of Markov Chain

We use a concrete reality example to describe a process like Markov Chain. Suppose he has three states every day: play, study and sleepState distribution). Given that he is playing today, what is the probability that he will study, play and sleep tomorrow? What is the probability of studying, playing and sleeping the day after tomorrow or even N days later?

Of course, to know the probability of studying, playing, and sleeping in N days, we need two things:

A prediction condition: knowing the state on the first day of progress (state distribution matrix S),

One hypothesis: that my states change in a regular way, that is, the probability of studying today and playing or sleeping or studying tomorrow is certain. In short, we have predictive certaintyState transition probability matrix P.This matrix up here is deterministicThe transition probability matrix PIt is homogeneous in time, in other words, it is the transition probability matrix P which is constant, the transition probability matrix from day one to day two is the same as the transition probability matrix from day two to day three.

Given this transition probability matrix P, plus the known state distribution matrix for day 1 (the probability of playing or studying or sleeping on day 1), we can calculate the state distribution for day N (the probability of playing or studying or sleeping on day N).

Ok, so now we have everything we need to measure whether we’re playing or learning or sleeping after n days, which is the initial state distribution matrix S and the transition probability matrix P. Suppose the state distribution matrix S1 = [0.3, 0.4, 0.3] on the first day of progress, and the numbers in it represent the probability of progress in playing, learning, and sleeping on the first day respectively.

then

On the second day, the state distribution matrix S2 = S1 * P (multiply the two matrices).

On day 3, the state distribution matrix S3 = S2 * P (only related to S2).

Day 4: State distribution matrix S4 = S3 * P (only related to S3)

The state distribution matrix Sn = SN-1 * P (only related to SN-1) on the NTH day.

You can see:The Markov chain is such a capricious process, its future state distribution depends only on the present, has nothing to do with the past! This is the embodiment of Markov Processes, in which the set of possible states entered each day constitutes the Markov chain.

Properties of Markov chains

Continuing with the above example, in the above example, the initial state distribution matrix S1 = [0.3, 0.4, 0.3], i.e., on the first day of progress, the probability of playing is 30%, the probability of learning is 40%, and the probability of sleeping is 30%. We use this to calculate after 100 days, It’s the probability distribution of entering on day 100 while playing or studying or sleeping.

The code is as follows:

import numpy as np

matrix = np.matrix([[0.05.0.75.0.2],
                    [0.8.0.05.0.15],
                    [0.25.0.5.0.25]])
vector1 = np.matrix([[0.3.0.4.0.3]])

for i in range(100):
    vector1 = vector1 * matrix
    print('Round {}'.format(i+1))
    print(vector1)
Copy the code

The running results are as follows:

. The first96Round [[0.39781591 0.41341654 0.18876755The first]]97Round [[0.39781591 0.41341654 0.18876755The first]]98Round [[0.39781591 0.41341654 0.18876755The first]]99Round [[0.39781591 0.41341654 0.18876755The first]]100Round [[0.39781591 0.41341654 0.18876755]]
Copy the code

According to the results, it can be found that given the initial state and the transition matrix of one day, the probability distribution of the subsequent state will remain unchanged when the calculation starts on a certain day, and will remain unchanged [0.39781591, 0.41341654, 0.18876755]. Could this be a coincidence? If the initial state distribution matrix S1 is not [0.3, 0.4, 0.3], will the result be [0.39781591, 0.41341654, 0.18876755]? We conduct the second test, in which we take the initial state distribution matrix S1 = [0.2, 0.6, 0.2].

The code is as follows:

import numpy as np

matrix = np.matrix([[0.05.0.75.0.2],
                    [0.8.0.05.0.15],
                    [0.25.0.5.0.25]])
vector1 = np.matrix([[0.2.0.6.0.2]])

for i in range(100):
    vector1 = vector1 * matrix
    print('Round {}'.format(i+1))
    print(vector1)
Copy the code

Running results:

. The first96Round [[0.39781591 0.41341654 0.18876755The first]]97Round [[0.39781591 0.41341654 0.18876755The first]]98Round [[0.39781591 0.41341654 0.18876755The first]]99Round [[0.39781591 0.41341654 0.18876755The first]]100Round [[0.39781591 0.41341654 0.18876755]]
Copy the code

Something wonderful happened! Under different initial probability distributions, the probability distributions of the final state tend to be the same stable probability distribution [0.39781591, 0.41341654, 0.18876755].

From this we obtain a very important property: the stable probability distribution to which the transition matrix of markov chain model converges is independent of the initial state probability distribution. In other words, in the above example, the initial state matrix can be started with any sample of probability distribution. As long as the state transition matrix of the Markov chain model is known, after a certain scale of transformation, we can get samples consistent with the corresponding stable probability distribution.

Now we can summarize the convergence property of Markov chains in mathematical language:

If an aperiotic Markov chain has a state transition matrix P, and any two of its states are connected, then limn→∞pijn\text{lim}_{\text{n} \rightarrow \infty} P ^{n}_{ij}limn→∞pijn is independent of I, We have:

A. Limn – up pijn = PI (j) \ text {lim} _ {\ text {n} \ rightarrow \ infty} p ^ {n} _ {ij} = \ \ left PI (j) \ right limn – up pijn = PI (j)

2. Lim ⁡ n – up pn = [(1) of PI PI PI PI (j) (2)…… (1) PI PI PI (j) (2)…… (1) PI PI (j) (2)……] \lim_{n\rightarrow \infty } p^{n}=\begin{bmatrix}\pi \left( 1\right) &\pi \left( 2\right) &… &\pi \left( j\right) &… \\ \pi \left( 1\right) &\pi \left( 2\right) &… &\pi \left( j\right) &… \ \… &… &… &… &… \\ \pi \left( 1\right) &\pi \left( 2\right) &… &\pi \left( j\right) &… \ \… &… &… &… &… {bmatrix} \ end limn – up pn = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ PI PI (1) (1)… PI (1)… PI PI (2) and (2)… PI (2)… PI PI (j) (j)… PI (j)… ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤

3. Limn – up pijn = PI (j) \ text {lim} _ {\ text {n} \ rightarrow \ infty} p ^ {n} _ {ij} = \ \ left PI (j) \ right limn – up pijn = PI (j)

π is the only non-negative solution of the equation πP = P, where:

$\pi =\left[ \pi \left( 1\right) ,\pi \left( 2\right) ,…,\pi \left( j\right) ,…\right] \ \sum^{\infty }_{i=0} \pi \left( i\right) =1$

Among the properties above that need to be explained are:

Aperiodic Markov chains: This mainly refers to the fact that the state transition of a Markov chain is not cyclic and will never converge if it is cyclic. Fortunately, the Markov chains we encounter tend to be aperiodic. Using mathematical expressions are: for any one state I, d to set {n ∣ n p 1 pijn > 0} \ left \ {n | n \ geq 1 p ^ {n} _ {ij} > 0 \ right \} {n ∣ n p 1 pijn > 0} the greatest common divisor, if 𝑑 = 1, then the status to nonperiodic
Any two states are connected: this means that it is possible to go from any state to any other state in a finite number of steps, without the condition that the probability is always zero, leading to unreachable conditions.
The number of states of a Markov chain can be finite or infinite. So it can be used for both continuous and discrete probability distributions.
𝜋 is often called the stationary distribution of markov chains.

Reference article:

How to understand the meaning of random process profoundly?

What is a Markov Chain

MCMC(2) Markov chain

To small white look at the Markov Chain (Markov Chain) best introductory tutorial

1 What is markov chain?

2. The concrete example of Markov Chain

Properties of Markov chains

Related Posts

Python/machine learning resources | 177 g/TensorFlow video/deep learning algorithm, introduction to cover/intermediate/project each stage!

Machine learning data set collation

What? The cat owner disappeared overnight?