Written in the book of the former

Naive Bayes classifier is actually an algorithm improvement of common sense. It uses a more precise quantification to judge classification, using a posterior probability method. Based on the comparison with decision tree, this paper introduces the relationship between prior probability and posterior probability, and then introduces the process of naive Bayes algorithm in detail.

Naive Bayes algorithm is relatively simple, so this article is reserved for review before the interview. Sorting out the relationship between the issues is key.

Comparison with decision trees

After learning the classical decision tree algorithm, we can have such an understanding: the characteristic of decision tree is that it is always doing segmentation along features. As the layers go on, the division gets finer and finer. It goes something like this:

Classical Decision Tree Algorithm

On this basis, I will introduce a basic method for implementing decision making in a probabilistic framework. Again, this is very much in line with our empirical thinking. This is called a Bayesian classifier. Compared with decision tree, its classification is shown as follows:

The blue and red here represent the probability. The principle behind the fancy name bayes classifier is very simple. It’s choosing which category we want to put an individual in based on probability.

We can think of Bayesian classifiers in this way. The probability of fresh watermelon vine being sweet is 0.7. If we only looked at melon vine, we judged the fresh melon with melon vine as melon. We introduce the second feature of watermelon texture, and assume that the probability of sweet melon with neat texture is 0.8. At this time, we need to calculate the probability of fresh and neat texture melon sweet, such as 0.9(why should be greater than the first two probability you can think about), so that we can see the texture and melon vine these two features can have a probability to judge whether the melon sweet.

Here we can use the analogy of a classified decision tree. For those who do not know much about decision trees, you can read my article “Classical Decision Tree Algorithm”. Compared with the decision tree, we can directly transform the probability of fresh melon sweet of melon vine into fresh melon sweet, and our Bayes have a kind of probabilistic fault tolerance, which makes the results more accurate and reliable. However, Bayesian classifier has higher requirements on data than decision tree, which requires a model that is easier to explain and less relevant between different dimensions. We’ll talk more about that later.

Prior probability and posterior probability

Let’s look at Bayes’ formula:


  1. It’s a posterior probability
  2. Is the prior probability, which is generally given subjectively by people. The prior probability in Bayes is generally referred to in particular.
  3. Conditional probability, also known as likelihood probability, is generally obtained from historical data statistics. It’s not usually called a prior probability, but by definition it fits the prior definition.
  4. In fact, it is also a prior probability, but it is not important in many applications of Bayes (because as long as the maximum posteriori does not ask for absolute value), it is often calculated by the total probability formula when necessary.

It can be seen that prior probability, posterior probability and likelihood probability are closely related. It’s worth noting that the order of A and B is related to this transcendental posterior. A and B are reversed, and the prior and the posterior need to be reversed. For example: if there is a piece of meat and a bottle of vinegar on the table, if you eat a piece of meat and you think it is sour, what is the probability that vinegar is added to the meat?

For this problem, the probability of vinegar in the meat under the condition that it tastes sour is a posterior probability. The probability that the meat will taste sour with vinegar is the likelihood probability, and the probability that the meat will taste sour with vinegar is the prior probability.

We can conclude that event A is the result of cause and event B is one of the causes. Here, the meat we eat is sour, which is the result of various reasons, and the vinegar in the meat is one of the reasons leading to the result of A. Why is it one of them? Because in addition to vinegar, it could also be bad meat and so on.

Naive Bayes classification algorithm

A classical example is used to explain the algorithm of naive Bayes classification. Learn a naive Bayes classifier from the data in the following table and determineClass W tag ofIn the table,.Is characteristic, and the sets of values are respectively..Marks the class,

At this point, we have a set of valuesIt can be calculated as follows:



visibleTime is a little bit more posteriori. so

As we can see from the examples above, the naive Bayes approach is in fact a general practice, and Laplace once said that probability theory is the expression of common knowledge in mathematical formulas. Let’s look at the mathematical representation of the most complete naive Bayes classification algorithm.

Naive Bayes algorithm

Naive means that conditional probability distributions are assumed to be conditional independent. The bayesian algorithm actually learns the mechanism of data generation and belongs to the generation model. The conditional independence hypothesis is equivalent to saying that the features used for classification are all conditionally independent under the condition that the class is defined.

The input: Training data, including.Is the firstNumber one of the samplesA characteristic,.Is the firstThe first feature is possibleA value,., test examples;

The output: Test exampleThe classification of

  1. Calculate prior probability and conditional probability



  1. For a given instanceTo calculate the

  1. To determine the instanceThe class of