Red Stone’s personal website: Redstonewill.com

Naive Bayes confuses many people with its complex formulas and confusing concepts. In this paper, I will explain the principle of naive Bayes algorithm in vernacular with the most popular language and minimize the use of complex formulas, and solve machine learning problems by using naive Bayes idea through practical examples. It gives you a quick intuitive and visual understanding of naive Bayes.

1. Buying melons

First, we will introduce two mathematical concepts: prior probability and posterior probability. Are you getting dizzy again? It doesn’t matter. We’ll give you an example to help you understand the two probabilities.

Recently the weather is hot, red stone to the supermarket ready to buy a watermelon, but not too much experience, do not know how to pick a ripe melon. At this time, as a student of science, the red stone has such considerations:

If I don’t know anything about the watermelon, its color, its shape, whether or not the stem has fallen off. Generally speaking, the watermelon has about a 60% chance of ripening. Then, this probability P(melon ripe) is called prior probability.

In other words, prior probability is the probability obtained based on previous experience and analysis. Prior probability does not need sample data and is not affected by any conditions. It’s like the red stone judging whether a watermelon is ripe based on common sense and not on its state, which is the prior probability.

Once again, the red stone learned a common sense to judge whether a watermelon is ripe, that is, whether the stem has fallen off. Generally speaking, when the stem falls off, the watermelon has a higher chance of ripening, about 75 percent. If the melon pedicle fall off as a result, then go to speculate that watermelon ripe probability, the probability P (melon cooked | melon pedicle off) is called a posteriori probability. A posterior probability is analogous to conditional probability.

Now that we know the prior probabilities and the posterior probabilities, let’s see what joint probabilities are. In the case of the red rock buying the watermelon, the P(ripe, tips falling off) is called the joint distribution, which is the probability that the melons are ripe and the tips fall off. For joint probability, the following multiplication equation is satisfied:


Among them, P (melon cooked | melon pedicle off) is just introduced a posteriori probability, said under the condition of “melon pedicle off”, “melon cooked” probability. P (melon pedicle off | melon cooked) in the case of “melon cooked”, said the probability of fall off “melon”.

Now, what does red Rock want to do to calculate the probability of the stem falling off? In fact, it can be divided into two cases: one is the probability of the melon pedicle falling off under the ripe state, and the other is the probability of the melon pedicle falling off under the raw state. The probability of the stem falling off is the sum of these two things. Therefore, we derive the total probability formula:


2. Judge melon ripeness by individual features

Ok, after introducing the prior probability, posterior probability, joint probability and total probability, let’s look at a problem like this: watermelon can be divided into two states: ripe and raw, with probabilities of 0.6 and 0.4 respectively, and the probability of the stalk falling off in the ripe is 0.8, and the probability of the stalk falling off in the raw is 0.4. So, if I now pick a melon with a broken stem, what are the odds that it’s a good melon?

Obviously, this is a problem to calculate the posterior probability. According to the joint probability and total probability formula derived above, it can be obtained:


Item by item:

Conditional probability P (melon pedicle off | melon cooked) = 0.8

Prior probability P(ripe melon) = 0.6

Conditional probability P (born melon pedicle off | melon) = 0.4

Prior probability P(Melon) = 0.4

Substitute the above values into the above equation, then:


Thus, we calculate that the probability of a melon whose stem falls off being a good melon is 0.75. Note that the formula above for calculating a posterior probability is based on Bayes’ theorem. A bit of a surprise? Before you know it, you’ve got the idea of Bayes’ theorem.

3. Judge melon ripeness by multiple features

In order to buy a ripe melon, red stone also had to fight. I did a special search on the Internet. Know whether a melon is ripe by looking at the shape and color of the melon as well as whether the stem has fallen off. The shapes are round and sharp, and the colors are dark green, light green and cyan. That’s a lot to look at, huh? The red stone is a little spooky, but it doesn’t matter, we can use the idea of Bayes’ theorem that we introduced to try to solve this problem.

Now, instead of having one feature, we now have three, and we use X for features, and Y for the type of melon (ripe or raw). According to Bayes’ theorem, the posterior probability P(Y=| X = X) expression is:


Among them,Represents categories, and k is the number of categories. In this case, k = 1,2,It means ripe,Stands for raw melon. The formula above seems a bit complicated, but in fact it is consistent with the form of the single feature (whether or not the melon stem falls off) in the previous section.

One thing to note here is that the feature X is no longer singular, but contains three features. Therefore, the conditional probability P (X = X | Y =) Assume that the conditions are independent of each other, that is, assume that features are independent of each other. In this way, P (X = X | Y =Can be written as:


Where, n is the number of features, and j is the current feature. For this example, P (X = X | Y =) can be written as:


This assumption of conditional independence is where the word “naive” of naive Bayes method comes from. This assumption makes naive Bayes simple, but sometimes sacrifices some classification accuracy.

In this way, using the naive Bayes idea, we can write the posterior probability as:


What happened to not having so many formulas? Don’t worry, look more complex, the above formula is just increase the sample characteristics, form and in the previous section P (melon cooked | melon pedicle loss) is the same.

Now, red Rock picks up a watermelon, looks at its stem, shape, and color, and calculates them according to the naive Bayes formula above(Ripe) andP of Y is equal to| X = X) and P (Y =| X = X). And then we compare P of Y is equal to| X = X) and P (Y =The size of the | X = X) values:

If P (Y = | X=x) > P(Y=| X = X), then determine the melon cooked.

If P (Y = | X=x) < P(Y=| X = X), determine the melon is born.

And the thing to notice is the denominator in the above equation, for all of theIt’s all the same. So the denominator can be omitted. Different, only P(Y=| X = X) molecules can:


All right! Red Rock finally figured out how to use naive Bayes to tell if a melon is ripe.

4. Naive Bayesian classification

Now that the Red Rock knows Bayes’ theorem and naive Bayes’ method, it’s ready to buy melons with confidence. One more thing to do before buying melons is to collect sample data. Red Stone obtained a set of data containing 10 groups of samples through online information and consulting. This set of data is the watermelon of different stem, shape, color corresponding to whether raw or ripe. I take this data set as historical empirical data and use it as the standard.

Among them, the melon pedicle is divided into peeling and not peeling, shape is divided into round and pointed, color is divided into dark green, light green, cyan. Different combinations of features correspond to ripe or raw melons.

Now, the red stone picks a watermelon with a peeled, round, blue-colored head. At this point, the red rock is perfectly capable of calculating a posterior probability based on sample data and naive Bayes.

First, for ripe melons:

Prior probability of melon ripeness: P(melon ripeness) = 6/10 = 0.6.

Fall off: conditional probability P (| melon cooked) = 4/6 = 2/3.

: conditional probability P (circular | melon cooked) = 4/6 = 2/3.

Conditional probability P (cyan | melon cooked) = 2/6 = 1/3.

Calculate the posterior probability numerator part:

P (melon cooked) * P (off | melon cooked) * P (circular | melon cooked) * P (cyan | melon cooked) = 0.6 * (2/3) * (2/3) (1/3) = 4/45.

Then, in the case of raw melons:

Prior probability of melon birth: P(melon birth) = 4/10 = 0.4.

Conditional probability P (off | melon) = 1/4 = 0.25.

Conditional probability P (circular | melon) = 1/4 = 0.25.

Conditional probability P (cyan | melon) = 1/4 = 0.25.

Calculate the posterior probability numerator part:

P (melon) * P (born off | melon) * P (circular | melon) * P (cyan | melon) = 0.4 * 0.25 * 0.25 * 0.25 = 1/160.

Because 4/45 > 1/160, it is predicted to be ripe. Finally, the calculation was done, and the red stone was sure that the melon was ripe, with a peeled, round shape and blue color.

When he got home, the melon was ripe.

5. To summarize

In this paper, the red stone to buy watermelon experience, to give you an image to explain what is the prior probability, posterior probability, bayes theorem and naive Bayes method. Finally, naive Bayes method is used to select watermelon and judge whether watermelon is ripe or not. This paper explains the basic idea and classification process of naive Bayes in the most plain language. I wonder if red Stone’s melon buying experience helps you understand naive Bayes?