Bayes’ theorem is so useful, whether it’s in investing, whether it’s in machine learning, whether it’s in everyday life.

For example, life scientists use Bayes’ theorem to study how genes are controlled; Educators realize that students’ learning process is really an application of Bayes’ rule; Fund managers use Bayes’ rule to find investment strategies; Google uses Bayes’ theorem to improve its search function to help users filter out spam; Driverless cars receive road conditions and traffic data collected by sensors on the roof and update the information from the map using Bayes’ theorem. Bayes’ theorem is widely used in artificial intelligence and machine translation.

I will look at Koepbayes’ theorem and the thinking behind it from the following four perspectives:

1. What does Bayes’ theorem do?

2. What is Bayes’ Theorem?

3. Application cases of Bayes’ theorem

4. Bayesville in Life

1. What does Bayes’ theorem do? English mathematician Thomas Bayes first proposed the theorem in a paper published in 1763. It was published posthumously by a friend of his.

(PS: Bayes’ Theorem is actually the probability formula shown in the picture below. I won’t talk about it here, but I will focus on its use value, because once you understand its application, you will be more interested in learning it.)

In this paper, he proposed Bayes’ theorem to solve an “inverse probability” problem.

Before Bayes wrote this paper, people had been able to calculate “positive probability”. What is the positive probability? For example, Durex holds a raffle where there are 10 balls in the bucket, two white balls and eight black balls, and if you draw the white balls, you win. What is the probability that you reach in and pick out a random ball?

According to the frequency probability calculation formula, you can easily know that the probability of winning = number of winning balls (2 white balls)/total balls (2 white balls +8 black balls) =2/10

Monkey: How to understand conditional probability?

Bayes, in his paper, was trying to solve an “inverse probability” problem. In the example above, we don’t know what’s in the bucket. Instead, we pick up a ball and predict the ratio of white to black balls in the bucket by looking at the color of the ball.

This prediction can actually be made using Bayes’ law. Bayes’ paper at the time was just an attempt to solve the problem of inverse probability, and he didn’t know what was going on in it.

Later, however, Bayes’ theorem took probability theory by storm and extended its application to all fields. It can be said that the shadow of Bayes’ theorem can be seen in all places that need to make probabilistic predictions, and in particular, Bayes is one of the core methods of machine learning.

Why is Bayes’ theorem so useful in real life?

That’s because real life problems are mostly “inverse probability” problems like the one above. Because most decisions in life are made with incomplete information, we only have a limited amount of information. Without complete information, we have to make as good a prediction as we can with limited information.

For example, the weather forecast says there is a 30% chance of rain tomorrow. What does that mean?

We can’t repeat tomorrow 100 times and then figure out that it’s going to rain about 30 times (number of rainy days/total number of rainy days) like we can do with frequency probability

Instead, you can only use a limited amount of information (measurements of past weather) to use Bayes’ law to predict what the probability of rain will be tomorrow.

Similarly, in the real world, each of us needs to predict. They want to delve into the future, think about whether to buy stocks, what opportunities policy will give them, come up with new product ideas, or just plan meals for the week.

Bayes’ theorem is designed to solve these problems by predicting the probability of future events based on past data.

Bayes’ way of thinking provides us with effective methods to help us make decisions to better predict the future of business, finance, and everyday life.

To sum up part 1: What does Bayes’ theorem do?

With limited information, it helps us predict probabilities.

Bayes’ theorem can be seen everywhere probabilistic predictions are required, and in particular, Bayes is one of the core methods of machine learning. Such as spam filtering, Chinese word segmentation, AIDS screening, liver cancer screening and so on.

2. What is Bayes’ Theorem? Bayes’ theorem looks like this:

Come here, you might say: Monkey, speak human, I see formula head big.

Actually, I don’t like formulas any more than you do. Let’s start with an example.

My friend Fawn said that his goddess smiled at him every time she saw him. Now he wanted to know if she liked him.

Who let me learn the knowledge of statistical probability? Now let’s use Bayes to predict the probability that the goddess will like him, so that the deer can decide whether to express their love to the goddess according to the probability.

First, I analyzed the given known and unknown information:

1) Question to be solved: The goddess likes you, denoted as event A

2) Known conditions: The goddess often smiles at you, denoted as B event

So, P (A | B) said the goddess often smile at you (B) after the occurrence of this incident, (A) the probability of goddess like you.

From the formula, we need to know three things:

1) Prior probability

We call P(A) as “Prior probability”, that is, our subjective judgment of the probability of A without knowing the event of B.

In this case, it’s to judge the probability of a goddess liking a person without knowing that she often smiles at you. So let’s say it’s 50%, which means there’s a 50/50 chance that I don’t like you, maybe I don’t like you.

2) Possibility function

P/P (B | A) (B) known as the “likelihood function (Likelyhood),” this is an adjustment factor, also is A new information from B adjustment, holds the prior probability (subjective judgment before) to closer to the true probability.

The probability function you can think of as an adjustment to the prior probability of new information coming in. Such as “artificial intelligence” we just started to see this message, you have your own understanding (prior probability – subjective judgment), but when you learn some of the data analysis, or watch some of this book (new information), then you according to master the latest information to optimize their understanding (likelihood function – adjustment factor) before, Finally reunderstanding the message “artificial intelligence” (posterior probability)

If “likelihood function” P/P (B | A) (B) > 1, means “prior probability” is enhanced, the possibility of the occurrence of event A larger;

If the “probability function” is equal to 1, that means that event B is not helpful in determining the probability of event A;

If the “probability function” is less than 1, it means that the “prior probability” is weakened and event A is less likely.

Again, according to the new information that the goddess often smiles at you, I investigated and visited the goddess’s best friends. Finally, I found that the goddess is usually cold and cold, and seldom smiles to others, which means that she is more likely to have a good impression on you (probability function >1). So I estimate the likelihood function “” P/P (B | A) (B) = 1.5 (save 10000 words, specific how to estimate the behind will have A more detailed example of science)

3) Posterior probability

P (A | B) is called the “Posterior probability” (Posterior aim-listed probability), namely the B after the incident, we have A probability of occurrence of reassessment. In this case, you repredict the probability that the goddess will like you after she smiles at you.

Into the bayesian formula to calculate P (A | B) = P (A) * P (B | A)/P (B) = 75% * 1.5 = 50%

Therefore, she smiles at you a lot and has a 75% chance of liking you. So what that means is that the goddess is laughing at you a lot and this new information is very good at making inferences, raising the prior probability from 50% to a posterior probability of 75%.

After getting the probability value, the deer confidently posted the following confession micro blog:

Later, as expected, received a reply from the goddess. Predict success.

Now let’s look at Bayes’ formula again and you can see the key idea behind it:

We first estimate A “prior probability “P(A) based on previous experience, and then add new information (experimental result B), so that with the new information, we can predict event A more accurately.

Therefore, Bayes’ theorem can be understood as the following:

Posterior probability (A probability after new information) = prior probability (A probability) x possibility function (adjustment caused by new information)

The underlying idea of Bayes is this:

If I have all the information about a thing, I can certainly calculate an objective probability (classical probability).

But most decisions in life are made with incomplete information. We only have a limited amount of information. Since comprehensive information is not available, we try to make as good a prediction as possible with limited information. That is, based on subjective judgment, you can estimate a value (prior probability) and then revise it (probability function) based on new information observed.

This is what it looks like graphically:

In fact, the alpha dog also beats the human in this way. To put it simply, the alpha dog can calculate the maximum probability of winning every move. That is to say, after every move, he can update his probability value objectively and calmly, completely unaffected by other circumstances.

3. Application cases of Bayes’ Theorem Previously we introduced the formula of Bayes’ theorem and the idea behind it. Now let’s take an application case, and you’ll be more familiar with the cow flap tool.

For the sake of the case study, we need to supplement this knowledge.

1. Total probability formula

What this formula does is compute P(B) in Bayes’ theorem.

Suppose the sample space, S, is the sum of two events, A and A prime. For example, in the figure below, the red part is event A and the green part is event A’, which together constitute the sample space S.

Here comes an event B, as shown below:

Total probability formula:

What it means is that if A and A prime make up all of the sample space of A problem, then the probability of event B is equal to the sum of the probabilities of A and A prime times the conditional probabilities of B for those two events.

It doesn’t matter if I can’t remember it, because I can’t remember it either, so I’ll just flip it over here when I use it.

Case 1: Application of Bayes’ theorem to making judgments

There are two identical bowls. Bowl 1 contains 30 chocolates and 10 fruit drops, and bowl 2 contains 20 chocolates and 20 fruit drops.

Then cover the bowl. Pick a bowl at random and pick a chocolate out of it.

Question: What is the probability that this chocolate comes from bowl 1?

All right, so I’m going to solve this problem with a formula, and I’m going to give you the formula at the end.

Step 1: Break down the problem

1) The problem to be solved: What is the probability that the chocolate comes from bowl 1?

From bowl 1, event A1, from bowl 2, event A2

I pulled out the chocolate, I’ll call it event B,

Then the requirements of the problem is that P (A1) | B, which is out of the chocolate (B), probability from no. 1 bowl (A1)

2) Known information:

There are 30 chocolates and 10 fruit drops in bowl 1

There are 20 chocolates and 20 fruit drops in bowl 2

It’s chocolate

Step 2, apply Bayes’ theorem

1) Find the prior probability

Since the two bowls are identical, they have the same probability of being selected before the new information is obtained (before the chocolate is removed), so P(A1)=P(A2)=0.5,(where A1 means from bowl 1 and A2 means from bowl 2).

This probability is known as the “prior probability”, that is, before the experiment, the probability of coming from both bowl 1 and bowl 2 is 0.5.

2) Find the possibility function

P(B|A1)/P(B)

Among them, P (B | A1) said is removed from a 1 bowl (A1) (B) the probability of chocolate.

Because of no. 1 and 10 fruit bowl with 30 chocolate, so P (B | A1) = number of chocolate (30)/(total candy 30 + 10) = 75%

Now all we have left in Bayes’ formula is P(B), and we have to solve for P(B) to get the answer.

According to the total probability formula, P(B) can be obtained using the following figure:

Figure of P (B | A1) is the probability of chocolate in 1 bowl, we according to the known condition, it is easy to calculate.

Similarly, P (B | A2) is the probability of 2 bowl of chocolate, is easy to calculate (has) is given.

And P (A1) = P (A2) = 0.5

Plugging these numbers into a formula is something that even a schoolboy could do. The last P (B) = 62.5%

So, the probability function P (B | A1)/P (B) = 75% / 62.5% = 1.2.

Probability function >1. Indicates that the new information B increases the likelihood of event A1.

3) Use Bayesian formula to calculate posterior probability

The above results, to the bayes’ theorem, can calculate P (A1 | B) = 60%

What we need to focus on in this example is the constraint: it’s chocolate. Without that constraint, there’s a 50% chance that bowl one will come out, because the uneven distribution of chocolate increases the probability from 50% to 60%.

Now, let me summarize the way bayes’ theorem works, and you’ll see that it’s as simple as a primary school student doing word problems:

Step 1: Break down the problem

In simple terms, it is like doing word problems. First list the conditions you need to solve the problem, and then remember what is known and what is not known.

1) What is the problem to be solved?

Identify which is event A in Bayes (usually A question you want to know) and which is event B (usually new information, or experimental results)

2) What are the known conditions?

Step 2. Apply Bayes’ theorem

Step 3: Find two indices in Bayes’ formula

1) Find the prior probability

2) Find the possibility function

3) Use Bayesian formula to calculate posterior probability

Case 2: Application of Bayes’ theorem in the medical industry

For every medical test, there are false positive and false negative rates. A false positive is when you’re not sick, but the test shows you’re sick. A false negative is the opposite, a disease but a normal test result.

Even if the test is 99 percent accurate, doctors can misdiagnose if they rely entirely on the results. So a false positive is a test that shows you have a disease, but you don’t actually have it.

For a more specific example, because of the long incubation period of AIDS, even if infected, the body may not feel anything for a long time, so a false positive HIV test will lead to great psychological pressure on the tested person.

You might think that with 99% accuracy, false tests would be negligible. So you don’t think this guy has AIDS, do you?

Let’s apply Bayes’ theorem and see that your intuition is wrong.

Suppose the incidence of a disease is 0.001, or 1 in 1,000 people will get it. There is now a test that can test for disease with an accuracy of 0.99, or 99 percent of the time it is positive if the patient does have it. It has a 5 percent false positive rate, meaning it has a 5 percent chance of being positive when the patient doesn’t have the disease.

Now the examination result of a patient is positive, how big is the possibility that he really gets ill?

Okay, I know you’re feeling overwhelmed by all this information, and so am I. But we have a Bayesian template, so let’s start.

Step 1: Break down the problem

1) Questions to be answered: What is the probability that a patient will actually have the disease if his test results are positive?

The patient’s positive test result (new information) is called event B, his illness is called event A,

Then the requirements of the problem is that P (A | B), which is the test result of the patient is positive (B), he does get sick the probability of (A)

2) Known information

The incidence of the disease is 0.001, i.e. P(A)=0.001

Reagent can test whether patients with disease, accuracy is 0.99, which is under the condition of the patients really sick (A), it has A 99% chance that positive (B), so P (B | A) = 0.99

The reagent has a 5 percent false positive rate, meaning it has a 5 percent chance of being positive if the patient doesn’t have the disease. Got sick we remember for the event. A, so no disease is the reverse of the event A, written as A ‘, so this sentence can be expressed as P (B | A ‘) = 5%

2. Apply Bayes’ theorem

1) Find the prior probability

The incidence of disease was 0.001, i.e., P(A)=0.001

2) Find the possibility function

P(B|A)/P(B)

Said, P (B | A) under the condition of the patients really sick, (A) positive probability of reagent, the previous known conditions we already know that P (B | A) = 0.99

So now we just have to solve for P(B) to get the answer. According to the total probability formula, P(B)=0.05094 can be obtained from the following figure

So the probability function P (B | A)/P (B) = 0.99/0.05094 = 19.4346

3) Use Bayesian formula to calculate posterior probability

We got A surprising result, P (A | B) is equal to 1.94%.

In other words, the accuracy of screening was 99 percent, and the probability of getting sick (positive) was only 1.94 percent

You might say, no longer do you trust the hype that screening is so accurate that it turns out to be useless for diagnosing the disease, so why bother with medical technology?

Yes, that’s what Bayesian analysis tells us. We get AIDS, because AIDS is small probability event, so when we have a large group of people do HIV screening, even with 99% accuracy, but there are still quite a number of people diagnosed with AIDS due to measurement errors, this part of the number of people in the crowd even than actually higher than the number of AIDS patients.

How, you ask, do you correct for such high misdiagnosis?

The reason for such unreliable misdiagnosis is to screen a large group of people indiscriminately, no matter how accurate the measurement is, because there are many more healthy people than actual patients, so the interference of misdiagnosis is very large.

According to Bayes’ theorem, we know that increasing the prior probability can effectively improve the posterior probability.

So the solution is simple: target the suspected population, say 10 out of 10,000, and repeat the test independently. Because normal people have two consecutive physical examination of the probability of mismeasurement is very low, then the accuracy of screening the real patient is very high, which is why the detection of many diseases, often sent to independent institutions for multiple checks.

That’s why people who test positive the first time need to be tested a second time, and those who test positive the second time need to be sent to the national laboratory for a third time.

As an example, given in the book The Truth about Medicine, for every positive test for HIV, there is only a 50 percent chance that the patient has the virus. But if doctors had prior knowledge and screened high-risk patients and then tested them, the accuracy of the test could increase to 95 percent.

Case 4: Bayesian spam filters

Spam is a headache that plagues all Internet users. Global spam peaked in 2006, when 90 percent of all emails were spam, and dropped below 50 percent for the first time in June 2015.

The original spam filtering is to rely on static keywords and some judgment conditions to filter, the effect is not good, many fish escaped from the net, wronged also many.

In 2002, Graham proposed the use of “Bayesian inference” to filter spam. It worked, he said, unbelievably well. Out of 1,000 spam messages, 995 can be filtered without a single misjudgment.

Because typical spam words appear more frequently in spam, they are bound to be recognized when doing Bayesian calculations. Then 15 spam words with the highest frequency are used for joint probability calculation. If the result of joint probability exceeds 90%, it means that it is spam.

Bayesian filters can identify a lot of rewritten spam with a very low error rate. You don’t even need to be precise about the initial value, and the accuracy will gradually approximate the real situation in subsequent calculations.

(PS: If you want to know more about this knowledge, I will write articles to answer you.)

Bayes’ theorem works much like the human brain, which is why it can be the basis for machine learning.

If you look closely at a child’s ability to learn new things, you will find that many things can be learned just once. For example, my 3-year-old nephew, saw me do push-ups, also did this action, although the action is not standard, but also have a pattern.

Similarly, WHEN I tell him a new word, he doesn’t know what it means at first, but he can make a guess (prior probability/subjective judgment) based on the situation. When given the chance, he’ll say it on different occasions and watch your reaction. If I tell him he is right, he will further memorize the meaning of the word, and if I tell him he is wrong, he will adjust accordingly. (Probability function/adjustment factor). After such repeated speculation, testing and adjustment of subjective judgment, is the process of Bayes’ theorem thinking.

Similarly, we adults use Bayesian thinking to make decisions. For example, if you’re talking to a goddess and she says “though,” you might guess that there’s a 90% chance she’ll say “but.” Our brains seem hardwired to use Bayes’ theorem, which is to make subjective judgments based on life experiences (prior probabilities), then revise them based on new information gathered (probability letters), and make high-probability predictions (posterior probabilities).

In fact, this is the brain decision-making process shown below:

Therefore, when it comes to prediction in life, bayesian thinking can improve the probability of prediction. You can predict this in 3 steps:

1. Break the problem down

Simply put, it is like the feeling of primary school students doing word problems. What are the problems to be solved? What do we know?

  1. Give subjective judgment

Not a shot in the dark, but a subjective judgment based on your own experience and knowledge.

3. Collect new information and optimize subjective judgment

Keep up to date with information about the problem you’re trying to solve, and then use the new information to adjust your judgment in Step 2. If the new information fits that judgment, you increase the credibility of your judgment, and if it doesn’t, you decrease the credibility of your judgment.

Such as we have just started to see “artificial intelligence is human unemployment” this information, you have your own understanding (subjective judgment), but when you learn some of the data analysis, or see some latest progress on this (new information), then you according to master the latest information to optimize their understanding before adjustment factor (), Finally, the information of “artificial intelligence” (posterior probability) is reunderstood. This is hu Shi said “bold assumptions, careful verification”.

Basic knowledge of probability supplement:

References:

Thomas Bayes: Probability for Success

Everything You Ever Wanted to Know About Bayes’ Theorem But Were Afraid to Ask.

The bayesian spam filter: www.paulgraham.com/spam.html

Bayesian spam filtering Wiki:en.wikipedia.org/wiki/Naive_…

Bayesian Inference and Its Internet Application (I)

The statistical ghost behind the Federalist Papers