In the field of machine learning, naive Bayes is a simple probability classifier based on Bayes’ theorem (classification is also known as supervised learning, the so-called supervised learning is to infer the possible output from the feature information in the known sample data to complete classification, whereas the clustering problem is called unsupervised learning). Naive Bayes can get good classification results when processing text data, so it is widely used in text classification/spam filtering/natural language processing and other scenarios.

Naive Bayes assumes that each feature of the sample is independent of each other. For example, if a fruit is red, round in shape, and about 70 mm in diameter, then it may be considered as an apple (the category with the highest probability will be considered as the most likely category, This is called Maximum A Posteriori), and naive Bayes assume that all of these features independently contribute to the probability that the fruit is an apple, even though there may be dependencies or other features. So this is what makes Naive Bayes so Naive.

The original name of Naive Bayes is Naive Bayes Classifier. Naive Bayes itself is not a correct translation. The reason for such translation is that Naive Bayes is Naive, but it does not mean that its efficiency is poor. Another reason is that it is really “naive” compared to algorithms such as Bayesian networks.

Before moving on to naive Bayes, we need to understand Bayes’ theorem and its pretheoretical conditional probability and total probability formulas.

This article was written by SylvanasSun([email protected]) and appeared on SylvanasSun’s Blog. The original link: sylvanassun. Making. IO / 2017/12/20 /… (Please be sure to retain this paragraph statement and keep the hyperlink.)

Conditional probability


Conditional Probability (Conditional aim-listed Probability) refers to the case of the event B occurs, the Probability of event A occurs, with P (A | B) “role =” presentation “style =” position: relative;” > P (A | B) P (A | B) said that read in B under the condition of the probability of A.

In the upper Venn diagram, two events A and B are described, and their intersection A ∩ B, substituted into the conditional probability formula, To launch event happened A probability P (A | B) = P (A ⋂ B) P (B) “role =” presentation “style =” position: relative;” > P (A | B) = P (A ⋂ B) P (B) P (A | B) = P (A ⋂ B) P (B).

Make A little change in the formula can be pushed to P (A ⋂ B) = P (A | B) P (B) “role =” presentation “style =” position: relative;” > P (A ⋂ B) = P (A | B) P (B) P (A ⋂ B) = P (A | B) P (B) and P (A ⋂ B) = P (B | A) P (A) “role =” presentation” style=”position: relative;” > P (A ⋂ B) = P (B | A) P (A) P (A ⋂ B) = P (B | A) P (A) (P (B | A) in the probability of A under the condition of B).

Then according to the relationship can be pushed to P (A | B) P (B) = P (B | A) P (A) “role =” presentation “style =” position: relative;” > P (A | B) P (B) = P (B | A) P (A) P (A | B) P (B) = P (B | A) P (A).

Let’s say, for example, that two people are rolling two six-sided dice D1 and D2. Let’s predict the probability of D1 and D2 going up.

In Table1, a sample space with 36 results is described, and the 6 results marked in red as D1 to 2 above are P (D1 = 2) = 6, 36 = 1, and the probability is P (D1 = 2) = 6, 36 = 1 and 6 “role=”presentation” style=”position: relative;” >P(D1=2)=636=16 P(D1=2)=636=16

P (D1 + D2 ≤ 5) = 10 36 “role=”presentation” style=”position: relative;” >P(D1+D2≤5)=1036 P(D1+D2≤5)=1036

Table3 describes the conditions that satisfy Table2 and the results that satisfy D1 = 2, it selects the three results in Table2, In the conditional probability formula is expressed as P (1 + 1 = 2 | D D D 2 5 or less) = 3 = 0.3 10 “role =” presentation “style =” position: relative;” > P (D1 = 2 | D1, D2 + 5 or less) = 310 = 0.3, P (1 + 1 = 2 | D D D 2 5 or less) = 3 = 0.3.

Total probability formula


The total probability formula is the basic rule that associates edge probability with conditional probability. It represents the total probability of an outcome, which can be realized by several different events.

The total probability formula transforms the problem of solving the probability of a complex event into a sum problem of the probability of simple events occurring under different circumstances. Formula for P = ∑ (B) (I = 1 n P (A I) P (B | A I) “role =” presentation “style =” position: relative;” > P = ∑ ni = 1 (B) P (Ai) P (B | Ai) P (B) = ∑ I = 1 n P (A I) P (B | A I).

Assume A sample space S, which is the sum of two events A and C, and event B has intersection with both of them, as shown in the figure below:

The probability of event B can be expressed as P (B) = P (B ⋂ A) + P (B ⋂ C)” role=”presentation” style=”position: relative;” > P (B) = P (B ⋂ A) + P (B ⋂ C) P (B) = P (B ⋂ A) + P (B ⋂ C)

Can be inferred through the conditional probability, P (B ⋂ A) = P (B | A) P (A) “role =” presentation “style =” position: relative;” ⋂ A > P (B) = P (B | A) P (A) P (B ⋂ A) = P (B | A) P (A), So P (B) = P (B | A) P (A) + P (B | C) P (C) “role =” presentation “style =” position: relative;” >P(B)=P(B|A)P(A)+P(B|C)P(C) P ( B ) = P ( B | A ) P ( A ) + P ( B | C ) P ( C )

So this is the total probability formula, which says that the probability of event B is equal to the sum of the probabilities of event A and of event C times the conditional probabilities of event B for each of them.

Similarly, to apply this formula, suppose that there are two factories that produce and supply light bulbs. The bulb from Factory X will work more than 5,000 hours 99% of the time, and the bulb from factory Y will work more than 5,000 hours 95% of the time. Factory X has 60% market share, and factory Y has 40%. What is the probability that the bulbs purchased will work more than 5,000 hours?

Using the total probability formula, it can be concluded that:

  • P r ( B x ) = 6 10 ” role=”presentation” style=”position: relative;” >Pr(Bx)=610 Pr(Bx)=610: the probability of purchasing the light bulb made by factory X.

  • P r ( B y ) = 4 10 ” role=”presentation” style=”position: relative;” >Pr(By)=410 Pr(By)=410: the probability of purchasing the bulb made By factory Y.

  • P r ( A | B x ) = 99 100 ” role=”presentation” style=”position: relative;” > Pr (A | Bx) = 99100 r P (A | B) x = 99, 100: factory manufacturing x light bulb work more than 5000 hours of probability.

  • P r ( A | B y ) = 95 100 ” role=”presentation” style=”position: relative;” > Pr (A | By) = 95100 r P (A | B y) = 95, 100: factory manufactures light bulb y probability of more than 5000 hours of work time.

Thus, the probability of purchasing a light bulb that works more than 5,000 hours is 97.4 percent.

Bayes’ theorem


Bayes’ theorem was first proposed by English mathematician (theologian and philosopher) Thomas Bayes (1701-1761). Interestingly, he did not publish any academic articles on mathematics before his death. Even his most famous work, Bayes’ theorem, was discovered and published by his friend Richard Price among his posthumous notes.

Thomas Bayes became interested in probability in his later years, and the so-called Bayes’ theorem was just a paper he wrote to solve an inverse probability problem (which philosophers seem to love to prove the existence of God). By that time, people had been able to calculate forward probability problems, like, if you have a bag with X white balls and Y black balls, what’s the probability that you reach in and touch a black ball? This is a forward probability problem, and the inverse probability problem is the reverse, we do not know the proportion of balls in the bag in advance, but constantly reach out to touch several balls, and guess the ratio of black to white balls based on their color.

Bayes’ theorem is A theorem about the conditional probabilities of random events A and B. In general, the probability of event A given event B is not the same as the probability of event B given event A, but there is A definite relationship between the two, and Bayes’ theorem states this relationship.

One of the main applications of Bayes’ theorem is Bayesian inference, which is an inference method based on subjective judgment. In other words, you only need to estimate a value and then revise it according to the actual result, without any objective factors. This method of reasoning needs a lot of calculation, so it has been criticized by others and cannot be widely used. Until the rapid development of computer and people find that many things cannot be objectively judged in advance, bayesian reasoning makes a comeback.

Having said that, let’s take a look at the formula. In fact, we can derive Bayes’ formula by deducing the conditional probability formula that we derived above.

  • P ( A | B )” role=”presentation” style=”position: relative;” > P (A | B) P (A | B) : under the condition of B in the event of A probability, in bayes’ theorem, also known as posterior probability, the conditional probability that the event B occurs, we review A probability of events.

  • P ( B | A )” role=”presentation” style=”position: relative;” > P (B | A) P (B | A) : in A condition of the probability of event B, with A similarly.

  • P ( A )” role=”presentation” style=”position: relative;” >P(A) P(A) and P(B)” role=”presentation” style=”position: relative; >P(B) P(B) is known as the prior probability (also known as edge probability), that is, before the occurrence of event B, we can infer the probability of event A (without considering any factors related to event B), and similarly.

  • P ( B | A ) P ( B ) ” role=”presentation” style=”position: relative;” P > P (B | A) (B) P (B | A) P (B) was known as the standard similarity, it is an adjustment factor, primarily to ensure closer to the true probability prediction probability.

  • In terms of these terms, Bayes’ theorem is stated as: posterior probability = standard similarity * prior probability.

Let’s famous false positive problem, for example, assume that the incidence of a disease was 0.001 (1000 people will have a sickness), existing a reagent under the condition of the patients really sick, has a 99% chance to present positive, and in the absence of disease patients, it has a 5% chance to present positive (i.e., false positive), If a patient tests positive, what is their probability of getting sick?

Generation into the bayes’ theorem, assume that events are represented as A sick probability (P = 0.001) (A), and this is our prior probability, it is the patient in the actual injection reagent (lack of) with the results of the experiment before the expected incidence, suppose the probability of event B for reagents result is positive, we need to calculate is the conditional probability P (A | B), This is the probability of A under event B, and this is the posterior probability, which is the incidence of the patient after the injection.

Because there is the probability of not get sick, so you also need to assume the event C not sick prior probability (P (C) = 1-0.001 = 0.999), then P (B | C) posterior probability is not sick reagent results under the condition of positive probability, then plug in full probability formula can be concluded that the final result.

The final result is about 2 percent, and even if a patient tests positive, he or she has only a 2 percent chance of developing the disease.

Probabilistic models of naive Bayes


Let’s say that X = f 1, f 2… F n” role=”presentation” style=”position: relative;” >X=f1,f2… fn X=f1,f2… fn, where each f is a characteristic attribute of X, C m “role=”presentation” style=”position: relative;” >C1,C2… Cm C1,C2… Cm.

You then need to calculate P (C | X 1), P (C | X) 2,…, P (m | C X) “role =” presentation “style =” position: relative;” > P (C1 | X), P (C2) | X,…, P (Cm | X) P (C | X 1), P (C | X) 2,…, P (m | X C), we can according to a training sample set (known classification to be classified collection). Then the conditional probability of each characteristic attribute in each category is obtained statistically:

P (f | 1 C, 1), P (f 2 | C 1),…, P n | 1 C (f),…, P (f | 1 C 2), P (f | 2 C 2),…, P n | 2 C (f), …, P (f | 1 C m), P (f | C m) 2,…, P n | C m (f) “role =” presentation “style =” position: relative;” | C1 > P (f1), P (f2 | C1),…, P (fn | C1),…, P | C2 (f1), P (f2 | C2),…, P (fn | C2),…, P | Cm (f1), P (f2 | Cm),…, P (fn | Cm) P (f | 1 C 1), P | C f (double),…, P n | 1 C (f),…, P (f | 1 C 2), P (f | C 2) 2,…, P n | 2 C (f),… P (f | 1 C m), P (f | C m 2),…, P n | C m (f)

If P (C k | X) = M A X (P (C | X 1), P (C | X) 2,…, P (M | C X)) “role =” presentation “style =” position: relative;” > P (Ck) | X = MAX (P (C1 | X), P (C2) | X,…, P (Cm) | X) P (C k | X) = M A X (P (C | X 1), P (C | X) 2,…, P (M | C X)), Then X ∈ C k “role=”presentation” style=”position: relative;” >X∈Ck X∈Ck (Bayesian classification is basically the one with the highest probability).

Naive bayes will assume that each characteristic is independent, according to the bayes’ theorem can be driven: P (C I | X) = P (X | C I) P (C I) P (X) “role =” presentation “style =” position: relative;” > P (Ci X |) = P (X | Ci) P (Ci) P (X) P (C I | X) = P (X | C I) P (C I) P (X), as the denominator for all categories for constant, so you just need to maximising molecules, and because of all the features are independent of each other, So the result is:

According to the above formula derivation, the process of naive Bayes can be shown as the figure below:

Let’s walk through the process in the figure above with an example.

There is a site that wants a program to automatically identify an account’s authenticity (sorting accounts into real and fake accounts, which are trumpets with fake information or malicious registrations).

  • First, characteristic attributes and categories need to be determined, and then training samples need to be obtained. Assume that an account has three characteristics: number of logs/number of registered days (F1), number of friends/number of registered days (F2), and whether it uses a real profile picture (True: 1, False: 0).

  • Using 10,000 accounts that had been manually examined as a training sample, P (C 0) = 8900 ÷ 10000 = 0.89, P (C 1) = 1100 ÷ 10000 = 0.11″ role=”presentation” style=”position: relative;” >P(C0)=8900÷10000=0.89,P(C1)=1100÷10000= 0.11p (C0)=8900÷10000=0.89,P(C1)=1100÷10000=0.11, C0 represents the category probability of the real account (89%), and C1 represents the category probability of the fake account (11%).

  • Then the conditional probability of each feature under each category needs to be calculated and substituted into the naive Bayes classifier, Available P (F | 1 C) P (2 F | C) P (3 F | C) P (C) “role =” presentation “style =” position: relative;” | C) > P (F1, F2 P | C) P (F3 | C) P (C) P (F | 1 C) P (2 F | C) P (3 F | C) P (C), but there is one problem is that the F1 and F2 are continuous variables, not appropriate according to the probability of a specific value calculation. The solution is to convert continuous values into discrete values, and then calculate the probability of the interval, for example, decompose F1 into three intervals [0,0.05], [0.05,0.2], [0.2,+∞], and then calculate the probability of each interval.

  • F 1 = 0.1, F 2 = 0.2, F 3 = 0″ role=”presentation” style=”position: relative; >F1=0.1,F2=0.2,F3=0 F1=0.1,F2=0.2,F3=0, guess whether the account is a real one or a fake one. In this case, F1 is 0.1 and falls within the second interval, so the probability of the occurrence of the second interval is used in calculation. According to the training samples, the results can be obtained as follows:

  • Then the trained classifier can be used to get the real account probability and false account probability of the account, and then take the maximum probability as its category:


The end result is that the account is a real one.

Naive Bayes algorithm model


There are three algorithm models in naive Bayes:

  • Gaussian Naive Bayes: Suitable for feature variables that are consistent and assume that features follow a Gaussian distribution. For example, suppose we have a set of statistics on human characteristics. The characteristics in the data, such as height, weight and foot length, are all continuous variables. Obviously, we cannot use the method of discrete variables to calculate the probability. The solution is to assume that the characteristic terms are normally distributed, and then calculate the mean and standard deviation from the sample, and then you get the density function of the normal distribution, and then you plug in the density function, and then you figure out the value of the density function at a certain point.

  • MultiNomial Naive Bayes theorem: In contrast to Gaussian Naive Bayes, the polynomial model is more suitable for handling the situation where the characteristics are discrete variables. The model calculates prior probability P (C m)” role=”presentation” style=”position: relative;” > P (Cm), m (C) and the conditional probability P $P (F_n | Cm) $will do some smooth processing. The specific formula is, includingTIs the total sample number,m$T{cm} =”position: relative; > = the number of samples whose class is C_m$aIs a smoothing value. The formula for the conditional probability is zero.nIs the number of features,T_cmfnFor the categories forC_mCan be characterized asF_nThe number of samples. When the smooth valuea = 1with0 < a < 1Is called asLaplaceSmooth, whena = 0When do not do smoothing. Is really its ideas for each category of all count + 1, so that if the training sample number is large enough, will not affect the results, and solves the P (F | C) “role =” presentation “style =” position: relative;” > P (F | C) P (F | C) of the phenomenon of frequency of 0 (a under the category of a feature partition does not appear, this will seriously affect the quality of the classifier).

  • Bernoulli Naive Bayes: Bernoulli is suitable for scenarios where feature attributes are binary. Bernoulli values each feature based on Boolean values. A typical example is to determine whether a word appears in the text.

Implementation of naive Bayes


With enough theory in mind, we are going to implement a Gaussian Naive Bayes in Python to solve the diabetes problem of the Pima people. The sample data (available from this hyperlink) is a CSV file with each value as a number. The document describes real-time measurements from the patient’s age, number of pregnancies and blood test results. Each record has a category value (a Boolean value, expressed as 0 or 1) that indicates whether the patient has had diabetes in the last five years. This is a data set that has been studied extensively in the machine learning literature, and a good prediction accuracy would be between 70% and 76%. The meaning of each column of sample data is as follows:

Column 1: Number of pregnancies Column 2: Plasma glucose concentration in the oral glucose tolerance test (2h) Column 3: cardiac diastolic pressure (mm Hg) Column 4: Triceps skin fold thickness (mm) Column 5: Serum insulin within 2h (MU U/ mL) Column 6: Body mass index ((weight in kg/(height in m)^2)) Column 7: Family effect column 8: Age column 9: Category Boolean value, 0 is 5 years without diabetes, 1 to 5 years had diabetes -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6148,72,35,0,33.6, 0.627, 1,85,66,29,0,26.6 50, 1, 0.351, 31, 0 1,89,66,23,94,28.1,64,0,0,23.3 8183, 0.672, 32, 1, 0.167, 21 0137,40,35,168,43.1, 0, 2.288, 33, 1...Copy the code

The first thing to do is read this CSV file and parse it into a data structure that we can use directly. Due to the sample data file without any empty lines and mark symbol, is each row corresponds to a row, simply put each line can be encapsulated into a list (return results as a list, it is every element contains a row of the list), pay attention to the data in the file are for the digital, need to do first type conversion.

import csv
def load_csv_file(filename):
    with open(filename) as f:
        lines = csv.reader(f)
        data_set = list(lines)
    for i in range(len(data_set)):
        data_set[i] = [float(x) for x in data_set[i]]
    return data_setCopy the code

After obtaining the sample data, the model needs to be divided into training data sets (which naive Bayes needs to use for prediction) and test data sets in order to evaluate the accuracy of the model. The data is randomly selected in the process of segmentation, but we will choose a ratio to control the size of the training data set and test data set, generally 67% : 33%, which is a relatively common ratio.

import random
def split_data_set(data_set, split_ratio):
    train_size = int(len(data_set) * split_ratio)
    train_set = []
    data_set_copy = list(data_set)
    while len(train_set) < train_size:
        index = random.randrange(len(data_set_copy))
        train_set.append(data_set_copy.pop(index))
    return [train_set, data_set_copy]Copy the code

After sample data is segmented, more detailed processing is required for the training data set. Gaussian Naive Bayes assumes that every feature follows normal distribution, so abstracts need to be extracted from the training data set, which contains mean and standard deviation. The number of abstracts is determined by the combination number of categories and feature attributes. If you have three categories and seven characteristic attributes, then you need to calculate the mean and standard deviation for each characteristic attribute and category, which is 21 summaries.

Before calculating the summary of the training data set, our first task is to separate the characteristics and categories of the training data set, that is, to construct a hash table with the key as the category and the value as the row of the data in that class.

def separate_by_class(data_set, class_index):
    result = {}
    for i in range(len(data_set)):
        vector = data_set[i]
        class_val = vector[class_index]
        if (class_val not in result):
            result[class_val] = []
        result[class_val].append(vector)
    return resultCopy the code

Since you already know that there is only one category and that it is at the end of each row, you simply pass -1 to the class_index argument. Then a summary of the training data set is calculated (the mean and standard deviation of each characteristic attribute in each category). The mean is used as the median of the normal distribution, while the standard deviation describes how discrete the data is, and is used as the expected distribution of each characteristic attribute in the normal distribution when calculating the probability.

The standard deviation is the square root of the variance, which can be obtained by first finding the variance (the average of the square sum of the difference between each eigenvalue and the mean).

import math
def mean(numbers):
    return sum(numbers) / float(len(numbers))
def stdev(numbers):
    avg = mean(numbers)
    variance = sum([pow(x - avg, 2) for x in numbers]) / float(len(numbers))
    return math.sqrt(variance)Copy the code

With these auxiliary functions, it is easy to calculate the summary by constructing the hash table with key as the category from the training data set, and then calculating the mean and standard deviation according to the category and each feature.

def summarize(data_set): Use the zip function to encapsulate the NTH attribute in each element into a tuple. Summariecheck is to package each column into a tuple summaries = [(mean(feature), Stdev (feature)) for feature in zip(*data_set)] del summaries[-1] # summarize_by_class(data_set): class_map = separate_by_class(data_set, -1) summaries = {} for class_val, data in class_map.items(): summaries[class_val] = summarize(data) return summariesCopy the code

The data processing stage has been completed, and the following task is to make predictions based on the training data set. This stage needs to calculate the class probability and the conditional probability of each feature and category, and then select the category with the highest probability as the classification result. The key is to calculate the conditional probability, using the density function of the normal distribution, which depends on the parameters (characteristic, mean, standard deviation) that we have already prepared.

def calculate_probability(x, mean, stdev): exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2)))) return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent def calculate_conditional_probabilities(summaries, input_vector): probabilities = {} for class_val, class_summaries in summaries.items(): probabilities[class_val] = 1 for i in range(len(class_summaries)): Typecheck link Mean, stdev = class_summariecheck [I] # input_vector is a row of data in test_set, X is a characteristic attribute in this line x = input_vector[I] # multiply the probabilities probabilities[class_val] *= calculate_probability(x, mean, stdev) return probabilitiesCopy the code

The function calculate_conditional_probabilities() returns a hash table with the key of the category and the value of the probability. The hash table records the conditional probability of each feature category, and then only needs to select the category with the highest probability.

def predict(summaries, input_vector):
    probabilities = calculate_conditional_probabilities(summaries, input_vector)
    best_label, best_prob = None, -1
    for class_val, probability in probabilities.items():
        if best_label is None or probability > best_prob:
            best_label = class_val
            best_prob = probability
    return best_labelCopy the code

Finally, we define a function to predict the accuracy of the model for each data instance in the test data set. This function returns a list of predicted values for each data instance. Based on this return value, the accuracy of the prediction can be evaluated.

def get_predictions(summaries, test_set): predictions = [] for i in range(len(test_set)): result = predict(summaries, test_set[i]) predictions.append(result) return predictions def get_accuracy(predictions, Detecset): Correct = 0 for x in range(Len (test_set)): the predictions are correct if test_set[x][-1] == Predictions [x]: Correct += 1 return (correct/float(len(test_set)) * 100.0Copy the code

The complete code is as follows:

import csv, random, math """ A simple classifier base on the gaussian naive bayes and problem of the pima indians diabetes. (https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes) """ def load_csv_file(filename): with open(filename) as f: lines = csv.reader(f) data_set = list(lines) for i in range(len(data_set)): data_set[i] = [float(x) for x in data_set[i]] return data_set def split_data_set(data_set, split_ratio): train_size = int(len(data_set) * split_ratio) train_set = [] data_set_copy = list(data_set) while len(train_set) < train_size: index = random.randrange(len(data_set_copy)) train_set.append(data_set_copy.pop(index)) return [train_set, data_set_copy] def separate_by_class(data_set, class_index): result = {} for i in range(len(data_set)): vector = data_set[i] class_val = vector[class_index] if (class_val not in result): result[class_val] = [] result[class_val].append(vector) return result def mean(numbers): return sum(numbers) / float(len(numbers)) def stdev(numbers): avg = mean(numbers) variance = sum([pow(x - avg, 2) for x in numbers]) / float(len(numbers)) return math.sqrt(variance) def summarize(data_set): summaries = [(mean(feature), stdev(feature)) for feature in zip(*data_set)] del summaries[-1] return summaries def summarize_by_class(data_set): class_map = separate_by_class(data_set, -1) summaries = {} for class_val, data in class_map.items(): summaries[class_val] = summarize(data) return summaries def calculate_probability(x, mean, stdev): exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2)))) return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent def calculate_conditional_probabilities(summaries, input_vector): probabilities = {} for class_val, class_summaries in summaries.items(): probabilities[class_val] = 1 for i in range(len(class_summaries)): mean, stdev = class_summaries[i] x = input_vector[i] probabilities[class_val] *= calculate_probability(x, mean, stdev) return probabilities def predict(summaries, input_vector): probabilities = calculate_conditional_probabilities(summaries, input_vector) best_label, best_prob = None, -1 for class_val, probability in probabilities.items(): if best_label is None or probability > best_prob: best_label = class_val best_prob = probability return best_label def get_predictions(summaries, test_set): predictions = [] for i in range(len(test_set)): result = predict(summaries, test_set[i]) predictions.append(result) return predictions def get_accuracy(predictions, test_set): correct = 0 for x in range(len(test_set)): if test_set[x][-1] == predictions[x]: Return (correct/float(len(test_set))) * 100.0 def main(): Filename = 'pima-indians-diabetes.data.csv' split_ratio = 0.67 data_set = load_csv_file(filename) train_set, test_set = split_data_set(data_set, split_ratio) print('Split %s rows into train set = %s and test set = %s rows' %(len(data_set), len(train_set), len(test_set))) # prepare model summaries = summarize_by_class(train_set) # predict and test predictions = get_predictions(summaries, test_set) accuracy = get_accuracy(predictions, test_set) print('Accuracy: %s' % accuracy) main()Copy the code

reference