Exercise 7.3 of Zhou Zhihua's Machine Learning--Pthon Implementing Naive Bayesian Classifier

1. Title

Laplacian modified naive Bayesian classifier is implemented in the trial program, and p151 "Test 1" sample is discriminated using watermelon dataset 3.0 as training set.

Watermelon dataset 3.0 is as follows:

Measure 1 is as follows:

Text-attached watermelon dataset:

Green, curled, muddy, clear, sunken, hard and slippery, 0.697,0.460, yes
Black, curled, dull, clear, sunken, hard and slippery, 0.774,0.376, yes
Black, curled, muddy, clear, sunken, hard and slippery, 0.634,0.264, yes
Green, curled, dull, clear, sunken, hard, slippery, 0.608, 0.318, yes
Light white, curled, turbid, clear, sunken, hard and slippery, 0.556,0.215, yes
Green, slightly curled, muddy, clear, slightly concave, soft and sticky, 0.403, 0.237, yes
Dark, slightly coiled, muddy, slightly pasty, slightly concave, soft and sticky, 0.481,0.149, yes
Dark, slightly curled, muddy, clear, slightly concave, hard and slippery, 0.437,0.211, yes
Dark, slightly coiled, dull, slightly blurry, slightly concave, hard and slippery, 0.666,0.091, no
Green, firm, crisp, clear, flat, soft and sticky, 0.243, 0.267, no
Light white, firm, crisp, blurry, flat, hard and slippery, 0.245, 0.057, no
Light white, curled, muddy, blurred, flat, soft and sticky, 0.343,0.099, no
Turquoise, slightly curled, muddy, slightly blurred, sunken, hard and slippery, 0.639, 0.161, no
Light white, slightly curled, dull, slightly blurry, sunken, hard and slippery, 0.657,0.198, no
Dark, slightly curled, muddy, clear, slightly concave, soft and sticky, 0.360,0.370, no
Light white, curled, muddy, blurred, flat, hard, 0.593,0.042, no
Turquoise, curled, dull, slightly blurry, slightly concave, hard and slippery, 0.719,0.103, no

2. Ideas + Code

The first step is to get the dataset, put the training data in x and the training label in y.

import numpy as np
import matplotlib.pyplot as plt
import math

def get_data():
    x = []
    y = []
    with open("./xigua.txt", 'r') as f:
        for line in f.readlines():
            words = line.split(',')
            if 'yes' in words[8]:
            elif 'no' in words[8]:
    return np.array(x), np.array(y)

Then you can start the training process to implement the learner. Before we can do this, we need to be clear that the prediction process of the Naive Bayesian classifier is written in one formula:
h ( x ) = a r g m a x c ∈ y P ( c ) ∏ d i = 1 P ( x i ∣ c ) h(\boldsymbol{x}) = \underset{c\in y}{argmax} P(c)\prod_{d}^{i=1}P(x_{i}|c) h(x)=c∈yargmax​P(c)d∏i=1​P(xi​∣c)
h(x) denotes the trained naive Bayesian classifier, and X denotes the sample or sample set.
The c below argmax represents the category, and y is the entire category.
P(c) denotes a "priori probability" for each category, where the Naive Bayesian classifier directly takes the proportion of each category in the training set as a priori probability (for example, for this topic, P(c = melon) is 8/17). Write it out in the form:
p ( c ) = ∣ D c ∣ ∣ D ∣ p(c) = \frac{\left | D_{c} \right |}{\left | D \right |} p(c)=∣D∣∣Dc​∣​
P ( x i ∣ c ) P(x_{i}|c) P(xi c) is the conditional probability, indicating that given a category, the probability that the first attribute in the corresponding data is equal to Xi (e.g., P(x1 = Turquoise | C = cucumber) indicates that the color of a cucumber is turquoise). Write it out in the form:
P ( x i ∣ c ) = ∣ D c , x i ∣ ∣ D c ∣ P(x_{i}|c) = \frac{\left | D_{c,x_{i}} \right |}{\left | D_{c} \right |} P(xi​∣c)=∣Dc​∣∣Dc,xi​​∣​
Above is the calculation method when the attribute is discrete. For continuous attributes, we can assume that the attribute obeys ortho-too distribution, and then Mu and sigma (mu is the mean, sigma^2 is the variance) are estimated from the training data. Then the distribution of the attribute score is calculated, and then the attribute value is brought into the ortho-too distribution formula to get the probability result as conditional probability.

In addition, there is a special case where one probability is zero, so that in the process of multiplying them, the whole is zero and the other probabilities that are not zero do not work. To avoid this situation, a Laplacian correction is needed. In fact, it is very simple to add one to each term in the process of calculating the probability, so that the conditional probability is not zero, and the modified P(c) and P ( x i ∣ c ) P(x_{i}|c) P(xi c) is:
p ^ ( c ) = ∣ D c ∣ + 1 ∣ D ∣ + N \widehat{p}(c) = \frac{\left | D_{c} \right | + 1}{\left | D \right | + N} p ​(c)=∣D∣+N∣Dc​∣+1​
p ^ ( c ) = ∣ D c , x i ∣ + 1 ∣ D c ∣ + N i \widehat{p}(c) = \frac{\left | D_{c,x_{i}} \right |+1}{\left | D_{c} \right |+N_{i}} p ​(c)=∣Dc​∣+Ni​∣Dc,xi​​∣+1​
Where N is the number of possible categories, and N i is the number of possible values for attribute i I.

So we find that the training process actually calculates and stores all the above probability values. When predicting, you only need to take out the corresponding probability and then apply it to the h(x) formula.

By understanding the above process, you can implement the code:

def train_model():
    # Store classifier probability values in model
    model = {}
    train_data, train_lable = get_data()
    # First calculate the prior probability of the positive and negative classes
    # model['pri_p_c1'] is a priori probability of positive class and model['pri_p_c0'] is a priori probability of negative class
    positive_cnt = np.sum(train_lable==1)
    negative_cnt = np.sum(train_lable==0)
    model['pri_p_c1'] = (positive_cnt + 1) / ((positive_cnt + negative_cnt) + 2)
    model['pri_p_c0'] = (negative_cnt + 1) / ((positive_cnt + negative_cnt) + 2)
    con_p_c1 = []
    con_p_c0 = []
    # Cyclically calculates the conditional probability,
    # The first level loops through each attribute (excluding continuous attributes), and i represents each attribute
    for i in range(len(train_data[0])-2):
        # cnt_in_c1 and cnt_in_c0 storage, in positive and negative classes, each value of attribute i corresponds to the number of samples
        # Example: cnt_in_c1['Turquoise'] means "there are several melons in a good melon with a turquoise color"
        cnt_in_c1 = dict()
        cnt_in_c0 = dict()
        # The second level loops through all the training data, and j represents each training data
        for j in range(len(train_data)):
            if not train_data[j][i] in cnt_in_c1.keys():
                cnt_in_c1[train_data[j][i]] = 0
            if not train_data[j][i] in cnt_in_c0.keys():
                cnt_in_c0[train_data[j][i]] = 0
            if train_lable[j] == 1:
                cnt_in_c1[train_data[j][i]] += 1
            elif train_lable[j] == 0:
                cnt_in_c0[train_data[j][i]] += 1
        # p_xi_given_c1 represents conditional probability
        # Example: p_xi_given_c1['Turquoise'] denotes the probability that a good melon will turn Turquoise
        p_xi_given_c1 = {}
        for key in cnt_in_c1.keys():
            # Laplace Amendment requires molecule+1, denominator+n
            p_xi_given_c1[key] = (cnt_in_c1[key] + 1) / (positive_cnt + len(cnt_in_c1))
        p_xi_given_c0 = {}
        for key in cnt_in_c0.keys():
            p_xi_given_c0[key] = (cnt_in_c0[key] + 1) / (negative_cnt + len(cnt_in_c0))
    # Traverse through each continuous attribute, i for each attribute
    for i in range(len(train_data[0])-2, len(train_data[0])):
        p_xi_given_c1 = dict()
        p_xi_given_c0 = dict()
        p_xi_given_c1['mu'] = 0
        p_xi_given_c1['sigma'] = 0
        p_xi_given_c0['mu'] = 0
        p_xi_given_c0['sigma'] = 0
        # Calculate mean
        for j in range(len(train_data)):
            if train_lable[j] == 1:
                p_xi_given_c1['mu'] += float(train_data[j][i])
            elif train_lable[j] == 0:
                p_xi_given_c0['mu'] += float(train_data[j][i])
        p_xi_given_c1['mu'] = p_xi_given_c1['mu'] / positive_cnt
        p_xi_given_c0['mu'] = p_xi_given_c0['mu'] / negative_cnt
        # Calculate variance
        for j in range(len(train_data)):
            if train_lable[j] == 1:
                p_xi_given_c1['sigma'] += (float(train_data[j][i]) - p_xi_given_c1['mu'])**2
            elif train_lable[j] == 0:
                p_xi_given_c0['sigma'] += (float(train_data[j][i]) - p_xi_given_c0['mu'])**2
        p_xi_given_c1['sigma'] =  np.sqrt(p_xi_given_c1['sigma'] / positive_cnt)
        p_xi_given_c0['sigma'] = np.sqrt(p_xi_given_c0['sigma'] / negative_cnt)
    model['con_p_c1'] = con_p_c1
    model['con_p_c0'] = con_p_c0
    return model

Normal distribution function:

# Normal Distribution Formula
def N(mu, sigma, x):
    return np.exp(-(x-mu)**2/(2*sigma**2)) / (np.sqrt(2*np.pi) * sigma)

Then you can make a prediction. The prediction process is very simple, that is, according to the h(x) formula, take the desired probability from the model, and then calculate the maximum corresponding category.

def predict(model, x):
    p_c1 = model['pri_p_c1']
    p_c0 = model['pri_p_c0']
    for i in range(len(x)-2):
        p_c1 *= model['con_p_c1'][i][x[i]]
        p_c0 *= model['con_p_c0'][i][x[i]]
    for i in range(len(x)-2, len(x)):
        p_c1 *= N(model['con_p_c1'][i]['mu'], model['con_p_c1'][i]['sigma'], float(x[i]))
        p_c0 *= N(model['con_p_c0'][i]['mu'], model['con_p_c0'][i]['sigma'], float(x[i]))
    print("p_c0:", p_c0, "\np_c1", p_c1)
    return np.argmax([p_c0, p_c1])

Then start the training and print out the probability values of the training:

model = train_model() 
for key in model.keys():
    print(key, ":\n", model.get(key))

pri_p_c1: a priori probability of a positive class
pri_p_c0: a priori probability of the inverse class
con_p_c1: conditional probability of positive class
con_p_c0: Conditional probability of inverse class
The book "Measure 1" into the model to predict:

text_x1 = ['Dark green','Coiled','Turbid Sound','clear','Sunken','Hard Slide','0.697','0.460']
y = predict(model, text_x1)
if y == 1:
    print("Bad Melon")

The results are as follows:

You can see that the predictions are correct.
In addition, because the result of p152 in the book is uncorrected, it deviates from that in the book and is smaller than that without correction.

Tags: Machine Learning

Posted by JimmyD on Tue, 19 Jul 2022 02:54:34 +0930