“This is the third day of my participation in the First Challenge 2022, for more details: First Challenge 2022”.

preface

The so-called machine learning, in most of the time, is to take the existing model to make some simple modifications and then start “alchemy”, the main job is to adjust the parameter, so the people’s lake called “adjustment paramete” or “alchemy”. Therefore, I would like to sort out and summarize some commonly used machine learning models, on the one hand, as personal learning notes, and on the other hand, for the convenience of friends who click in to copy the code can directly start “refining”, and strive to “out of the box”.

View before hint: the level is limited, small dish chicken is here first to each big guy compensate not πŸ™.

The order of carding is basically based on time, which generally conforms to the development process of machine learning algorithm. All models will provide Pytorch implementation and briefly introduce its principle. This paper introduces perceptron, the originator of neural network. Below begins the text πŸ‘‡

Preparation for perceptrons

Perceptron, also known as “artificial neuron” or “naive Perceptron”, is the basic unit of neural network. This paper first introduces the basic principle of Perceptron, and then gives the Pytorch realization of Perceptron model combined with specific classification tasks.

1.Rosenblatt

Rosenblatt is the originator of neural network. He proposed the theory of Perceptron in 1957. In 1960, he built a neural network based on hardware. However, this achievement was questioned by Marvin Minksy and Seymour Papert, which made Perceptron quiet for nearly 20 years. It was not until Hinton invented BP algorithm in 1980s that it became popular.

2. Fundamentals

Assuming that the input space (eigenspace) is x∈Rnx\in R^n x∈Rn and the output space is y∈{1,βˆ’1}y\in\{1,-1\}y∈{1,βˆ’1}, then the function from the input space to the output space is: F (x) = sign (wx + b) f (x) = sign (wx + b) f (x) = sign (wx + b) is known as the perceptron. Where WWW is called weight or weight vector, BBB is called bias, and signsignSign is a sign function:


s i g n ( x ) = { 1 . x p 0 1 . x < 0 sign(x)=\begin{cases}1,x\geq0\\-1,x<0\end{cases}

A given data set T = {(x1, y1), (x2, y2),…, xn, yn)} T = \ {(x_1, y_1), (x_2, y_2), \ cdots, (x_n, y_n) \} T = {(x1, y1), (x2, y2),…, xn, yn)}, Then, the classification learning process using perceptron is equivalent to solving the following minimization problem:


m i n L ( w . b ) = βˆ‘ x i ∈ M y i ( w x i + b i ) min L(w,b)=-\sum_{x_i\in M}y_i(wx_i+b_i)

Where MMM is a collection of misclassification points, that is to say, the perceptron is driven by misclassification points. For WWW and BBB updates, random gradient descent (SGD) is adopted:


w i + 1 = w i eta partial L ( w . b ) partial w b i + 1 = b i eta partial L ( w . b ) partial b w^{i+1}=w^i – \eta\frac{\partial L(w,b)}{\partial w}\\ b^{i+1}=b^i – \eta\frac{\partial L(w,b)}{\partial b}

Where, Ξ·\etaΞ· is called the learning rate.

The single-layer perceptron model classifies the toy data

  • Guide package
import numpy as np
import matplotlib.pyplot as plt
import torch
%matplotlib inline
Copy the code
  • Load the data
data = np.genfromtxt('.. /data/perceptron_toydata.txt', delimiter='\t') X, y = data[:, :2], data[:, 2] y = y.astype(np.int) print('Class label counts:', np.bincount(y)) print('X.shape:', X.shape) print('y.shape:', y.shape)Copy the code

The output is πŸ‘‡

Class label counts: [50 50]

X.shape: (100, 2)

y.shape: (100,)

Shuffled the data and randomly divided the training and test sets

Shuffle_idx = np.arange(y.shape[0]) shuffle_rng = NP.random.randomState (123) Shuffle_idx X, y = X[shuffLE_IDx], y[shuffle_IDx] X_train, shuffLE_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]] y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]Copy the code

After z-Score standardization, the mean value and variance of the standardized data are 0 and 1, and the distribution of characteristic data does not change after standardization.

Linear models generally require data normalization/standardization processing, such as KNN(K Nearest Neighbor), K-means clustering, perceptron and SVM.

Ensemble learning models such as Boosting and Bagging, which are based on decision tree and decision tree, are not sensitive to the size of feature values, such as tree models such as Random forest, XGBoost and LightGBM, as well as naive Bayes. Generally, these models do not need data normalization/standardization.

# Normalize (mean zero, unit variance)
mu, sigma = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma
Copy the code

Data scatter chart πŸ‘‡, it can be clearly divided into two categories.

plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.legend()
plt.show()
Copy the code

  • The model definition
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


def custom_where(cond, x_1, x_2):
    return (cond * x_1) + ((~cond) * x_2)


class Perceptron():
    def __init__(self, num_features):
        self.num_features = num_features
        self.weights = torch.zeros(num_features, 1, 
                                   dtype=torch.float32, device=device)
        self.bias = torch.zeros(1, dtype=torch.float32, device=device)

    def forward(self, x):
        linear = torch.add(torch.mm(x, self.weights), self.bias)
        predictions = custom_where(linear > 0., 1, 0).float()
        return predictions
        
    def backward(self, x, y):  
        predictions = self.forward(x)
        errors = y - predictions
        return errors
        
    def train(self, x, y, epochs):
        for e in range(epochs):
            
            for i in range(y.size()[0]):
                # use view because backward expects a matrix (i.e., 2D tensor)
                errors = self.backward(x[i].view(1, self.num_features), y[i]).view(-1)
                self.weights += (errors * x[i]).view(self.num_features, 1)
                self.bias += errors
                
    def evaluate(self, x, y):
        predictions = self.forward(x).view(-1)
        accuracy = torch.sum(predictions == y).float() / y.size()[0]
        return accuracy
Copy the code
  • Model training
ppn = Perceptron(num_features=2)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32, device=device)

ppn.train(X_train_tensor, y_train_tensor, epochs=10)

print('Model parameters:')
print('Weights: %s' % ppn.weights)
print('Bias: %s' % ppn.bias)
Copy the code

The output is πŸ‘‡

Model parameters:

Weights: tensor([[1.2734], [1.3464]])

Bias: tensor([-1.])

  • Model to evaluate
X_test_tensor = torch.tensor(X_test, dtype=torch.float32, device=device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32, device=device)

test_acc = ppn.evaluate(X_test_tensor, y_test_tensor)
print('Test set accuracy: %.2f%%' % (test_acc*100))
Copy the code

The output is πŸ‘‡

The Test set accuracy: 93.33%

rendering

w, b = ppn.weights, ppn.bias

x_min = -2
y_min = ( (-(w[0] * x_min) - b[0]) 
          / w[1] )

x_max = 2
y_max = ( (-(w[0] * x_max) - b[0]) 
          / w[1] )


fig, ax = plt.subplots(1, 2, sharex=True, figsize=(7, 3))

ax[0].plot([x_min, x_max], [y_min, y_max])
ax[1].plot([x_min, x_max], [y_min, y_max])

ax[0].scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
ax[0].scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')

ax[1].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='class 0', marker='o')
ax[1].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='class 1', marker='s')

ax[1].legend(loc='upper left')
plt.show()
Copy the code

Multilayer Perceptron model & Handwritten Number Recognition

  • Guide package
import time import numpy as np from torchvision import datasets from torchvision import transforms from torch.utils.data  import DataLoader import torch.nn.functional as F import torch if torch.cuda.is_available(): torch.backends.cudnn.deterministic = TrueCopy the code
  • Parameter Settings
# Device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

# Hyperparameters
random_seed = 1
learning_rate = 0.1
num_epochs = 10
batch_size = 64

# Architecture
num_features = 784
num_hidden_1 = 128
num_hidden_2 = 256
num_classes = 10
Copy the code
  • Load the data
train_dataset = datasets.MNIST(root='data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='data', 
                              train=False, 
                              transform=transforms.ToTensor())


train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size, 
                          shuffle=True)

test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size, 
                         shuffle=False)

# Checking the dataset
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break
Copy the code

Transforms.totensor () scales the input image to the 0-1 range, and the output is πŸ‘‡

Image batch dimensions: torch.Size([64, 1, 28, 28])

Image label dimensions: torch.Size([64])

  • The model definition
class MultilayerPerceptron(torch.nn.Module): def __init__(self, num_features, num_classes): super(MultilayerPerceptron, self).__init__() ### 1st hidden layer self.linear_1 = torch.nn.Linear(num_features, Num_hidden_1) # Weight initialization, by default, Initialize self.linear_1.weigh.detach ().normal_(0.0, 0.1) self.linear_bias.detach().zero_() # self.linear_1_BN = torch.nn.BatchNorm1d(num_hidden_1) ### 2nd Hidden Layer Linear_2 = torch. Nn.Linear(num_hidden_1, num_hidden_2) self.linear_weight.detach ().normal_(0.0, Self.linear_bias.bias. Zero_ () ### Output layer self.linear_Out = Torch.nn.Linear(num_hidden_2, Linear_out.weight.detach ().normal_(0.0, 0.1) self.linear_out.bias. Detach ().zero_() def forward(self, linear_out.bias. x): out = self.linear_1(x) out = F.relu(out) #out = self.linear_1_bn(out) out = self.linear_2(out) out = F.relu(out) #out = F.dropout(out, p=dropout_prob, training=self.training) logits = self.linear_out(out) probas = F.log_softmax(logits, dim=1) return logits, probas torch.manual_seed(random_seed) model = MultilayerPerceptron(num_features=num_features, num_classes=num_classes) model = model.to(device) optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)Copy the code

In the # notation in the code above, BatchNorm speeds up deep network training by reducing the internal covariable offset, and Dropout uses samples from Bernoulli distributions to randomly zero some elements of the input tensor with probability P, Is a common way to deal with overfitting.

  • Model training
def compute_accuracy(net, data_loader):
    net.eval()
    correct_pred, num_examples = 0, 0
    with torch.no_grad():
        for features, targets in data_loader:
            features = features.view(-1, 28*28).to(device)
            targets = targets.to(device)
            logits, probas = net(features)
            _, predicted_labels = torch.max(probas, 1)
            num_examples += targets.size(0)
            correct_pred += (predicted_labels == targets).sum()
        return correct_pred.float()/num_examples * 100
    
Copy the code

Calculation accuracy ☝

start_time = time.time()
minibatch_cost = []
epoch_acc = []
for epoch in range(num_epochs):
    model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.view(-1, 28*28).to(device)
        targets = targets.to(device)
            
        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        minibatch_cost.append(cost)
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    with torch.set_grad_enabled(False):
        acc = compute_accuracy(model, train_loader)
        epoch_acc.append(acc)
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, acc))
        
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Copy the code

Visualization of training process

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(range(len(minibatch_cost)), minibatch_cost)
plt.ylabel('Train loss')
plt.xlabel('Minibatch')
plt.show()

plt.plot(range(len(epoch_acc)), epoch_acc)
plt.ylabel('Train Acc')
plt.xlabel('Epoch')
plt.show()
Copy the code

☝ is wrong because every element of minibatch_cost is a tensor with a gradient. You can’t translate it into Numpy.

minibatch_cost = [a.detach().numpy() for a in minibatch_cost]
Copy the code

The loss and accuracy changes of running 50 epochs are shown below πŸ‘‡

  • Model to evaluate

Accuracy on the test set

print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Copy the code

The results are as follows πŸ‘‡

The Test accuracy: 98.04%

for features, targets in test_loader:
    break

_, predictions = model.forward(features[:4].view(-1, 28*28))
predictions = torch.argmax(predictions, dim=1)
predictions = predictions.tolist()

fig, ax = plt.subplots(1, 4)
for i in range(4):
    ax[i].imshow(features[i].view(28, 28), cmap=matplotlib.cm.binary)
    ax[i].set_title("Predicted:" + str(predictions[i]))

plt.show()
Copy the code

❀️ thank you

Thank you all for reading this, if you find it helpful:

  1. Please support it, so that more people can see this content (no one likes the dish chicken can be too difficult 🀑, big guys do not spray -_-)
  2. Share your thoughts with me in the comments section, and record your thought process in the comments section.

Thank you again for your encouragement and support 🌹🌹🌹