This is the 22nd day of my participation in the August Wen Challenge.More challenges in August

An introduction to the

Softmax is mainly for multi-classification processing. In the face of classification problems, a more general method is to convert them into dummy variables and then use Softmax regression for processing. This method is also suitable for binary classification and multi-classification problems. π›Ώπ‘˜=π‘’π‘§π‘˜\βˆ‘πΎπ‘’π‘˜, this transformation can scale the result to 0-1, and use Softmax to compare the maximum value. Compared with Max, it can effectively avoid the problem that the loss function can not be differentiated at 0 point when solving the loss function (reverse propagation is required, involving the derivation), and the function characteristics of the loss function. After data processing by Softmax, it can be realized at 0 point.

Two manual softmax regression

2.1 Modeling Process

2.1.1 Model selection

Construct a neural network consisting of only one layer for modeling. The output result of each neuron in the output layer represents the value after Softmax of a certain sample in three categories. At this time, the neural network has two layers and is fully connected. At this point, the transformation from feature to output result is no longer a simple linear equation, but softmax transformation after matrix multiplication.

0 0 def Softmax (X,w): m=torch.exp(torch.mm(X,w)) sp= Torch.sum (m,1). 0 (1) Return M /spCopy the code

X is the feature tensor, w is the matrix composed of the connection weights between the two layers, and the number of rows of W is the number of input data features, and the number of columns of W is the number of neurons in the output layer, or the total number of categories of classification problems.

2.1.2 Determine the objective function

Def m_cross_entropy(soft_z, y): y = y.long() prob_real = torch.gather(soft_z, 1, y) return (-(1/y.numel()) * torch.log(torch.prod(prob_real)))Copy the code

The cross entropy loss function is essentially a function equation of w parameter. We also regard W as a leaf node when carrying out back propagation and gradually update the value of W through gradient calculation.

2.1.3 Define the optimization algorithm

Def m_accuracy(soft_z, y): acc_bool = torch.argmax(soft_z, 1).flatten() == y.flatten() acc = torch. Mean (acc_bool. Float ()) return(acc) # def SGD (params, lr): params.data -= lr * params.grad params.grad.zero_()Copy the code

2.1.4 Training model

Torch. Manual_seed (300), the features, labels = tensorGenCla (bias = True, deg_dispersion = [6, 2]) Scatter (features[:,0], Features [:,1], C =labels, Cmap ='rainbow') Torch. Manual_seed (300) # Set random seed batch_size=10 # Number of each small batch Lr =0.03 # learning rate num_epochs=3 # Number of passes w=torch. Randn (3,3,requires_grad=True) # Set the initial weight randomly. Net =softmax # Use the regression equation loss=m_cross_entropy Train_acc =[] # model training for epoch in range(num_epochs): for X,y in data_iter(batch_size,features,labels): l=loss(net(X,w),y) l.backward() sgd(w,lr) train_acc=m_accuracy(net(features,w),labels) print('epoch %d, acc %f'%(epoch+1,train_acc))Copy the code

View model results

w
Copy the code

2.1.5 Model debugging

Several iterations were performed to observe the convergence rate of the model

Torch. manual_seed(300) # set the initial weight w=torch.randn(3,3,requires_grad=True) train_acc=[] for I in range(num_epochs): for epochs in range(i): for X,y in data_iter(batch_size,features,labels): l=loss(net(X,w),y) l.backward() sgd(w,lr) train_acc.append(m_accuracy(net(features,w),labels)) plt.plot(list(range(num_epochs)),train_acc)Copy the code

Similar to the previous logistic regression experiment results, the model converges faster when the internal dispersion degree of data is low. It is speculated that different random values of W in each round of epoch will affect the model’s convergence rate.

For I in range(10): w=torch. Randn (3,3,requires_grad=True) for the epoch in range(10): for X,y in data_iter(batch_size,features,labels): l=loss(net(X,w),y) l.backward() sgd(w,lr) train_acc.append(m_accuracy(net(features,w),labels)) plt.plot(list(range(10)),train_acc)Copy the code

Although the random value of initial W will affect the accuracy of the earlier model, the initial value points of the loss function are different when the overall convergence rate is fast, but the minimum point can be found (approached) through successive iterations of the gradient descent algorithm. This not only verifies the effectiveness of the gradient descent algorithm itself, but also indicates that it is not difficult to find (approximate) the minimum point of the loss function for this data set.

Softmax regression is realized by three tuning libraries

3.1 Tuning library modeling process

Batch_size =10 # Number of batches lr=0.03 # learning rate num_epochs=3 # Number of times to traverse data in the training process Torch. Manual_seed (300) # Create data set Features,labels=tensorGenCla(deg_dispersion=[6,2]) labels= allelages.float () # the labels of the loss function must be floating point data=TensorDataset(features,labels) batchData=DataLoader(data,batch_size=batch_size,shuffle=True) featuresCopy the code

3.1.1 Defining core parameters

Fast sofrmax regression by tuning libraries

Batch_size =10 # Number of batches lr=0.03 # learning rate num_epochs=3 # Number of times to traverse data in the training process Torch. Manual_seed (300) # Create data set Features,labels=tensorGenCla(deg_dispersion=[6,2]) labels= allelages.float () # the labels of the loss function must be floating point data=TensorDataset(features,labels) batchData=DataLoader(data,batch_size=batch_size,shuffle=True) featuresCopy the code

3.1.2 Defining the model

class softmaxR(nn.Module):
    def __init__(self,in_features=2,out_features=3,bias=False):
        super(softmaxR,self).__init__()
        self.linear=nn.Linear(in_features,out_features)
    def forward(self,x):
        out=self.linear(x)
        return out
softmax_model=softmaxR()
Copy the code

3.1.3 Define the loss function

Criterion =nn.CrossEntropyLoss()Copy the code

3.1.4 Define optimization methods

Optimizer = optim.sgd (softmax_model.parameters(),lr=lr)Copy the code

3.1.5 Model training

# def training model fit (.net, criterion, the optimizer, batchdata, epochs) : for epoch in range (epochs) : for the X, y in batchdata: zhat=net.forward(X) y=y.flatten().long() loss=criterion(zhat,y) optimizer.zero_grad() loss.backward() optimizer.step() fit(net=softmax_model ,criterion=criterion ,optimizer=optimizer ,batchdata=batchData ,epochs=num_epochs )Copy the code

3.1.6 Viewing Models and Parameters

List (softmax_model.parameters())Copy the code

3.1.7 Calculate cross entropy and accuracy

Criterion (softmax_model(features), allelage.flatten ().long()) M_accuracy (F.softmax(softMAX_model (features),1),labels)Copy the code

3.2 Model Parameters

Iterate the above model several times to see if the speed and accuracy are improved.

Textkey =tensorGenCla(deg_dispersion=[6,2]) labels= allelages.float () Data= TensorDataset(features,labels) batchData=DataLoader(data,batch_size=batch_size,shuffle=True) # Set random number seed SF1=softmaxR() cr1= nn.CrossentRopyLoss () Op1 = optim.sgd (sf1.parameters (),lr=lr) train_acc=[] for epochs in range(num_epochs): fit(net=SF1 ,criterion=cr1 ,optimizer=op1 ,batchdata=batchData ,epochs=epochs ) Epoch_acc =m_accuracy(F.sotmax (SF1(features),1),labels) train_acc.append(epoch_ACC) # plot to check the change of accuracy plt.plot(list(range(num_epochs)),train_acc)Copy the code

As with the manual implementation, the model here also shows very fast convergence. When num_epochs=20, the SF1 parameter has been trained (19+18+… + 1) times.

3.3 Increase the difficulty of model complex classification

Torch. Manual_seed (300) features,labels=tensorGenCla(deg_dispersion=[6,6]) # increase model complexity = allelage.float () data=TensorDataset(features,labels) batchData=DataLoader(data,batch_size=batch_size,shuffle=True) plt.scatter(features[:,0],features[:,1],c=labels,cmap='rainbow')Copy the code

Num_epochs =30 SF1=softmaxR() cr1= nn.CrossentRopyLoss () Op1 = optim.sgd (sf1.parameters (),lr=lr) train_acc=[] for epochs in range(num_epochs): fit(net=SF1 ,criterion=cr1 ,batchdata=batchData ,optimizer=op1 ,epochs=epochs ) Epoch_acc = m_accuracy (F.s oftmax (SF1 (features), 1), labels) train_acc. Append (epoch_acc) PLT. The grid (alpha = 0.3) plt.plot(list(range(num_epochs)),train_acc)Copy the code

The convergence rate is still very fast, and the model soon reaches a relatively stable state. However, similar to the previous logistic regression experiment, although the model results are relatively stable, the model accuracy is not high due to the increased difficulty of data set classification, basically maintaining at about 65%. Generally speaking, at this point, it means that the model has reached the upper bound of discriminant effectiveness, and the model can no longer effectively capture the law of the data set. Basically, the model has reached (approached) the minimum point of the loss function, but the evaluation index of the model cannot continue to improve. First, we can initially choose multiple w’s to observe whether the loss function has approached the minimum point rather than falling near the local minimum point.

3.4 Initializing the W value

Cr1 =nn.CrossEntropyLoss() train_acc=[] for I in range(num_epochs): SF1=softmaxR() op1=optim.SGD(SF1.parameters(),lr=lr) fit(net=SF1 ,criterion=cr1 ,optimizer=op1 ,batchdata=batchData ,epochs=epochs) epoch_ACC =m_accuracy(F.sotmax (SF1(features),1),labels) train_acc.appEnd (epoch_ACC) plt.grid(alpha=0.3)  plt.plot(list(range(num_epochs)), train_acc)Copy the code

After initializing different W, it was found that the final accuracy rate of the model was still about 65%, which also confirmed from the side that there was no problem in the iterative process and the model had reached (approached) the minimum point. In other words, the problem is not the solution of the loss function, but the construction of the loss function. At this point, even if the loss function achieves the minimum point, it cannot further improve the effect of the model. The construction of the loss function is directly related to the construction of the model, so the model structure needs to be adjusted in order to further improve the model effect