This is the 17th day of my participation in the August More text Challenge. For details, see:August is more challenging

An introduction to the

Compared to the traditional machine learning, in-depth study of the data set is more complex, most of the time also is unable to generate the data table to see the data, in the process of modeling, design after the model structure is often directly training model, only by some indicators to measure the effect of the model, neural network internal actually is also a “black box”, So we basically control process, the input data and observations, but in the learning stage, especially in the process of optimization algorithm, we still hope to be able to from even more angles observation data, the modeling process, and here we ourselves, create some data of raw materials used in the experiment, through some experiment understanding model principle, make a black box more ‘albino’

Create regression class data set

Since the data, features and labels of the regression model are all continuous values, we can generate the continuous data of the regression model because we are now modeling the data. Two features are planned to be generated here. The coefficient relationship of the linear equation is determined by the data set with a linear relationship between the biased independent variable and the dependent variable 𝑦=2𝑥1−𝑥2+1. We will generate the corresponding random number of the independent variable (conforming to the standard normal distribution) and generate certain noise data. Because the data collected in real life often have certain errors due to various reasons, and cannot fully describe the objective laws of the real world, the method of creating data by generating data according to a certain law and artificially adding disturbance term is also a common method for creating data in the field of mathematics.

import matplotlib.pyplot as plt import matplotlib as mpl import numpy as np import random import torch from torch import Nn, optim import torch. Nn. The functional as F the from the torch. The utils. Data import Dataset, TensorDataset, DataLoader # characteristic number Num_inputs =2 NUM_examples =1000 # set torch. Manual_seed (300) 𝑦 = 1-2 𝑥 𝑥 2 + 1, Linear equation coefficient w_true=torch. Tensor ([2.,-1]). Shape (2,1) b_true=torch features=torch.randn(num_examples,num_inputs) labels_true=torch.mm(features,w_true)+b_true Labels = labels_true + torch. Randn (size = labels_true. Shape) * 0.01Copy the code

2.1 Data Exploration

Features [:,0] labels[:5] plt.subplot(1,2,1) plt.subplot(features[:,0] labels,color='r') Plt. scatter(features[:,1],labels,color='orange'Copy the code

We use Scatter instead of PLT here, because when drawing point map, when large data, scatter is faster. As can be seen from the figure, there is a certain linear relationship between the two features and tags, which is closely related to the absolute value of the feature coefficient. To increase the difficulty of modeling the linear model, the numerical proportion of the disturbance term can be increased to weaken the linear relationship.

2.2 Add disturbance term for comparison

Torch. Manual_seed (420) # change the dependent variable labels1=labels_true+torch. Randn (size=labels_true.shape)*2 # plt.scatter(features[:,0],labels,color='r') plt.subplot(222) plt.scatter(features[:,1],labels,color='orange'); # Plt.subplot (223) plt.scatter(features[:,0],labels1,color='blue') plt.scatter(features[:,1],labels1,color='green')Copy the code

By increasing the perturbation term, the linear relationship will be weakened.

2.3 Generate nonlinear data set 𝑦=𝑥**2+1

torch.manual_seed(420) num_inputs=2 num_examples=1000 w_ture=torch.tensor(2.) b_true=torch.tensor(1.) features=torch.randn(num_examples,num_inputs) labels_true=torch.pow(features,2)*w_true+b_true Labels = labels_true + torch. Randn (size = labels_true. Shape) * 0.1 PLT. Scatter (the features, labels, color = 'red').Copy the code

2.4 Encapsulation into functions

Def tensorGenReg (num_examples = 1000, w = (2, 1, 1), bias, = True, the delta = 0.01, deg = 1) : Param num_examples: Numbers of data to be created in the dataset param w: intercept (if present) vector of feature coefficients Param BIAS: whether intercept is required Param delta: coefficient of disturbance terms param deg: number of equations return: Generated feature and label tensors """ If BIAS ==True: Num_inputs =len(w)-1 Features_true =torch. Randn (NUM_examples, NUM_inputs) # Feature tensor W_true = shell.tensor (w[:-1]). Shape (w[:-1]) Labels_true =torch. Pow (features_true,deg)*w_true+b_true else: features_true = features_true labels_true=torch.mm(torch.pow(features_true,deg),w_true)+b_true features=torch.cat((features_true,torch.ones(len(features_true),1)),1) labels=labels_true+torch.randn(size=labels_true.shape)*delta else: Num_inputs = len (w) the features. = the torch randn (num_examples, num_inputs) w_true = torch. The tensor (w). Reshape (1, 1). The float () the if num_inputs==1: labels_true=torch.pow(features,deg)*w_true else: labels_true=torch.mm(torch.pow(features,deg),w_true) labels=labels_true+torech.randn(size=labels_true.shape)*delta return features,labelsCopy the code

2.4.1 Linear functions

# Test function performance, Manual_seed (300) f,l=tensorGenReg(delta=0.01) print(f) plt.subplot(223) plt.scatter(f[:,0],l) plt.subplot(224) plt.scatter(f[:,1],l)Copy the code

2.4.2 Second order functions

Manual_seed (300) f,l=tensorGenReg(deg=2) plt.subplot(2,2,3) plt.scatter(f[:,0],l) plt.subplot(224) plt.scatter(f[:,1],l)Copy the code

Create classified data sets

3.1 Creation of classified data sets

Create a tripartitic data set with two features. Each category contains 500 pieces of data. Both features of the first category are normally distributed with a mean of 4 and a standard deviation of 2, and both features of the second category are normally distributed with a mean of -2 and a standard deviation of 2. In the third category, both features are normally distributed with a mean of -6 and a standard deviation of 2

Torch. Manual_seed (300) num_inputs=2 num_examples=500 # create variable data0=torch. Normal (4,2,size=(num_examples,num_inputs) Normal (-2,2,size=(NUM_examples, NUM_inputs)) datA2 =torch. Normal (-6,2,size=(NUM_examples, NUM_inputs)) # Create tag Label0 =torch. Zeros (500) Label1 =torch. Ones (500) Label2 =torch. Full_like (label1,2 The features = torch. The cat ((data0, data1, data2)). The float () labels = torch. The cat ((label0, label1, label2)) long (). Reshape (1, 1)Copy the code
Print (features[:,0]) print(labels[:5]) plt.scatter(features[:,0],features[:,1],c=labels)Copy the code

There is less crossover between categories, so the classifier will perform well in this data set. In order to increase the classification difficulty of the classifier, the mean value of each category can be compressed and the variance can be increased, so as to increase the interlacing situation from the two-dimensional image.

3.2 Increase the difficulty of classifier classification

Torch. Manual_seed num_examples num_inputs = 2 (420) = 500 data0 = torch. The normal (3, 4, size = (num_examples num_inputs)) Data1 = torch. Normal (0, 4, size = (num_examples num_inputs)) data2 = torch. The normal (3, 4, size = (num_examples num_inputs)) Label0 = torch. Zeros (500) label1 = torch. The catalog (500) label2 = torch. Full_like (label1, 2) Features1 = torch. The cat ((data0, data1, data2)). The float () labels1 = torch. The cat ((label0, label1, label2)) long (). Reshape (1, 1) PLT. Subplot (1, 2, 1) PLT. Scatter (the features [:, 0], the features [:, 1), c = labels) PLT, subplot (1,2,2) plt.scatter(features1[:,0],features[:,1],c=labels1)Copy the code

3.3 Encapsulation into functions

Def tensorGenCla (num_examples = 500, num_inputs = 2, num_class = 3, deg_dispersion = [4, 2], bias, = False) : Param NUM_EXAMPLES: Specifies the number of data for each category. Param NUM_INPUTS: specifies the number of data for reference. Param NUM_class: specifies the number of label categories. The degree of discreteness of the data distribution. The first parameter indicates the reference of the mean of each category array. The second parameter indicates the standard deviation of the random number group. """ cluster_l=torch. Empty (num_examples,1) # Cluster _=deg_dispersion[0] # Std_ =deg_dispersion[1] # lf=[] # for dispersion of LFS ll=[] # for dispersion of LFS ll=[] # for dispersion of LFS k=mean_*(num_class-1)/2 # for dispersion of LFS range(num_class): data_temp=torch.normal(i*mean_-k,std_,size=(num_examples,num_inputs)) lf.append(data_temp) labels_temp=torch.full_like(cluster_l,i) ll.append(labels_temp) features=torch.cat(lf).float() labels=torch.cat(ll).long() if bias==True: featues=torch.cat((features,torch.ones(len(features),)),1) return features,labelsCopy the code

3.4 Comparison of discrete degree

F,l=tensorGenCla(deg_dispersion=[6,6]), f,l=tensorGenCla(deg_dispersion=[6,6] PLT. Subplot (1, 2, 1) PLT. Scatter ([0] :, f, f [:, 1), c = l); PLT. Subplot (1,2,2) PLT. Scatter ([0] :, f1, f1 [:, 1), c = l1);Copy the code

Create a small batch shard function

In the process of deep learning modeling, gradient descent is the most commonly used optimization method to solve the objective function. Different gradient descent algorithms are used for different types of objective functions with different functional characteristics. Now determine small batch gradient descent (MBGD) is a “universal” optimization algorithm, it has both stochastic gradient descent (SGD) can across the characteristics of the local minimum point, at the same time and batch gradient descent (BGD), have relatively fast convergence speed, small batch gradient descent in the process, we need to the segmentation of function points in bulk.

def data_iter(batch_size,features,labels): """ Param batch_size: data contained in each data set Num_examples =len(features) indices=list(range(num_examples)) random.shuffle(indices) l=[] for I in range(0,num_examples,batch_size): j=torch.tensor(indices[i:min(i+batch_size,num_examples)]) l.append([torch.index_select(features,0,j),torch.index_select(labels,0,j)]) return lCopy the code
Torch. Manual_seed (420) featues,labels=tensorGenCla() l=data_iter(5,features,labels) l[1]Copy the code

The tags are out of order.