This paper mainly realizes the convolutional neural network based on the gender of face recognition and visualizes the extracted features in the process of convolution.

Github address: github.com/chenlinzhon…

Convolutional neural network

  • Convolutional neural network was originally used to solve the problem of image recognition, but now it is also used in time series data and text data processing. There is no extra need for convolutional neural network to extract data features. In the process of network training, the network will automatically extract the main features.
  • Convolutional neural network directly uses all pixels of the original image as input, but the internal structure is not fully connected. Because the image data is spatially organized, each pixel is spatially related to the surrounding pixels, and basically has no connection with the pixels far away from each other. Each neuron only needs to accept the local pixels as input, and then summarize the local information to obtain the global information. The two operations of weight sharing and pooling greatly reduce the parameters of network model and improve the training efficiency of the model.

Main features of convolutional neural network

  • Weight sharing: there can be multiple convolution kernels in the convolution layer. After each convolution operation with the original image, a new 2D image will be mapped, and each pixel of the new image comes from the same convolution kernels. This is weight sharing.
  • Pooling: down-sampling. After convolution (filtering), the image processed by the activation function retains the pixel with the highest gray value (retaining the most important features) in the pixel block. For example, the maximum pooling of 2X2 is carried out and a pixel block of 2X2 is reduced to a pixel block of 1X1.

Training data of convolutional Networks (112*92*3 graphs)

To read data from the Data directory, famale stores female images and male stores male images

def read_img(list,flag=0):
    for i in range(len(list)-1):
         if os.path.isfile(list[i]):
             images.append(cv2.imread(list[i]).flatten())
             labels.append(flag)

read_img(get_img_list('male'), the [0, 1]) read_img (get_img_list ('female'),[1,0])

images = np.array(images)
labels = np.array(labels)
Copy the code

To disturb

permutation = np.random.permutation(labels.shape[0])
all_images = images[permutation,:]
all_labels = labels[permutation,:]
Copy the code

The ratio of training set to test set is 8:2

Train_total = all_images.shape[0] train_nums= int(all_images.shape[0]*0.8) test_nums = all_images.shape[0]-train_nums# training set
images = all_images[0:train_nums,:]
labels = all_labels[0:train_nums,:]

# test set
test_images = all_images[train_nums:train_total,:]
test_labels = all_labels[train_nums:train_total,:]
Copy the code

The training parameters

train_epochs=3000                # Number of training roundsBatch_size. = the random randint (6, 17)# Every training data, randomDrop_prob = 0.4# regularize, discard proportionLearning_rate = 0.00001# Learning efficiency
Copy the code

The network structure

The input layer is the input grayscale image size: -1 x 112 x 92 x 3 The first convolution layer, the size, depth and number of the convolution kernel (3, 3, 3, 16) The second convolution layer, the size, depth and number of the convolution kernel (3, 3, 16, 32), the size of the feature tensor after pooling: The third convolution layer, the size, depth and number of convolution kernels (3, 3, 32, 64) The feature tensor size after pooling: -1 x 14 x 12 x 64 Fully connected first layer weight matrix: 10752 x 512 Fully connected second layer weight matrix: 512 x 128 Between the output layer and the fully connected hidden layer: 128 x 2Copy the code

Auxiliary function

# weight initialization (convolution kernel initialization)
Unlike tf.random_normal(), # tf.truncated_normal() does not return values that are twice the standard deviation from the mean
The shpae argument is a list object, for example [5, 5, 1, 32]
5 represents the size of the convolution kernel, 1 represents the channel, 3 is the convolution of color images, and 1 is the gray level of monochromatic images
The last number 32, the number of convolution kernels (i.e. the number of features extracted from the volume base)Def weight_init(shape): weight = tf.truncated_normal(shape,stddev=0.1,dtype=tf.float32)return tf.Variable(weight)

# Paranoid initialization
def bias_init(shape):
    bias = tf.random_normal(shape,dtype=tf.float32)
    return tf.Variable(bias)

Full connection matrix initializationDef fch_init(layer1,layer2,const=1): def fch_init(layer1,layer2,const=1): min = -const * (6.0 / (layer1 + layer2)); max = -min; weight = tf.random_uniform([layer1, layer2], minval=min, maxval=max, dtype=tf.float32)return tf.Variable(weight)
    
# The source code is located in tensorflow/python/ops nn_imp. py and nn_ops.py
# This function takes two arguments,x is the pixel of the image and w is the convolution kernel
[Batch, height, width, channels]
[height, width, channels, channels_multiplier]
# tf.nn. Conv2d () is a two-dimensional convolution function,
# stirdes is the step of the convolution kernel movement, represented by four ones, over four parameters of the x-tensor dimension
The # padding parameter 'SAME' indicates that the original input pixels are filled, and the convolution mapping 2D image is the SAME size as the original image
# Fill means to fill 0 pixels around the original pixel value matrix
# If no padding is carried out, assume that the original image is 32x32, the convolution sum size is 5x5, and the size of the mapping image after convolution is 28x28
def conv2d(images,weight):
    returnTf. Nn. Conv2d (images, weight, strides =,1,1,1 [1], the padding ='SAME')


    
Copy the code

Padding

# poolingThe feature extraction action of the convolution kernel is called the padding, which has two ways: SAME and VALID. The moving step of the convolution kernel may not be exactly divisible by the width of the image pixels, so some pixels cannot be convolved in the border position of some images. This sampling without going over the edge is called valid padding, and the area of the convolution is smaller than the original image. In order to make the convolution kernel cover all pixels, the edge position can be filled with 0 pixels and then convolved again. This sampling across the edge is the same padding. If the move step is 1, the image with the same size as the original image will be obtained. If the step size is big, if it's bigger than the convolution kernel length, then the same padding will be smaller than the original image. def max_pool2x2(images,tname):returnTf. Nn. Max_pool (images, ksize =,2,2,1 [1], strides =,2,2,1 [1], the padding ='SAME',name=tname)


Copy the code
# images_INPUT is the input image, and labels_INPUT is the input label
images_input = tf.placeholder(tf.float32,[None,112*92*3],name='input_images')
labels_input = tf.placeholder(tf.float32,[None,2],name='input_labels')
Convert the image to a 112*92*3 shapeX_input = tf. Reshape (images_input,92,3] [- 1112)Copy the code

training

The first layer convolution + pooling

# convolution kernel 3*3*3 16 first level convolutionW1 = weight_init([3,3,3,16]) b1 = bias_init([16]) conv_1 = conv2d(x_input,w1)+b1 relu_1 = tf.nn.relu(conv_1,name='relu_1')
max_pool_1 = max_pool2x2(relu_1,'max_pool_1')
Copy the code

Second convolution + pooling

# convolution kernel 3*3*16 32 second convolutionW2 = weight_init([3,3,16,32]) b2 = bias_init([32]) conv_2 = conv2d(max_pool_1,w2) + b2 relu_2 = tf.nn.relu(conv_2,name='relu_2')
max_pool_2 = max_pool2x2(relu_2,'max_pool_2')
Copy the code

Third layer convolution + pooling

W3 = weight_init([3,3,32,64]) b3 = bias_init([64]) conv_3 = conv2d(max_pool_2,w3)+ brelu_3 = tb.nn. relu(conv_3,name='relu_3')
max_pool_3 = max_pool2x2(relu_3,'max_pool_3')
Copy the code

Fully connect the first layer

# Tiled the convolution results of the third layer into a one-dimensional vectorF_input = tf. Reshape (max_pool_3, [1, 14 * 12 * 64])Full connection layer 1 31*31*32,512F_w1 = fch_init(14*12*64,512) f_b1 = bias_init([512]) f_r1 = tf.matmul(f_input,f_w1) + f_b1# Activate function, RELu randomly drops some weights to provide pan-Chinese capability
f_relu_r1 = tf.nn.relu(f_r1)

# In order to prevent over-fitting of the network, the fully connected hidden layer is processed by Dropout(regularization), and parts are discarded randomly during training
# node data to prevent overfitting. Dropout is the same as dropping some eigenvalues by setting the node data to zero, and only during training,
# When forecasting, still use full data features
The percentage of incoming and discarded node data
f_dropout_r1 = tf.nn.dropout(f_relu_r1,drop_prob)
Copy the code

Fully connect to layer 2

F_w2 = FCH_init (512,128) F_B2 = BIAS_init ([128]) F_R2 = Tf.matmul (F_DROPOUT_R1, F_W2) + F_B2 RELU_R2 = TF.nn.relu (f_R2)  f_dropout_r2 = tf.nn.dropout(f_relu_r2,drop_prob)Copy the code

Fully connected output layer

F_w3 = fch_init(128,2) f_b3 = bias_init([2]) f_r3 = tf.matmul(f_dropout_r2,f_w3) + Softmax (f_R3,name= tf.nn.softmax'f_softmax')
Copy the code

Loss function

# Cross entropy cost function
cross_entry =  tf.reduce_mean(tf.reduce_sum(-labels_input*tf.log(f_softmax)))
Optimizer, automatic execution of gradient descent algorithm
optimizer  = tf.train.AdamOptimizer(learning_rate).minimize(cross_entry)
Copy the code

Calculation accuracy & loss

arg1 = tf.argmax(labels_input,1)
arg2 = tf.argmax(f_softmax,1)
The predicted result of each sample is a vector of (1,2)
cos = tf.equal(arg1,arg2)
# tf.cast converts bool values to floating point numbers
acc = tf.reduce_mean(tf.cast(cos,dtype=tf.float32))

Copy the code

Start the session to start training

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
Cost = []
Accuracy=[]
for i inrange(train_epochs): Idx =random.randint(0,len(train_data.images)-20) batch= random.randint(6,18) train_input = train_data.images[idx:(idx+batch)] train_labels = train_data.labels[idx:(idx+batch)] result,acc1,cross_entry_r,cos1,f_softmax1,relu_1_r= sess.run([optimizer,acc,cross_entry,cos,f_softmax,relu_1],feed_dict={images_input:train_input,labels_input:train_labels} )print acc1
    Cost.append(cross_entry_r)
    Accuracy.append(acc1)

# Cost function curvePlot (Cost) ax1.set_xlabel(fig1,ax1 = plt.plot(figsize=(10,7))'Epochs')
ax1.set_ylabel('Cost')
plt.title('Cross Loss')
plt.grid()
plt.show()

# Accuracy curvePlots (figsize=(10,7)) plt.plot(Accuracy) ax7.set_xlabel('Epochs')
ax7.set_ylabel('Accuracy Rate')
plt.title('Train Accuracy Rate')
plt.grid()
plt.show()

Copy the code

Test set validation

# test
arg2_r = sess.run(arg2,feed_dict={images_input:train_data.test_images,labels_input:train_data.test_labels})
arg1_r = sess.run(arg1,feed_dict={images_input:train_data.test_images,labels_input:train_data.test_labels})
Print the report using the obfuscation matrix
print (classification_report(arg1_r, arg2_r))
Copy the code

If the verification is successful, save the model

# Save the model
saver = tf.train.Saver()
saver.save(sess, '/ model/my - gender - v1.0')
Copy the code

Use the trained model reference: gender_model_use.py

Results: The accuracy of the model reached 93% after 3000 iterations

Training cross entropy costs

Accuracy of training

A sample of training data

Features extracted by convolution at the first level

2X2 pooled characteristics

The features extracted by the second layer of convolution

2X2 pooled characteristics

Feature extraction by convolution at the third layer

2X2 pooled characteristics

reference

Blog.csdn.net/u014281392/…