Author: Zhou Zongwei

Please cite this paper if you found it useful. Thanks!

Wang H, Zhou Z, Li Y, et al. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images[J]. 2017, 7.

Another wave of paper peripherals — code

In general, Keras’s deep learning framework is “easy” to use. It’s easy to use because the documentation is detailed. Unlike Caffe, which relies on technical blogs, Keras has its own official documentation (albeit in English), which gives beginners a lot of room to learn. This document must be pushed! Nice in English can read the document directly. This article is about this in Chinese.

keras.io/





Keras official documentation

First of all, let’s make it clear: I have never learned Python, so I need to write some code, so sometimes the code is quite redundant, and I can write a lot in a single sentence


Paper citation — 3.2 Test platform

The project code is run on Windows 7, mainly using Matlab R2013a and Python. Matlab is used for segmentation and pretreatment of patch, and Keras, a deep learning framework rooted in Python and Theano, is used for the construction of convolutional neural network. Keras is a deep learning framework based on Theano. Its design references Torch. Written in Python, Keras is a highly modular neural network library that supports gpus and cpus, making it extremely easy to use and suitable for rapid development.


[Github]

Post it on Github as soon as it’s published 🙂


The resources

  • Deep Learning framework Based on Theano Keras Learning Essay -12- core layer
  • Deep Learning framework Based on Theano Keras Learning Essay -13- convolution layer

1. The main function of convolutional neural network construction directly

def create_model(data): model = Sequential() model.add(Convolution2D(64, 5, 5, border_mode='valid', Input_shape =data.shape[-3:]) model. Add (Activation('relu') model. Add (Dropout(0.5)) model. Add (Convolution2D(64, 5, 5, border_mode='valid')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2)) Model. Add (Dropout(0.5) model. Add (Convolution2D(32, 3, 3, border_mode='valid')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(32, 3, 3, Border_mode ='valid') model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(512, Init = 'normal')) model. The add (Activation (' relu) model. The add (Dropout (0.5)) model. The add (Dense (LABELTYPE, Init ='normal')) model.add(Activation('softmax') SGD = SGD(l2=0.0, LR =0.01, decay= 1E-6, Momentum =0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, class_mode="categorical") return modelCopy the code

This function is fairly neat and clear, input training set, output an empty neural network, is essentially the initialization of the convolutional neural network. Model = Sequential() starts the neural network, and model.add() keeps adding layers, like building blocks, to whatever is needed. The convolutional neural network has two types of layers: 1) convolution and 2) Sequential sampling, which correspond to code:

Model. Add (Convolution2D(64, 5, 5, border_mode='valid')) Add (MaxPooling2D(pool_size=(2, 2))) #Copy the code

1.1 Activation function

Note: each convolution layer is followed by an activation function, which is described in the textbook





It can control the result of convolution within a certain value range, such as 0~1, -1~1, etc., without making a huge difference between the values after each convolution

This corresponds to the code:

model.add(Activation('relu'))Copy the code

The Activation function keras provides many alternatives. I used ReLU here, and others

  • tanh
  • sigmoid
  • hard_sigmoid
  • Linear etc. Keras library is constantly updated, and more optimized activation functions used in new papers will also be included, such as:
  • LeakyReLU
  • PReLU
  • ELU, etc., are replaceable. It is called “optimized network”, but in fact it is just a change of name. Note that the activation function for the last layer of the convolutional neural network is usually “softmax”. Let me just say a little bit more about how to choose these activation functions, in a word





Reference isThis article

The reasons are





It causes the gradient to disappear, not the zero center




Cause the gradient to disappear




The gradient disappears at x<0




Oh, good for you

I know that Leaky ReLU is already available, but it has not been used yet, ReLU is still used now, don’t ask me why 🙂

1.2 the Dropout layer

Dropout: Addressing the “overfitting” problem





Turns off certain neurons

Not all neurons in the human brain are firing during signal processing because 1) the brain’s energy supply cannot keep up, 2) the specificity of neurons, a particular neuron processes a particular signal, and 3) activation of all neurons increases the response time. Therefore, we need to make a choice when using neural network simulation. For example, we should suppress all neurons whose signal intensity is lower than a certain value, which can improve the speed and robustness of the network and reduce the possibility of “over-fitting”. Oh, nonsense don’t say, anyway is good! The code is this:

Model. The add (Dropout (0.5))Copy the code

This 0.5 can be changed, which means that the neurons in the bottom 50% of the signal intensity are suppressed, so throw them out

1.3 More details

There are only a few minor things that are unclear about the network initialization function so far:

model.add(Convolution2D(64, 5, 5, border_mode='valid', input_shape=data.shape[-3:]))Copy the code

You’ll notice that the first convolutional layer code is longer than the others because it also needs to add some parameters to the training set, namely input_shape = data.shape[-3:], which means how many channels the training set sample has and the size of each input image





Data. Shape [-3:] indicates that I have used six channels, and the size of each patch is 24*24 pixels.

The concept of channel is for example, a black and white image, is a channel, namely gray value; A color map is three channels, namely RGB; Of course, you can also use colors as channels, such as the six channels I used. However, I am not very clear about the internal mechanism of the channel, maybe it is set for RGB. Put a question mark here?

model.add(Flatten())
model.add(Dense(512, init='normal'))Copy the code

I’m going to add a full connection layer here, which is these two lines of code, which is equivalent to this in the convolutional neural network





512 means there are 512 neurons in this layer

There’s nothing to talk about, it’s just a part of the model, it can have several layers, but it’s usually at the back of the network.

SGD = SGD(l2=0.0, LR =0.01, decay= 1E-6, Momentum =0.9, nesterov=True) Model.compile (Loss ='categorical_crossentropy', optimizer=sgd, class_mode="categorical")Copy the code

This part is the legendary “gradient descent method”, which is used in the feedback stage of neural network to continuously learn and adjust the parameters of each layer of convolution, namely the so-called “learning” process. I’m using the most common SGD parameters, including learning speed (LR), although other parameters can theoretically be changed, but I didn’t change them, hehe. Tips: Learning parameters are generally small, I used 0.01, which is determined by different training set data, too small training speed is slow, too large training is easy to burst, like this





The circle is the current position and the five-pointed star is the target position. If the learning speed is too fast, it is easy to directly skip the target position, resulting in training failure

I haven’t tried any of keras’s other feedback methods (Optimizer) and don’t know their pros and cons. Here are a few other options:

  • RMSprop
  • Adagrad
  • Adadelta
  • Adam
  • Adamax, etc., I’m guessing each of these methods corresponds to a deep learning paper. Keras has provided all of them, so go back to the paper for details. Here I mention a mouth cost function thing, in view of “slow learning” and “transition fitting” problem, there is a method to modify the cost function. I understand all the reasons. I am still groping for the specific modification in Keras. Let me tell you some reasons first:





reasonable

So, a good cost function is theta





Look for an opportunity to change this part of the code inside Keras

Main code section, The End


2. Train pre-code

There are several steps you need to take before you start training

  • Import the required Python packages
  • Import data
  • Divide the training set and the test set

2.1 Importing python packages

#coding:utf-8 ''' GPU run command: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python cnn.py CPU run command: Python CNN. Py "' # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # import all kinds of used the module components of # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # The module of ConvNets from __future__ import Absolute_import from __future__ import print_function from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation, Flatten from keras.layers.advanced_activations import PReLU, LeakyReLU import keras.layers.advanced_activations as adact from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.optimizers import SGD, Adadelta, Adagrad, Adam, Adamax from keras.utils import np_utils, Generic_utils from six. Moves import range from keras.callbacks import EarlyStopping # module from collections import Counter import random, cPickle from cutslice3d import load_data from cutslice3d import ROW, COL, LABELTYPE, CHANNEL # memory tuning module import sysCopy the code

It’s like #include in C, so you can import whatever you want. To import the package, go to the installation location of Keras. My installation path is

C:\Users\Administrator\Anaconda2\Lib\site-packages\keras

You’ll see.py files





That’s what it looks like under Keras

For example, if you need to import Sequential(), you first need to know that it is defined in models.py of Keras, and then the code will flow naturally

From keras.models import Sequential # Import Sequential from models.py of keras.Copy the code

You see, the code is so simple it can be translated literally. The difficulty is that you don’t know where Sequential() functions are defined, and that would require a good look at keras’s documentation, so many functions that I can’t list here. I personally feel like I need some Knowledge of Python for this part, because apart from Keras, there are a lot of packages that are quite useful, and having these functions saves a lot of work. For example:

from collections import CounterCopy the code

The function is to count the number of occurrences of different elements in a matrix, implemented in the following code is

CNT = Counter(A) for k,v in cnt.iteritems(): print ('\\t', k, '-->', v) #Copy the code

2.2 Simple data processing module

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # for the description of the # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # print (" \ \ n \ \ n \ \ nHey you, this is a trial on malignance and benign tumors detection via ConvNets. I'm Zongwei Zhou. :)") print("Each input patch is 51*51, cutted from 1383 3d CT & PT images. The MINIMUM is above 30 segment pixels.") ###################################### # Load Data # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # print (" > > Loading Data..." ) TrData, TrLabel, VaData, VaLabel = load_data () # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # disrupted data # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # index = [I for i in range(len(TrLabel))] random.shuffle(index) TrData = TrData[index] TrLabel = TrLabel[index] print('\\tTherefore, Read in', trdata. shape[0], 'samples from the dataset totally.') # label = 0~1 Call the function TrLabel = np_utils.to_categorical(TrLabel, LABELTYPE).Copy the code

Here I use a load_data() function, which I wrote myself, which is a data import that reads the training set and the test set separately from the.mat file. In other words, the translation and rotation transformation of input patch and the division of training set and test set are all completed in MATLAB, and the amount of data obtained is huge. By April 7th, my training set has reached the size of 31.4GB, while python functions are more intuitive, like this

def load_data(): # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # from. Mat file read data # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # mat_training = h5py.File(DATAPATH_Training); mat_training.keys() Training_CT_x = mat_training[Training_CT_1]; Training_CT_y = mat_training[Training_CT_2]; Training_CT_z = mat_training[Training_CT_3]; Training_PT_x = mat_training[Training_PT_1]; Training_PT_y = mat_training[Training_PT_2]; Training_PT_z = mat_training[Training_PT_3]; TrLabel = mat_training[Training_label]; TrLabel = np.transpose(TrLabel); Training_Dataset = len(TrLabel); mat_validation = h5py.File(DATAPATH_Validation); mat_validation.keys() Validation_CT_x = mat_validation[Validation_CT_1]; Validation_CT_y = mat_validation[Validation_CT_2]; Validation_CT_z = mat_validation[Validation_CT_3]; Validation_PT_x = mat_validation[Validation_PT_1]; Validation_PT_y = mat_validation[Validation_PT_2]; Validation_PT_z = mat_validation[Validation_PT_3]; VaLabel = mat_validation[Validation_label]; VaLabel = np.transpose(VaLabel); Validation_Dataset = len(VaLabel); # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # to initialize the # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # TrData = np. Empty ((Training_Dataset,  CHANNEL, ROW, COL), dtype = "float32"); VaData = np.empty((Validation_Dataset, CHANNEL, ROW, COL), dtype = "float32"); # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # cut out pictures, input channel # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # for I in range(Training_Dataset): TrData[i,0,:,:]=Training_CT_x[:,:,i]; TrData[i,1,:,:]=Training_CT_y[:,:,i]; TrData[i,2,:,:]=Training_CT_z[:,:,i]; TrData[i,3,:,:]=Training_PT_x[:,:,i]; TrData[i,4,:,:]=Training_PT_y[:,:,i]; TrData[i,5,:,:]=Training_PT_z[:,:,i]; for i in range(Validation_Dataset): VaData[i,0,:,:]=Validation_CT_x[:,:,i]; VaData[i,1,:,:]=Validation_CT_y[:,:,i]; VaData[i,2,:,:]=Validation_CT_z[:,:,i]; VaData[i,3,:,:]=Validation_PT_x[:,:,i]; VaData[i,4,:,:]=Validation_PT_y[:,:,i]; VaData[i,5,:,:]=Validation_PT_z[:,:,i]; print '\\tThe dimension of each data and label, listed as folllowing:' print '\\tTrData : ', TrData.shape print '\\tTrLabel : ', TrLabel.shape print '\\tRange : ', np.amin(TrData[:,0,:,:]), '~', np.amax(TrData[:,0,:,:]) print '\\t\\t', np.amin(TrData[:,1,:,:]), '~', np.amax(TrData[:,1,:,:]) print '\\t\\t', np.amin(TrData[:,2,:,:]), '~', np.amax(TrData[:,2,:,:]) print '\\t\\t', np.amin(TrData[:,3,:,:]), '~', np.amax(TrData[:,3,:,:]) print '\\t\\t', np.amin(TrData[:,4,:,:]), '~', np.amax(TrData[:,4,:,:]) print '\\t\\t', np.amin(TrData[:,5,:,:]), '~', np.amax(TrData[:,5,:,:]) print '\\tVaData : ', VaData.shape print '\\tVaLabel : ', VaLabel.shape print '\\tRange : ', np.amin(VaData[:,0,:,:]), '~', np.amax(VaData[:,0,:,:]) print '\\t\\t', np.amin(VaData[:,1,:,:]), '~', np.amax(VaData[:,1,:,:]) print '\\t\\t', np.amin(VaData[:,2,:,:]), '~', np.amax(VaData[:,2,:,:]) print '\\t\\t', np.amin(VaData[:,3,:,:]), '~', np.amax(VaData[:,3,:,:]) print '\\t\\t', np.amin(VaData[:,4,:,:]), '~', np.amax(VaData[:,4,:,:]) print '\\t\\t', np.amin(VaData[:,5,:,:]), '~', np.amax(VaData[:,5,:,:]) return TrData, TrLabel, VaData, VaLabelCopy the code

Read the data stored in. Mat, and output the training set (TrData, TrLabel) and test set (VaData, VaLabel) directly. I will cover Data Augmentation on the MATLAB side later. Explain the role of data expansion is also aimed at the “over-fitting” problem.

Keras requires binary class matrices, so call keras’s categorical function np_utils.to_categorical() directly.


3. Code training in the middle and later stages

The hard work is done, and here it is, like a joke, a few lines of code to solve the problem.

print(">> Build Model ..." ) model = create_model (TrData) # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # training model of ConvNets lies ###################################### print(">> Training ConvNets Model ..." ) print("\\tHere, batch_size =", BATCH_SIZE, ", epoch =", EPOCH, ", lr =", LR, ", momentum =", MOMENTUM) early_stopping = EarlyStopping(monitor='val_loss', patience=2) hist = model.fit(TrData, TrLabel, \\ batch_size=BATCH_SIZE, \\ nb_epoch=EPOCH, \\ shuffle=True, \\ verbose=1, \\ show_accuracy=True, \\ validation_split=VALIDATION_SPLIT, \ \ callbacks = [early_stopping]) # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # test ConvNets lies model ###################################### print(">> Test the model ..." ) pre_temp=model.predict_classes(VaData)Copy the code

3.1 Training model

Call create_model() directly on the main function of the convolutional neural network to create an initial model. Then the training main code is a sentence

hist = model.fit(TrData, TrLabel,                 \\
                batch_size=100,         \\
                nb_epoch=10,             \\
                shuffle=True,                 \\
                verbose=1,                     \\
                show_accuracy=True,         \\
                validation_split=0.2,         \\
                callbacks=[early_stopping])Copy the code

🙂 yes, just one sentence, but there’s a little more to this one… Here’s a quick list of my concerns:

  • TrData: training data
  • TrLabel: Training data label
  • Batch_size: Each gradient descent adjustment parameter is used for training samples
  • Nb_epoch: Number of training iterations
  • Shuffle: When suffle=True, the data of each epoch will be randomly planned (scrambled by default), but the verification data will not be scrambled by default.
  • Validation_split: the ratio of test sets. I chose 0.2 here. Note that this is not the same thing as the test set in the simple processing module of 2.2 data. This test set is a test set for one training session, i.e. it may become a training set for the next session. In the simple processing module of 2.2 data is the global test set, which is the final test for the trained network.
  • Early_stopping: the network will automatically stop the training iteration when the result of this training is almost the same as that of the last one, so it may not finish training nb_epoch(10) times

The early_stopping call is here

early_stopping = EarlyStopping(monitor='val_loss', patience=2)Copy the code

The rest are related to the interface when training, just use mine or default

By the way, if you want to see the results of every workout, you can! The HIST of hist = model.fit() stores information such as the result of each training and test accuracy.

Again, if you want to see the output of each layer, you can do it! This can use convolutional neural network and other traditional classifiers combined to optimize the Softmax method of the experiment, involving more advanced algorithms, I will talk about later. Here is just the code to look at the output of each layer:

get_feature = theano.function([origin_model.layers[0].input],origin_model.layers[12].get_output(train=False),allow_input_downcast=Fals e) feature = get_feature(data)Copy the code

Random Forests (SVM) and Random Forests (Random Forests)

###################################### # SVM ###################################### def svc(traindata,trainlabel,testdata,testlabel): print("Start training SVM..." ) svcClf = SVC(C=1.0,kernel=" RBF ",cache_size=3000) svCclF. fit(TrainData, trainLabel) pred_testLabel = svcClf.predict(testdata) num = len(pred_testlabel) accuracy = len([1 for i in range(num) if testlabel[i]==pred_testlabel[i]])/float(num) print("\\n>> cnn-svm Accuracy") prt(testlabel, pred_testlabel) ###################################### # Random Forests ###################################### def rf(traindata,trainlabel,testdata,testlabel): print("Start training Random Forest..." ) rfClf = RandomForestClassifier(n_estimators=100,criterion='gini') rfClf.fit(traindata,trainlabel) pred_testlabel = rfClf.predict(testdata) print("\\n>> cnn-rf Accuracy") prt(testlabel, pred_testlabel)Copy the code

Close ~ stop.

3.2 Test Model

Whoo, this most water, is also a word

pre_temp=model.predict_classes(VaData)Copy the code

Enter the test set VaData using an existing function, predict_classes(), to return the predicted result of the trained network, pre_temp. Compare pre_temp with the correct test set tag VaLabel to see how well the network training works. Post a screenshot:





Everybody Happy

3.3 Saving the Model

Training a model is not easy, not only need to adjust parameters, adjust the network structure, the training time is very long, so learn to save the trained network, the code is like this:

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # save ConvNets lies model # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # model.save_weights('MyConvNets.h5') cPickle.dump(model, open('./MyConvNets.pkl',"wb")) json_string = model.to_json() open(W_MODEL, 'w').write(json_string)Copy the code

I’ll just save it. These are the three files





The saved model file

When you go back and call the network, just use this code

The model = cPickle. Load (open (' MyConvNets. PKL ', "rb"))Copy the code

The PKL file stored in the model is read into the model.


This article is the first stage of keras sharing, if it involves some knowledge points I have skipped, can refer to the Deep Learning framework based on Theano keras Learning essay -01-FAQ later I will sort out a middle stage of sharing, to talk about these things in detail 🙂


4. Conclusion

It’s not a paper! You don’t want a conclusion!

All the best 🙂