Papers of the standard genetic algorithm (ga) is applied to neural network structure search, first of all, network coding, said then genetic operation, the integral method is very concise, search space design is very simple, basic equivalent to search only connection between nodes, but the effect is very good, very worth learning source: vick algorithm public engineering note number

Paper: Genetic CNN

  • Thesis Address:Arxiv.org/abs/1703.01…

Introduction


In order to carry out neural network architecture search, the paper limits the network to a limited depth, each layer is a preset operation, but there are still many candidate networks, in order to effectively search in the huge search space, the paper proposes genetic algorithm to accelerate. Firstly, the initial population is constructed, and then the individuals in the population are genetically operated, that is, selection, crossover and variation. The adaptability is judged by the accuracy of recognition, and finally a strong population is obtained

Our Approach


Binary Network Representation

At present, MOST SOTA networks are composed of multiple stages, with layers within each stage having the same dimension, while adjacent stages are connected by pooling. Drawing on this idea, the definition of a network hasComposed of three phases,– th stage () containsIs marked as., nodes are arranged in order, only nodes with low serial number are allowed to connect to nodes with high serial number, and element-wise sum is applied to all inputs of nodes. Each node represents the convolution operation, and BN+ReLU is connected after convolution. The network is not added to the full connection layer

Use at each stageBits represent the internal connection, with the first digit representing the connection, and the second and third bits represent connectionsandAnd so on, finallyWho has saidConnections to other nodes. forIf the,andHave a side,willAs part of the element-wise sum. The encoding is shown in Figure 1.However, there seems to be something wrong with Stage 2 encoding. According to the picture, it should be 0-10-000-0011

  • Technical Details

Each phase has two default input nodesAnd output node, the input node uses convolution to further extract the features of the previous stage, and then transfer them to the nodes without input. The output node uses elelement -wise sum of the outputs of all nodes that are not used, and then conducts a convolution and then joins the pooling layer. There are two special cases:

  • If the nodeIt is isolated and has no non-default input and output, so it is ignored, as shown in Figure 1

  • If there is no connection at the current stage and all are zero, then only one convolution is performed (originally at least one convolution is performed at the input and output nodes).

  • Examples and Limitations

Such a coding form can encode the current mainstream classification structure, but it also has many limitations:

  • Currently convolution and pooling are the only connections available and other tricky modules such as Maxout cannot be used
  • The convolution kernel at each stage is fixed, which hinders the fusion of multi-scale features

Genetic Operations

The genetic algorithm process is shown in Figure 1Generation inheritance, each generation contains 3 operations, selection, variation and crossover, and the fitness values are obtained on the validation set through the trained model

  • Initialization

Initialize a collection of random models, each model is of lengthThe binary string of each of the strings obeies Bernoulli distribution.And then train and test the accuracy of each model. The initialization strategy here has little impact

  • Selection

Selection is performed before the generation of each generation-th generation before, the individualThe adaptability ofDirectly affectThe probability of survival during the selection phase. Russian Roulette method was used for specific selection, and the probability of each individual’s selection andIs proportional to,Is the lowest adaptability of the previous generation. After selection, the total population remains the same, so an individual may be selected multiple times

  • Mutation and Crossover

The mutation operation involves performing a probability of 1 per bit on the binary stringWhile crossover operations change both individuals at the same time to probabilityStages are exchanged between individuals. The probability of individual variation is zero, the probability of individuals crossing in each group isSee Algorithm 1 for the specific operation. Although this method is very simple, it is very effective

  • Evaluation

After the above operation, for each individualExercise and test to get fitness, and if the individual has been tested before, just test again and average, which removes the uncertainty from the training

Experiments


MNIST Experiments

Experimental configuration,.., initial population, a...., only produced in totalIt takes 2 GPU-day

CIFAR10 Experiments

Experimental configuration,.., initial population, a...., only produced in totalIt takes 17 GPU-day

CIFAR and SVHN Experiments

The networks learned in CIFAR-10 were tested directly on other data sets

ILSVRC2012 Experiments

The two networks in Figure 5 were trained on ILSVRC2012. First, STEM of VFFNet was used for downsampling, then the network in Figure 5 was used, and finally all connections were connected for classification

CONCLUSION


The paper applies the standard genetic algorithm to the neural network structure search, first carries on the coding representation to the network, then carries on the genetic operation, the whole method is very simple, the search space design is very simple, basically is equivalent to only searches the connection way between the nodes, but the effect is quite good, is worth learning





If this article is helpful to you, please click a like or watch it. For more content, please follow the wechat public account.