In recent years, the wave of artificial intelligence has swept the tech world. The world’s top technology companies, including Google, Facebook, Microsoft and Baidu, have all turned to ARTIFICIAL intelligence as the focus of their future strategies. With the continuous emergence of face recognition, assisted driving, AlphaGo and other applications, learning based vision is changing our lives more and more. This series of articles will introduce the visual algorithms behind these seemingly magical systems step by step.

This article, the first in a series, will briefly introduce the development of computer vision, as well as the basic principles of supervised learning and neural networks. In the final practice section, we will use TensorFlow to give a simple implementation of the algorithm introduced earlier.

Development of computer vision

What is computer vision? First let’s take a look at wikipedia’s definition:

Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high-level understanding from digital images or videos.

Simply put, computer vision is the ability of a machine to automatically understand a picture or video.

The origins of computer vision can be traced back to 1966, Marvin Minsky, a renowned expert in industrial intelligence at MIT, gave his undergraduate students a summer assignment: “Link a camera to a computer and get the computer to describe what it saw.” While images can be easily understood by humans, it turns out that getting computers to understand them is far more complicated than we first thought.

Early computer vision research, for reasons of computational resources and data, mainly focused on geometry and reasoning. In the 1990s, due to the continuous development of computer hardware and the gradual popularity of digital cameras, computer vision entered a period of rapid development. A major breakthrough during this period was the emergence of various artificial design features, such as SIFT, HOG and other local features. Compared with the original pixel, these features are robust to scale and rotation, so they are widely used and give birth to visual applications such as image Mosaic, image retrieval and three-dimensional reconstruction. Another big breakthrough was the popularity of methods based on statistics and machine learning. With the continuous popularization of digital photos, large-scale data sets also come into being. Learning based Vision, which can automatically learn model parameters through a large amount of data, has gradually become the mainstream.

With the continuous progress of computing power and the emergence of massive Internet data, the traditional vision technology based on artificial features and SVM/ Boosting and other simple machine learning algorithms has encountered a bottleneck. Therefore, both industry and academia are exploring how to avoid tedious artificial feature design while enhancing the fitting performance of the model, so as to further utilize massive data. Deep learning meets this requirement well, so it has been widely used in the field of vision. After 2010, computer vision gradually entered the era of deep learning. The iconic event is the ImageNet 2012 competition. In this competition, the algorithms based on deep learning greatly surpassed the traditional algorithms that were carefully designed, which shocked the entire academic community and led to the application of deep learning in other fields. The competition is also seen as a symbolic event in the Renaissance of deep learning in the entire field of artificial intelligence.

At present, except for 3d reconstruction and other low-level vision problems, the performance of algorithms based on deep learning has far exceeded that of traditional algorithms in most visual problems. Therefore, this series of articles will focus on the introduction of computer vision algorithms based on deep learning.

Neural network

Neural network (NN), simply a network of neurons, is the earliest and simplest deep learning model. A lot of other more complex algorithms like convolutional neural networks, a lot of the concepts in deep reinforcement learning come from neural networks. Therefore, we first introduce the principle of neural network in this article. To understand neural networks, we need to understand what neurons are.

Neurons & perceptrons

Neuron is the smallest unit of a neural network. Each neuron maps multiple inputs to one output. As shown in the figure, the output of the neuron is the weighted sum of the inputs plus a bias through an activation function. It can be expressed as:

Y = phi (n ∑ I wi ∗ (xi) + b) = phi ⋅ x + b (w)

The activation function φ has various forms. If the step function is used, the neuron is equivalent to a linear classifier:

⋅ x + y = {1 if w b > 00 the else

This classifier has historically been known as a Perceptron.

Multilayer neural network

A single layer of perceptron can only solve linearly separable problems. However, most problems are nonlinear in practice, so the single-layer perceptron is useless. To this end, we can form a network of individual Neuron and make the output of the neuron at the previous layer the input of the neuron at the next layer. The neural network as shown in the figure below is composed:

Because of the existence of nonlinear activation function, multilayer neural network has the ability of fitting nonlinear function. For historical reasons, multilayer neural networks are also known as multilayer perceptrons (MLP).

Neural network has the ability of fitting nonlinear function. But in order to fit different nonlinear functions, do we need to design different nonlinear activation functions and network structures? The answer is no. The universal approximation Theorem has proved that the forward neural network is a general approximation framework. To put it simply, for activation functions such as Sigmoid and Relu, even if there is only one hidden layer of neural network, as long as there are enough neurons, any continuous function can be approximated infinitely. In practice, shallow neural networks may need too many neurons to approximate complex nonlinear functions, which increases the difficulty of learning and affects generalization performance. Therefore, we often use deeper models to reduce the number of neurons needed and improve the generalization ability of the network.

Basic concepts of machine learning

Deep neural network is a kind of algorithm in deep learning, and deep learning is a special case of machine learning. Therefore, in this section, we introduce the basic concepts related to model training and its implementation in Tensorflow under the general framework of machine learning. Related concepts are applicable to machine learning algorithms including NN.

Common problems with machine learning

Common machine learning problems can be abstracted into four categories: – supervised learning – unsupervised learning – semi-supervised learning – enhanced learning

According to whether the training data has a label, problems can be divided into supervised learning (all data have a label), semi-supervised learning (some data have a label) and unsupervised learning (all data have no label). Different from the first three kinds of problems, reinforcement learning also rewards behavior, but focuses on how to take a series of behaviors in the environment, so as to obtain the maximum cumulative return. Supervised learning is the most widely used and well-studied machine learning problem at present. The next part of this paper will focus on supervised learning.

Supervised learning

In supervised learning, given N training samples {(x1,y1)… ,(xN,yN)}, our goal is to get a function from the input to the output: f:X→Y. In practice, we usually do not optimize function F directly, but choose a set of parameterized functions fθ according to the specific situation of the problem, and convert the optimization function F θ into the optimization parameter θ.

The common classification and regression problems are special cases of supervised learning. Linear classifier, deep neural network and other models are parameterized functions designed to solve these problems. For simplicity, we take the linear classifier y= F θ(x)=wTx+b as an example, and the parameter θ to be optimized is (w,b).

Loss function

To measure how good a function is, we need an objective criterion. In supervised learning, this criterion is usually a loss function L:Y×Y→R. For a training sample (xi,yi), the model predicts that the result is ˆy, and that the corresponding loss is L(yi, YI). The smaller the loss is, the more accurate the prediction result is. In practice, different loss functions need to be selected according to the characteristics of the problem.

In dichotomous problems, sigmoid + Cross Entroy is commonly used as Loss for Logistic regression. Tensorflow provides corresponding functions for common Loss.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=y)__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

For the multi-classification problem, as shown in the figure above, we can extend the binary linear classifier into N linear equations: Yi =∑jWi, JXJ + Bi. Then, softmax(x) I =exp(xi)∑jexp(xj) is normalized, and the normalized result is the probability of each class. Therefore, multiple classifications usually use softmax + Cross entropy as a loss function.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__loss = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

It can be proved that sigmoid cross Entroy and Softmax cross Entory are theoretically equivalent to loss for dichotomous problems. In addition, in practice, softmax cross entropy is usually computed directly, rather than softmax first and then Cross entory, mainly from the perspective of numerical stability.

Loss minimization and regular terms

After the loss function is defined, the supervised learning problem can be transformed to minimize the experimental loss of 1n∑ Ni =1L(Yi,f(xi)). In practice, in order to ensure the fitting ability of the model, the complexity of function F is sometimes relatively high. As shown in the far right of the figure, if the number of training samples is small or there are errors in the label, the experimental loss is directly minimized without limiting F, and the model is easy to overfit.

Therefore, when the number of samples is small or the annotation quality is not high, additional regularizer is needed to ensure the generalization ability of the model. In practice, you can select different regular entries based on different requirements. Among neural networks, l2 norm regular term is more common: R(w)=12∑ Ni =1w2i

In TensorFlow, there are usually two ways to add regular entries. One is to realize corresponding Regularization Loss according to needs, and then optimize it by adding regularization loss with other losses. This approach can implement a more complex regularizer.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__weight_decay = tf.multiply(tf.nn.l2_loss(weights), wd, name='weight_loss')__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

For common regularization items, you can also use the built-in function of TensorFlow to regularize corresponding variables. Then all Regularization losses and other losses generated by the system are added for optimization.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__tf.contrib.layers.apply_regularization(tf.contrib.layers.l2_regularizer(wd), weights)
tf.losses.get_regularization_losses()__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

Gradient descent and back propagation

After the loss function and regularization term are defined, the final regularization loss is: Loss (θ)=1n∑ Ni =1L(Yi,f(xi))+αR(w)

With loss function, the corresponding parameters can be optimized by standard gradient descent algorithm. For example, w in linear classifier can be iteratively updated by the following formula: W ← W −η(α∂R(W)∂w+∂L(wTxi+b,yi)∂ W)

By setting an appropriate learning rate, the parameters will gradually converge to the local/global optimal solution.

Back propagation can be regarded as a generalization of gradient descent in neural network. It also calculates the gradient of parameters relative to Loss by minimizing the Loss function, and then updates the parameters iteratively. The exact derivation is skipped here for space reasons. In TensorFlow, you only need to specify the loss function and the learning rate, and the Optimizer can automatically complete the gradient descent/backpropagation process for us:

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

OCR of actual combat

Finally, this paper shows how to use Tensorflow to implement Softmax classifier and MLP classifier through an OCR example. MNIST, a well-known digital recognition dataset, was used in the experimental data set. The dataset contains 60,000 training images and 10,000 test images. Each image in the dataset represents a number from 0 to 9, and the image size is 28×28.

Softmax classifier

Let’s start by implementing a simple Softmax classifier. Since Tensorflow runs the constructed network in a complicated process, the code is divided into three main modules, namely Dataset module, Net module and Solver module, in order to improve the development efficiency and the reusability of the code.

Model structure

In each Net class, we need to implement three functions: -inference We define the main structure of the network in the inference function. In Tensorflow, variables are represented by tF.variable. Since the Softmax classifier is a convex function, any initialization is guaranteed to achieve a global optimal solution, so we can simply initialize W and B to 0. Softmax classifier can be implemented simply by multiplying a matrix y = tf.matmul(data, W) + b followed by a tF.nn. Softmax function.

  • Loss According to the previous introduction, in order to ensure numerical stability, we directly calculate TF.nn.softMAX_cross_entropy_with_logits.

  • Metric After training the model, we need to validate the performance of the model on the Validation or test collection. When the test set is large, we cannot get the results of the model on the whole test set at one time. We need to divide the test set into small batches, test each batch, and then summarize the results of each batch. For this purpose, TensorFlow provides the tf.metrics module, which automatically evaluates each batch and aggregates all the evaluations. In this example, we are solving the classification problem, so we can use tf.metrics. Accuracy to calculate the accuracy of the classification.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__class Softmax(Net):

def __init__(self, **kwargs):
self.output_dim = kwargs.get('output_dim', 1)
return

def inference(self, data):

feature_dim = data.get_shape()[1].value

with tf.name_scope('weights'):
W = tf.Variable(tf.zeros([feature_dim, self.output_dim]))
with tf.name_scope('biases'):
b = tf.Variable(tf.zeros([self.output_dim]), name='bias')
with tf.name_scope('y'):
y = tf.matmul(data, W) + b
with tf.name_scope('probs'):
probs = tf.nn.softmax(y)

return {'logits': y, 'probs': probs}

def loss(self, layers, labels):
logits = layers['logits']

with tf.variable_scope('loss'):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
return loss

def metric(self, layers, labels):
probs = layers['probs']
with tf.variable_scope('metric'):
metric, update_op = tf.metrics.accuracy(
labels=tf.argmax(labels, 1), predictions=tf.argmax(probs, 1))
return {'update': update_op, 'accuracy': metric}__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

Dataset

In versions of TensorFlow before 1.2, we recommended using multi-threaded, Pipelines for Performance. Beginning with TensorFlow 1.2, However, we recommend using the tf.contrib.data module instead.

Starting with Tensorflow1.2, Tensorflow provides a new API based on tF.contrib.data. The code structure is much cleaner than the original QuequRunner and Coordinator based APIS. So we use a new API in Dataset class to read data.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__class MNIST(Dataset): def __init__(self, **kwargs): self.data_dir = kwargs.get('data_dir', None) self.split = kwargs.get('split', 'train') self.count = kwargs.get('count', None) self.buffer_size = kwargs.get('buffer_size', 10000) self.batch_size = kwargs.get('batch_size', 50) if self.split not in ['train', 'validation', 'test']: raise ValueError('unsupported dataset mode! ') # download mnist data images, labels = load_dataset(self.data_dir, self.split) # build dataset dataset = tf.contrib.data.Dataset.from_tensor_slices((images, labels)) if self.buffer_size is not None: dataset = dataset.shuffle(buffer_size=self.buffer_size) dataset = dataset.repeat(self.count) dataset = dataset.batch(self.batch_size) with tf.name_scope('input'): self._iterator = dataset.make_one_shot_iterator() self._batch = self._iterator.get_next() def batch(self): return self._batch def shape(self): return self._iterator.output_shapes__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

We first read mnIST data set images, labels in numpy. Array format. And then through the tf. Contrib. Data. The Dataset. From_tensor_slices converts the tf. Contrib. Data. The Dataset format. We can then set the number of iterations of the Dataset (None means infinite), batch size, and whether to shuffle the Dataset. Finally, we use the simplest make_one_shot_iterator() and get_next() to get the basic data unit batch of the network. By default, each batch contains 50 graphs and their corresponding labels.

Solver

Finally, we introduce the Sover class. Solver class consists of five functions: – build_optimizer because the network is simpler, here we choose the most basic and gradient descent algorithm of tf. Train. GradientDescentOptimizer, and using the fixed learning rate. – Build_train_net and build_test_net have similar functions to connect data in Dataset with network structure in Net. At the end we call tF.summary.Scalar to add loss to the summary. Tensorflow provides a powerful visualization module, TensorBoard, which makes it easy to visualize variables in summary. At the beginning of train_NET, we completed the initialization of Graph, Saver, summary and other modules. The network structure is then printed to the summary using the summary_writer.add_graph(tf.get_default_graph()). Tf.session () is then initialized and the corresponding operation is run through session.run. In TensorFlow, symbolic programming is used, and the process of creating the Graph is just the composition, not the actual calculation of the data. The underlying data is actually manipulated when the corresponding operation is run in the Session. – Test_NET is similar to train_net. Test_net mainly completes the initialization of various modules, and then reads the latest model recorded in the checkpoint file under the model directory and tests it in the test set.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__class BasicSolver(Solver): def __init__(self, dataset, net, **kwargs): Self. learning_rate = float(kwargs.get('learning_rate', 0.5)) self.max_steps = int(kwargs.get('max_steps', 0.5)) self.max_steps = int(kwargs.get('max_steps', 2000)) self.summary_iter = int(kwargs.get('summary_iter', 100)) self.summary_dir = kwargs.get('summary_dir', 'summary') self.snapshot_iter = int(kwargs.get('snapshot_iter', 100000)) self.snapshot_dir = kwargs.get('snapshot_dir', 'cache') self.dataset = dataset self.net = net def build_optimizer(self): with tf.variable_scope('optimizer'): train_op = tf.train.GradientDescentOptimizer(self.learning_rate).minimize( self.loss) return train_op def build_train_net(self): data, labels = self.dataset.batch() self.layers = self.net.inference(data) self.loss = self.net.loss(self.layers, labels) self.train_op = self.build_optimizer() for loss_layer in tf.get_collection('losses') + [self.loss]: tf.summary.scalar(loss_layer.op.name, loss_layer) def build_test_net(self): data, labels = self.dataset.batch() self.layers = self.net.inference(data) self.metrics = self.net.metric(self.layers, labels) self.update_op = self.metrics.pop('update') for key, value in self.metrics.iteritems(): tf.summary.scalar(key, value) def train(self): self.build_train_net() saver = tf.train.Saver(tf.trainable_variables()) init_op = tf.global_variables_initializer() summary_op = tf.summary.merge_all() summary_writer = tf.summary.FileWriter( os.path.join(self.summary_dir, 'train')) summary_writer.add_graph(tf.get_default_graph()) with tf.Session() as sess: sess.run(init_op) for step in xrange(1, self.max_steps + 1): start_time = time.time() sess.run(self.train_op) duration = time.time() - start_time if step % self.summary_iter == 0: summary, loss = sess.run([summary_op, self.loss]) summary_writer.add_summary(summary, step) examples_per_sec = self.dataset.batch_size / duration format_str = ('step %6d: loss = %.4f (%.1f examples/sec)') print(format_str % (step, loss, examples_per_sec)) sys.stdout.flush() if (step % self.snapshot_iter == 0) or (step == self.max_steps): saver.save(sess, self.snapshot_dir + '/model.ckpt', global_step=step) def test(self): self.build_test_net() saver = tf.train.Saver() init_op = [ tf.global_variables_initializer(), tf.local_variables_initializer() ] summary_op = tf.summary.merge_all() summary_writer = tf.summary.FileWriter( os.path.join(self.summary_dir, 'test')) summary_writer.add_graph(tf.get_default_graph()) with tf.Session() as sess: sess.run(init_op) checkpoint = tf.train.latest_checkpoint(self.snapshot_dir) if not os.path.isfile(checkpoint + '.index'): print("[error]: can't find checkpoint file: {}".format(checkpoint)) sys.exit(0) else: print("load checkpoint file: {}".format(checkpoint)) num_iter = int(checkpoint.split('-')[-1]) saver.restore(sess, checkpoint) while True: try: sess.run(self.update_op) except tf.errors.OutOfRangeError: results = sess.run([summary_op] + self.metrics.values()) summary = results[0] metrics = results[1:] for key, metric in zip(self.metrics.keys(), metrics): print("{}: {}".format(key, metric)) summary_writer.add_summary(summary, num_iter) break__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

The following figure shows the network structure visualized in Tensorboad and the statistics of loss. As you can see, Tensorboad provides great visual support for our analysis.

The output of the final program is as follows:

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__step 100: Loss = 0.3134 (116833.0 examples/ SEC) step 200: Loss = 0.4800 (113359.6 examples/ SEC) step 300: Loss = 0.3528 (114410.9 examples/ SEC) step 400 loss = 0.2597 (105278.7 examples/ SEC) step 500: Loss = 0.3301 (106834.0 examples/ SEC) step 600: Loss = 0.4013 (115992.9 examples/ SEC) step 700: Loss = 0.3428 (112871.5 examples/ SEC) step 800: Loss = 0.3181 (113913.7 examples/ SEC) step 900: Loss = 0.1850 (123507.2 examples/ SEC) step 1000: Loss = 0.0863 (125653.2 examples/ SEC) step 1100: Loss = 0.2726 (105703.2 examples/ SEC) step 1200: Loss = 0.4849 (115736.9 examples/ SEC) step 1300: Loss = 0.2986 (100582.8 examples/ SEC) step 1400: Loss = 0.2994 (103973.8 examples/ SEC) step 1500: Loss = 0.2626 (102500.1 examples/ SEC) step 1600: Loss = 0.0996 (107712.0 examples/ SEC) step 1700: Loss = 0.2523 (114912.4 examples/ SEC) step 1800: Loss = 0.3264 (105703.2 examples/ SEC) step 1900: Loss = 0.2911 (114975.4 examples/ SEC) Step 2000: Loss = 0.2648 (132312.4 examples/ SEC) accuracy: 0.919499993324__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

As you can see, a simple linear model can achieve 92% accuracy. We suspect that the problem of number recognition should not be linearly separable, so better results should be obtained using more complex nonlinear classifiers.

MLP classifier

Because of our modular design, each module is basically decoupled directly. So replacing the Softmax classifier with MLP is easy, and we just need to re-implement the Net layer.

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__class MLP(Net): def __init__(self, **kwargs): self.output_dim = kwargs.get('output_dim', 1) return def inference(self, data): with tf.variable_scope('hidden1'): hidden1 = linear_relu(data, 128) with tf.variable_scope('hidden2'): hidden2 = linear_relu(hidden1, 32) with tf.variable_scope('softmax_linear'): y = linear(hidden2, self.output_dim) probs = tf.nn.softmax(y) return {'logits': y, 'probs': probs} def loss(self, layers, labels): logits = layers['logits'] with tf.variable_scope('loss'): loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)) return loss def metric(self, layers, labels): probs = layers['probs'] with tf.variable_scope('metric'): metric, update_op = tf.metrics.accuracy( labels=tf.argmax(labels, 1), predictions=tf.argmax(probs, 1)) return {'update': update_op, 'accuracy': metric} def linear_relu(x, size, wd=0): return tf.nn.relu(linear(x, size, wd), name=tf.get_default_graph().get_name_scope()) def linear(x, size, wd=0): weights = tf.get_variable( name='weights', shape=[x.get_shape()[1], size], initializer=tf.contrib.layers.xavier_initializer()) biases = tf.get_variable( 'biases', shape=[size], Initializer =tf.constant_initializer(0.0)) out = tf.matmul(x, weights) + biases if wd! = 0: weight_decay = tf.multiply(tf.nn.l2_loss(weights), wd, name='weight_loss') tf.add_to_collection('losses', weight_decay) return out __Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

To prove the effect of MLP, we construct a neural network with two hidden layers. The code structure is roughly the same as that of Softmax classification, so I won’t explain it too much. Because the linear layer appears many times in the network, we abstract it into a reusable function. Also, to make the graph look better in Tensorboad, we place the related variables and operations under the same variable_scope with tf.variable_scope(‘hidden1’):. This allows all related variables and operations to shrink into a node that can be expanded in Tensorboad, providing better visualization.

The final network result and running result are as follows:

__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__step 100: Loss = 0.4675 (49113.6 examples/ SEC) step 200: Loss = 0.2348 (53200.2 examples/ SEC) step 300: Loss = 0.1858 (51922.6 examples/ SEC) Step 400: Loss = 0.1935 (49554.6 examples/ SEC) Step 500: Loss = 0.2634 (51552.4 examples/ SEC) step 600: Loss = 0.1800 (51871.2 examples/ SEC) step 700: Loss = 0.0524 (51225.0 examples/ SEC) step 800: Loss = 0.1634 (50606.9 examples/ SEC) step 900: Loss = 0.1549 (56239.0 examples/ SEC) step 1000: Loss = 0.1026 (54755.9 examples/ SEC) step 1100: Loss = 0.0928 (51871.2 examples/ SEC) step 1200: Loss = 0.0293 (50864.7 examples/ SEC) step 1300: Loss = 0.1918 (54528.1 examples/ SEC) step 1400: Loss = 0.1001 (48725.7 examples/ SEC) step 1500: Loss = 0.1263 (50003.6 examples/ SEC) step 1600: Loss = 0.0956 (54176.0 examples/ SEC) step 1700: Loss = 0.1012 (52025.6 examples/ SEC) step 1800: Loss = 0.3386 (53471.5 examples/ SEC) step 1900: Loss = 0.1626 (54641.8 examples/ SEC) Step 2000: Loss = 0.0215 (54528.1 examples/ SEC) accuracy: 0.970499992371__Wed Nov 08 2017 10:01:18 GMT+0800 (CST)____Wed Nov 08 2017 10:01:18 GMT+0800 (CST)__Copy the code

It can be seen that by using the simple neural network containing two hidden layers, the classification accuracy has been improved to 97%, which is much better than the simple linear classifier. The important effect of model selection on final performance is proved.

Full code download: github.com/Dong–Jian/…

Author’s brief introduction

Jian Dong is a senior data scientist at 360 and a former Research scientist at Amazon. Currently, she focuses on scientific and technological innovation in deep learning, reinforcement learning and computer vision, and has rich experience in big data and computer vision. He has participated in Pascal VOC, ImageNet and other world famous artificial intelligence competitions as a leader for many times and won the champion.

During his PhD, he published many academic papers in top international academic conferences and journals. Since joining 360 at the end of 2015, Jian Dong has participated in and led a number of computer vision and big data projects as a main technical staff member.

Thanks to Chen Si for curating this article.