This section conducts a comparative study on five open source deep learning frameworks, focusing on three dimensions: hardware support rate, speed and accuracy rate, and community activity. They are: TensorFlow, Caffe, Keras, Torch and DL4j.

2.3.1 Hardware Support rate

The hardware utilization studied in this section refers to the support efficiency and general performance of hardware supported by different open source deep learning frameworks under different CPU/GPU configurations.





Table 2.1 shows the general support performance of each framework for different hardware.

2.3.2 Speed and accuracy

This section measures the sum of gradient calculation time, feedforward propagation time and feedback propagation time without subdividing each item. And all test data are based on CPU.

Model.

In this section, Fully Connected Neural Network (FCNN) is selected as the speed testing model of deep learning framework. FCNN is regarded as a feedforward multilayer sensing network, which means that the connections between network neurons are one-way and do not contain circular connections, so it is easy to obtain time data. FCNN is mainly used for data classification, so it is suitable for comparison of accuracy under different frameworks.

Data set.

In this section, MNIST handwritten digital image set is selected as the data set of FCNN to test different frameworks. The MNIST dataset consists of 6000 training image sets and 1000 test image sets, all handwritten digital images of 28X28 pixels.

Test method.

The objective of this section is to compare and test the convergence time of fCNn-type neural networks in different frames and the accuracy of pre-trained networks in predicting classification results in different frames. The following aspects are mainly investigated: 1. Convergence rate; 2. Predicted time; 3. Classification accuracy; 4. Source code size;

In order to evaluate the extensibility of the model, different extensibility factors were used to measure the 1-3 points above. The neural network structure was tested at two scales: 1. The same number of neurons was used to change the “depth” of the network (see Figure 2.10); 2. 2. Use the same number of layers to change the “width” of the network (see Figure 2.11);





Figure 2.9 “depth” changed neural network





FIG. 2.10 Neural network with “width” changed

Test results.

Figure 2.11-Figure 2.14 shows the training time, prediction time and classification accuracy of FCNN based on the Tanh nonlinear activation function of each frame. The Epoch for all trials was set to 10.





Figure 2.11 Training time of TANH-activated FCNN under the condition of changing “depth”





Figure 2.12 Prediction time of TANH-activated FCNN under the condition of changing “depth”





Figure 2.13 Classification accuracy of TANH-activated FCNN under the condition of changing “depth”

Similarly, Figure 2.14-Figure 2.16 shows the training time of FCNN based on the application of ReLU non-linear activation function in each frame.





Figure 2.14 Training time of RELU-activated FCNN under the condition of changing “depth”





Figure 2.15 Predicted time of RELU-activated FCNN under the condition of changing “depth”





Figure 2.16 Classification accuracy of RELU-activated FCNN under the condition of changing “depth”

The following experiments investigate the speed and accuracy of FCNN on different frames when the size of network hidden layer (such as the number of neurons) is changed as shown in Figure 2.10. The test results are presented in FIG. 2.17- FIG. 2.19 in the same manner.





Figure 2.17 Training time of RELU-activated FCNN under the condition of changing “width”





Figure 2.18 Prediction time of RELU-activated FCNN under changing “width”





Figure 2.19 Classification accuracy of RELU-activated FCNN under the condition of changing “width”

We measure the complexity of a deep learning framework by combining the amount of code implemented by relevant algorithms with the interface language. The complexity comparison of each framework is shown in Table 2.1 and Figure 2.20.





Table 2.1 Complexity of each framework





Figure 2.20 Lines of code representing complexity

2.3.3 Community activity

Speed is an important metric to measure the performance of source deep learning frameworks, as is the number of contributors to each open source deep learning framework and the activity of the open source community. Whether for academic research or industrial project development and deployment, community activity is closely related to knowledge acquisition and development costs.

The number of Watch, Star and Fork of GitHub community project can reflect the activity degree of each deep learning framework (as shown in Figure 2.21-2.23). Watch represents the number of views of each frame, Star represents the number of community users’ likes on the frame, and Fork refers to the number of copies of the frame.





Figure 2.21 Number of Watches of each open source deep learning framework in GitHub community





Figure 2.22 Star number of open source deep learning frameworks in GitHub community





Figure 2.23 Fork count of open source deep learning frameworks in GitHub community

When stepping out of the deep learning framework itself and searching GitHub for projects, notes and discussions based on each framework, Figure 2.24-Figure 2.26 shows the active situation of projects based on each framework.





Figure 2.24. The GitHub community based on each open source deep learning framework





Figure 2.25 GitHub community Commits based on open source deep learning frameworks





Figure 2.26 GitHub community Commits based on open source deep learning frameworks

2.3.3 Industrial performance ability

Open source deep learning frameworks not only provide strong support for academic research, but also provide numerous solutions for industrial task solving. This section measures the performance of open source frameworks in industrial production in terms of model expressiveness, interfaces, deployment, performance, and architecture.





Figure 2.27 Languages supported by each framework





Table 2.2 Industrial Capability Score of each Framework (GitHub)

Network and modeling capabilities

Caffe is the most popular toolkit in computer vision, with many extensions but poor support for recursive network and language modeling. In addition, layers need to be defined using C++ in Caffe, whereas networks use Protobuf definitions.

TensorFlow is an ideal RNN API and implementation. The graph method of vector operation makes it quite easy to specify new networks. However, it does not support bidirectional RNN and 3D convolution. Because you have to use Python loops and you can’t do graph compilation optimization.

Theano supports most advanced networks, and many research ideas are derived from Theano, which has led the trend in the use of symbol diagrams in programming networks. Theano’s notation API supports loop control, making RNN implementation easier and more efficient.

Torch’s support for convolutional networks is excellent, with a native interface for time-domain convolution making it intuitive to use. Torch supports a large number of RNNS through unofficial extensions, and there are many ways to define the network. But Torch essentially defines the network in terms of layers, and its coarse-grained approach makes it less supportive of extending new layer types. Compared to Caffe, defining new layers in Torch is very easy, no C++ programming is required, and the difference between layers and the way the network is defined is minimal.

interface

Caffe supports the Pycaffe interface, but only as an aid to the command line interface, and even with Pycaffe you must use protobuf to define the model.

TensorFlow supports both Python and C++ interfaces. Users can experiment in a relatively rich high-level environment and deploy models in environments that require native code or low latency.

Theano supports the Python interface.

The Torch runs on LuaJIT, which is fast compared to industrial languages such as C++, C#, and Java, allowing users to write any type of computation without worrying about performance, but Lua is not a mainstream language.

Deployment model

Caffe is C++ based, compiles on multiple devices and is cross-platform, making it an ideal choice for deploying projects.

TensorFlow supports C++ interfaces and can be compiled and optimized based on the ARM architecture. Users can deploy mature models on multiple devices without having to implement a separate model decoder or load a Python/LuaJIT interpreter.

Theano lacks an underlying interface, and its Python interpreter is inefficient.

Torch’s model running required LuaJIT support, which was a major barrier to integration.

performance

Caffe is simple and fast.

TensorFlow only used cuDNN V2, but even then its performance was 1.5 times slower than that of the Torch, which also used cuDNN V2, and training GoogleNet ran out of memory at a batch size of 128.

Theano performs as well as the Torch7 on large networks. But it took too long to start up because of the need to compile C/CUDA code into binary. In addition, Theano’s import is time consuming and there is no getting rid of pre-configured devices after the import.

Torch is very good and has no problems with TensorFlow and Theano.

architecture

Caffe’s main disadvantage is that layers need to be defined using C++ while models need to be defined using protobuf. In addition, users must implement additional functions if they want to support cpus and Gpus; For custom layer types, an ID must also be assigned and added to the PROto file.

TensorFlow’s architecture is clear, modular and supports multiple front-end and execution platforms.

Theano’s entire code base is in Python, and even C/CUDA code is packaged as Python strings, making it difficult to navigate, debug, refactor, and maintain.

The Torch7 and NN class libraries have clear designs and modular interfaces.

2.2.4 conclusion

1. Hardware utilization of each deep learning framework:

The Torch is most widely used with multi-threaded cpus;

TensorFlow is the most flexible and available under the condition of multiple Gpus.

2. Speed of each deep learning framework:

Under the condition of network “depth” change, Keras has the fastest training speed, and TensorFlow has the fastest prediction response speed.

Caffe had the fastest training speed when the “width” changed, TensorFlow had the fastest predicted response speed when the “width” changed, Keras had the fastest response speed when the “width” changed, and TensorFlow followed.

3. Accuracy of each deep learning framework:

When the network “depth” changes, the classification accuracy of TensorFlow and Torch decreases with the network “depth” increasing.

When the “width” of the network changes, TensorFlow’s classification prediction accuracy is relatively stable, exceeding CaffeTorch’s.

Regardless of the network “depth” or “width” changes, Keras has a very stable accuracy for classification prediction, which is superior to other frameworks and has the best prediction accuracy.

4. Community activity of each deep learning framework:

TensorFlow can be defined as the “most popular” and “most recognized” open source deep learning framework. The number of stars, forks, and tensorflow-based items retrieved on GitHub far exceeds that of other frameworks, or even the combined resources of other frameworks.

5. Industrial expression ability of each deep learning framework:

Caffe has excellent model presentation and industrial deployment skills, especially in computer vision, but poor support for RNN and language modeling. Caffe is suitable for visual task processing, especially for industrial projects based on deep learning, with undisputed production stability; But its lack of flexibility makes it more difficult to change the network structure than other frameworks, and Caffe’s documentation is poor, making it harder to read code than other frameworks.

Tensorflow has good model expression ability, excellent interface and clear internal framework, suitable for industrial project deployment, but its speed performance is not advantageous; TensorFlow supports distributed computing to maximize the performance of hardware devices. Its code readability and community presence make it easy for both academic research and industrial production;

Keras has good speed performance, model expression ability, and is simple and easy to use – you can build a neural network in just a few lines of code. Keras is fully documented, making it easy to learn and use — even if you’re not familiar with Python. It is more suitable for academic research, experiments or lightweight industrial tasks (such as obtaining eigenvalues);

The Torch has excellent speed, but it uses Lua;

DL4j is jVM-compatible and also works with Java, Clojure, and Scala;