The author | Microsoft Cortana AI and ML team
The translator | Debra
Edit | Emily
AI Front Line introduction:There are many popular deep learning frameworks in the community, but TensorFlow, Julia, CNTK… Which is convenient and efficient to use? Microsoft’s CortanaAI and ML teams compared several mainstream deep learning frameworks and made the results available on GitHub. So what were the results of the tests?






Please pay attention to the wechat public account “AI Front”, (ID: AI-front)

Repo 1.0 GitHub

https://github.com/ilkarman/DeepLearningFrameworks

We think of deep learning frameworks like languages: English is used by many people, of course, but each language has its own uses. We created common code for several different frameworks and used these languages in many different frameworks. The idea is to create a deep learning framework similar to Rosetta Stone, which allows people to freely use different frameworks. Problems arise when papers are published on a completely new framework code or language, and it’s easier to use an “exotic” language than to write models from scratch in your favorite framework.

We would like to thank the CNTK, Pytorch, Chainer, Caffe2 and Knet teams and all our friends from the open source community for their contribution to this REPO over the past few months.

In summary, our goal in releasing this REPO is to create:

  1. Rosetta Stone, a deep learning framework that allows data scientists to easily apply their expertise to different frameworks.

  2. A set of GPU code optimized using the latest and highest level apis.

  3. A common setting for comparing different Gpus (probably CUDA versions and precision).

  4. Common Settings for comparing languages (Python, Julia, R).

  5. The possibility of verifying the expected performance of the framework you are using.

  6. Collaboration between different open source communities.

Deep learning framework test results

In the following chapters, we will review the training time of each CNN model, the feature extraction of the pre-trained ResNet50 model, and the training time test results of each RNN model. Our experiment was conducted on an Azure deep learning VIRTUAL machine using K80 and the newer P100.

Training time (in seconds) : USE CIFAR-10 to train CNN (VGG type, 32 bits) — image recognition

The input to the model was a standard CIFAR-10 dataset containing 50,000 training images and 10,000 test images, evenly divided into 10 classes. Each 32×32 image was set as a tensor (3,32,32) and the pixel intensity was reset from 0-255 to 0-1.

Caffe2:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Caffe2_Inference.ipynb

Chainer:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Chainer_Inference.ipynb

CNTK:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/CNTK_Inference.ipynb

Keras (CNTK) : https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Keras_CNTK_Inference.ipynb

Keras (TF) : https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Keras_TF_Inference.ipynb

Tensorflow:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Tensorflow_Inference.ipynb

MXNet:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/MXNet_Inference.ipynb

PyTorch :https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/PyTorch_Inference.ipynb

Julia – Knet:https://github.com/ilkarman/DeepLearningFrameworks/blob/master/notebooks/Knet_Inference.ipynb

Average test time in seconds for 1000 images: RESNET-50 — feature extraction

When avg pooling ends in (7,7), a pre-trained ResNet50 model is loaded and truncated, and a 2048D dimension vector is output. This is where you can insert a SoftMax layer or other classifiers (such as a promotion tree) for transfer learning. This time forward only to the AVG_pool layer is timed to allow for hot start.

Note: The batch size remains the same, but filling the GPU with RAM further improves performance (over gpus with more memory).

Training time (in seconds) : RNN (GRU) on IMDB — Emotion analysis

The input to the model is the standard IMDB movie review data set, which contains 25,000 training reviews and 25,000 test reviews, uniformly divided into two categories (positive/negative cases). Use Keras’s method, where the start character is set to 1, the out-of-term (using a 30K word size) is represented as 2, and the word index starts at 3. Zero padding/truncation to 150 words per comment.

* indicates no experiment.

Lessons learned
  1. Use auto-tune: Most frameworks using cuDNN cudnnFindConvolutionForwardAlgorithm () to carry on the exhaustive search, and optimization for fixed size image convolution forward propagation algorithm, usually this option is enabled by default, but some of the frame may need to manually tag, As the torch. Backends. Cudnn. Benchmark = True identity, etc.

  2. Use cuDNN as much as possible: CuDNN is usually called for Vanilla RNNs (e.g. GRU/LSTM) Wrapper (https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/) speed, e.g., Cudnn_rnn.cudnngru () instead of rnn.grucell (). The downside of this approach, however, is that it makes it harder to reason on the CPU later.

  3. Match Shapes: When run on cuDNN, matching NCHW’s CNNs and TNC’s RNNs channel order reduces tuning time and allows the user to perform matrix multiplication directly.

  4. Native generators: Local generators that use a framework and are enhanced or even preprocessed asynchronously (e.g., shuffling) by linearity, thereby increasing speed.

  5. During the reasoning phase, be sure to mark up places where unnecessary gradient calculations may be saved, and ensure that the Batch-norm and dropout layers are applied correctly.

Initially, we had to use a number of tricks and tricks to create this REPO to ensure that we were using the same model across frameworks and that it was done optimally. However, these frameworks have evolved at an incredible rate over the past few months, and many have been updated, so many of the optimizations we developed in 2017 are now outdated.

For example, the Keras channel sequencing with a TF backend is hardcoded to Channels-last (not optimal for cuDNN), so specifying a priority channel means that it is retuned after each batch (hardcoded values) processing, slowing down training considerably. Tf-enabled Keras now allows local channel priority. We can speed up Tensorflow by specifying a flag to use The Winograd algorithm for convolution, but such methods no longer work. If you are interested, check out the results of our early REPO.

By implementing end-to-end solutions in different frameworks, you can compare different frameworks in a number of ways. Since each framework uses the same model architecture and data, the accuracy of all frameworks is about the same (in fact, this is a way for us to test our code to ensure that different frameworks use the same model!). In addition, Notebook is developed in a way that makes it easy to compare different frameworks, not necessarily in terms of speed.

Of course, while it is easy to compare different frameworks in terms of speed, reasoning time, etc., the results do not imply any problems with the overall performance of the frameworks, as this approach omits comparisons between important dimensions, such as: Help and support, availability of pre-trained models, custom layers and architectures, data loaders, debugging, different platforms supported, distributed training, and more! This approach only shows how to create the same network in different frameworks and the performance of the framework in the example.

Traveling Companion of Deep Learning Framework

There are many popular deep learning frameworks in the community that help AI developers and data scientists solve problems using different deep learning frameworks in a variety of situations. Among them, the Open source Open Neural Network Exchange (ONNX, https://github.com/onnx/onnx) become deep learning model interoperability between different framework of standards. For example, ONNX can be useful when you are using one framework for model development but need to evaluate the model in another framework. Similarly, MMdnn (https://github.com/Microsoft/MMdnn) as a set of tools that can help users direct conversion between different frameworks, and the model architecture visualization processing.

Travel companions for deep learning frameworks such as ONNX and MMdnn are like automatic machine translation machines. By contrast, the full 1.0 version of REPO we are releasing today is like Rosetta Stone for a deep learning framework, showing how to build models across frameworks. Many hands make light work, and the combined efforts of all will enable all deep learning developers to “swim” better in a multilingual environment.

Original link:

https://blogs.technet.microsoft.com/machinelearning/2018/03/14/comparing-deep-learning-frameworks-a-rosetta-stone-approa ch/



For more content on dry goods, please pay attention to AI Front, ID:
ai-front“, background reply”
AI“,”
TF“,”
Big data“The AI Frontier series of MINI-books and skill Maps are available in PDF.