Yesterday, Facebook launched Caffe2, an open source deep learning framework for performance, speed and modularity. It follows a number of Caffe designs and addresses bottlenecks identified over the years in Caffe’s use and deployment. Finally, Caffe2 opens the door to algorithmic experimentation and new products. With in-house use for various deep learning and augmented reality tasks, Caffe2 has been forged on Facebook’s needs for scale and performance. At the same time, it offers impressive new features for mobile apps, such as advanced cameras and instant messaging. Nvidia, a Caffe2 development partner, plans to launch a series of blog posts on Caffe2’s deep learning app. This is the first of a series of blog posts that introduce Caffe2’s deep learning basics and prove its flexibility and speed. It will also explain why you want to use Caffe, what makes Caffe different from Caffe and give a Caffe2 use case using a pre-training goal classification model.


Code once, run at will


While maintaining scalability and high performance, Caffe2 also emphasizes portability. “Portability” is often overhead — how does it work on many different platforms? How does overhead affect the ability to scale? Caffe2 certainly takes this into account and has been designed with performance, scalability and mobile deployment as its main objectives from the very beginning. Caffe2’s core C++ library provides speed and portability, Its Python and C++ apis make it easy to prototype, train, and deploy on Linux, Windows, iOS, Android, and even Raspberry Pi and NVIDIA Tegra. You may ask: What about the Internet of things? Caffe2 will be suitable for a large number of devices. Although you don’t want to train your network on an iot device, you can deploy trained models on top of it.


Caffe2 does not miss this opportunity when the GPU is available. With the partnership between Facebook and nvidia, Caffe2 has been able to take full advantage of nvidia’s GPU deep learning platform. Caffe2 uses the latest Nvidia Deep Learning SDK libraries — cuDNN, cuBLAS and NCCL — for high-performance, multi-GPU accelerated training and reasoning.


Most of the built-in functions switch seamlessly between CPU mode and GPU mode depending on the running state. This means you can enjoy the super acceleration of deep learning without extra programming. This brings us to another exciting aspect of Caffe2: multi-GPU and multi-host processing. Caffe2 makes parallel network training easy and now it’s easy for you to experiment and expand.


The most recent ImageNet training benchmark uses 64 of the latest Nvidia Gpus and ResNET-50 neural network architectures. Facebook Engineers implement Caffe2’s Data_Parallel_model (github.com/caffe2/caff…) Distributed neural network training was performed on Facebook’s 8 Big Basin AI servers (each equipped with 8 Nvidia Tesla P100 GPU accelerators for a total of 64 Gpus). Figure 1 is the scaling result of these systems: a near-linear deep learning training scaling with 57x throughput acceleration.


Figure 1: Caffe2 expansion coefficients for training on up to 64 Nvidia Tesla P100 GPU accelerators using the Resnet-50 model

Figure 1: Caffe2 expansion coefficients for training on up to 64 Nvidia Tesla P100 GPU accelerators using the Resnet-50 model



Caffe2’s new features


You may remember that in Caffe everything is represented as a “Net,” which is made up of “layers” that define computing in a neural network centralization. However, this creates a very rigid computing pattern and leads to many hard-coded routines, especially in the area of deep neural network training.


Caffe2 uses a more modern computation Graph to characterize neural networks or other computations including clustering and data compression. The graph uses the concept of an operator: each operator contains the logic necessary to compute inputs, given the appropriate number and type of inputs and parameters. Although the layers in Caffe always use tensors (matrices or multi-dimensional arrays), the operators in Caffe2 can take and produce “blobs” containing arbitrary objects. This design makes many things possible that were not possible in Caffe in the past:


  • CNN distributed training can be represented by a single computational graph, whether it is trained on one or more Gpus or on multiple machines. This is critical for facebook-scale deep learning apps.

  • Easy heterogeneous computing on specialized hardware. For example, on iOS, the Caffe2 graph can take an image from the CPU, convert it to a Metal GPU cache object, and leave the computation entirely on the GPU for maximum throughput.

  • Better management of run-time resources, such as optimizing static memory with memonger, or prepacking training networks for optimal performance.

  • Float, FLOAT16, INT8 mixing accuracy and other quantization model calculation.


Caffe2 has over 400 operators and offers a wide range of functions. You can go through the Operators Catalogue, view Sparse Operations and learn how to write Custom Operators.


Installation and Setup


The first thing you should do is check out Caffe2’s GitHub home page, Clone or fork the Project’s GitHub repo.

git clone https://github.com/caffe2/caffe2.git
Copy the code

If you can’t Install Caffe2, you can view the following installation guide: Install, try this Docker mirror: hub.docker.com/r/caffe2ai/… Or run it on your cloud provider of choice. Its documentation also provides instructions for each option. However, we recommend that you validate GPU processing speed by building gPU-enabled cloud instances. Here is a quick way to build Caffe2 with Docker GPU support:

nvidia-docker run -it caffe2ai/caffe2 python -m caffe2.python.operator_test.relu_op_test
Copy the code

Try a pre-trained model


Now let’s give it a try! In this first tutorial I will show you how to easily use Caffe2’s Model Zoo and Model downloader to help you experiment with some other models yourself. Model Zoo link: Caffe2 Model Zoo


Use Caffe2’s model downloader


This is a download module (github.com/caffe2/caff…) , you can use it to get pre-trained networks. You can incorporate this module into your scripts, or use it from the command line:

python -m caffe2.python.models.download <model name>
Copy the code

For example, this command can download squeezenet pretraining model:

python -m caffe2.python.models.download squeezenet
Copy the code

Once you’ve downloaded Squeezenet, you can load it. The model loader module has an install option that you can turn on with -i. Otherwise, you’ll need to move the file yourself after downloading it. Once installed, you can also import these models directly into your Python scripts:

python -m caffe2.python.models.download -i squeezenet
Copy the code

Run a pre-training model: target classification


Let’s try our Goal triage with Caffe2. This is easy to do if you have downloaded a pre-training model. If you haven’t already downloaded Squeezenet, you can download it using the above method, as well as init_net.pb and predict_net.pb files from S3.


Init_net. Pb:S3.amazonaws.com/caffe2/mode…

Predict_net.pb:s3.amazonaws.com/caffe2/mode…

Put the downloaded file in $PYTHONPATH/caffe2 / python/models/squeezenet folder. Your Python code needs Caffe2’s Workspace to store the protobuf loads and weights for the model and load them into blob, init_net, and predict_net. You will need workspace Predictor to receive the two protobufs and then leave the rest to Caffe2. Caffe2 has a simple run function that can input images and analyze them and then return a tensor with the result.

# load up the caffe2 workspace from caffe2.python import workspace # choose your model here (use the downloader first) from caffe2.python.models import squeezenet as mynet # helper image processing functions import caffe2.python.tutorials.helpers as helpers # load the pre-trained model init_net = mynet.init_net predict_net = mynet.predict_net # you must name it something predict_net.name = "squeezenet_predict" workspace.RunNetOnce(init_net) workspace.CreateNet(predict_net) p = workspace.Predictor(init_net.SerializeToString(), predict_net.SerializeToString()) # use whatever image you want (local files or urls) img = "Https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/Orange-Whole-%26-Split.jpg/1200px-Orange-Whole-%26-Split.jpg"  img = "https://upload.wikimedia.org/wikipedia/commons/a/ac/Pretzel.jpg" img = "https://cdn.pixabay.com/photo/2015/02/10/21/28/flower-631765_1280.jpg" # average mean to subtract from the image mean =  128 # the size of images that the model was trained with input_size = 227 # use the image helper to load the image and convert it to NCHW img = helpers.loadToNCHW(img, mean, input_size) # submit the image to net and get a tensor of results results = p.run([img]) response = helpers.parseResults(results) # and lookup our result from the list print responseCopy the code

The result is a probability tensor (a multidimensional array). In essence, each line represents the probability that the target matches what the neural network recognizes.


Note that when the workspace Predictor function is called to load the pretrained model, the next step is to call.run and pass the function an array of images.

p = workspace.Predictor(init_net, predict_net)
results = p.run([img])
Copy the code

Image preprocessing


For faster processing speed and for traditional reasons, the image also needs to go through two steps of transformation before being fed into Caffe2:


1. Convert the color from RGB to BGR

2. Encapsulate the image into an array of pixels and provide the Number of images in the batch (in this case, 1) and the Number, Height, and Width of Channels (pixels arranged by BGR), which are called NCHW for Number, Channels, Height, and Width, respectively.


These image preprocessing functions are handled by a Helper module, so you can focus only on caffe2-specific interactions. For a more in-depth look at image preprocessing, see the related IPython notes: Caffe2 / Caffe2


Get the results


When the model finishes processing the image array, you get a multidimensional array of the form (1, 1, 1000, 1, 1)

results = np.asarray(results)
print "results shape: ", results.shape

results shape:  (1, 1, 1000, 1, 1)
Copy the code

See that 1000 in results.Shape? If there is more than one image in the batch, the array is larger, but there are still 1000 cells in the middle. It stores the probabilities of each category in the pretraining model. So when you look at the results, it’s like saying, “Computer, what’s the probability that this is a Beryllium ball?” Or is it a venomous lizard or one of the 998 other categories the model has been trained to recognize.


This is a first three results extracted from a tensor of length 1000, which have been shrunk and sorted. The results were ranked in order of probability of a match, with 0.98222 (98%) being the highest.


[array([985.0, 0.9822268486022949], DTYPE =object), ARRAY ([309.0, 0.011943698860704899], DTYPE =object), Array ([946.0, 0.004810151644051075], dtype = object)]

These are the first three categories in order of probability, indicating the probability that the detected target falls into a certain category. You can use this gist to see the results:Gist.githubusercontent.com/aaronmarkha…. Each time you run this example, you will get slightly different results. Run a picture with some daisies and the model will get:


In addition to the highest probability of correct answers, the second and third were bees and prickle thistles, which makes sense given that bees and flowers often appear in the same photo.


Enter a photo of a sliced orange and you get oranges (95.3%), lemons (4.6%), strawberries (0.006%).


Caffe2 cooperation and sharing


Caffe2’s development is based on a community of developers, researchers and companies interested in deep learning and who have used Caffe and other open source machine learning tools. Through open source on Caffe2 and innovative collaboration on Model Zoo, we hope to advance the science of artificial intelligence and promote the benefits of various industries. Caffe2 project members can contribute directly to the Caffe2 Github Wiki which lists all models: Caffe2 / Caffe2.


We also invite developers, researchers and anyone interested in creating or fine-tuning models to share on the Caffe2 GitHub ‘issue’ page, Caffe2 / Caffe2, and ask to add questions to the Zoo. Also, the Github ‘issue’ section is not just for Caffe2 developers. If you build the Caffe2 Model, improve the pre-training Model, or even just use the pre-training Model, you can also input, suggest and contribute to this resource and the Model Zoo in this section. Further information about the Caffe2 and Model Zoo partnership can be found at Caffe2.


Study Caffe2 at the GTC conference


This section contains our first Parallel Forall blog. The following blog will provide an in-depth look at how to use Caffe2 training ImageNet, in which we present some exciting new benchmarks for distributed training, tips for optimising Caffe2 training and details on how to use Caffe2’s data parallel model.


In addition, GTC is an annual conference for AI and GPU developers held in SAN Jose from May 8 to 11, and Machine Heart will be the invited media to report the highlights of the conference.

Compiled from Devblogs. Nvidia Machine Heart