Google Developer Days (GDD) is a global event that showcases Google’s latest Developer products and platforms. It’s designed to help you quickly develop great apps, build and retain an active user base, and make the most of your tools to make more money. The 2018 Google Developers Conference was held in Shanghai on September 20 and 21. Google Developer Conference 2018 Gold Nuggets at 👉

TensorFlow Lite: A Lightweight Cross-platform Solution for TensorFlow on Mobile and Embedded Devices was delivered by Ling Yucheng, a Software engineer at Google Brain, on September 21, 2018.

Running machine learning on terminals/devices is increasingly important

Today, machine learning is developing rapidly. Machine learning is not only deployed in the server side, running on personal computers, but also exists in many small devices in our life, such as mobile devices and smart phones. OK Google on smartphones can set an alarm clock by voice, a machine learning combination of keyword detection and speech recognition. Another example is Google’s Photo app, which can make photos with blurred backgrounds and clear portraits through machine learning. These machine learning applications on mobile devices and smart phones are very useful and interesting.

There are two ways to implement machine learning on mobile devices. One is to collect data on the device, pass it to the cloud, where the server performs machine learning tasks and finally sends the results back to the device. Another approach is to run everything on the end device, including the machine learning model.

Running machine learning on terminals has many advantages:

  • No network latency
  • No network connection required
  • Data stays at the terminal
    • It doesn’t cost bandwidth to upload data
    • In some cases, it saves electricity
  • Direct access to sensor on terminal

However, running an application on a terminal is difficult, with the following limitations:

  1. Mobile terminals have smaller memory
  2. There are requirements for power saving
  3. Less computing power

It is even more difficult for machine learning. The machine learning models we develop for servers are usually large and require large memory. The models are complex and require more power consumption and computing power.

What is TensorFlow Lite

TensorFlow Lite is TensorFlow’s cross-platform solution for machine learning on mobile devices. It features low latency, minimal runtime library, and a range of tools to transform, debug, and optimize models.

One of the advantages of developing applications using TensorFlow Lite is that they are very responsive. For example, developing photo processing apps does not need to transfer photos to the cloud, but can be processed directly on the terminal. The second advantage is that it can be used offline, which is especially important in areas with poor Internet access.

Tensorflow Lite is highly portable and has been successfully ported on the following platforms:

  • Android, iOS,
  • Raspberry PI, and other Linux SoCs
  • Microprocessors (including systems with no operating system and no POSIX environment)
  • Also can run in PC, Mac, easy debugging
Processes using TensorFlow Lite

Optimization of TensorFlow Lite

Compared to TensorFlow, TensorFlow Lite has the following optimizations:

  • Compression model: Reduce the size of the model
  • Quantization: The TensorFlow model contains a large number of matrices whose values are usually of 32-bit float data type. Quantization is the representation of these 32-bit floating-point numbers in 8-bit bytes.
  • CPU OPS fusion: optimized specifically for the ARM Neon instruction set, for example
  • Optimized SIMD computing core

TensorFlow Lite is also tightly integrated with hardware accelerators, supporting the following types of hardware accelerators:

  • GPUs
  • Edge-TPUs
  • NNAPI supported hardware accelerators

Status of GPUs support:

  • Android: GPU acceleration based on OpenGL
  • Release of Binary is expected in the fourth quarter of 2018
  • MobileNet and other image models can be accelerated

About Edge TPUs built by Google:

TensorFlow Lite can use parameters to determine the runtime Library size. The basic Interpreter 80K, with all the built-in Ops kernel 750KB, is pretty small. Further optimizations can be made to allow for different ops for different models, such as registering only the ops you need so that the rest of the OPS are not compiled into the Runtime Library and can be further reduced in size.

Developer Feedback

TensorFlow Lite has been used by many developers and received a lot of positive feedback:

  • Cross-platform deployment
  • Faster inference
  • Smaller Runtime Library
  • Hardware acceleration

Some suggestions for improvement were also collected:

  • TensorFlow Lite is easier to use
  • Increase the number of supported Ops
  • Enhanced model optimization tools
  • More documentation, sample source code…

More on the TensorFlow Lite team’s improvements to these issues will follow.

Who is using TensorFlow Lite

And…

How to use TensorFlow Lite

TensorFlow Lite is very easy to use. The following steps are recommended:

Use the Demo App
  • Download: from https://www.tensorflow.org/mobile/tflite to download a demo app (iOS/Android)
  • Build: Simply build demo apps on your machine
  • Run: Run the Demo app and try to modify it
Pretrained & Retrained model

Pre-training Models: Tensorflow offers machine learning models for different pre-training, such as image classification, object detection, image segmentation, word prediction, and more.

Retraining models: Please try to transfer learning Colab tutorial – Tensorflow for Poets. As the name suggests, this tutorial is designed to be as simple as possible for people with no technical background to run. Transfer learning is the retraining of a small part of an existing model to apply to a new problem.

Develop your own model

  1. Build & train models

    TensorFlow (Estimator or Keras) is used to build the Model, train and obtain the Saved Model.

  2. Transformation format

    Use the TensorFlow Lite converter to convert to a TensorFlow Lite available model. The model conversion code is as follows:

import tensorflow.contrib.lite as lite

graph_def_file = "/ path/to/Downloads/mobilenet_v1_1. 0 _224 / frozen_graph. Pb"
input_arrays = ["input"]
output_arrays = ["MobilenetV1/Predictions/Softmax"]

converter = lite.TocoConverter.from_frozen_graph(
            graph_def_file, input_arrays, output_arrays)
tflite_model = converter.convert()
open("converted_model.tflite"."wb").write(tflite_model)
Copy the code
validation

With model visualization tools:

Due to the limited Ops supported by TensorFlow Lite, some Ops may not be supported after model transformation. The TensorFlow Lite team plans to support more Ops

  • There are already 75 built-in Ops
  • Coming Soon: Tensorflow Lite Compat mode
  • The release is scheduled for q4
  • Add hundreds of supported Ops

Verify model and analyze performance:

  • Whether the transformed model is correct
  • How fast does the model extrapolate
  • How big the Runtime library

More details, please refer to: www.tensorflow.org/mobile/tfli…

The deployment of

Python API examples:

interpreter = tf.contrib.lite.Interpreter(
    "/tmp/awesome_model.tflite")
input = interpreter.get_input_details()[0] ["index"]
interpreter.set_tensor(input, np.array([1.2.3]))
interpreter.invoke()
prediction = interpreter.get_tensor(output)
Copy the code

Java API examples:

import org.tensorflow.lite.Interpreter;
try {
    Interpreter tflite = new Interpreter(
        "/tmp/awesome_model.tflite");
    // Fill the inputs and output buffers
    // ...
    // Invoke the interpreter
    tflite.run(inputs, outputs);
}
Copy the code
Addendum: model optimization

In real projects, model optimization is often also done, which is added to the previous process to evolve into the following process:

TensorFlow Lite provides a number of tools to help developers optimize their models:

  • After training Quantization

The latest release, which has the advantage of being simple to use, simply adds a line to the previous Converter code:

converter.post_training_quantization = True
Copy the code

Quantization will cause the loss of model accuracy, but it is observed that it has little influence on the prediction accuracy of image and audio models. After quantization, CNN model can increase performance by 10-50%, and RNN can increase performance by 3 times.

  • When training Quantization

The tool, released a year ago, can theoretically provide better accuracy than post-training Quantization, but it is more complex to use and only works well in CNN models, and only modestly in RNN models.

The team is studying both, hoping to improve on both.

Demo

An object detection model implemented by Raspberry PI can detect objects by rotating the camera to follow the target

A system that uses Google Edge TPU to demonstrate the real-time processing power of TensorFlow Lite

TensorFlow Lite on very low hardware configurations using ARM microprocessors

Shows the real-time video processing capabilities of TensorFlow Lite

Looking to the future

Compat is a short for Compatibility mode. Compatibility mode is used to prompt TensorFlow Lite whether to enable Compat mode when it encounters unsupported Ops. In this way, more Ops (600+) are supported. But it also means that the Runtime Library will grow and developers will have to make a trade-off between functionality and size.

In addition, TensorFlow Lite move, from the original TensorFlow/contrib/Lite /… The ascent to tensorflow/lite /… This means that TensorFlow Lite is officially supported as an official project.

Code: github.com/tensorflow/tensorflow document: tensorflow.org/mobile/tflite/ discussion: [email protected]Copy the code

That is all the content of this speech, I hope it will be helpful to you. Read more Google Developer Conference 2018 tech dry goods