Recently, I learned a lot from Mr. Zhang Shaowen’s “Android Development Master Course”, especially the strategies to deal with problems and the breadth of knowledge, which inspired me a lot, and also provided directions for my future study.

There are two trends in technology right now. One trend is that the Internet of Things makes possible the “Internet of everything” with the development of 5G networks. The Era of the Internet of Things is also a world where cloud computing and edge computing coexist, and edge computing will play an important role. Edge computing is a computing paradigm opposite to cloud computing, which refers to data processing and analysis at the edge nodes (terminal devices) of the network. Edge computing is important for the following reasons: First, in the Era of the Internet of Things, there will be billions of terminal devices continuously collecting data, which will bring more computing than cloud computing can bear; Second, the computing power of terminal devices is increasing, and not all computing needs to be done in the cloud. In addition, the mode of sending data back to the cloud computing and then sending the results back to the terminal inevitably has delays, which not only affects the user experience, but also unacceptable in some scenarios. Finally, as users attach importance to data security and privacy, data processing and calculation will also be required at the terminal.

Another trend is the rapid development of artificial intelligence technology. In recent years, artificial intelligence technology represented by deep learning has made breakthrough progress, which has not only aroused people’s interest and concern in the society, but also reshaped the pattern of business. The great value of artificial intelligence technology lies in that, with the help of intelligent algorithms and technologies, the work that must be completed manually before can be machine-made, and the scale of machines can liberate manpower, break through the bottleneck of human production factors, so as to greatly improve production efficiency. In the business world, “scale” technology will unleash tremendous energy, create tremendous value, and even reshape business competition and expand the outer boundaries of the entire business.

Since the edge computing has computing power, in many application scenarios, there will be a need for intelligent computing. The two are combined together to form the edge intelligent computing.

Development techniques for mobile machine learning

At present, the application technology of mobile terminal machine learning mainly focuses on image processing, natural language processing and speech processing. In terms of specific applications, including but not limited to object detection of video images, language translation, voice assistant, beauty, etc., there is a huge demand for applications in autonomous driving, education, healthcare, smart home and Internet of things. At present, commercial deep learning technologies are mainly as follows:

The framework	Research and development company	Support platform
TF_lite	Google	ARM
Caffe2	Facebook	ARM
TF_lite	Xiaomi	ARM, DSP, and GPU
paddle-mobile	Baidu	ARM, GPU,
FeatherCNN	Tencent	ARM
NCNN	Tencent	ARM

1. Computing framework

Since deep learning algorithm has many specific algorithm operators, in order to improve the efficiency of mobile terminal development and model deployment, various manufacturers have developed computing frameworks for mobile terminal deep learning, such as Google’s TensorFlow Lite, Facebook’s Caffe2, etc. It’s important for Android developers to know about these computing frameworks. In order to develop an interest in this area of interest and gain a perceptual understanding, you can choose a more mature framework, such as TensorFlow Lite, and develop an object detection Demo on your mobile phone to practice.

In real projects, TensorFlow Lite and Caffe2 are often slow. Comparisons of the various computing frameworks can be easily found on the web. To really get started, I recommend using the NCNN framework here. NCNN is Tencent’s open source computing framework with obvious advantages, such as clear code structure, small file size and high operation efficiency. It is highly recommended that interested Android developers read the source code several times, which will not only help to understand the algorithms commonly used in deep learning, but also help to understand the computing framework of mobile machine learning.

Read NCNN source code to grasp the three most basic data structures, Mat, Layer and Net. Where, Mat is used to store the value of the matrix, and each input, output and weight in the neural network are stored by Mat. A Layer actually represents an operation, so every Layer must have a forward function. All operators, such as convolution and LSTM operations, are derived from a Layer. Net is used to represent the entire network, combining all data nodes and operations.

In reading NCNN source code at the same time, I suggest that you can also see some introductory information about the convolutional neural network algorithm, will help to understand.

2. Perform performance optimization

In a real project, if a path becomes a time bottleneck, it is usually possible to implement that node with the NDK. But in general, the computing framework for mobile machine learning is already implemented in the NDK, and the direction of improvement is to optimize with ARM NEON instruction assembler.

NEON is A 128Bit SIMD (Single Instruction%2CMultipleData) extension structure applicable to armCortex-a series processors. The key to performance optimization with ARMNEON’s instructions is to have more data than one instruction. As shown in the figure below:

The two operands of the NEON are 128 bits each, and each contains four 32-bit registers of the same type. With one instruction as shown below, four 32-bit data can be calculated in parallel, thus improving performance.

VADDQ.S32 Q0,Q1,Q2

We used to implement the PixelShuffle operation in the deep learning algorithm with THE NDK method, and then adopted the ARMNEON%2B assembly optimization method, which improved the calculation efficiency by 40 times, and the effect is very significant.

If you want to further improve the performance of the calculation, you can use the Int 8 quantization method. In deep learning, most operations are performed based on Float 32. Float 32 is 32Bit. A 128Bit register can store 4 Float 32 data at a time. In comparison, Int 8 can store 16 pieces of data. Combined with the previously mentioned single instruction with multiple data, if the type of Int 8 data is used, a single instruction can perform 16 Int 8 data at the same time, thus greatly improving parallelism.

However, the quantization of Float 32 to Int 8 will inevitably affect the accuracy of the data, and the quantization process is also time-consuming, these are the areas to pay attention to. For those interested in quantitative methods, you can read this paper.

If the device has a GPU, you can also use OpenCL for GPU acceleration, such as Xiaomi’s open source mobile machine learning framework MACE.

Algorithmic techniques for mobile machine learning

For those of you who are just beginning to learn algorithms, I always advocate that you should not imagine the algorithm too complicated or too mathematical, otherwise it is easy to be intimidated. Mathematics is the expression of thinking logic, and we want to use mathematics to help us understand algorithms. We need to be able to get an intuitive understanding of the algorithm, to intuitively know and understand why the algorithm works the way it does, and only then can we really master the algorithm. On the other hand, if you just remember the math and don’t have an intuitive understanding of an algorithm, you won’t be able to apply it flexibly.

1. Algorithm design

Image processing of deep learning has a wide range of applications, and is relatively intuitive and interesting. It is suggested that you can start from image processing of deep learning. Deep learning image processing, the most basic knowledge is convolutional neural networks, so you can learn convolutional neural networks first. There’s a lot of stuff on the Internet about convolutional neural networks, so I won’t go into it here.

The key to understanding convolutional neural networks is to understand how convolutional neural networks learn, to understand why they learn. To understand this, you first need to identify two key points: forward propagation and back propagation. After the structure and activation function of the whole neural network are determined, the so-called “training” or “learning” process is actually constantly adjusting each weight of the neural network to make the calculation result of the whole network converge to the expected value (target value). Forward propagation is calculated from the input end, and the goal is to get the output result of the network, and then compare the output result with the target value to get the result error. Then, the result error is propagated back along the network structure and disassembled to each node to get the error of each node, and then the weight of each node is adjusted according to the error of each node. The purpose of “forward propagation” is to get the output, and the purpose of “back propagation” is to adjust the weight by back propagation error. It is hoped that the output results are consistent with the target values by alternating the two iterations.

Now that we understand the CONVOLUtional neural network, we can implement an example of handwriting recognition. After mastering, you can continue to learn object detection algorithms in deep learning, such as YOLO, FasterR-CNN, etc. Finally, you can write them all by hand with TensorFlow, running training data and adjusting parameters, and understand them by hand. In the learning process, we should pay special attention to the design and solution of the algorithm model.

2. Effect optimization

Effect optimization refers to the improvement of the accuracy of the algorithm model and other indicators. The usual ways are as follows:

Optimization of training data;
Optimization algorithm design;
Optimizing model training

Optimizing training data

Because the algorithmic model learns from the training data, the model cannot learn patterns outside of the training data. Therefore, care must be taken in selecting the training data to include the patterns that will occur in the actual scenario. Carefully selected or annotated training data can effectively improve the effect of the model. Training data labeling is so important to effectiveness that there are startups that specialize in data labeling and have received a lot of funding.

Design of optimal algorithm

Better algorithm models are adopted according to the problem, deep learning models are adopted instead of traditional machine learning models, and models with higher feature expression ability are adopted, such as residual network or DenseNet to improve the feature expression ability of the network.

Optimizing model training

The optimal model training method includes which loss function to use, whether to use regularization terms, whether to use Dropout constructs, which gradient descent algorithm to use, and so on.

3. Calculation optimization

Although we have done a lot of work on the framework side to improve the calculation performance, if the calculation amount on the algorithm side can be reduced, the overall real-time calculation will also be improved. From the perspective of model, there are two ways to reduce the computation, one is to design a lightweight network model, the other is to compress the model. Both academia and industry have designed lightweight convolutional neural networks to greatly reduce the computational cost of the model on the premise of ensuring the accuracy of the model. The poster child for this approach is Google’s MobileNet, which, as its name suggests, is designed to target the network structure used by mobile devices. MobileNet divides the standard convolutional neural network operation into Depthwise convolution operation and Pointwise convolution operation. Firstly, Depthwise convolution is used for the convolution operation of each input channel respectively, and then Pointwise convolution is used to realize the fusion of information between each channel.

The model of compression

Model compression includes two methods: structure sparseness and distillation.

In the logistic regression algorithm, we introduce regularization so that the coefficients of some features are approximately 0. In the convolutional neural network, we also hope to introduce regularization so that the coefficient of the convolution kernel is approximately 0. Different from ordinary regularization, in structural sparsity, regularization is expected to achieve structural sparsity. For example, the coefficients of convolution kernels of several channels are all approximately 0, so that the convolution kernels of these channels can be pruned off to reduce unnecessary computational overhead.

Distillation method has the meaning of transfer learning, which is to design a simple network and make the simple network have the approximate representation ability of the target network through training, so as to achieve the effect of distillation.

Android development opportunities for students

Computing framework and algorithm of mobile machine learning. The former is responsible for the performance of model calculation and reduces the time cost; The latter is mainly responsible for the accuracy of the model, and can also reduce the calculation amount of the algorithm through some algorithm design, so as to achieve the purpose of reducing the time cost.

It is important to note that in mobile machine learning, the training of the algorithm model is usually done on the server side. At present, the terminal equipment is usually not responsible for the training of the model. When in use, the terminal device loads the training result model and performs forward calculation to get the calculation result of the model.

But with so many industry trends and basic techniques of machine learning mentioned above, how do you get into this “hot” field for mobile developers? Mobile terminal machine learning is a field of edge intelligent computing, and mobile terminal development is a field that Android developers are particularly familiar with, so this is also a development opportunity for Android developers to transition into the edge intelligent computing field. Android development students can give full play to their technical professional advantages, first in the edge computing terminal equipment program development to have a firm foothold in the future technical division of labor system; At the same time, the deep learning algorithm is gradually learned, so as to make a step forward in the future, enter the field of edge intelligent computing, and create higher technical value.

In most cases, Android developers do not have a competitive advantage in the field of deep learning algorithms compared with professional algorithm students, so we must not give up our expertise in terminal device development experience. For most Android developers, “specializing in Android development and understanding deep learning algorithms” is the pose to create the greatest value in the future technical division of labor.

As for the learning path, I suggest that Android developers should first learn the basic knowledge of convolutional neural network (structure, training and forward computing), then read and learn the NCNN open source framework, master the optimization method of computing performance, and master the development technology. At the same time, it can learn algorithm technology step by step, mainly learning various common deep learning algorithm models, and focusing on the lightweight neural network algorithm that has emerged in recent years. In short, Android development students should focus on mastering the development technology and algorithm technology to improve the real-time computing, and take into account the learning of deep learning algorithm model.

Based on the above description, I have outlined the big picture of mobile machine learning technology for your reference. The red circle in the figure is what I recommend Android developers to master.

Since the edge computing has computing power, in many application scenarios, there will be a need for intelligent computing. The two are combined together to form the edge intelligent computing.

At present, the application technologies of mobile machine learning mainly focus on image processing, natural language processing and speech processing. In terms of specific applications, including but not limited to object detection of video images, language translation, voice assistant, beauty, etc., there is a huge demand for applications in autonomous driving, education, healthcare, smart home and Internet of things. At present, commercial deep learning technologies are mainly as follows:

TF_lite Google ARM Caffeine 2 Facebook ARM TF_lite Xiaomi ARM, DSP, GPU Paddle – Mobile Baidu ARM, GPU FeatherCNN Tencent ARM NCNN Tencent ARM 1. Because deep learning algorithm has many specific algorithm operators, in order to improve the efficiency of mobile terminal development and model deployment, various manufacturers have developed computing frameworks for mobile terminal deep learning, such as Google’s TensorFlow Lite, Facebook’s Caffe2, etc. It’s important for Android developers to know about these computing frameworks. In order to develop an interest in this area of interest and gain a perceptual understanding, you can choose a more mature framework, such as TensorFlow Lite, and develop an object detection Demo on your mobile phone to practice.

In reading NCNN source code at the same time, I suggest that you can also see some introductory information about the convolutional neural network algorithm, will help to understand.

In a real project, if a path becomes a bottleneck in time cost, it can usually be implemented using the NDK. But in general, the computing framework for mobile machine learning is already implemented in the NDK, and the direction of improvement is to optimize with ARM NEON instruction assembler.

NEON is A 128Bit SIMD (Single Instruction%2CMultipleData) extension structure applicable to armCortex-a series processors. The key to performance optimization with ARMNEON’s instructions is to have more data than one instruction. As shown in the following figure, the two operands of the NEON are 128 bits each, and each contains four 32-bit registers of the same type. With one instruction shown in the following figure, four 32-bit data can be calculated in parallel, thus optimizing performance.

VADDQ.S32 Q0,Q1,Q2

If the device has a GPU, you can also use OpenCL for GPU acceleration, such as Xiaomi’s open source mobile machine learning framework MACE.

For students who just start to learn algorithms, I always insist that the algorithm should not be too complex or too mathematical, or it will easily make people feel intimidated. Mathematics is the expression of thinking logic, and we want to use mathematics to help us understand algorithms. We need to be able to get an intuitive understanding of the algorithm, to intuitively know and understand why the algorithm works the way it does, and only then can we really master the algorithm. On the other hand, if you just remember the math and don’t have an intuitive understanding of an algorithm, you won’t be able to apply it flexibly.

1. Algorithm design Deep learning image processing has a wide range of applications, and is relatively intuitive and interesting. It is suggested that you can start from the image processing of deep learning. Deep learning image processing, the most basic knowledge is convolutional neural networks, so you can learn convolutional neural networks first. There’s a lot of stuff on the Internet about convolutional neural networks, so I won’t go into it here.

2. Effect optimization Effect optimization refers to the improvement of the accuracy of the algorithm model and other indicators. The usual ways are as follows:

Optimization of training data; Optimization algorithm design; Because the algorithmic model learns from the training data, the model cannot learn patterns outside of the training data. Therefore, care must be taken in selecting the training data to include the patterns that will occur in the actual scenario. Carefully selected or annotated training data can effectively improve the effect of the model. Training data labeling is so important to effectiveness that there are startups that specialize in data labeling and have received a lot of funding.

The optimization algorithm design adopts better algorithm model according to the problem, adopts deep learning model instead of traditional machine learning model, and adopts models with higher feature expression ability, such as residual network or DenseNet to improve the feature expression ability of the network.

Optimizing how the model is trained includes which loss function to use, whether to use regulars, whether to use Dropout constructs, which gradient descent algorithm to use, and so on.

Although we have done a lot of work on the framework side to improve the calculation performance, if the calculation on the algorithm side can be reduced, the overall real-time calculation will also be improved. From the perspective of model, there are two ways to reduce the computation, one is to design a lightweight network model, the other is to compress the model. Both academia and industry have designed lightweight convolutional neural networks to greatly reduce the computational cost of the model on the premise of ensuring the accuracy of the model. The poster child for this approach is Google’s MobileNet, which, as its name suggests, is designed to target the network structure used by mobile devices. MobileNet divides the standard convolutional neural network operation into Depthwise convolution operation and Pointwise convolution operation. Firstly, Depthwise convolution is used for the convolution operation of each input channel respectively, and then Pointwise convolution is used to realize the fusion of information between each channel.

Model compression model compression includes two methods: structure sparseness and distillation.

Android development students have the opportunity to mobile terminal machine learning computing framework and algorithm, the former is responsible for the performance of model computing, reduce the time overhead; The latter is mainly responsible for the accuracy of the model, and can also reduce the calculation amount of the algorithm through some algorithm design, so as to achieve the purpose of reducing the time cost.

Insert the picture description here

Markdown has selected 5169 words 105 lines current line 105, current column 63HTML 4783 words 71 paragraphs

“Android Development Master” study notes 1