This article was originally published by AI Frontier.
Next-generation technology: Fei-Fei Li built Cloud AutoML with it, backed by Andrew Ng


Editing & Planning | Natalie


Yao Jialing, Debra

“Mobile learning is the core technology behind Google Cloud AutoML, which aims to enable AI for all. It is also another tool that Ng is pushing for the commercialization of machine learning at NIPS 2016. Is Google’s “non-coding transfer learning” really that good? Did you know that Microsoft launched a similar service, Custom Vision, eight months ago? Today we’re talking about the potential next wave of technology, and Cloud AutoML, which is sweeping the screen today.”


Today’s announcement of Google’s Cloud AutoML, which aims to enable AI for all, has made all the tech headlines and everyone’s moments. In the early hours of the morning, Li feifei tweeted three times about Google’s new AI product, Cloud AutoML Vision, “everyone can customize machine learning models without being proficient in machine learning.”

AutoML Vision, the first service from Cloud AutoML, offers automated development of custom image recognition systems. According to Google, even those with no machine learning expertise will be able to easily build custom image recognition models by understanding basic modeling concepts. Simply upload your own tag data into the system, and you get a trained machine learning model. The entire process, from data import to markup to model training, can be done through a drag-and-drop interface.

In addition to image recognition, Google plans to expand AutoML into translation, video and natural language processing in the future.

Isn’t that amazing? Isn’t it awesome? Don’t you think that little white can turn over the machine learning engineer! Wait, don’t get too excited. This is great, but it’s not as simple as you might think.

AI Front took note of Transfer Learning, the core technology behind Cloud AutoML mentioned in Google’s official blog. By transferring the trained model (also known as pre-trained models, or pre-trained models) to the new model training process, Google can train the machine learning model with less data. Cloud AutoML Vision relies on the pre-trained model, It is the “big and good” image datasets ImageNet and CIFAR. In addition, Google automatically selects suitable models through the Learning2Learn function, and automatically adjusts parameters with Hyperparameter Tuning Technologies.

Similarly, at last year’s NIPS 2016 lecture, Ng said that “after supervised learning, transfer learning will lead the next wave of machine learning technology commercialization.”

So, with migrating learning at its core, could Cloud AutoML be the next machine learning killer?


Expert opinion

Cloud AutoML was so “shocked! Fierce! NB!” ?

AI Front posted a question on Zhihu: “How do you evaluate Google’s newly launched Cloud AutoML?” Most of the respondents felt sorry for Microsoft. Microsoft released almost the same service eight months ago (no coding, no tuning, dragging controls to help you train your deep learning model).

Grapeot said, I feel sorry for Microsoft S PR department. As a soft dog, I didn’t know about Custom Vision until today. Google didn’t even have a press conference, but two tweets made a media sensation. Judge, judge!” Another user mocked Google as a “top-notch advertising company.”

So on the AI front, “How good is Google’s Cloud AutoML?” And so on a series of questions consulted a few industry technical experts, get the reply quite a bit thought-provoking.

An expert from IBM told the AI front that the field is still so young that he doesn’t think it will really hit the ground running any time soon. Neural network training neural network development time is not long, so the effect of Cloud AutoML needs to be further tested through practice.

Another technical expert, who asked not to be named, believes that the first service Cloud AutoML is currently launching is for Vision. ImageNet data set is good and large enough, so in most cases it can indeed migrate a good effect, and Vision is now a relatively easy field to do, if it is NLP, CTR and other fields, It’s a lot harder. People now have a bit of a “Google dad is sure to do good” mentality, I have to say that Google PR ability is really good. , of course, through the study of migration implementation AutoML this matter itself does give practitioners very big imagination space, can break the data island, a lower cost to solve more problems, such as electricity business data to do the recommendation of traditional industry, or a new company no data but can do things with other company or industry data.

Google says AutoML Vision offers a clean, graphical user interface that allows you to build new models by importing data and dragging and dropping components, and press reports have pointed out that you don’t need to write a line of code. “It’s easy not to write code, but it’s hard not to write code to get good results,” the expert told THE AI front.

Fourth Paradigm is a company committed to using machine learning, transfer learning and other artificial intelligence technologies to extract the value of big data, and the co-founder and chief scientist of fourth Paradigm Yang Qiang is the founder and pioneer of the field of transfer learning, he has published more than 400 papers, papers have been cited more than 30,000 times.

After the launch of Cloud AutoML, many readers also expressed strong concerns about how the Fourth normal Form should be viewed. Therefore, AI Front also threw the question to Chen Dihao, the platform architect of Wevin in the fourth Paradigm. He gave a very detailed answer to our question as follows:

AI Frontier: What do you think are the biggest highlights of Google Cloud AutoML?

Chen Dihao: The biggest highlight of Cloud AutoML is to make the complete machine learning workflow into a Cloud easy-to-use product. Users only need to drag and drop sample data on the interface to complete the whole process of data processing, feature extraction, model training and so on. For the scene of image classification, it has achieved the ultimate in ease of use.

AI Front: How difficult will it be for Google to develop Cloud AutoML?

Chen Dihao: According to the introduction of Cloud AutoML, it is not difficult to develop a Cloud AutoML for image classification. By finetune the already trained Inception model on a new data set, a new model with good effect can be obtained. This is described in the Official TensorFlow documentation, and developers can even develop a “command line version of Cloud AutoML Vision” locally. Of course, Google has also introduced algorithms such as Learning to Learn and automatic neural network construction in previous papers. These algorithms have higher requirements for sample size and computing power, and are still in the research stage in the industry.

AI front: Cloud AutoML uses technologies such as transfer learning to make it easy for users to upload very little annotation data to generate their own models, but how good will the new models be? Can you explain it technically?

Chen Dihao: As mentioned earlier, CloudML AutoML does not disclose the details of the algorithm to generate the model. It may be to tune the model parameters based on Finetune, or to reconstruct the neural network model using the method of AutoML paper. At present, finetune is likely to be used. For example, to finetune the Inception model with TensorFlow, the user only needs to provide a very small amount of annotation data. First, the model parameters obtained after the official training on the ImageNet dataset are loaded. Then train the last layer of the neural network on the new data set, update the parameters of the Label and predicted value, and soon get an image classification model with an accuracy of more than 90%. Of course, it does not rule out that Google has used or will use AutoML’s algorithm in the future to retrain models using user-provided data sets and labeled data sets such as ImageNet, whose parameters are used to construct neural network structures. The goal of the model is to find the neural network structure with the highest accuracy of image classification. According to the results of the paper, with sufficient data and computing power, the effect of the model obtained by machine training is close to the top model designed by human beings, and the effect will not be too bad if it is applied to the Cloud AutoML scene.

AI Frontier: What impact do you think Cloud AutoML will have on the future development of ARTIFICIAL intelligence?

Chen Dihao: Google’s Cloud AutoML is just an application scenario of AutoML. Before this, companies such as Microsoft, Amazon, and The Fourth Paradigm in China have already had practical scenarios of AutoML. Cloud AutoML Vision only solves the modeling scene with lower threshold in the field of image classification, and has no revolutionary impact expected in other machine learning fields of State of the Art. Of course, the launch of Google Cloud AutoML has quickly attracted the attention of foreign countries to the construction of automatic machine learning model, which has provided a strong endorsement for the research and implementation of AutoML, and I believe that it can promote the better development of this field in the future.

AI Front: In your opinion, will Cloud AutoML help Google stand out from the crowd of cloud-based machine learning providers (Microsoft Azure, AWS, IBM, etc.)?

Chen Dihao: In my opinion, Google Cloud AutoML is not a universal machine learning solution, and it cannot directly eliminate Cloud machine learning platforms such as Microsoft and Amazon. Of course, we’re looking forward to the Google Cloud and Google Brain divisions working on AutoML. With the maturity and universalization of AutoML algorithm, there will be more low-threshold and user-close machine learning modeling paradigms in the future, which is also very good for the artificial intelligence industry.

AI Frontier: What’s the status of your company’s machine learning tools? Are there plans to launch services like Cloud AutoML in the future? Or are there other important directions?

Chen Dihao: I am currently working as an architect of Wevin platform in The Fourth Paradigm. Wevin 3.0 released at last year’s Wuzhen Internet Conference has integrated AutoML function. The self-developed FeatureGo automatic feature combination algorithm and open source automatic parameter tuning algorithm can realize the whole machine learning workflow from feature extraction, feature combination, model training, hyperparameter tuning and model on-line. At present, all models provided to users in our recommendation system are generated by AutoML algorithm. Training model to realize Learning to Learn on TensorFlow is also our focus. Besides, large-scale data splicing, timing feature extraction, model grayscale release, workflow visualization and self-learning closed-loop are practical business pain points. From the perspective of algorithm and product, we are committed to creating a machine learning platform with lower threshold and more ground than Google Cloud AutoML. We also welcome more exchanges with peers.

Sebastian Ruder, a well-known AI blogger and NLP PhD student at the National University of Ireland, wrote a post titled “Transfer Learning: The Next Frontier in Machine Learning”.


What exactly is transfer learning?

In the classic supervised learning scenario of machine learning, if we intend to train A model for A task and A domain, suppose we provide labeled data for the same task and domain. We can clearly see in Figure 1 that for our model A, the tasks and domains for training and testing data are the same. Later, we will give a detailed definition of a task and a domain. Now, let’s assume that a task is a goal that our model performs, such as recognizing objects in a picture; One field is our data source, for example, images taken in a San Francisco coffee shop.

Figure 1: Establishment of classical supervised learning in machine learning

Now we can train model A on this data set and expect it to perform well on invisible data of the same task and domain. In the other case, when given data for other tasks or domains B, we need to mark the data for the same tasks or domains again in order to train the new model B so that we can expect it to perform well on these data.

The classical supervised learning paradigm breaks down when we do not have enough marker data for the tasks or domains of reliable models that we are focused on to train.

If we want to train a model to detect pedestrians in nighttime images, we can apply a model that has been trained in a similar field, such as daytime images. In practice, however, we often experience performance deterioration or model breakdown as models inherit biases from training data and do not know how to generalize to new domains.

If we want to train a model to perform a new task, such as detecting a cyclist, we can’t even reuse an existing model because the markers are different between tasks.

Transfer learning enables us to process these scenarios with the markup data of some related task or domain that already exists. We try to store the knowledge gained from solving source domain tasks in the source domain and apply it to the problem we are interested in, as shown in Figure 2.

Figure 2: Transfer learning Settings

In practice, we try to transfer as much knowledge as possible from the source to our target task or domain. The form of this knowledge is determined by data: it can relate to how objects are put together so that we can identify new objects more easily; It can be about general words that people use to express their opinions, etc.


Why is transfer learning so important?

Andrew Ng, former Chief scientist at Baidu and a professor at Stanford University, said in his popular NIPS 2016 lecture that, after supervised learning, transfer learning will be the next driver of machine learning’s commercial success.

Figure 3: Andrew Ng at NIPS 2016 on transfer learning

He purposely drew a diagram on the whiteboard, which I reproduced as faithfully as possible in Figure 4 below (sorry, I didn’t mark the axes). According to Ng, transfer learning will be a key factor in machine learning’s success in the industry.

Figure 4: Andrew Ng’s introduction to the drivers of success in machine learning

Needless to say, the use and success of machine learning in the industry to date has been driven primarily by supervised learning. Driven by advances in deep learning, more powerful computing tools and large tagging datasets, supervised learning has reignited interest in AI, a wave of funding and acquisitions, and especially in recent years, we’ve seen the application of machine learning become part of our daily lives. This success could continue if we ignore the naysayers and another AI winter omen and trust Andrew Ng’s predictions.

What is less clear, however, is why, despite having been around for decades, there is currently little use in the industry, and whether the future will see the explosive growth Andrew Ng predicts. Indeed, transfer learning currently receives relatively little attention compared to other areas of machine learning, such as unsupervised and reinforcement learning, which are receiving increasing attention: Unsupervised learning — which, as you can see from Figure 5, is a key element in the quest for universal AI according to Yann LeCun — has seen a resurgence of interest, driven in particular by the generation of hostile networks.

In turn, reinforcement learning, led by Google DeepMind, has led to AlphaGo’s success and real-world successes, such as reducing cooling costs in Google’s data centres by 40 per cent. Both areas, while promising, are likely to have relatively little commercial impact in the foreseeable future and remain largely within the confines of cutting-edge research papers as they still face many challenges.

Figure 5: In the cake shown by Yann LeCun, there is clearly no transfer learning.


What’s so special about transfer learning?

Next, let’s take a look at what makes transfer learning different. In our view, they inspired Andrew Ng’s foresight and outlined why now is the time to focus on transferred learning.

At present, the application of machine learning in the industry is dualistic:

  • On the one hand, over the past few years, we’ve gained the ability to train more and more accurate models. We are now in the multitasking phase, where the most advanced models have reached a point where their performance is so good that it is no longer a hindrance to the user. How good? The latest residual networks on ImageNet achieve better performance than human in object recognition. Google’s Smart Reply automatically handles 10% of mobile replies; Speech recognition has a declining error rate and is more accurate than typing; We can automatically recognize skin cancer just like dermatologists do; Google’s NMT system is used to generate more than 10 translation language pairs; Baidu can generate realistic speech in real time; The list goes on and on. This level of maturity enables large-scale deployment of these models to millions of users and is already widely adopted.
  • On the other hand, these successful models are data-hungry and rely on large amounts of markup data to achieve their performance. For some tasks and areas, the available data has been painstakingly developed over the years. In a few cases, it is public, such as ImageNet, but large amounts of markup data are often proprietary or expensive, such as many voice or MT datasets, because they are the ones that create a competitive advantage.

At the same time, when machine learning is applied in an unfamiliar environment, the model faces many situations that it has never seen before and does not know how to deal with. Each customer and each user has their own preferences and owns or generates data that is different from the data used for training; A model is asked to perform a number of tasks related to, but not identical to, the task being trained on. In all of these cases, our most advanced models, while exhibiting human-like or even superhuman performance in the tasks and domains they receive, suffer significant performance losses or even outright collapse.

Transfer learning can help us deal with these new scenarios, and it is essential to enable machine learning to scale in task domains where marker data is scarce. So far, we have applied the model to a number of high-impact task domains, but these are mostly low-tree fruits of data, and we must learn to transfer the knowledge to new task domains for long-term development.


What are the other application scenarios of transfer learning?

Learn from simulation

I’m excited about the fact that I think transfer learning will become more about learning from simulations in the future. For many machine learning applications that rely on hardware to interact, collecting data and training models from the real world is either expensive, time consuming, or too dangerous. So it makes sense to collect data in other, less risky ways.

Simulation is the tool of choice in this regard and has been used in practice in many advanced machine learning systems. Learning from simulation and applying the acquired knowledge to practice is one of the application scenarios of transfer learning. Because the feature space between the source and target domains is the same (usually both depend on pixels), but the boundary probability distribution is different in the simulation and real world scenarios, although the difference diminishes as the simulation gets closer to reality, the objects and sources in the simulated scenario still look different. At the same time, because it is difficult to completely simulate all reactions in the real world, the conditional probability distribution of simulation and real world is not the same, for example, the physics engine cannot completely imitate the complex interactions of objects in the real world.

Figure 6: Google’s driverless car (source: Google Research blog)

However, learning from simulations also has the benefit of making it easier to gather data, because simulations can run multiple learning cases in parallel and train quickly while easily binding and analyzing objects. Thus, it can be the first choice for large machine learning projects that need to interact with the real world, such as self-driving cars (see Figure 6). According to Zhaoyin Jia, Google’s head of self-driving car technology, “if you really want to make a self-driving car, simulation is essential”. Udacity has open-source its nanoscale simulator for training self-driving car engineers, as shown in Figure 7. OpenAI Universe is also likely to use GTA5 or other video games to train self-driving cars.

Figure 7: Udacity’s self-driving car simulator (source: TechCrunch)

Another application where simulation learning will play a key role is robotics: training models on a real robot is too slow and expensive. Robots that learn from simulation and transfer knowledge to practice can alleviate this problem and have recently received a lot of attention [8]. Figure 8 is an example of a data manipulation task in a real-world and simulated scenario.

Figure 8: Robot and simulation images (Source: Rusu et al., 2016)

Finally, learning from simulation is an integral part of the path to general-purpose AI. Training an agent to implement general AI directly in the real world would be too costly, and the initial unnecessary complexity would impede learning. On the other hand, learning in a simulated environment is much more efficient, as shown in Figure 9 in commai-env.

Figure 9: ComMAI-Env for Facebook AI Research (source: Mikolov et al., 2015)


Adapting to a new domain

Although learning from simulation is a special case of domain adaptation, it is worth listing some examples of other domains of adaptation.

In the direction of computer vision, domain adaptation is a common requirement, because the information on the label is easily accessible, and the data we really care about is different, whether it is identifying the bike shown in Figure 10 or other objects in an unfamiliar environment. Even if the training and test data appear to be the same, they may contain subtle biases that are imperceptible to humans and cause the model to overfit.

Figure 10: Different visual domains (source: Sun et al., 2016)

Another common domain-adaptation scenario is to adapt to different text types: standard NLP tools, such as word class annotators or parsers, are often trained with news data such as the Wall Street Journal, which has been used to evaluate models since ancient times. However, models trained using news data have difficulty adapting to more novel forms of text, such as messages from social media.

Figure 11: Different text types

Even in a field like product reviews, people will use different words to express the same concepts. Therefore, the training mode of text using a type of review should be able to distinguish between the specialized words in the field and the words used by ordinary people, so as not to be confused by the transformation of the field.

Figure 12: Different topics

In the end, the above problems relate only to general text or image types, but can be magnified by extending them to other areas that are relevant to individuals or groups of users: such as automatic speech recognition (ASR) cases. Voice is expected to be the next big thing, with 50% of searches expected by 2020. Traditionally, most ASR systems are evaluated on the Swithboard data set, which consists of 500 speakers. Standard accents are fine, but the system struggles to understand the sounds of immigrants, people with accents, speech disorders or children. Now more than ever, we need systems that can meet the needs of individual users and minority groups to ensure that everyone’s voice is understood.

Photo 13: Different accents


Knowledge transfer across languages

Finally, another killer application of transfer learning, in my opinion, is applying knowledge gained from learning in one language to another. I’ve written about the cross-language embedding model. Reliable cross-language adaptation will enable us to take the vast amount of English tagging data we already have and apply it to any language, especially languages that are not commonly used and data resources are scarce. Given the state of the art, this still seems utopian, but recent advances such as zero-shot translation suggest we may be able to take this one step further.

In summary, transfer learning offers many exciting research directions, especially for many applications that require models that can transform knowledge into new tasks and adapt to new domains.

Google’s Cloud AutoML show, no matter how much PR is involved, is a good thing if it pushes the technology direction of migrated learning forward.

For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.