The Heart of machine original, author: Qiu Lulu.

From the advent of regression analysis to the boom in deep learning, the evolutionary path of this algorithm is not so much “machines replacing people” as “machines helping people do things we’re not good at.” The “bad” list includes “not good at finding rules from a large amount of data”, “not good at optimizing a large number of variables at the same time”, and “not good at extracting features from high-dimensional data”. Today, another group of researchers is asking whether humans are “bad at model design and tuning” and how machines can help. In the past two years, companies represented by Google have once again brought this kind of problem to the public’s attention under the name of AutoML, trying to explore whether this technology can enable more industry experts to overcome the engineering and algorithm barriers, and complete the development of deep learning algorithms with the help of machines with only professional knowledge and data accumulation.

In China, Tantincube is one of the companies holding such a vision. Founded at the beginning of this year, the company is committed to developing a platform with “autonomous model design” capability — DarwinML — based on evolutionary algorithms to find an “evolutionary path” that models do not rely on artificial design, so as to lower the application threshold of artificial intelligence. So that IT personnel and industry experts of various industries can more easily apply artificial intelligence to various suitable and needed scenarios, and solve the common problems of talent shortage and technical ability deficiency.

In July, Machine Heart interviewed TANGUcube’s two founders, CTO Qian Guangrui and product director Song Yu, to learn more about AutoML, which is headed toward practical applications.

Heart of the Machine: What problems does TCC hope to solve with AutoML?

Song Yu: AutoML is not a new concept. It has gained wide attention in the past two years because we can see that the distribution of data set itself is closely related to the model. Applying a model that works very well in a paper to a particular scenario, the performance of the model will be severely degraded. Two or three years ago, model improvements were more common in the form of “hyperparametric tuning.” From the rule-based method to the Bayesian method, various attempts have been made to find a reasonable solution space that approximates to the optimal solution. A number of automated tools for hyperparameter tuning have emerged. And then it turns out that in addition to superparameters, sometimes you have to change the network structure and so on. Finally, people began to wonder if machines could design models.

When we were optimizing the model before, we also felt that the most limited resource was human time. Therefore, since last year, we have tried to abstract the process of model optimization as much as possible into a pure mathematical problem, and then use the powerful computing power of the machine to search for the global optimal solution in the limited time space by means of search and fitting. With AutoML as its main direction, TCC today aims to solve the problems of automatic design and optimization of artificial intelligence-related application models in the actual environment.

Heart of the Machine: What categories does AutoML come in? Which of these does the Probe Wisdom Cube choose?

Qian: In the industry, there are about three AutoML algorithms that learn from the head. In addition to evolutionary algorithms, there are the earliest pure reinforcement learning algorithms represented by Google. At present, the efficiency of the evolutionary algorithm is higher than that of reinforcement learning. In addition, there are also meta-learning methods currently being explored.

Evolutionary algorithms themselves have many branches, such as “evolutionary strategy” and “evolutionary method”. While Google uses an evolutionary approach, OpenAI uses an “evolutionary strategy” algorithm. TCC’s “DarwinML Platform” is an AutoML system based on an “evolutionary” algorithm type.

Of course, for AutoML methods that do not require learning from the “head,” there is also a preset model library from which models can be selected for optimization or migration.

Heart of the Machine: What is the focus of RoboCube in AutoML?

Song Yu: We are different from other academic institutions in that we hope to provide model interpretability from the perspective of AutoML.

Interpretability research today is more about letting researchers output the weights or activations of the intermediate feature extraction layers, studying what the effects of each layer are, and then conveying their observations to the machine. But human time is limited, and we want machines to do it themselves.

We want the machine to make its own summary of what “feature extraction” actually extracts. Find why a particular combination extraction method to calculate the loss function and effect on specific data distribution, can let the gradient descent faster, the loss is small, then explained that knowledge into a machine understandable digital expression of feedback system, back to attribute, provide guidance for the next design, improve design efficiency.

In other words, we’re also trying to see if our modeling brains can get smarter, and can we design models for similar problems or data types quickly from four days to 80 percent accuracy to a day or even a few hours? Is it possible to get the first generation of models into a solution space that is very close to the optimal solution without taking too many detours?

Qian: Most of the research on AutoML in the academic world focuses on the methodology itself, while we pay more attention to how to integrate the actual data of AutoML users into the project.

We developed “DarwinML Platform” to help a large number of users learn their own data and train their models, but also make the platform smarter and more efficient to help users solve practical problems.


Heart of the Machine: Who is the target audience for DarwinML platform? What does a user need to do to complete a task? What does the platform do?

Song Yu: DarwinML platform is a life-cycle management platform. The ultimate goal of this platform is to become an automatic model development platform, so that business personnel, or those who are not specialized in AI research, can also design a model that conforms to business objectives.

All the user needs to do is to prepare the data, and make some calculation force and desired effect choices: upper limit of calculation force, model accuracy requirements, maximum training time, etc. The algorithm have experienced users to understand and model design, can also in the process of model design parameters, such as algebra, evolutionary algorithm is one of the biggest evolution model maximum depth expectations set, can even do not need to in the design of dynamic weed out the model of “gene”, adjust the different changes of the operating mode of the proportion of evolutionary operators, etc.

The rest of the data cleaning, model design, training, tuning, evaluation, and reasoning are all automated evolutions on the platform.

Heart of the Machine: Can you describe the process by which the platform completes a task?

Song Yu: First of all, “DarwinML platform” will extract the statistical information of the data, and set an initial condition of evolution according to the statistical information, including the number of population of the model, the maximum algebra of evolution iteration, distribution of distributed computing resources, and the limit of computing power set by users, etc.

The platform then starts to automate the model design. At each generation of evolution, the model is evaluated once to select the direction of further evolution, and at the same time to prevent the whole population, that is, all the models from prematurity (avoid reusing the same or similar models with better results in the earlier stage, and try to make the selection in the solution space decentralized).

When the evolution reaches the customer’s accuracy or time limit requirements, DarwinML platform will fix the model structure to conduct a fine tuning of parameters, and at the same time carry out local optimization of hyperparameters.

Finally, at the end of the whole process, the characteristics, hardware configuration and performance indicators of the model are returned to the user in the form of a report.

Heart of the Machine: What are the basic building blocks of the model?

Qian: Our platform is called DarwinML because its core algorithm is an evolutionary algorithm. The basic idea of evolutionary algorithm is the same as simulating the biological evolution process in nature.

Just as animals start from single-celled animals and generate new individuals through cross and mutation of DNA during reproduction, our model design also starts from the initialization of the first generation of model population, keeping the population size basically unchanged and generating better models through generation by generation evolution.

The platform has the same evolutionary core algorithm for machine learning and deep learning, but the basic building blocks (we call them genes) for machine learning and deep learning are different. Therefore, genes are the most basic building blocks in the DarwinML platform design process.

The “gene” of machine learning model includes some data preprocessing operations such as clustering and Imputer, as well as more than 40 basic machine learning operations related to regression and classification, with more than 120 variants. The “gene” library of deep learning is larger and more complex, including different neuron genes of deep learning, such as convolutional module, LSTM module, pooling, full connection layer, etc. And when some “genes” are combined to form larger modules with excellent performance, they can be fixed and become a new “gene”. This has more than one hundred and twenty gene Banks for initialization of deep learning, and evolve to appear more complex “big” gene pool, can achieve set for the structure of machine learning, deep learning model variability as much as possible, modules, as thin as possible, make us able to explore some beyond the field of human imagination, find some and experience different model structures known to mankind.

Heart of the Machine: What is the initial generation process of the model?

Qian: There are two sources of models in the initial population. One is that DarwinML automatically generates some models randomly selected from “genes” according to the data distribution, and we also support the evolution from some “excellent” initial models provided by users themselves.

The initialization process of the model includes a series of initialization operations similar to “dice throwing”, which randomly generates a series of models according to the predefined depth and the number of overall neurons. Of course, the DarwinML platform will also learn from previous training models and combine user-input parameters to transform the initialization problem from a simple dice roll to a problem of generating distributions based on the characteristics of the data. For example, based on previous experience, the system will determine that a model of less than 20 layers is likely to achieve good results; Or how to design a model with no more than 50 neurons in order to consider the performance of the model and take into account the low latency of deployment.

Heart of the Machine: Can you describe the evolution of the model?

Qian: DarwinML will train, evaluate and rank the first generation of randomly generated models. The probability that each gene will be “passed on” is then selected according to the “survival of the fittest” principle of evolutionary algorithms: the best individuals will theoretically have the greatest chance of producing new offspring, ensuring that the resulting individuals are of higher and higher quality.

There are several ways to generate a next-generation model from a first-generation model:

One is mutation, in which a few “genes” in a generation model are replaced. Or delete or copy a whole layer from the model. The crossover or heredity operation, for example, two generation models are divided into three parts. The middle part of model A is removed and the middle part of model B is replaced. Third, in order to ensure diversity, random operation will continue to generate some, and new models will be generated randomly in the same way as the first generation.

At the same time, DarwinML platform also introduced three methods based on Bayesian, Monte Carlo tree search (MCTS) and reinforcement learning to guide the genetic algorithm to search. The Bayesian method is used to calculate the probability distribution of the effectiveness improvement of the next generation model. MCTS provides the design of possible reasonable models based on the results generated by tree search, while reinforcement learning is not completely limited to the model change itself, but updates the Q-table according to the specific evolutionary process, the model score and other input information. Guide the generation of more rational models and evolution direction.

The time of evolution generation is determined by the amount of data and computing power. It takes about 10-20 minutes to evolve a generation on four Gpus for a CIFAR data scale deep learning model. A machine learning model with 4 million pieces of data takes about 5-6 hours to evolve a generation on 100 CPU cores.

In terms of evolutionary algebra, machine learning models usually have about 15-20 models in one generation, and no more than 20 generations. Deep learning models are typically in the 30-40 generation or more. In general, the more complex the model, the more model population and evolutionary algebra are required.

Heart of the Machine: Can you visually show the evolution of DarwinML platform?

Song Yu: The figure above is a relationship diagram of evolutionary process model based on CIFAR-10 data set. Each circle represents a model, and the size of the circle represents the performance of the model. The larger the circle, the better the performance. The position of the circle from the center represents the algebraic evolution of the model. The farther the circle is, the later the model is generated. Red represents the model generated by Random operation, blue represents the model generated by Heredity operation, and green represents the model generated by Mutation operation. From this graph, we can see that the algorithm is very efficient and very directional towards better direction. Meanwhile, in the lower left corner of the figure, we can find that the performance of the following generations of models is relatively stable, and most of them come from the same parent. In some complex data, more complex model “family atlas” will clearly tell us the process of model evolution. Combined with the characteristics of the model, we can better study which “excellent” modules (structures) are inherited from generation to generation, providing data support for the interpretability of the model. (below)

Song Yu: Based on the same risk analysis data, the following two graphs are calculated by the DarwinML platform and designed by automatic evolution. The first model POP3 is the best in the third generation, and the second model POP8 is the best in the eighth generation.

The POP3 model achieved 98 percent accuracy, while pop8 achieved 99 percent accuracy. But comparatively speaking, the network structure of POP3 is much more complex. This is an example of an evolutionary process guided by the addition of a penalty term. Even if the loss function or accuracy is similar, the final score difference between simple model and complex model will be larger. The introduction of penalty terms allows the system to design more efficient networks rather than more complex ones.

Also, this is a classification task, but the machine incorporates regression algorithms into its model design process. Instead of limiting the number of genes because the ultimate goal is classification, the machine looks for the best solution on a much wider scale. Here, the machine decided that it would be better to extract features using regression methods and hand them over to the classifier.

Another interesting phenomenon for Pop8 is that Quantile Transformer, a data preprocessing method, is used before the classifier and SVC algorithm, but not before the K neighbor algorithm. This is not a manual rule, but a rule that the machine has learned through extensive training.

From the perspective of a human engineer, we know that converting continuous variables into discrete variables is beneficial to the classifier and SVC algorithm, especially in the case of high dimension and scattered data, which is a good means to avoid overspill. However, the core of K proximity algorithm is the calculation of distance, so the quantification of data will introduce unnecessary noise, resulting in data distribution deformation.

This is our post-mortem analysis of the model, and the model itself was completely designed by machines in the process of evolution, without the existence of artificial preset information or structure.

Heart of the Machine: Is there an industry focus for DarwinML system platform?

Song Yu: we hope to serve all kinds of industry platform, so our application case is also a cross-industry: we do in the financial sector risk control related model, in the field of insurance do, in the field of medical image recognition model of medical electronic archives and the professional terminology speech recognition, the manufacturing yield of quality supervision, inspection and analysis.

The DarwinML platform aims to lower the technical threshold for AI across all industries. In any industry, with enough data, business people, not algorithmic experts, can “AI ground” the business. We hope to build a complete ARTIFICIAL intelligence ecosystem based on *DarwinML *, * including enterprise customer service team, shared cloud platform and developer community, to meet enterprise customers’ expectations and demands for ARTIFICIAL intelligence from various dimensions.

Heart of the Machine: Why did you choose to start your own business from the AutoML perspective?

Qian Guangrui: I work in the phase of the research is closely related and evolutionary algorithm and high performance computing, the main work is to use the large-scale computing method in high temperature and high pressure of human completely unknown material field design to explore, we were using the evolutionary algorithm combined with the first principles calculation, the development is still the world’s leading material structure prediction software packages. In IBM, Song Yu and I designed an ARTIFICIAL intelligence platform product for enterprise users.

Song Yu: My early background was in HIGH performance computing and low-level databases, and later I designed a high-level modeling platform that supported major deep learning frameworks such as Torch, Caffe, and TensorFlow. In the process, we found that the design of the model was the most troublesome part. At that time, we used many methods of hyperparameter tuning to try to find a better model in a short time with limited computational power, but the model design is still very troublesome.

We believe that there are two areas in AI that are worth a lot of human effort: the design of the loss function itself, and how to use the AI model in a particular application domain. In addition, model design and model tuning are time-consuming and meaningless tasks. That’s what data scientists and algorithm scientists do today, but over time, we find that models are very similar, and once we boil it down to a differentiable, differentiable mathematical problem, machines can “design” and “find the best” as fast as humans.

The last ImageNet contest was an interesting example. The winning model of that year was not a model with a new structure, but hikvision used a lot of computing power to complete a fine tuning. This proves that, given a one-year time limit, machines have outperformed humans when it comes to adjusting parameters. In terms of hyperparameter adjustment and even model structure design, the principle is the same. We firmly believe that these are all tasks that machines can help people complete or even do better than people.