Artificial intelligence industrial application pain points and solutions

Share some dry about artificial intelligence small white | Python + + machine learning Matlab neural network theory + practice + + + depth video + courseware + source code, download attached!

In today’s news about AI and artificial intelligence, do you think the application of artificial intelligence has been popularized in every aspect of our life?

The fact is that although artificial intelligence has more applications in the fields of voice, image and NLP, it doesn’t actually have that many real applications compared to the apps that people have on their phones. Why does this happen? Has ARTIFICIAL intelligence already exploded, or is it far from bursting?

Chen Yuqiang, co-founder of The Fourth Paradigm and an expert on deep learning migration, gave us a detailed explanation at FMI2017 Artificial Intelligence Conference hosted by Peima.

Strong Chen Yu

The rise of artificial intelligence is a result of the expansion of data, the improvement of machine performance and the development of parallel computing.

What kind of system does industrial AI need?

According to Chen, Scalable systems are needed, and two layers of Scalable are known as traditional big data, and Scalable refers to machine learning, where data processing leads to increased volumes of machines. Another, more important, Scalable system is our machines, intelligence, quality of service, customer experience, etc., which increased as the volume of business increased and as the number of users increased. So why is this important?

Because it gives businesses a new way to grow. In the past, enterprises basically fought for their own channels, their own operations, their own markets and their own capital. In this case, whoever has the better capital, who has the better operations, you have more markets, you have more land. But as the racing enclosure phase comes to an end and growth reaches a certain level, racing enclosure is not a sustainable way of development. And now slowly by horse racing enclosure, need to be transformed into fine operation. In this case, whoever runs more efficiently and has a higher effect can grab more users and bring better results. And that’s a very high barrier, and AI can do that, because AI uses data, and data can’t be replicated. For example, even if you had all of Baidu’s code now, you wouldn’t have the same search engine as Baidu, because you don’t have the same search habits that everyone has had for the last 10 years.

For enterprises, there is a new way of growth or barrier. By establishing the ability of artificial intelligence, the advantages accumulated in time and data will become its barrier. That’s why AI is so popular.

How can you achieve a high Scalable system?

Chen Yuqiang mentioned in his speech: industrial big data needs high VC dimension model. VC theory describes the degree of computer artificial intelligence, describing the ability to fit complex functions, the higher VC means the model is smarter, the higher VC means the model is weaker.

As you can see from the figure above, in the case of the high VC dimension, as we continue to learn, the loss in training data is decreasing, but the loss you measure is decreasing first and then increasing.

For ai, because it can’t distinguish between good and bad data, the so-called low VC dimension model, when the data is not large enough, uses the stupid model, and as training increases, the test will perform better and better in the future. The other thing is that this data is not always small, and now with the development of the Internet, there’s more and more data, and in this case you find that the dumber model works much better than the better model.

To achieve a Scalable system, the industry needs a model with a high VC dimension. As the number increases, intelligence increases, user experience increases, and the product barrier is high.

So how do you get a high VC dimension model?

Chen Yuqiang summarized several general routines for us: machine learning = data + features + model.

The amount of data, given that the data is constant, what we can actually see is that we can look at this thing from two different perspectives, a macro feature and a micro feature. In addition, the model is divided into two parts, one is called simple model, simple model in academic language, such as linear model, and a complex model, nonlinear model, actually there are more models. We can see that this classification divides our machine learning ARTIFICIAL intelligence into four phenomena. The first quadrant is simple model plus microscopic features. In such a system, it is difficult for our artificial intelligence to play a good effect, because its VC dimension is relatively low, the effect is generally not very good.

▲ In the first quadrant, in the 1970s and 1980s, there was a relatively famous data, about a thousand data sets, each data set about 100 to 1,000 data, 1,000 data. Such traditional Chinese painting scientists in the past could not have developed a more complex model based on this data. So it was probably the first quadrant model that was being studied.

The second quadrant, the most famous representative of the industry is probably Google’s Adword2. Google is a pioneer in many fields. At that time, Google used hundreds of billions of features and hundreds of billions of training data to achieve an unparalleled effect on a linear model. Even now, when deep learning is all the rage around the world, this model is still a very, very excellent machine learning model. So the second quadrant has very successful applications in the industry, advertising for Google, advertising for Baidu, advertising for a lot of companies, creating tens or even hundreds of billions of dollars of value every year.

▲ The third quadrant is a complex model, macro feature situation, if you are familiar with, Microsoft’s Bing and Yahoo’s more important. In the third quadrant, the features are less complex, but the model is more complex, and you can get a higher VC dimension model.

▲ The fourth quadrant, complex model, microscopic characteristics, his high VC dimension model is very high, but the challenge is very big, because its model is too big, VC dimension is too high, so it is also a very hot research field.

How do I follow the model?

We can see that there are two paths, one is the feature path and the other is the model path, how do you follow the model path?

First of all, how do we make a complex set of models?

Academic leadership (ICML, NIPS, ICLR)

Kernel, Boosting, Neural Network;

* Most of the model can be loaded in a single machine;

* Solve data distribution problems and reduce overhead;

Industry customizes models for applications

* Hypotheses based on thought or observation;

By looking at the business and data inside our enterprise, we make some assumptions, which are generally the assumptions of the mathematical model, and we put these assumptions into the model in some way, and finally verify whether the assumptions are correct on the new data.

* Add new models and structures to add more parameters;

* Case study: Galileo;

How to follow features?

How to go along with features, this road is basically industry-led, because industry has its engineering implementation capabilities, its architecture is strong, so it needs to be efficient and parallel and ensure that it can do this quickly. Such as KDD, WWW, such as industrial combination of such work is more, this model is basically relatively simple and crude.

There is no universal model

All machine learning itself is biased, Chen said. Whether it’s deep learning, it’s always biased, and if we make more model assumptions, we need less data. However, if we use a simpler model, we need more data support and characterization.

Of course, different models have advantages and disadvantages, such as bias that can be wrong if it is too large. But there’s another way you can do it, you don’t make so many assumptions, you leave it to the data, you let the data learn it, and the good thing about it is, the easier your assumptions are, the less likely you are to make simple assumptions wrong, because you don’t have any assumptions. You need more data to help you fit out this complex feature.

So there’s no free lunch in industrial machine learning, it’s about making choices that are appropriate to the business problem, what model you choose for what business. Machine learning is not necessarily better than deep learning, so it is wise to make the right choices.

Application of machine learning challenges in industry

AI application platforms are needed

In addition to the XN in the picture, the intuitive idea is that you need an AI platform. Even though there are a lot of open source tools out there, we’re actually finding that they’re not enough.

Why hasn’t AI really reached every enterprise on a large scale?

This requirement is to say, if I want to make a success of AI system, I must be a AI expert, this requires us to the original architect, not only to understand their own architectural aspects of things, I also want to know the AI’s things, to do such a problem, the requirement is very high, which result in the AI is very difficult to be born.

Feature engineering: The process of finding the most critical features based on your model is called special engineering. Including feature cleaning, feature transformation, feature combination, and feature secondary engineering such things.

Feature engineering is very difficult, and it needs to be very different according to your model, and it needs to have a very deep understanding of your business. Therefore, the difficulty of feature engineering in the industry makes it impossible for many people to directly apply machine learning to artificial intelligence applications.

Chen Yuqiang told us that the fourth normal form wanted to do an automatic feature combination. After investigation, three ways were found:

1. Implicit feature combination; It mainly refers to feature combination in some ways that are not explicit feature combination, which is naturally friendly to continuous value combination features. Deep learning is a very typical work of implicit feature combination.

2. Semi-explicit feature combination; The main thing is that it looks explicit, but it’s not the way to do explicit feature combinations. This place is our number, it seems that each path is a feature combination, he said, referring to a pile of the combination of characteristic variables in a fixed interval values, but not to the combination of the characteristics of itself, he is the feature of effect is better, but it is the characteristics of the combination is just complicated, is not a true do feature combination way.

3. Explicit feature combination; Explicit feature combination is a very, very difficult problem, but it has the advantage of being superimposed, because it’s feature engineering, and this feature engineering can be applied to all the places where feature engineering is needed.

In addition, Chen yuqiang told us that the fourth normal Form recently proposed a FeatureGO algorithm, which is a method capable of making high-order feature combination. Under this system, we can achieve up to 10 order, we can achieve up to 16 order feature combination. Such an algorithm, based on the MCTS method, allows me to know what the probability is that a particular combination of features is likely to work better.

The computing power of artificial intelligence is also a very important part of artificial intelligence. Traditionally, you just say that the model you build is the most important part of artificial intelligence, but actually now, the model of artificial intelligence, the computing power, is very important.

Finally, Chen Yuqiang believes that replacing people with machines in the future is definitely a trend in the development of artificial intelligence in the industry in the future. From this point, there is still a lot of work to be done to continually lower the bar for user modeling.