I’m going to talk about these three topics today, one is cloud computing, one is big data, and one is artificial intelligence. Why do I want to talk about these three things? Because these three things are very, very popular now, and they seem to be related to each other. Generally, when we talk about cloud computing, we also talk about big data, and when we talk about artificial intelligence, we also talk about cloud computing. So it feels like they are mutually reinforcing and inseparable, and it might be difficult for non-technical people to understand how these three relate to each other, so it’s worth explaining.
Cloud computing was originally designed to achieve flexibility in resource management
Let’s start with cloud computing. The initial goal of cloud computing is resource management, which mainly covers computing resources, network resources and storage resources.
1.1 Managing a data center is like fitting a computer
What do we mean by computing, network, storage resources? Say you want to buy a notebook computer, do you want to care what kind of CPU this computer? How much memory is that? These two are called computational resources.
This computer to the Internet cafe, needs a front-end ports to Ethernet cable, or a wireless network card can connect the router in our house, you also need to operators such as China unicom, mobile, telecom launched a network, such as the bandwidth of the 100 m, and then there will be a teacher get a network cable to your home, the teacher may help you to your router configuration and their company’s network connection is good, All your computers, phones, and tablets will be connected to the Internet through your router. This is the web.
How big is the hard drive? The hard disks used to be very small, like 10GB, and then 500GB, 1TB, 2TB hard disks are not new. (1T is 1000 gigabytes), that’s storage.
That’s true for a computer, and it’s true for a data center. Imagine that you have a very, very large computer room with a lot of servers. These servers also have CPU, memory, hard disk, and Internet access through devices like routers. One of the questions is, how do the people who run the data centers manage all these devices in a unified way?
1.2 Flexibility means having everything when you want and as much as you want
The goal of management is to achieve flexibility in two ways. What are the two aspects? For example, a person needs a very, very small computer, only one CPU, 1 GB memory, 10 GB hard disk, 1 MB bandwidth, can you give him? Like this so small specifications of the computer, now any notebook computer is stronger than this configuration, the home to pull a broadband to 100M. However, if he goes to a cloud computing platform, when he wants this resource, he only needs a little bit.
So it’s flexible in two ways.
- The first aspect is that you want it when you want it, for example, when you need it. This is called time flexibility.
- The second aspect is how much? How many, want to need a tiny computer, for example, can meet, need a particularly large space, for example, cloud disk, for example, seems to cloud disk space easily give everyone distribution is big, big, upload space at any time, at any time to burn forever, this is called space flexibility.
Spatial flexibility and temporal flexibility, also known as the elasticity of cloud computing.
It took a long time to solve this problem of elasticity.
1.3 Physical Devices Are Inflexible
First of all, the first stage is the physical machine, or the physical device stage. In this period, the customer needed a computer, so we bought one and put it in the data center. Physical equipment, of course, is more and more powerful, such as servers, memory is always a hundred GIGABytes of memory, such as network equipment, a port bandwidth can have dozens of gigabytes or even hundreds of gigabytes, such as storage, in the data center is at least PB level (a P is 1000 T, a T is 1000 G).
However, physical devices are not very flexible. First of all, it can not achieve when you want when you want, such as buying a server, even if you buy a computer, there is time to purchase. All of a sudden, the user tells a cloud manufacturer that if they want to start a computer, if they use a physical server, it will be very difficult to purchase at that time. If the supplier has an average relationship, the purchase may take a month, and if the supplier has a good relationship, it will take a week. Users wait a week for the computer to arrive and then log in to start slowly deploying their applications, which is very time-inflexible. The second is the flexibility of the space is not good, for example, the user mentioned above, to a very small computer, now which has such a small model of computer ah. You can’t buy such a small machine just to satisfy users that only one gigabyte of memory is 80 gigabytes of hard disk. But if you buy a big one, because the computer is big, you’re going to charge the user more, and the user says he’s only going to use this little one, and it would be very unfair to ask the user to pay more.
1.4 Virtualization is much more flexible
Somebody figured it out. The first is virtualization. Don’t users just want a very small computer? The physical devices in the data center are very powerful. I can virtualize a small piece of physical CPU, memory, and hard disk for customers, and at the same time, I can virtualize a small piece of physical CPU, memory, and hard disk for other customers. Each customer can only see his own virtual piece, but in fact, each customer uses a small piece of the whole device. Virtualization technology can make the computers of different customers appear to be isolated. I look as if this disk is mine, but look at this disk, it is yours. In fact, my 10G and your 10G May fall on the same big, big storage.
And if the physical equipment is ready beforehand, virtualization software can create a virtual computer very quickly, basically in a few minutes. So if you want to create a computer on any cloud, it will come out in a few minutes.
This spatial flexibility and time flexibility are basically solved.
1.5 Making money and feelings in virtual world
In the stage of virtualization, best company is Vmware, earlier a company is to implement virtualization technology, can realize the calculation, network, storage virtualization, very cow, the company has done a very good performance, and virtualization software sells very well, too, earned a lot of money, then let EMC (the first brand of world top, storage vendor) to buy.
But there are still a lot of people with feelings in this world, especially among programmers. What do people with feelings like to do? Open source. A lot of software in the world is either closed source or open source, source is source code. It means that if a piece of software does well, everyone likes to use it, and the code of the software, I seal it off and only my company knows it, but no one else knows it, and if other people want to use the software, they have to pay me, which is called closed source. But there are always some great people in the world who don’t like the money and let the family make it. You can develop it, I can develop it. I can develop it without charging money, and share the code with everyone. Anyone in the world can use it and everyone can enjoy the benefits.
Tim Berners-Lee, for example, is a very thoughtful man these days. In 2017, he won the Turing Award for 2016 for “inventing the World Wide Web, the first browser, and the basic protocols and algorithms that enabled it to be extended.” The Turing prize is the Nobel Prize of computing. But he is most admired for making the technology of the World Wide Web, commonly known as the WWW, freely available to the world. Everything we do online today should be thanks to him. If he were to make money from this technology, he would be as rich as Bill Gates.
For example, in the closed source world you have Windows, where everyone has to pay Microsoft to use Windows, and in the open source world you have Linux. Bill Gates made a lot of money from Windows, Office and other closed source software, and became the richest man in the world, so he developed another operating system, Linux. Many people may not have heard of Linux, many background servers run on Linux, for example, we enjoy the Double 11, support the double 11 shopping system, whether Taobao, Jingdong, Kaola, are running on Linux.
If there is Apple, there is Android. Apple has a huge market cap, but the code for apple’s operating system is invisible. So someone wrote the Android mobile operating system. So you can see that almost every other phone manufacturer has Android in it, because Apple doesn’t open source, and Android is available to everyone.
The same goes for virtualization software, which is very, very expensive with Vmware. There are two open source virtualization software, one called Xen and one called KVM. If you don’t do technology, you can forget these two names, but I will mention them later.
1.6 Semi-automatic virtualization and fully automatic cloud computing
Virtualization software seems to solve the flexibility problem, but it’s not quite right. Because virtualization software generally create a virtual computer, is the need for artificial specify this virtual computer on which the physical machine, may also need to more complex artificial configuration, so using Vmware virtualization software, need to test a very cow certificate, can get the certificate, salary is quite high, also visible complexity. Therefore, the cluster scale of physical machines managed only by virtualization software is not particularly large, generally in a dozen, dozens, at most a hundred such a scale. On the one hand, time flexibility is affected. Although the virtual time of a computer is very short, the process of manual configuration becomes more and more complicated and time-consuming with the expansion of the cluster scale. On the other hand, it also affects the flexibility of space. When the number of users is large, the size of the cluster is far from reaching the desired level. It is likely that the resources will be used up soon, and we have to purchase them. Therefore, with the increasing scale of clusters, the number of servers starts from thousands, often tens of thousands or even tens of millions. If you look at BAT, including netease, Google and Amazon, the number of servers is very large. So many machines have to rely on people to choose a place to put this virtual computer and do the corresponding configuration, it is almost impossible, still need the machine to do this job.
Various algorithms have been invented to do this. The name of the algorithm is Scheduler. Popular point said, is to have a dispatch center, thousands of machines in a pool, no matter how many users need the CPU, memory, hard drive in the virtual computer, automatic dispatching center in a big pool to find a place to meet user needs, the virtual computer boot up completes the configuration, the user can directly use. At this stage, we call it pooling, or cloud. At this stage, we can call it cloud computing. Before that, we could only call it virtualization.
1.7 Private and public cloud computing
There are roughly two kinds of cloud computing, one is private cloud, the other is public cloud, and some people call the combination of private cloud and public cloud hybrid cloud, let’s not talk about this. Private cloud is the software deployment of virtualization and cloud in someone else’s data center, the use of private cloud users tend to be rich, oneself to buy land to build room, buy their own server, and then let the cloud vendor deployment in his here, Vmware later in addition to virtualization, also introduced a cloud computing products, and in the private cloud market get rich. A public cloud is deployed in the cloud virtualization and cloud software vendors themselves inside the data center, users don’t need a lot of investment, as long as the registered an account, you can click on a web page to create a virtual computer, such as AWS that amazon’s public clouds, such as domestic ali cloud, tencent, netease cloud, etc.
Why would Amazon do a public cloud? As we all know, Amazon used to be a big e-commerce company in foreign countries. When it was doing e-commerce, it would certainly encounter a scene similar to Double 11, where everyone rushed to buy things at a certain moment. The time and space flexibility of the cloud is especially needed when people rush to buy things. Because it doesn’t have all the resources available at all times, and that’s a waste. But also can’t prepare anything, looking at double 11 so many users want to buy things on the board. So create a large number of virtual computers to support e-commerce applications on Singles’ Day, and then release all these resources for other things. So Amazon needs a cloud platform.
However, commercial virtualization software is too expensive for Amazon to give all the money it makes in e-commerce to virtualization vendors. So Amazon developed its own cloud-based software, based on open-source virtualization technologies like Xen or KVM, as described above. Unexpectedly, Amazon became more and more successful in e-commerce and cloud platform. In addition, its cloud platform needs to support its own e-commerce applications, while traditional cloud computing manufacturers are mostly FROM IT manufacturers and almost have no applications of their own. Therefore, Amazon’s cloud platform is more friendly to applications and rapidly develops into the first brand of cloud computing, making a lot of money. Before Amazon announced its financial results for its cloud computing platform, people were speculating that if Amazon makes money on e-commerce, will the cloud also make money? As soon as the financial report was released, it was found that it was not generally profitable. Last year alone, Amazon’S AWS generated an annual revenue of 12.2 billion DOLLARS and an operating profit of 3.1 billion dollars.
1.8 Making money and feelings of cloud computing
Amazon, no. 1 in public Cloud, is having a good time; Rackspace, no. 2, is having a mediocre time. No way, this is the cruelty of the Internet industry, is the winner-takes-all model. So if the number two is not in the cloud computing industry, many people probably haven’t heard of it. The second person thinks, what if I can’t beat the boss? Open source. As mentioned above, although Amazon uses open source virtualization technology, the cloud code is closed source, and many companies that want to do but cannot do cloud platform can only watch Amazon make a lot of money. Once Rackspace makes the source code public, the entire industry can work together to make the platform better and better.
Therefore, Rackspace and NASA jointly created open source software OpenStack. As shown in the diagram, the architecture diagram of OpenStack is not for cloud computing industry, but you can see three keywords: Compute, Networking and Storage. It is also a cloud management platform for computing, network and storage.
Of course, the technology in the second place is also great. With OpenStack, as Rackspace imagined, all the big IT companies that want to do cloud have gone crazy. All the big IT companies that you can imagine, SUCH as IBM, HP, Dell, Huawei, Lenovo and so on, have gone crazy. The original cloud platform we all want to do, looking at Amazon and Vmware made so much money, looking at no way, want to do one seems to be quite difficult. Now, with OpenStack as an open source cloud platform, all IT vendors are joining the community, contributing to the cloud platform, packaging their own products and selling them with their own hardware. Some do private clouds, some do public clouds, OpenStack has become the de facto standard for open source cloud platforms.
1.9 IaaS, resource-level flexibility
As OpenStack technology becomes more and more mature, it can be managed on a larger scale and can be deployed in multiple OpenStack clusters. For example, one OpenStack cluster can be deployed in Beijing, two iN Hangzhou, and one in Guangzhou, and then managed in a unified manner. So the whole scale is bigger. At this scale, for the perception of ordinary users, they can basically do what they want when they want, as much as they want. Or take the cloud disk as an example, each user cloud disk has been allocated 5T or more space, if there are 100 million people, it adds up to how much space ah. Actually the mechanism behind is that the distribution of your space, you probably took only a few, such as it is assigned to you for 5 T, is only so much space you can see, not really give you, you actually took only 50 G, is real to you is 50 G, as your files are uploaded, give your space will be more and more. When everyone uploads and the cloud platform finds that it is nearly full (for example, 70% is used), it will purchase more servers and expand the resources behind them, which is transparent and invisible to users. In terms of feeling, it realizes the flexibility of cloud computing. In fact, it’s a bit like a bank, which gives the customer the impression that they can withdraw their money whenever they want, so long as they don’t run at the same time, the bank won’t collapse.
Here is a simple summary. At this stage, cloud computing basically realizes the flexibility of time and space, as well as the flexibility of computing, network and storage resources. Computing, networking, storage we call infrastructure Infranstracture, so resiliency at this stage is resource-level resiliency, cloud platforms that manage resources, we call infrastructure services, IaaS, Infranstracture As A Service.
Second, cloud computing is not only about resources, but also about applications
With IaaS, is it enough to achieve resource-level resilience? Apparently not. And the elasticity of the application layer. Here is an example, for example, to realize an e-commerce application, usually 10 machines are enough, but double Eleven needs 100 machines. You might think it’s easy, with IaaS, to create ninety new machines. However, the 90 machines are empty when they are created, and the e-commerce application is not put on it. It can only be handled one by one by the operation and maintenance personnel of your company, which takes a long time to install. Although resiliency is achieved at the resource level, it is not enough to be flexible without resiliency at the application layer.
Is there a way around this? Therefore, an additional layer is added on top of IaaS to manage the elasticity of applications over resources. This layer is commonly called PaaS (Platform As A Service). This layer is often difficult to understand, but there are basically two parts, one I call your own apps installed automatically, and one I call universal apps not installed at all.
Let’s start with the first part, automatic installation of your own application. Such as electric commercial application was developed by yourself, in addition to yourself, and others don’t know how to install, such as electric business applications, install, need to configure the alipay or WeChat account, to others when you buy things on your electricity, pay money is playing to your account, in addition to you, who also don’t know, so the process of installation platform can’t help busy, But to help you do the automation, you need to do some work to integrate your own configuration information into the automated installation process. For example, the 90 machines newly created on Double Eleven are empty. If a tool can be provided to automatically install e-commerce applications on the new 90 machines, the real elasticity of the application layer can be realized. Puppet, Chef, Ansible, Cloud Foundary can do this. The latest container technology, Docker, can do this much better.
In the second part, generic applications do not need to be installed. The so-called general application, generally refers to some of the high complexity, but everyone is in use, such as database. Almost all applications use databases, but database software is standard, and although installation and maintenance are complex, it is the same regardless of who installs it. Such applications can become standard PaaS layer applications on the cloud platform interface. When the user needs a database, one point is out and the user can use it directly. Some people ask, since who installs the same, that I come by myself, do not need to spend money to buy on the cloud platform. Of course not. Databases are a very difficult thing to do. Oracle alone makes so much money from databases. Oracle also costs a lot of money. However, most cloud platforms will provide open source databases such as Mysql, which is open source, so there is no need to spend so much money. However, to maintain this database, we need to recruit a large team. If this database can be optimized to support double eleven, it will not be able to be done every year or two. Such as you do is a bike, of course, there’s no need to hire a very large database team to do this thing, the cost is too high, should be handed over to a cloud platform to do this, professional things, from cloud platform keeps hundreds of people to maintain the system, specialized application you just focus on your bike.
Either you deploy it automatically or you don’t deploy it at all, and in general you don’t have to worry about the application layer, and that’s where the PaaS layer comes in.
While scripting can solve your own application deployment problems, different environments vary, and a script that works correctly in one environment may not work correctly in another.
Containers are a better solution to this problem.
Container is another word for Container. The idea of a Container is to become a Container for software delivery. Container features, one is packaging, the other is standard.
In the days before containers, it would take three ports and three changes to get goods from A to B. Each time the cargo had to be unloaded, disassembled, and then loaded onto the ship and put in order again. So when there were no containers, the crew had to stay ashore for several days each time they changed ships.
With the containers, all the cargo was packed together, and the containers were all the same size, so every time a ship changed, a single box was moved across the ship, which could be done by the hour, and the crew did not have to spend long time on shore.
This is the container “encapsulation”, “standard” two characteristics in life application.
So how does a container package an application? Or to learn containers, first of all, there should be a closed environment, the goods will be encapsulated, so that the goods do not interfere with each other, isolated from each other, so that loading and unloading is convenient. Fortunately, LXC technology in Ubuntu has long been able to do this.
There are two main technologies used in closed environments. One is a seemingly isolated technology called Namespace, which means that each application in a Namespace sees a different IP address, user space, procedure number, etc. The other is a segregated technique called Cgroups, which means that an application can only use a portion of the CPU or memory on the entire machine.
The so-called mirror is the moment when you weld the container, and the state of the container is preserved. Just like sun Wukong said, “Fixed”, the container is fixed at that moment, and then the state of this moment is saved into a series of documents. The format of these documents is standard, and anyone who sees them can reconstruct the moment of the moment. The process of restoring an image to runtime (that is, reading the image file to restore that moment in time) is the process of the container running.
With containers, the automatic deployment of the PaaS layer to users’ own applications becomes fast and elegant.
Big data embraces cloud computing
A complex common application in the PaaS layer is the big data platform. How does big data fit into cloud computing step by step?
3.1 Small data also contains wisdom
The big data wasn’t that big at first. How much data can you imagine? Now everyone go to the ebook, read online news, after 80 in our childhood, the amount of information is not so big, also read, read, newspaper added up to a week how many words ah, if you are not in a big city, an ordinary school library together few shelves, is then with the advent of information technology, information is more and more.
First of all, let’s take a look at the data in big data, which can be divided into three types: structured data, unstructured data and semi-structured data. What is structured data? Data of fixed format and finite length. For example, the form is structured data, nationality: People’s Republic of China, nationality: Han, gender: male, these are all called structured data. Now more and more unstructured data, that is, data with variable length and no fixed format, such as web pages, sometimes very long, sometimes a few sentences, such as voice, video are unstructured data. Semi-structured data is in some XML or HTML format, which non-technical people may not understand, but it doesn’t matter.
How can data be useful to people? In fact, the data itself is not useful, must go through a certain amount of processing. For example, if you run with a wristband every day, you can also collect Data, and so many web pages on the Internet are also Data, which we call Data. Data itself is useless, but Data contains a very important thing called Information, which is very messy and can only be called Information after combing and cleaning. Information will contain many rules, we need to summarize the rules from information, called knowledge, knowledge changes fate. There is a lot of information, but some people see the information is equivalent to nothing, but some people see the future of e-commerce from the information, some people see the future of live broadcasting, so people are cool, if you do not extract knowledge from the information, every day to see the circle of friends, you can only be a spectator in the tide of the Internet. With knowledge, and then use this knowledge to apply to the actual combat, some people will do very well, this thing is called intelligence. Knowledge does not necessarily mean wisdom, for example, many scholars are very knowledgeable, what has happened can be analyzed from all angles, but as soon as it comes to work, it can not be transformed into wisdom. Many entrepreneurs are great because they apply their knowledge to practice and end up doing great business.
So the application of data is divided into four steps: data, information, knowledge, wisdom. This is a lot of businesses are wanted, you see I have collected so many data, can be based on these data to help me do the decision, improve my products, such as pop-up ads next to the users watching video, is exactly what he wants to buy, or allow the user to listen to music, also recommend some other he wanted very much to listen to music. User clicking the mouse on my application or web site, enter text are data for me, I’m to pick up the some of the things, to guide practice, the formation of wisdom, let users in my application cannot extricate oneself, don’t want to leave on my net, hand points, buy, a lot of people say double tenth one I want to offline, My wife in the above constantly buy buy buy, buy A and recommend B, wife adult said, “oh, B is also I like ah, husband I want to buy”. How do you think this program is so smart, so smart, and knows my wife better than I do, how does this thing work?
3.2 How can data be sublimated into wisdom
There are several steps to the processing of data, and wisdom comes after completion.
The first step is called data collection. First of all, there are two ways to collect data. The first way is to take, which is technically called “grab” or “crawl”. For example, search engines do this. For example, when you go to search, the result will be a list, why this list will be in the search engine of the company, it is because he put the data ah took down, but you a link, point out this site is not in the search engine they company. For example, Sina has a news, you take Baidu search, when you do not point, that page in baidu data center, a little out of the web page is in sina’s data center. Another way is push. There are many terminals that can help me collect data. For example, mi wristband can upload your daily running data, heartbeat data and sleep data to the data center.
The second step is data transmission. Generally through the queue, because the amount of data is too large, the data must be processed to be useful, but the system can not be processed, had to queue up, slowly processing.
The third step is data storage. Data is now money, and to have data is to have money. How else would a website know what you want to buy? Just because it has data about your transactions in history, this information can’t be given to anyone else, it’s very valuable, so it needs to be stored.
The fourth step is data processing and analysis. The data stored above is the original data, the original data is mostly chaotic, there is a lot of garbage data in it, so it needs to be cleaned and filtered to get some high-quality data. For high-quality data, analysis can be carried out to classify the data, or discover the relationship between the data, and obtain knowledge. Such as beer and diapers that wal-mart’s story, is through the analysis of people’s purchase data, discovered the man usually bought diapers, can buy beer at the same time, thus found the relationship between beer and diapers, acquire knowledge, and then applied to the practice, will get very close, beer and diapers counter received wisdom.
The fifth step is the retrieval and mining of data. Search is to search, the so-called foreign affairs do not ask Google, domestic affairs do not ask Baidu. Both internal and external search engines are about the analysis of the data into the search engine, so that when people want to find information, there is a search. The other is mining, just search out can not meet the requirements of people, but also need to dig out the mutual relationship from the information. For example, when searching for a company’s stock, should the company’s executives also be uncovered? If you just search for the company’s stock and find that it is rising very well, then you buy it. In fact, the senior executive makes a statement, which is very bad for the stock, and the next day it falls, doesn’t that hurt the majority of shareholders? Therefore, it is very important to mine the relationship in data and form knowledge base through various algorithms.
3.3 In the era of big data, the fire is high when everyone adds wood
When the amount of data is small, a few machines can solve it. Slowly when the amount of data is more and more large, the most awesome server can not solve the problem, want to do? We need to bring together the power of multiple machines, and we need to work together to get this done.
For data collection, for IoT, deployed outside the tens of thousands of testing equipment, to a large number of temperature, moderate, monitoring, power and so on all of the data collected, for Internet web search engine, you need to download all the Internet web page, this is obviously a machine can’t do that, need more machines of web crawler system, Each machine to download a part of the work at the same time, in a limited time, will be the massive web page download.
For data transmission, a queue in memory will certainly be crowded by a large number of data burst, so the generation of distributed queue based on hard disk, so that the queue can be transmitted by multiple machines at the same time, depending on the amount of data, as long as my queue is large enough, thick enough pipe, can hold up.
For data storage, the file system of a single machine cannot fit, so a large distributed file system is needed to do this, combining the hard disks of multiple machines into one large file system.
Again such as data analysis, may need to do to a large amount of data decomposition, statistics, summary, a machine must make uncertain, can’t finish the analysis processing to date, and then there are distributed computing method, will be a large amount of data into small pieces and a small portion of each machine processing, multiple machines in parallel processing, can quickly work out. The famous Terasort, for example, sorts 1 terabyte of data, equivalent to 1000 GIGABytes, in hours on a single machine, but in 209 seconds on parallel processing.
So big data platform, what is called big data, to put it bluntly, one machine can not finish, we all work together. With more and more data, many small companies need to process a lot of data. What can these small companies do without so many machines?
3.4 Big data needs cloud computing, and cloud computing needs big data
At this point, you think of cloud computing. When you want to do these jobs, you need a lot of machines to do it, really want when you want, want as much as you want. For example, the financial situation of a big data analysis company might be analyzed once a week. If you have to put all these 100 machines or 1,000 machines in there and use them once a week, right? It’s very wasteful. Can we take out a thousand machines when we need to count, and then when we don’t count, we can use a thousand machines to do something else. Who can do it? Only cloud computing can provide the flexibility of resource layer for big data operations. Cloud computing also deploys big data on its PaaS platform as a very, very important universal application. Because of big data platform allows multiple machines together to do a thing, this thing is not the average person can be developed, not ordinary people chosen, how to also have to hire a dozens of hundreds of people can play this up, like a database, so still need to have a group of professional people to actually play this thing. On public clouds now basically can have big data solutions, a small company I need big data platform, do not need to purchase one thousand machines, as long as to the public cloud point, all one thousand machines, and has deployed well above the big data platform, just put in the data to calculate.
Cloud computing needs big data, and big data needs cloud computing.
4. Ai embraces big data
4.1 When will machines understand the human mind
Although there is big data, people’s desire is always this can not be satisfied. Although there is a search engine in the big data platform, I can search out what I want. But there are also cases where I can’t search for what I want, and I can’t express it. What I search for is not what I want. For example, the music software recommends a song, which I have never heard before, so OF course I don’t know its name and I can’t search it, but the software recommends it to me and I really like it. This is something that search cannot do. When people use the app, they find that the machine knows what I want, rather than when I want it, I go to the machine and search for it. This machine really understands me as well as my friend, which is a bit artificial intelligence.
People have been thinking about this for a long time. In the early days, people imagined that if there was a wall, and there was a machine behind the wall, and I spoke to it, it would respond to me, and if I couldn’t tell whether it was a person or a machine on the other side, it would really be an artificial intelligence thing.
4.2 Let machines learn to reason
How do you do that? And people thought, well, first I’m going to tell the computer about human reasoning. You see what is important about people ah, what is the difference between people and animals ah, is to be able to reason. If ONLY I could tell the machine my ability to reason, and the machine could deduce a corresponding answer according to your question. People are slowly making machines capable of doing things like proving mathematical formulas. It was an amazing process, and the machine was able to prove the mathematical formula. However, we gradually found that the result was not so surprising, because we found a problem, the mathematical formula is very rigorous, the reasoning process is also very rigorous, and the mathematical formula is easy to express with a machine, the program is relatively easy to express. However, human language is not so simple, for example, this evening, you and your girlfriend date, your girlfriend said: if you come early, I did not come, you wait, if I come early, you did not come, you wait. This machine is a little harder to understand, but everyone understands, so you don’t dare to be late for a date with your girlfriend.
4.3 Teach machine knowledge
So it’s not enough to tell the machine rigorous reasoning, but also to tell the machine some knowledge. But knowledge of this matter, ordinary people may not be able to do, may be experts can, such as language experts, or financial experts. Can knowledge of language and finance be expressed as a slightly more rigorous mathematical formula? For example, a language expert might summarize grammatical rules such as subject, predicate, object, definite form, complement, the subject must be followed by the predicate, the predicate must be followed by the object, summarize these rules and strictly express them soon ok? And it turns out that this is not going to work. It’s too hard to summarize. Take the subject, predicate and object example, many times in the spoken language, the predicate is omitted, others ask: who are you? I replied: I liu Chao. But you can’t require the machine to speak standard written language for speech semantic recognition, which is still not smart enough. As Luo Yonghao said in a speech, every time he looked into the phone, he said in written language: Please call x and X for me, this is a very embarrassing thing.
This stage of artificial intelligence is called expert systems. Expert systems are not easy to succeed, on the one hand, knowledge is difficult to sum up, on the other hand, the knowledge summed up is difficult to teach computers. How can you program it to a computer if you can’t tell it, because you seem to think there’s a pattern?
4.4 Forget it, teach can’t you learn by yourself
So people thought, well, it looks like machines are a completely different species from humans, so let the machines learn by themselves. How does a machine learn? Since the machine is so powerful in statistics, it must be able to find certain patterns in large numbers based on statistical learning.
In fact, there is a good example in the entertainment industry, visible
A netizen has calculated the lyrics of 117 songs from nine albums released by well-known singers in the Mainland. The same word appears only once in a song, and the top ten adjectives, nouns and verbs are shown in the table below (the number after the word is the number of occurrences) :
a | adjectives | b | noun | c | The verb |
0 | Loneliness: 34 | 0 | Life: 50 | 0 | Love: 54 |
1 | Freedom: 17 | 1 | Lou: 37 | 1 | 37 pieces: |
2 | Confused: 16 | 2 | Eve: 29 | 2 | Cry: 35 |
3 | Strong: 13 | 3 | Sky: 24 | 3 | Death: 27 |
4 | Despair: 8 | 4 | Child: 23 | 4 | Fly: 26 |
5 | Youth: 7 | 5 | Rain: 21 | 5 | Dream: 14 |
6 | Confused: 6 | 6 | Stone: 9 | 6 | Prayer: 10 |
7 | Light: 6 | 7 | Bird: 9 | 7 | Left: 10 |
What if we wrote a random string of numbers and took one word from the adjective, noun and verb in order of the digits and joined them together?
For example, take PI 3.1415926, the corresponding words are: strong, road, fly, freedom, rain, buried, lost. Connect and touch up a little:
Tough kids,
Still on the road,
Spread your wings and fly to freedom,
Let the rain bury his confusion.
Do you feel something? Of course, true statistics-based learning algorithms are much more complex than this simple statistic.
Statistical learning is easier to understand a simple correlation, however, such as one word and another word always appear together, the two words should have a relationship, to express complex correlation, and the formula of statistical methods are often very complex, in order to simplify the calculation, often make a variety of independence assumption, to reduce degree of difficulty of calculation formula, in real life, however, Independent events are relatively rare.
4.5 Mimic the way the brain works
So people began to reflect on how the human world works from the world of machines.
Human brain is not store a lot of rules, also is not a record of a lot of statistical data, but by neurons trigger the implementation, each neuron has input from other neurons, when receives the input, will produce an output to stimulate other neurons, and a large number of neurons interactions, eventually form the results of the various output. For example, when people see a beautiful woman with dilated pupils, it is not the brain making a regular judgment based on the proportion of body size, nor counting all the beautiful women they have seen in their life. Instead, neurons fire from the retina to the brain and back to the pupils. In this process, it’s hard to figure out what role each neuron played in the final result, but it did.
So people started modeling neurons with a mathematical unit
This neuron has input and output, and the input and output are represented by a formula. The input affects the output according to its importance (weight).
So n neurons can be connected together like a neural network, the number n can be very, very large, all the neurons can be divided into many columns, and each column can have many columns, and each neuron can have different weights on the input, and therefore each neuron has a different formula. When people input something from this network, they want to output a result that is correct for human beings. For example, in the above example, input a picture with the word “2”, the second number in the output list is the largest. Actually, from the perspective of the machine, it does not know the input picture with the word “2”, nor does it know the meaning of the output series of numbers. It does not matter, human knows the meaning. As for neurons, they don’t know that the retina is seeing a beautiful woman, nor do they know that the pupils dilate to see clearly, but when they see a beautiful woman, the pupils dilate, and that’s enough.
For any neural network, no one can guarantee that the input is 2, and the output is always the second largest number. To guarantee this result requires training and learning. After all, the dilated pupils of beautiful women are the result of years of evolution. The process of learning is to input a lot of images and adjust them if the result is not what you want. How to adjust it is that each weight of each neuron is fine-tuned to the target. Because there are too many neurons and weights, it is difficult for the results generated by the whole network to show an either-or result. Instead, it makes slight progress towards the result and finally achieves the target result. Of course, these adjustment strategies are still very skillful, need algorithm master to carefully adjust. Just as the human sees the beauty, the pupil does not enlarge to be able to see clearly at the beginning, so the beauty runs away with others, the result of the next study is that the pupil dilates a little, rather than dilate the nostrils.
4.6 It doesn’t make sense but it can be done
It doesn’t sound that logical, but it can be done, just so willful.
The universal theorem of neural networks says, suppose someone gives you some complicated fancy function, f(x) :
Whatever this function looks like, there is always a neural network that can take any possible input x, and its value f(x) (or some exact approximation) is the output of the neural network.
If the function represents a law, it also means that the law, no matter how wonderful, no matter how incomprehensible, can be represented by a large number of neurons, by a large number of weights.
4.7 Economic explanation of ARTIFICIAL intelligence
This reminds me of economics, so it’s easier to understand.
We think of each neuron as an economically active individual in a society. Therefore, the neural network is equivalent to the entire economy and society, and each neuron adjusts its weight to the input of the society and makes corresponding output, for example, when the salary rises, the vegetable price rises, and the stock falls, what should I do? How should I spend my own money? Isn’t there a pattern? Sure, but what’s the pattern? It’s hard to tell.
The economy based on expert system belongs to planned economy, and the expression of the whole economic law is not expected to be expressed through the independent decision of each economic individual, but to be summarized through the strategic view and farsighted knowledge of experts. Experts never know which city street lacks a sweet tofu vendor. Therefore, experts say that how much steel and steamed bread should be produced is often far from the real needs of people’s lives. Even if the whole plan is written in hundreds of pages, it cannot express the small laws hidden in people’s lives.
Macroeconomic regulation based on statistics is much more reliable. Every year, the Bureau of Statistics will make statistics on the employment rate, inflation rate, GDP and other indicators of the whole society. These indicators often represent many internal laws, although they cannot be accurately expressed, but they are relatively reliable. However, the summary expression based on the statistical law is relatively rough. For example, economists can conclude whether the housing price will rise or fall in the long run and whether the stock price will rise or fall in the long run when they see these statistical data. If the economy is on the rise, both the housing price and the stock price should rise. However, based on statistical data, it is impossible to summarize the law of slight fluctuations in stocks and prices.
Microeconomics based on neural network is the most accurate expression of the whole economic law. Everyone adjusts the input from the society, and the adjustment will also feed back to the society as input. Imagine the subtle curves of the stock market, which are the result of individual trades. There is no uniform pattern to follow. And each person makes independent decisions based on the input of the whole society. When certain factors are trained for many times, they will also form macroscopic statistical laws, which is what can be seen in macroeconomics. For example, every time a large amount of money is issued, housing prices will eventually rise, and after a lot of training, people will learn.
4.8 Artificial intelligence needs big data
However, neural network contains so many nodes, each node contains a huge number of parameters, the quantity is too big, need is simply too large amount of calculation, but no relationship, we have a big data platform, can be calculated together with the power of multiple machines, can in the limited time to obtain the desired result.
Ai can do a lot of things, such as identify spam, pornography and violent text and pictures. This also went through three stages. The first phase relies on keyword blacklists and filtering techniques to determine which words are obscene or violent. As the language grows and the words change, updating the lexicon becomes overwhelming. The second phase is based on some new algorithm, like Bayesian filtering, you don’t care what a Bayesian algorithm is, but the name you’ve heard of, it’s a probabilistic algorithm. The third stage is based on big data and artificial intelligence, more accurate user portrait and text understanding and image understanding.
Artificial intelligence algorithms mostly rely on a large amount of data, which often need to be accumulated for a specific field (such as e-commerce and email) for a long time. Without data, even if there is an ARTIFICIAL intelligence algorithm, it is useless. Therefore, artificial intelligence programs are seldom like IaaS and PaaS. To install a set of artificial intelligence programs for a customer to use, because to install a set of artificial intelligence programs for a customer alone, the customer does not have relevant data to do training, the result is often very poor. But cloud computing manufacturers tend to accumulate a lot of data, so in the cloud computing manufacturers inside the installation of a set, exposed a service interface, for example, you want to identify a text is not involved in pornography and violence, direct use of this online service can. The Service of this situation is called Software AS A Service (SaaS) in cloud computing.
So the industrial intelligence program as SaaS platform into the cloud computing.
Five, cloud computing, big data, artificial intelligence to live a better life
Finally, there are three brothers of cloud computing, namely IaaS, PaaS and SaaS. Therefore, cloud, big data and artificial intelligence can be found on a cloud computing platform. For a big data company, which has accumulated a large amount of data, it will also use some artificial intelligence algorithms to provide some services. For an AI company, it is impossible to do without big data platform support. Therefore, cloud computing, big data and artificial intelligence are integrated in this way to complete the meeting, acquaintance and acquaintance.
The author of this article is Chao Liu, “The Secrets of Lucene Application Development”. Personal official account: Liu Chao’s Popular Cloud Computing (Popsuper1982).
comments