There are four stages of intelligent speech: voice chat; Voice control; Emotionalization and personalization; The core of voice control is the seamless integration of the language system and the operating system, and the weight of voice commands will be higher

At the geek Park annual conference over the weekend, Robin Li, founder of Baidu inc, talked about artificial intelligence. Robin said ai has matured and technology has become more important in the era of mobile Internet. Meanwhile, RayKurzweil, president of the singularity university, said the cost of simulating human intelligence will be very low by 2020, and ai will surpass human intelligence by 2045 and lead to a whole new form of civilization.

Than an science fiction are the future of artificial intelligence, the general public is more likely to come into contact with the intelligent voice technology of life change, zte, nuance, on, baidu, Scott, automation, Chinese Academy of Sciences, such as nearly ten units become wise voice union, has been in the layout wisdom scale application of voice technology and ecological construction. As the interaction mode of the front end of artificial intelligence, intelligent speech is closer to the human communication mode, and can be integrated into mobile phones, cars, home appliances and other devices in stages, hierarchically and in depth, leading people step by step to the era of artificial intelligence.

Technological revolution: Four stages of intelligent Speech

Intelligent speech technology and application can be divided into four development stages according to its intelligence degree, value level, interaction level and thinking depth: voice chat, voice control, emotional personalization and human-machine integration.

Voice chat. This is the initial model-building period for speech technology. Voice chat is not only a process of machine learning, but also a process that endows machines with the ability to think. In the process of conversation and chat between people and robots, robots become more and more intelligent and gradually establish a sample feature database of people. Chatbots such as Siri and Xiaona on iPhone all use big data and machine learning technologies to conduct man-machine conversations. Most of these chats are for entertainment and were very active in the early stage. However, as people’s novelty period waned, the popularity of voice chats continued to decline sharply.

Voice control. This is the application stage of intelligent speech. The deep integration of voice technology and system software endows machine functions with motor ability and language system control ability, just like human language command system. At this stage, the communication between machine and human goes beyond the chat dialogue, and has realistic application value and gives full play to the value of technological productivity. By embedding voice into the operating system, users can use smart voice to wake up applications, address book, make calls, and listen to music. The most eye-catching thing is that in driving mode, the phone can be fully controlled by voice. It can wake up and control the phone without touching the phone or the screen. It can also intelligently broadcast SMS, voice to text, etc., giving full play to the advantages of intelligent voice interaction mode.



Giiso, founded in 2013, is a leading technology provider in the field of “artificial intelligence + information” in China, with top technologies in big data mining, intelligent semantics, knowledge mapping and other fields. At the same time, Giiso’s research and development products include editing robots, writing robots and other artificial intelligence products! With its strong technical strength, the company has received angel round investment at the beginning of its establishment, and received pre-A round investment of $5 million from GSR Venture Capital in August 2015.

Emotional and personal. Compared with the deep integration of speech and machine controlled by voice, the next step of intelligent language may be to be more full of emotion and have emotional communication like human beings. Personalization is the advanced state that intelligent speech is close to human natural language system. This stage, the wisdom of the voice control would be the most basic ability, the wisdom of the stars no x voice may be able to listen to a person’s mood, emotion, attitude, and not just through simple interaction between literal meaning, they will like your assistant accompany your life, happy, sad, this will require voice system has great wisdom of voiceprint recognition technology and smart brain.

Man-machine integration. It sounds is a science, but look from the current development of artificial intelligent technology and is able to achieve perhaps 10 to 20 years, this stage intelligent voice interaction patterns have infinite close to the human language system, and to manipulate human natural language into machine instruction system, to provide services, the initiative to give priority to people And voice control of many devices through open apis. Machines think like humans, understand the meaning of human language and emotional systems, and each machine can fight together like its own brothers.



What does intelligent speech bring us at this stage

Siri and Cortana bring the fun of anthropomorphic interactive communication of smart phones, and it is ok to relax occasionally. However, as they are only independent apps, they need to be awakened by touch operation, and the voice control they can achieve is very limited, which is a good beginning of interest in the era of intelligent speech. Siri’s contribution was to discover the power of speech technology and to develop initial user habits, opening the door to intelligent speech.

The intelligent voice level of Xingxing ii is the second stage, which focuses on the deep integration of the voice system and the mobile phone operating system. The voice can control the main functional requirements of the mobile phone, and its biggest feature is the driving mode scene. The interactive mode of intelligent voice shows high practical value. There is no need to touch the phone while driving, and the voice control interaction mode becomes a rigid requirement in driving mode.

Intelligent voice control is basically enough, and most of the control in driving mode has been realized and can be used. Meanwhile, IT is hoped that ZTE can continuously increase the granularity in the subsequent version of technology upgrade, that is, to improve the types of smart voice control phones and applications, in-app actions, etc. For example, whether some frequently used applications can be accessed through open API in the future, such as reading news and posts by voice in driving mode, reading novels in novel application, reading texts in wechat moments and so on, so that the whole mobile operating system can be completely controlled by voice.



The personalization of intelligent speech will be interesting

The embedded intelligent voice can now wake up and open the use of Amap, but Amap has a celebrity broadcast, This function of Amap makes the map application seen every day full of fun and intimacy, it would be interesting if ZTE also learned from this idea of Amap. It would be cool to have celebrities turn on music for me every day, search here and there, and read text messages.

The technology of accent recognition is generally good, but it still needs to be improved. Noise reduction is an idea, and intelligent speech can recognize the accent. Can it interact with me in Yantai Dialect? At present, intelligent speech can only identify dialects into Mandarin, so whether it is possible to reverse thinking and output dialects, so that it is easier to communicate with people in various places? For example, my star mobile phone named “small Mocofu”, I said to her “small mocofu, come to the bar”, intelligent speech is estimated to be difficult to identify, this machine learning can be solved?

I also think of an interesting scene, Luo Yonghao has been known as the best cross talk in the mobile phone industry, in fact, speech skills are needed. The robot with intelligent voice can act as the audience, and the owner of the mobile phone gives speech training to the audience. Whenever there is a long pause, he or she gives encouragement or applause, making everyone become a crosstalk actor, making shy and introverted people get out of the seclusion, and giving lonely and lonely people a place to vent. Therefore, I think there are many things in the speech scene mode and scene details, which can build an intelligent speech product level store, so that the community can participate in the external research and application scene research of intelligent speech.



What’s the next level of voice manipulation

The core of voice control is the seamless integration of language system and operating system, and the integration and unification of voice command system and OPERATING system API. In voice mode, voice command has higher weight, and the API of operating system plays a very important role. Reverse thinking, intelligent voice control system is also can output the API, if it is base point with between the speech control system, open voice control apis for mobile app developers, let all kinds of excellent application in the operation to be able to access to the voice control system, so wise voice can a day earlier to realize the whole speech manipulation of the mobile phone and use applications.

Furthermore, since voice control can open API, it means that complex API programming can be carried out, one voice command can be connected according to algorithm sequence, and can be connected to industrial robots, military robots, gardening robots, so as to achieve voice remote control system. That way, we’ll be able to remotely speak to our robot housekeepers, kind of like we’re pushing smart home connectivity right now.

Advanced voice control programming in the support of computing capacity, and people can carry out real-time remote dialogue and communication, our army sent drones in automatic combat at the same time, can also listen to the command or early warning aircraft air command direct voice command operations, at this time, a word spoken, recalling can also be chased.