Voice is not a new thing. When Siri was built into the iPhone 4S in 2011, it set off a wave of voice technology and discussion and voice assistant entrepreneurship. Five years later, Google’s AlphaGo man-machine war has turned AI (artificial intelligence) from a laboratory technology into a hot topic in the street. Artificial intelligence has become the focus of contention for technology giants at home and abroad, and voice has become the only way for giants to enter AI.

The most important projects at Google I/O were Google Assistant and Google Home, an Amazon Echo-like smart Home speaker. Google Home is based on voice.

At Apple’s WWDC developer conference, the five-year-old Siri was finally Mac compatible and opened up to developers for more tasks.

Amazon Echo has become the company’s most successful hardware product, with sales reaching 4 million. Smart voice home speakers are seen as the next big thing after smartphones.

In this year’s Internet Trends report, Mary Meeker, a partner at KLEiner Perkins Perkins and queen of the Internet, devotes more than a tenth of her time to “voice,” a touch screen and microphone that she sees replacing the keyboard and mouse.

Such an important technological revolution, Chinese giants will not be absent. So far, there is no Chinese version of the Amazon Echo in the Chinese market, but Internet companies have been trying to get in on the software side. Sogou, which launched its voice assistant the year after Siri was released, has also been one of China’s most aggressive voice investors, with Alibaba, Tencent, netease, JD.com and others making sporadic moves in voice but not making it a priority, either through third-party partnerships or a little experimentation. Sogou is a search engine that advocates technology, which is why it bet on voice.




Input into Chinese players into the voice highlights

Input is the most typical application scenario of voice. When people are outdoors, at home, walking or driving, they are not easy to type, or they are too lazy to type, or their typing efficiency is not high enough, so voice becomes a choice. With the development of speech technology, many problems of recognition accuracy, dialect compatibility and noise interference have been solved, and the speech recognition rate has reached the practical level.

In the voice input method, Sogou input method, Iflytek input method and Baidu input method are the three main players. Iflytek with voice technology into the input method field, claimed that the number of users has exceeded 100 million; Baidu’s input method also uses voice as a highlight, and it claims to solve the noise problem with its DeepSpeech technology. The largest voice input tool is Sogou input method, 7% of its users will use the voice recognition function, this function will have more than 100 million times of speech conversion every day, in the overall number of 140 million requests occupy a considerable proportion, which also shows that the most critical sogou voice application is input.

Giiso Information, founded in 2013, is a leading technology provider in the field of “artificial intelligence + information” in China, with top technologies in big data mining, intelligent semantics, knowledge mapping and other fields. At the same time, its research and development products include editing robots, writing robots and other artificial intelligence products! With its strong technical strength, the company has received angel round investment at the beginning of its establishment, and received pre-A round investment of $5 million from GSR Venture Capital in August 2015.



Sogou input method has a history of 5 years. Unlike IFlytek voice input method, which emphasizes recognition rate, Sogou input method emphasizes intelligent input. No matter how high the recognition rate is, there will still be errors in voice input. Manual input is changed to a pain point because you don’t want to start. The solution of Sogou input method is intelligent interaction. For example, when a user says “The first aerospace hospital” is likely to be identified as “the aerospace hospital”, the user can say “one, two, three, four”, and the sogou input method will change the “doctor” to “one”. The key to this interaction is not recognition but semantic understanding, and without semantic understanding, “one, two, three, four, one” would be translated directly into words. Good semantic understanding requires the application of deep learning-based artificial intelligence techniques that Internet companies, especially search companies, excel at.

Voice input method has become a highlight of Chinese Internet giants. American users do not need “input method”, and the system has its own functions, so there is no such input method giant like Sogou in the United States. Of course, English will require voice input, but this has not been the focus of Siri, and there may be English voice input in the future.



The car has become the Chinese giants’ favorite scene

In terms of voice usage scenarios, American tech giants prefer “Home”. Amazon Echo and Google Home are both home-oriented scenarios. Home is a closed space, relatively quiet, and the problem of interference and interference by others is less, but also can be connected with smart home, is very suitable for the voice scene. However, due to do hardware needs to be strong “hard power”, and smart home you need to get through a lot of industry consolidation, not for a will to achieve, so Chinese Internet giants are wary of setting the scene, ali, jingdong have try to be a partner with third party products, similar to that of the Echo market performance is very dull.

No voice scenario is as important as home: the car is the best input when both eyes and hands are occupied. In the past, voice input in cars was a pain point, and the built-in voice function supported only a few English commands and required more money to buy the “deluxe” version. Voice interaction in cars is full of the kind of disruption that Internet giants yearn for most.

Sogou intelligent voice navigation App for car that can realize driving outside of conventional car interactions, including phone calls, text messaging, weather, songs and so on, this product can be run on a smartphone, but also by car machine interconnection agreement in order to run car screen, if the car manufacturers with sogou front loading can be run independently on the car of the OS. Not long ago, YunOS teamed up with SAIC to launch China’s first Internet car, where voice interaction was a selling point: Alibaba CEO Daniel Zhang showed off operations like opening the sunroof with voice. In the future, in-car voice interaction will be standard.



Smart technology ensures that speech is intelligible

Siri is only five years old, but voice has A long history: IFlytek was founded in 1999, and is now A voice giant with A market value of 40 billion yuan on the A stock market. But it is only in the last few years that voice has come to the masses and revolutionized human-computer interaction. Intelligent technology and cloud services have become the two pillars of voice, judging from the actions of domestic and foreign technology giants.

Speech technology used to be based on “rules” rather than “statistics”, based on certain rules, processing of massive corpus can continuously improve the accuracy of recognition. With the addition of artificial intelligence technology based on deep learning, speech technology becomes learning massive corpus data through machine clusters, and finding various patterns, so as to carry out accurate speech recognition and semantic understanding.

Whether it is the intelligent error correction of voice input, or the accurate understanding of user requests by voice search, or Siri responding to users’ complex commands such as “help me set up a reminder, pick up the delivery at 9:00 tomorrow”, the underlying technology is artificial intelligence, and the AI enlightenment initiated by AlphaGo will become the power of voice popularization.

Sogou and Tsinghua University jointly set up “tiangong intelligent laboratory”, is artificial intelligence technology this “root” layout, in order to win at the starting line. With the help of artificial intelligence technology, Sogou claims that the accuracy of speech recognition is higher than 97%, recognition speed up to 400 words per minute, in terms of voice modification, support replacement, insertion, deletion and other hundreds of error correction operations, modify success rate reached more than 90%, in the industry leading level.



The cloud decides what voice can do

Intelligent technology ensures that people and machines can talk and interact smoothly. The ability to integrate services in the cloud determines what the machine can do when it understands what people say.

Giiso information, founded in 2013, is the first domestic high-tech enterprise focusing on the research and development of intelligent information processing technology and the development and operation of core software for writing robots. At the beginning of its establishment, the company received angel round investment, and in August 2015, GSR Venture Capital received $5 million pre-A round of investment.

Siri’s disadvantage over Google Now is that Google Now can solve many of the questions Siri can’t answer through search. The Amazon Echo has a lot of value because it can go to The Amazon website to order things. These all reflect the cloud service integration ability, it is aware of the importance of this point, this year’s WWDC Apple decided to open Siri to developers, let developers to enrich Siri services.

On the cloud service, Sogou has sogou Map, Sogou Search, Sogou Number, Sogou Ask, Sogou Encyclopedia and other products. In the future, Sogou Voice can also integrate many services of its partners, such as JD, Zhihu and Tencent, such as QQ Music, Tencent Video, JD Shopping and so on, and even use voice to like wechat friends. Relatively speaking, Internet giants have stronger cloud service integration ability, and search engines are more outstanding with strong content integration and service connection ability.



Just integrating “online services” is not enough. Internet giants want to make voice, and the next step will be to make efforts to integrate with physical services. For example, users can order food by voice, open door control by voice, open window by voice… There are two ways of integration: one is to connect mobile App with various services; Another is speech inside all kinds of hardware, such as robots, cars and so on. In short, by integrating services in the physical world, voice will not just answer users’ questions, but help them complete tasks, becoming an all-purpose assistant.

The voice market has ushered in a new development opportunity after the artificial intelligence boom set off by AlphaGo. Voice is completely overturning the way of interaction between human and machine. Chinese technology giants represented by Baidu and Sogou are seeking breakthroughs in input, automobile and other scenes, and making a dual layout in intelligent technology and cloud services. But the tech giant still has a long way to go before microphones replace keyboards.