With the rapid development of mobile intelligent terminals and cloud computing, the tide of artificial intelligence is quietly subvert every bit of our life. VUI (Voice User Interface), as a new field, is also developing rapidly. And put forward more new requirements on user experience, such as linguistics, emotion shaping, logic building and so on. Think about how VUI technology has changed: I’m lying on the couch, playing games with my hands, using only my voice, I can control the air conditioner, order a takeaway, and eat it in an hour or so. It’s a great experience!

01

The development of the VUI

So why add a new interaction when there is already a Graphical User Interface (GUI)? The biggest difference between them is: the input method is different. The most notable feature is that it “frees up the hands” to use natural language to get the information we care about, and the eyes and hands can do other things at the same time.

1. The first period of VUI

In the 1990s, the first viable, non-specific speech recognition system (everyone can speak to him) was born, and the emergence of Interactive Voice Response (IVR) systems represented the first significant period of VUI [1]. People interact and perform tasks through telephone lines, such as airline ticket reservations, bank transfers, business inquiries, etc. We believe that you have used 12306 phone to book train tickets. We can enter digital commands and interact with the system through voice. Its main features are as follows:

  • Advantages: Good at recognizing and broadcasting long characters.

  • Disadvantages: users rarely have a chance to pause the system, the system takes the initiative.

We input the ID number and so on, so that the system carries out identity and command recognition, the system will broadcast each site such as :1 Beijing, 2 Tianjin, 3 Shandong voice length for us to choose, recall that process, we must constantly interact with the system, if there is an error in the middle, we can only hang up and start again, Therefore, the whole interaction process tends to leave users in a state of caution and awkwardness.

2. The second phase of VUI

We are now in the early stage of the second period. At present, many apps integrating visual and voice information such as Siri and Google, as well as pure voice design products such as Amazon Echo, have gradually developed and become the mainstream [1]. With the development of voice recognition technology, AI technology and Internet technology, we have been able to use voice to deal with many things in mobile devices, but there are still many things that cannot be completed by voice at present, so we need to explore.

Photo: Google Voice APP

  

Photo: Echo voice assistant product

02

Advantages and disadvantages of VUI compared to GUI

With the CURRENT TXD team precipitation GUI design principles as the inspection standard, horizontal cutting, vertical comparison of VUI advantages and disadvantages.

Figure: TXD design principle

The main advantages are:

The main disadvantages are:

Therefore, through comparison, we find that GUI has more advantages in clear, efficient and universal, which is exactly the key for people to obtain information. It can provide help to users accurately, and has good ductility and universality. Compared with the point-like way of “ask and answer” to obtain information, GUI is more efficient. VUI is the most natural and cordial way of interaction to be pursued by design. It is “interactive experience with emotion and temperature”, which really starts from the perspective of users. From my personal point of view, at this stage of technological development, VUI is more of an auxiliary, at least not a complete replacement for GUI anytime soon.

Figure: Each interaction has its own strengths

03

Main application scenarios of voice interaction products

In different scenarios, users have different requirements. Therefore, voice interaction design needs to distinguish functions and voice interaction design based on specific scenarios. The main application scenarios include smart home, vehicle driving, enterprise application, medical treatment and education. This paper makes a brief analysis and examples of voice interaction products in these five scenarios.

1. Smart home

Photo: Product launch of smart home voice assistant

In the first quarter of this year, Amazon Echo accounted for 70.6% of the US voice assistant market, while Google Home accounted for 23.8%. Other vendors (including Apple, Lenovo, LG, Harmon Kardon, and Mattel) took the remaining 5.6% of the market, indicating competitive pressure.

2. Car driving

3. Enterprise applications

4, medical

5, education

        

In each scenario, the corresponding apps in mobile phones are very rich, and we often encounter the need to open multiple apps for comparison and use in daily use. For example, if we want to find a hotel with affordable prices and good taste, we may need to search on word-of-mouth, comments and other products. Just imagine, if the voice intelligent matching technology is mature, we only need to input the demand by voice once, and the system can automatically recommend the matching results that are in line with our personal preferences, which is the natural, fast and smooth user experience.

Figure: Mobile phone APPS in various scenarios

“When voice interaction, intelligent matching and personalized recommendation become the mainstream interaction methods of the new generation of users, the services built as independent apps will face a huge impact.

While bringing convenience to people, composite experience is gradually disintegrating the sensory stimulation brought by a single medium (print, screen, sound). [1]”

Reference Books:

[1] Voice User Interface Design by Cathy Pearl (Translated by Wang Hanxing)

Check out more original content