First you need to know the two concepts of speech recognition 🦄 and speech synthesis 🐲.

Speech recognition: Speech recognition technology, also known as Automatic Speech Recognition (ASR), computer speech recognition (ASR) Computer Speech Recognition, or Speech To Text (STT), aims at automatically converting human Speech content into corresponding Text by Computer. Different from speaker recognition and speaker confirmation, the latter attempts to recognize or confirm the speaker of the speech rather than the lexical content contained therein.

Speech synthesis: Speech synthesis is the artificial production of human speech. If the computer system is used in speech synthesis, it is called speech synthesizer, and speech synthesizer can be implemented with software/hardware. Text-to-speech (TTS) systems convert Text from ordinary languages To Speech, while other systems can depict the presentation of linguistic symbols, just as phonetic symbols translate To Speech. The synthesized speech is connected by a number of recorded voices in a database. The system varies depending on the size of the voice unit stored. If you want to store phone and DIphone, the system must provide a large amount of storage space, but the semantics may not be clear. When used in specific fields, the way of storing whole words or sentences can achieve high quality speech output. In addition, a synthesizer that includes a sound channel model and other characteristic parameters of the human voice can create a complete synthetic sound output.

(From Wikipedia)

Basically, speech recognition turns speech into text, and speech synthesis turns text into speech,

Speech recognition

The interface for SpeechRecognition. Current support is not perfect and needs to be prefixed with webKit.

The current support status in Caniuse has not been updated, please check this issue.

Use mode:

const newRecognition = new window.webkitSpeechRecognition();
newRecognition.continuous = true;
newRecognition.start();
newRecognition.onresult = async function (event) {
  console.log(event.results[0] [0].transcript, 'onresult');
};
Copy the code

2. Speech synthesis

The SpeechSynthesisUtterance interface for speech recognition.

Current levels of support are as follows

It should be noted that since this feature is being abused, it requires pre-user behavior to call this API, and you can poke here to see why.

Use mode:

const utterThis = new window. SpeechSynthesisUtterance (' ray monkey ah friends');window.speechSynthesis.speak(utterThis);
Copy the code

3. Voice assistant

Combine these two functions and you can make a voice assistant.

Combined with qingyunke interface to do an online demo, please stamp 👇 here to view.

The Github address is github.com/suedar/talk

=- The End -=