Hello everyone, I am a ruffian balance, is a serious technical ruffian. Pzh-py-speech is a speech recognition tool.
Speech recognition is the core of PZH-Py-Speech, which uses the SpeechRecognition system and the CMU Sphinx engine to enable speech recognition. Here’s how speech recognition is implemented in Pzh-py-Speech.
SpeechRecognition system introduction
SpeechRecognition is a python based SpeechRecognition system designed by Anthony Zhang (Uberi). The library has been updated since 2014. Pzh-py-speech uses SpeechRecognition 3.8.1. The official home page for SpeechRecognition is as follows:
- SpeechRecognition official home: github.com/Uberi/speec…
- SpeechRecognition installation: pypi.org/project/Spe…
Your SpeechRecognition system has no SpeechRecognition function. It uses a third-party SpeechRecognition engine to enable SpeechRecognition. There are 8 types of engines that you can use with your SpeechRecognition system.
- CMU Sphinx (works offline)
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech to Text
- Snowboy Hotword Detection (works offline)
Your SpeechRecognition interface is the same regardless of which engine you use. Let’s use audio_Transcribe. Py as an example of how to convert an audio file to text. An interception of the audio_Transcribe. Py is as follows:
import speech_recognition as sr
# Specify the audio source file to convert (English.wav)
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
Define SpeechRecognition objects and get the data in the audio source file (English.wav)
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
# Use the CMU Sphinx engine to identify audio
try:
print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
Use Microsoft Bing Voice Recognition engine to recognize audio
BING_KEY = "INSERT BING API KEY HERE" # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
Use other engines to recognize audio
#... .
Copy the code
Do you find SpeechRecognition easy to use? Yes, this is the power of the SpeechRecognition system. For more examples, see github.com/Uberi/speec… .
1.1 CMU Sphinx engine is selected
Your SpeechRecognition system does not have SpeechRecognition, so you need to install a SpeechRecognition engine for your SpeechRecognition. For your speech, you use CMU Sphinx, which works offline. CMU Sphinx is an open source speech recognition engine developed by Carnegie Mellon University that works offline and supports multiple languages (English, Chinese, French, etc.). The official homepage of the CMU Sphinx engine is as follows:
- CMU Sphinx official homepage: cmusphinx.github
- CMU Sphinx official download: sourceforge.net/projects/cm…
Since JaysPySPEECH is developed in Python, we cannot use CMU Sphinx directly. What should we do then? Dmitry Prazdnichnov has written a Python wrapper for CMU Sphinx, PocketSphinx.
- PocketSphinx: github.com/bambocher/p…
- PocketSphinx Installation method: pypi.org/project/poc…
We have SpeechRecognition and PocketSphinx installed in the environment for the first JaysPySPEECH Birth article, Install your engine with your speech_recognition and Pocketsphinx packages in C: tools_mcu Python27 Lib site-packages.
1.2 Added Chinese language pack for PocketSphinx engine
By default, PocketSphinx supports only US English, You can only see the en-us folder under C: tools_mcu Python27 Lib site-packages Pocketsphinx-data. Take a look at what’s in this folder:
\ Pocketsphinx-data \ en-us \acoustic- Model - Acoustic model \feat. Params -HMM model characteristic parameter \mdef -model definition file \means - Mean of mixed Gaussian model \mixture_weights - mixing weights \noisedict - noise is just a non-speech dictionary \sendump - obtaining mixing weights from acoustic models \transition_matrices -HMM model state transition matrices \variances - Variance of mixed Gaussian model \language-model.lm.bin \pronounciation \ dictionary.dict - pronounce.pronounce.dictCopy the code
Do you find this pile of documents confusing? This is actually related to the speech recognition principle of CMU Sphinx engine. We don’t have a deep understanding here, but for our application that calls API, we only need to know how to add other language packages (such as Chinese packages) to CMU Sphinx. If you want to add other languages, the data must first be language pack, CMU Sphinx homepage provides 12 kinds of mainstream language pack to download sourceforge.net/projects/cm… Since JaysPySPEECH needs to support Chinese recognition, we need to download the following three files \Mandarin:
\Mandarin \ zh_broadcastnews_16k_pTM256_8000.tar. bz2 -- Audio model \ zh_broadcastnews_64000_UTf8.dmp -- language model \zh_broadcastnews_utf8.dic -- A pinyin dictionaryCopy the code
Now that we have the Chinese language package data, we need to follow the instructions in Notes on Using PocketSphinx.
- Create zh-cn folder under \speech_recognition\ Pocketsphinx-data
- Unzip zh_broadcastnews_16k_pTM256_8000.tar. bz2 and put all the files in the folder \ zh-cn \acoustic-model
- Rename zh_broadcastnews_utf8.dic to pronounciation-dictionary.dict and put it in the \ zh-cn folder
- Use SphinxBase to convert zh_broadcastnews_64000_utf8.dmp to language-model.lm.bin and put it in the \ zh-cn folder
For the SphinxBase tool mentioned in step 4, we need to download the SphinxBase tool from github.com/cmusphinx/s… SLN Rebuild All to SphinxBase bin Release x64 and Rebuild to Sphinxbase bin Release x64
\\sphinxbase\bin\Release\x64
\sphinx_cepview.exe
\sphinx_fe.exe
\sphinx_jsgf2fsg.exe
\sphinx_lm_convert.exe
\sphinx_pitch.exe
\sphinx_seg.exe
Copy the code
We mainly use sphinx_lm_convert.exe tool to complete the conversion work and generate language-model.lm.bin, the specific command is as follows:
PS C:\tools_mcu\sphinxbase\bin\Release\x64> .\sphinx_lm_convert.exe -i .\zh_broadcastnews_64000_utf8.DMP -o language-model.lm – ofmt arpa
Current configuration: [NAME] [DEFLT] [VALUE] - case-help no no -i.\ zh_broadcastnews_64000_utf8.dmp-ifmt-logbase 1.0001 1.000100e+00 -mmap no no -o language-model.lm -ofmt arpa INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format INFO: ngram_model_trie.c(365): Header doesn't match INFO: ngram_model_trie.c(177): Trying to read LM in arpa format INFO: ngram_model_trie.c(70): No \data\ mark in LM file INFO: ngram_model_trie.c(445): Trying to read LM in dmp format INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460 INFO: lm_trie.c(474): Training quantizer INFO: lm_trie.c(482): Building LM trieCopy the code
PS C:\tools_mcu\sphinxbase\bin\Release\x64> .\sphinx_lm_convert.exe -i .\language-model.lm -o language-model.lm.bin
Current configuration: [NAME] [DEFLT] [VALUE] - case-help no no -i.\ language-model.lm-ifmt-logbase 1.0001 1.000100e+00 -mmap no no-o language-model.lm.bin -ofmt INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format INFO: ngram_model_trie.c(365): Header doesn't match INFO: ngram_model_trie.c(177): Trying to read LM in arpa format INFO: ngram_model_trie.c(193): LM of order 3 INFO: ngram_model_trie.c(195): #1-grams: 63944 INFO: ngram_model_trie.c(195): #2-grams: 16600781 INFO: ngram_model_trie.c(195): #3-grams: 20708460 INFO: lm_trie.c(474): Training quantizer INFO: lm_trie.c(482): Building LM trieCopy the code
2. Implementation of PZH-Py-Speech
Speech recognition code is easy to implement, call the SPEECh_recognition API directly, so far only the IMPLEMENTATION of the CMU Sphinx engine, and only Chinese and English bilingual recognition. Pzh-py-speech is a callback function for the “ASR” button on the GUI interface, that is, audioSpeechRecognition(). If the user selects the configuration parameters (language type, ASR engine type) and clicks the “ASR” button, This triggers the execution of audioSpeechRecognition(). The code is as follows:
import speech_recognition
class mainWin(win.speech_win) :
def getLanguageSelection(self) :
languageType = self.m_choice_lang.GetString(self.m_choice_lang.GetSelection())
if languageType == 'Mandarin Chinese':
languageType = 'zh-CN'
languageName = 'Chinese'
else: # languageType == 'US English':
languageType = 'en-US'
languageName = 'English'
return languageType, languageName
def audioSpeechRecognition( self, event ) :
if os.path.isfile(self.wavPath):
Create the Speech_Recognition speech recognition object asrObj
asrObj = speech_recognition.Recognizer()
Get voice content from waV files
with speech_recognition.AudioFile(self.wavPath) as source:
speechAudio = asrObj.record(source)
self.m_textCtrl_asrttsText.Clear()
Get the speech language type (English/Chinese)
languageType, languageName = self.getLanguageSelection()
engineType = self.m_choice_asrEngine.GetString(self.m_choice_asrEngine.GetSelection())
if engineType == 'CMU Sphinx':
try:
Recognize_sphinx is called recognize_sphinx to complete speech recognition
speechText = asrObj.recognize_sphinx(speechAudio, language=languageType)
# Speech recognition results are displayed in the asrttsText text box
self.m_textCtrl_asrttsText.write(speechText)
self.statusBar.SetStatusText("ASR Conversation Info: Successfully")
Write the speech recognition result to the specified file
fileName = self.m_textCtrl_asrFileName.GetLineText(0)
if fileName == ' ':
fileName = 'asr_untitled1.txt'
asrFilePath = os.path.join(os.path.dirname(os.path.abspath(os.path.dirname(__file__))), 'conv'.'asr', fileName)
asrFileObj = open(asrFilePath, 'wb')
asrFileObj.write(speechText)
asrFileObj.close()
except speech_recognition.UnknownValueError:
self.statusBar.SetStatusText("ASR Conversation Info: Sphinx could not understand audio")
except speech_recognition.RequestError as e:
self.statusBar.SetStatusText("ASR Conversation Info: Sphinx error; {0}".format(e))
else:
self.statusBar.SetStatusText("ASR Conversation Info: Unavailable ASR Engine")
Copy the code
So, the speech processing tool pzh-py-Speech is a speech recognition tool
Welcome to subscribe to
The article will be published on my blog park homepage, CSDN homepage and wechat public account platform at the same time.
Wechat search “ruffian balance embedded” or scan the following two-dimensional code, you can see the first time on the phone oh.