Small program to achieve speech recognition after all how many holes to fill?

Not long ago, I wrote a tool micro channel small program (Find around), which uses voice recognition technology. Now the implementation details are sorted as follows:

Interface to preview

Through reading iFLYTEK interface documentation, small program interface development documentation and back-end ThinkPhp framework learning, I sorted out the following development steps:

Register with IFlytek (the pride of Chinese people, the world’s leading voice recognition technology)
Enter AIUI open platform to create application and record APPID and ApiKey in application management
Enter the application configuration. The configuration conforms to your scenario mode, identification mode, and skills
Perform small program development to record audio to be identified (more on this below)
The audio recorded by back-end transcoding (IFLYtek supports PCM and WAV) is submitted to the identification interface (detailed below).
The small program receives the recognition result for the next business

Audio recording interface

Wx. StartRecord () and wx. StopRecord ()

The wx.startRecord() and wx.stopRecord() interfaces can also meet the requirements, but are no longer maintained by the wechat team starting from version 1.6.0. The more powerful Wx.getRecorderManager interface is recommended. The audio format obtained by the interface is Silk. Silk is the result of base64 encoding of WebM format. After decoding, we need to convert WebM into PCM and WAV

wx.getRecorderManager()

Compared to the wx.startRecord() interface, this interface provides more powerful capabilities (details), can pause recording can continue recording, according to their own needs to set the code rate, recording channel number, sampling rate. The best part is that you can specify the audio format, valid values aAC/MP3. Unfortunately, wx.getRecorderManager() was only supported in 1.6.0. Of course, if you want to compatible low-end wechat users need to use wx.startRecord() to do compatibility processing.

Event Listening Details

// wxjs: Const recorderManager = wx.getRecorderManager() recorderManager.onstart (() => {// callback to start recording}) // Recording stop function recorderManager.onStop((res) => { const { tempFilePath } = res; Wx.uploadfile ({url: app.d.osturl +) uploadFile({url: app.d.osturl +'/Api/Index/wxupload'FilePath: tempFilePath, name:'viceo',
    success: function(res) { console.log(res); }})}) Page({// Press the button -- record startHandel:function () {
    console.log("Start"Recordermanager.start ({duration: 10000})}, // Release button endHandle:function () {
    console.log("The end"Recordermanager.stop ()}}) // WXML: <view bindTouchStart ='startHandel' bindtouchend='endHandle' class="tapview">
    <text>{{text}}</text>
</view>
Copy the code

Audio conversion

My backend uses THINKPHP, the open source framework of PHP. Of course, Node, Java, Python and other backend languages can be used according to your preferences and abilities. Want to do a good job of audio transcoding we have to use audio and video transcoding tools FFMPEG, AVCONV, they rely on GCC. Installation process we can baidu, or pay attention to my later article.

<? php namespace Api\Controller; use Think\Controller; Class IndexController extends Controller {// Audio upload codec publicfunction wxupload() {$upload_res=$_FILES['viceo'];
        $tempfile = file_get_contents($upload_res['tmp_name']);
        $wavname = substr($upload_res['name'],0,strripos($upload_res['name'].".")).".wav";
        $arr = explode(",".$tempfile);
        $path = 'Aduio/'.$upload_res['name'];
        
        if ($arr && !empty(strstr($tempfile.'base64'))){// the audio files recorded by wechat simulator can be stored directly.$path, base64_decode($arr[1]));
        	$data['path'] = $path;
        	apiResponse("success"."Transcoding successful!".$data);
        }else{// Phone recording file$path = 'Aduio/'.$upload_res['name'];
            $newpath = 'Aduio/'.$wavname;
        	file_put_contents($path.$tempfile);
            chmod($path, 0777);
            $exec1 = "avconv -i /home/wwwroot/mapxcx.kanziqiang.top/$path -vn -f wav /home/wwwroot/mapxcx.kanziqiang.top/$newpath";
            exec($exec1.$info.$status);
            chmod($newpath, 0777);
	        if ( !empty($tempfile) && $status == 0 ) {
	        	$data['path'] = $newpath;
	        	apiResponse("success"."Transcoding successful!".$data);
	        }
        }
        apiResponse("error"."Unknown error occurred!"); } //json data return method encapsulationfunction apiResponse($flag = 'error'.$message = ' '.$data = array()){
        $result = array('flag'= >$flag.'message'= >$message.'data'= >$data);
        print json_encode($result);exit; }}Copy the code

Call identification interface

Once we have the file ready, we can then transfer the Base64 encoded audio file through the API request. During this period, we should pay attention to strictly follow the specification transmission mentioned in the document, otherwise unknown results will be caused.

<? php namespace Api\Controller; use Think\Controller; class IndexController extends Controller { publicfunction _initialize(){} // Encapsulate the data request method publicfunction httpsRequest($url.$data = null,$xparam) {$curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, FALSE);
        curl_setopt($curl, CURLOPT_HEADER, 0);
        $Appid = ""; // Open platform appID$Appkey = ""; // Open platform Appkey$curtime = time();
        $CheckSum = md5($Appkey.$curtime.$xparam.$data);
        $headers = array(
        	'X-Appid:'.$Appid.'X-CurTime:'.$curtime.'X-CheckSum:'.$CheckSum.'X-Param:'.$xparam.'Content-Type:'.'application/x-www-form-urlencoded; charset=utf-8'
        	);
        curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
        if(! empty($data)){
            curl_setopt($curl, CURLOPT_POST, 1);
            curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
        }
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        $output = curl_exec($curl);
        curl_close($curl);
        return $output; } // Request interface data processing publicfunction getVoice($path) {$d = base64_encode($path);
        $url = "https://api.xfyun.cn/v1/aiui/v1/voice_semantic";
        $xparam = base64_encode( json_encode(array('scene'= >'main'.'userid'= >'user_0001'."auf"= >"16k"."aue"= >"raw"."spx_fsize"= >"60" )));
    	$data = "data=".$d;
    	$res = $this->httpsRequest($url.$data.$xparam);
    	if(! empty($res) && $res['code'] == 00000){
    	    apiResponse("success"."Identification successful!".$res);
    	}else{
    	    apiResponse("error"."Identification failed!"); }} // The data is returned to the wrapperfunction apiResponse($flag = 'error'.$message = ' '.$data = array()){
        $result = array('flag'= >$flag.'message'= >$message.'data'= >$data);
        print json_encode($result);exit; }}Copy the code

So we’re almost done here. The above code is after finishing, may not be able to meet your actual development needs. If you find something wrong, you are welcome to communicate on wechat (xiaoqiang0672).

Those who want to see the actual case can scan the code on wechat

About GCC Installation

About FFmpeg installation

About FFMPEG/AVCONV installation

Small program to achieve speech recognition after all how many holes to fill?

Those who want to see the actual case can scan the code on wechat

Related Posts

Use MongoDB Explain in Mongoose

Where do you think this is pointing?

Front mind process | summary in 2021