Wechat applet - Baidu AI speech recognition - (I)
1, Baidu AI
Having nothing to do one day, I saw on CSDN that a great God made a small demo with Baidu voice recognition + Turing robot that can chat (flirt with artificial mental retardation). I thought it was very interesting and wanted to realize it.
Baidu AI
Open Baidu AI's official website and see that there are many functions
Open the demo on the official applet, which has also done a lot of rich functions. (a flash in the eyes)
speech recognition
2, Start to achieve (start stepping on the pit)
Based on the rigorous principle, the interface must be adjusted through postman tool before development
1. Interface authentication
The routine is the same. One ACCESS KEY and one ACCESS SECRET request directly
https://openapi.baidu.com/oauth/2.0/token
(copy the official postman SDK directly, too lazy to read the documents)
After the request, you can get the token of the response (the valid period of this token is 2592000 seconds, 30 days)
In order to realize the automatic test interface, I added a small script in postman, and put the request to the token into the environment variable
pm.test("token",function(){ var jsonData = pm.response.json(); pm.environment.set("TOKEN",jsonData.refresh_token); });
2. Speech recognition interface
After the token is requested, the voice recognition interface can be requested
Baidu AI voice recognition interface has two request modes:
- The voice data is base64 encoded through json and put into the request parameters
- Put it into the request body through RAW to make a request
Personally, I think the first method is very convenient, but for a long voice, base64 coding will be very long and will be limited by the length of URLs in different browsers
Therefore, the first method is abandoned and RAW method is adopted
(to be honest, I haven't even heard of the word raw, but I've used the principle, which is to bring data through the request body)
Here, I use 16k audio with a unified sampling rate. 8k audio has not been tested yet
Set request header:
Content-Type: audio/pcm;rate=16000
Put the official test pcm format file into the body
Data requested
3, Implement demo (pit...)... 🕳. . . )
I want to implement a simple demo on the browser first
So don't talk!
<body> <input type="file" name="audio" id="audio-file"> <button onclick="getToken()">GET TOKEN</button> </body>
It's relatively simple, and the functions are concentrated
First upload the file, and then click the button to obtain the token and upload the audio file for identification
Because I want to read the binary content of the file, I first think of the FileReader object built in js, and there is also a method such as readAsBinaryString to read the binary content of the file and put it into the request body
const ACCESS_KEY = "NSuFZs*********lpvdLvKb"; // API Key const ACCESS_SECRET = "iAa************************tG"; // Secret Key let audio_file = document.getElementById("audio-file"); let file_data; audio_file.onchange = (file) => { let reader = new FileReader(); // Get binary content of file reader.readAsBinaryString(file.target.files[0]); reader.onload = (res) => { console.log(res.target.result); file_data = res.target.result; } } // Request token function getToken(){ let xhr = new XMLHttpRequest(); xhr.open("POST", "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id="+ACCESS_KEY+"&client_secret="+ACCESS_SECRET ); xhr.send(); xhr.addEventListener("readystatechange",(res)=>{ if(xhr.readyState == 4){ token = JSON.parse(res.target.response).refresh_token; soundReco(); } }); } // distinguish function soundReco(){ let xhr = new XMLHttpRequest(); xhr.open("POST", "http://vop.baidu.com/server_api?cuid=155236postman&dev_pid=1537&token="+token ); // xhr.setRequestHeader("Content-Type","application/json"); xhr.setRequestHeader("Content-Type","audio/pcm;rate=16000"); xhr.addEventListener("readystatechange",(res)=>{ if(xhr.readyState === 4){ console.log("***********************",JSON.parse(res.target.response)); } }); xhr.send(file_data); }
However, the request will return speech quality error Error of
Obviously, the parameters and file contents have been passed?
Guess it may be a problem with plain text data
So the readAsArrayBuffer api is used instead
reader.readAsArrayBuffer(file.target.files[0]);
Sure enough, the data is requested!!!
good 👌 Anyway, the data is requested. How to display it next is not simple!
In the process of request, I will encounter the problem of browser cross domain. At present, I have solved it by setting the browser to cross domain
Refer to the big man's plan
Browser settings cross domain
Let's start with the development of this demo. Dog life matters~~~~
The next step is to change to a platform with better user experience. I want to use wechat applet to realize a speech recognition function...