Wechat applet - Baidu AI speech recognition——

Wechat applet - Baidu AI speech recognition - (I)

1, Baidu AI

Having nothing to do one day, I saw on CSDN that a great God made a small demo with Baidu voice recognition + Turing robot that can chat (flirt with artificial mental retardation). I thought it was very interesting and wanted to realize it.

Baidu AI
Open Baidu AI's official website and see that there are many functions
Open the demo on the official applet, which has also done a lot of rich functions. (a flash in the eyes)
speech recognition

2, Start to achieve (start stepping on the pit)

Based on the rigorous principle, the interface must be adjusted through postman tool before development

1. Interface authentication

The routine is the same. One ACCESS KEY and one ACCESS SECRET request directly
https://openapi.baidu.com/oauth/2.0/token
(copy the official postman SDK directly, too lazy to read the documents)
After the request, you can get the token of the response (the valid period of this token is 2592000 seconds, 30 days)

In order to realize the automatic test interface, I added a small script in postman, and put the request to the token into the environment variable

pm.test("token",function(){
    var jsonData = pm.response.json();
    pm.environment.set("TOKEN",jsonData.refresh_token);
});

2. Speech recognition interface

After the token is requested, the voice recognition interface can be requested
Baidu AI voice recognition interface has two request modes:

  • The voice data is base64 encoded through json and put into the request parameters
  • Put it into the request body through RAW to make a request
    Personally, I think the first method is very convenient, but for a long voice, base64 coding will be very long and will be limited by the length of URLs in different browsers
    Therefore, the first method is abandoned and RAW method is adopted
    (to be honest, I haven't even heard of the word raw, but I've used the principle, which is to bring data through the request body)

    Here, I use 16k audio with a unified sampling rate. 8k audio has not been tested yet
    Set request header:
Content-Type: audio/pcm;rate=16000


Put the official test pcm format file into the body

Data requested

3, Implement demo (pit...)... 🕳. . . )

I want to implement a simple demo on the browser first
So don't talk!

<body>
  <input type="file" name="audio" id="audio-file">
  <button onclick="getToken()">GET TOKEN</button>
</body>


It's relatively simple, and the functions are concentrated
First upload the file, and then click the button to obtain the token and upload the audio file for identification
Because I want to read the binary content of the file, I first think of the FileReader object built in js, and there is also a method such as readAsBinaryString to read the binary content of the file and put it into the request body

 const ACCESS_KEY = "NSuFZs*********lpvdLvKb";  // API Key
 const ACCESS_SECRET = "iAa************************tG";  // Secret Key
let audio_file = document.getElementById("audio-file");
let file_data;
audio_file.onchange = (file) => {
    let reader = new FileReader();
    // Get binary content of file
    reader.readAsBinaryString(file.target.files[0]);
    reader.onload = (res) => {
      console.log(res.target.result);
      file_data = res.target.result;
 	}
 }
 
 // Request token
function getToken(){
  let xhr = new XMLHttpRequest();
  xhr.open("POST",
    "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id="+ACCESS_KEY+"&client_secret="+ACCESS_SECRET
  );
  xhr.send();
  xhr.addEventListener("readystatechange",(res)=>{
    if(xhr.readyState == 4){
      token = JSON.parse(res.target.response).refresh_token;
      soundReco();
    }
  });
}

// distinguish
function soundReco(){
  let xhr = new XMLHttpRequest();
  xhr.open("POST",
    "http://vop.baidu.com/server_api?cuid=155236postman&dev_pid=1537&token="+token
  );
  // xhr.setRequestHeader("Content-Type","application/json");
  xhr.setRequestHeader("Content-Type","audio/pcm;rate=16000");
  xhr.addEventListener("readystatechange",(res)=>{
    if(xhr.readyState === 4){
      console.log("***********************",JSON.parse(res.target.response));
    }
  });
  xhr.send(file_data);
}

However, the request will return speech quality error Error of


Obviously, the parameters and file contents have been passed?
Guess it may be a problem with plain text data
So the readAsArrayBuffer api is used instead

reader.readAsArrayBuffer(file.target.files[0]);

Sure enough, the data is requested!!!

good 👌 Anyway, the data is requested. How to display it next is not simple!

In the process of request, I will encounter the problem of browser cross domain. At present, I have solved it by setting the browser to cross domain
Refer to the big man's plan
Browser settings cross domain

Let's start with the development of this demo. Dog life matters~~~~
The next step is to change to a platform with better user experience. I want to use wechat applet to realize a speech recognition function...

Tags: api

Posted by bslevin on Tue, 19 Apr 2022 08:15:49 +0930