I have a story. Let the robot read it

Recently, I was busy with my work, and I just wanted to close my eyes and rest at home. I didn’t want to see the screen for a minute. However, I wanted to catch up with the novel I had read before, so the demand came — I needed a robot to tell my story!

In the browser or reader App also have actually read function, but is more rigid, always fascinating plot to read into the laundry list, points minutes for people abandoned pit, so I consider myself using crawler download updates section regularly, and then synthesize text stored in the audio file, such not only can choose one of speech synthesis tool to deal with the words, And the preserved audio can be listened to repeatedly, killing two birds with one stone.

Text integration is easy, but how do you quickly convert it to audio? Do you have to train your model to solve it? After all, my knowledge of algorithm is very simple, and the hardware conditions are not allowed. In line with the principle of “whatever I can use”, I decided to use the products of open platform on the market first to solve the problem. After comparison, I found that Youdao Zhiyun’s speech synthesis is not bad (you can experience it here), so I decided to use The speech synthesis API of Youdao Zhiyun for development.

Sneak peek at the results:

I took two paragraphs of Mr. Zhu Ziqing’s “Moonlight over the Lotus Pond” as experimental materials and developed a simple demo, which went through the logic from loading text to generating audio files. Here I will introduce the development process in detail.

Text that requires speech synthesis:

Synthesis results (paragraph 1) :

Synthesis results (paragraph 2) :

Unfortunately, you can’t upload mp3 music files here

Preparations for calling the API

First of all, you need to create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article.

Detailed introduction of the development process

The following describes the specific code development process.

Firstly, the API input and output specifications of Youdao Wisdom Cloud are analyzed according to the documents. The call of the voice synthesis API is very simple. The API uses HTTPS to communicate. The parameters required are as follows:

The field name type meaning mandatory note
q text The text string of the audio file to be synthesized True How do you do
langType text The language type of the synthesized text True Support [language] (HTTP: / / https://ai.youdao.com/DOCSIRMA/html/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90TTS/API%E6%96%87%E6%A1%A3/%E8%AF%AD%E9%9F %B3%E5%90%88%E6%88%90%E6%9C%8D%E5%8A%A1/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90%E6%9C%8D%E5%8A%A1-API%E6%96%87%E6%A1%A3.htm l#section-9)
appKey text Application ID True Can be applied in [management] (https://ai.youdao.com/appmgr.s) to view
salt text UUID True UUID
sign text sign True MD5(App ID+ Q +salt+ App key)
voice text The choice of pronunciation, 0 is female voice, 1 is male voice, the default is female voice false 0
format text Target audio format, mp3 support false mp3
speed text Synthetic audio speed false For example, “1” is normal speed
volume text The volume of synthetic audio false Normal “1.00”, maximum “5.00”, minimum “0.50”

By simply organizing your own language (UTF-8 encoded text), supporting necessary parameters such as signatures, and telling the API the desired audio characteristics, you get a satisfactory synthesis of audio.

In the interface output, if the composition is successful, the binary voice file is normally returned. The specific header information is content-type: audio/mp3. If the composition error occurs, the JSON result is returned. Application /json to determine the running status.

The Demo development:

The demo was developed using PYTHon3, including maindow. Py, synthesize. py, synthesistool.

  1. Interface part:

    Part of the interface code is as follows, relatively simple.

    root=tk.Tk() root.title("youdao speech synthesis test") frm = tk.Frame(root) frm.grid(padx='50', Btn_get_file = tk.Button(FRM, text=' select file to be composited ', command=get_files) btn_get_file. Grid (row=0, column=0, Text1 = tk.Text(FRM, width='40', height='10') text1. Grid (row=0, Column =1) # btn_sure=tk.Button(FRM,text=" synthesise ",command=synthesis_files) btn_sure.grid(row=1,column=1) 12345678910111213Copy the code

    The btn_sure binding event synthesis_files() collects all the text files, starts the synthesis, and prints the result:

    def synthesis_files(): if syn_m.file_paths: Message =syn_m.get_synthesis_result() tk.messagebox.showinfo(" prompt ", message) os.system('start' + '.\ result') else: Tk.messagebox.showinfo (" prompt "," no file ") 1234567Copy the code
  2. synthesis.py

    Here is mainly with the interface to achieve some text reading and request interface processing returned value logic. First define a Synthesis_model

    class Synthesis_model(): def __init__(self,file_paths,result_root_path,syn_type): Self.file_paths =file_paths # Path of the file to be synthesized self.result_root_path=result_root_path # Result path self.syn_type=syn_type # Synthesis type 12345Copy the code

    The get_synthesis_result() method reads files in batches, calls the synthesized method, and processes the returned information:

    def get_synthesis_result(self): syn_result="" for file_path in self.file_paths: Basename (file_path).split('.')[0] file_content=open(file_path,encoding=' utF-8 ').read( If result=="1": if result=="1": syn_result=syn_result+file_path+" ok ! \n" else: syn_result=syn_result+file_path+result return syn_result 1234567891011121314Copy the code

    The method synthesis_use_netease() is defined separately to specifically implement the API calling method, which increases the expansibility of demo and realizes a loose coupling form of pluggable synthetic modules:

    def synthesis_use_netease(self,file_name,text):
        result=connect(text,'zh-CHS')
        print(result)
        if result.headers['Content-Type']=="audio/mp3":
            millis = int(round(time.time() * 1000))
            filePath = "./result/" + file_name+"-"+str(millis) + ".mp3"
            fo = open(filePath, 'wb')
            fo.write(result.content)
            fo.close()
            return "1"
        else:
            return "error:"+result.content
    123456789101112
    Copy the code
  3. synthesistool.py
    1. In synthesistool. Py, there are some methods directly related to the request of Youdao Wisdom Cloud API, the most core is connect() method, which integrates all parameters required by THE API, calls the method do_request() to execute the request, and returns the PROCESSING result of THE API.

      def connect(text,lang_type):
          q = text
      
          data = {}
          data['langType'] = lang_type
          salt = str(uuid.uuid1())
          signStr = APP_KEY + q + salt + APP_SECRET
          sign = encrypt(signStr)
          data['appKey'] = APP_KEY
          data['q'] = q
          data['salt'] = salt
          data['sign'] = sign
      
          response = do_request(data)
          return response
      123456789101112131415
      Copy the code

    If you need to experience it, please download my code or go to the official website to try it: P. Project address: github.com/LemonQH/Spe…

    Special note: The synthesistool module APP_KEY and APP_SECRET should be replaced by the synthesistool module APP_KEY and APP_SECRET./result You will need to manually create this directory under the project path. Or change it to wherever you want

conclusion

The above is my development process. The voice synthesis API of Youdao Wisdom Cloud has clear documents and no pits in the whole call process. The development experience and synthesis effect are comfortable.

I have a story, I give it to the robot to tell, eyes closed not boring, is really a beautiful thing!

If you are interested in Python, you can learn with me: click the link to join the Python Chat group.