Surprised! Python can actually read stories

Recently, I was busy with my work, and I just wanted to close my eyes and rest at home. I didn’t want to see the screen for a minute. However, I wanted to catch up with the novel I had read before, so the demand came — I needed a robot to tell my story!

In the browser or reader App also have actually read function, but is more rigid, always fascinating plot to read into the laundry list, points minutes for people abandoned pit, so I consider myself using crawler download updates section regularly, and then synthesize text stored in the audio file, such not only can choose one of speech synthesis tool to deal with the words, And the preserved audio can be listened to repeatedly, killing two birds with one stone.

Text integration is easy, but how do you quickly convert it to audio? Do you have to train your model to solve it? After all, my knowledge of algorithm is very simple, and the hardware conditions are not allowed. In line with the principle of “whatever I can use”, I decided to use the products of open platform on the market first to solve the problem. After comparison, I found that Youdao Zhiyun’s speech synthesis is not bad (you can experience it here), so I decided to use The speech synthesis API of Youdao Zhiyun for development.

Sneak peek at the results:

I took two paragraphs of Mr. Zhu Ziqing’s “Moonlight over the Lotus Pond” as experimental materials and developed a simple demo, which went through the logic from loading text to generating audio files. Here I will introduce the development process in detail.

Text that requires speech synthesis:

Synthesis results (paragraph 1) :

Synthesis results (paragraph 2) :

Unfortunately, you can’t upload mp3 music files here

Preparations for calling the API

First of all, you need to create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article.

Detailed introduction of the development process

The following describes the specific code development process.

Firstly, the API input and output specifications of Youdao Wisdom Cloud are analyzed according to the documents. The call of the voice synthesis API is very simple. The API uses HTTPS to communicate. The parameters required are as follows:

The field name type meaning mandatory note
q text The text string of the audio file to be synthesized True How do you do
langType text The language type of the synthesized text True Support language
appKey text Application ID True Can be inApplication managementTo view
salt text UUID True UUID
sign text sign True MD5(App ID+ Q +salt + App key)
voice text The choice of pronunciation, 0 is female voice, 1 is male voice, the default is female voice false 0
format text Target audio format, mp3 support false mp3
speed text Synthetic audio speed false For example, “1” is normal speed
volume text The volume of synthetic audio false Normal “1.00”, maximum “5.00”, minimum “0.50”

By simply organizing your own language (UTF-8 encoded text), supporting necessary parameters such as signatures, and telling the API the desired audio characteristics, you get a satisfactory synthesis of audio.

In the interface output, if the composition is successful, the binary voice file is normally returned. The specific header information is content-type: audio/mp3. If the composition error occurs, the JSON result is returned. Application /json to determine the running status.

The Demo development:

The demo was developed using PYTHon3, including maindow. Py, synthesize. py, synthesistool.

  1. Interface part:

    Part of the interface code is as follows, relatively simple.

    root=tk.Tk() root.title("youdao speech synthesis test") frm = tk.Frame(root) frm.grid(padx='50', Pady ='50') btn_get_file = tk.Button(FRM, text=' select file to be composited ', command=get_files) btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20') text1 = tk.Text(frm, width='40', height='10') text1.grid(row=0, Column =1) btn_sure=tk.Button(FRM,text=" synthesise ",command=synthesis_files) btn_sure.grid(row=1,column=1)Copy the code

    The btn_sure binding event synthesis_files() collects all the text files, starts the synthesis, and prints the result:

    def synthesis_files(): if syn_m.file_paths: Message =syn_m.get_synthesis_result() tk.messagebox.showinfo(" prompt ", message) os.system('start' + '.\ result') else: Tk.messagebox.showinfo (" prompt "," no file ")Copy the code
  2. synthesis.py

    Here is mainly with the interface to achieve some text reading and request interface processing returned value logic. First define a Synthesis_model

    class Synthesis_model():
        def __init__(self,file_paths,result_root_path,syn_type):
            self.file_paths=file_paths				
            self.result_root_path=result_root_path  
            self.syn_type=syn_type                  
    Copy the code

    The get_synthesis_result() method reads files in batches, calls the synthesized method, and processes the returned information:

    def get_synthesis_result(self): syn_result="" for file_path in self.file_paths: file_name=os.path.basename(file_path).split('.')[0] file_content=open(file_path,encoding='utf-8').read() result=self.synthesis_use_netease(file_name,file_content) if result=="1": syn_result=syn_result+file_path+" ok ! \n" else: syn_result=syn_result+file_path+result return syn_resultCopy the code

    The method synthesis_use_netease() is defined separately to specifically implement the API calling method, which increases the expansibility of demo and realizes a loose coupling form of pluggable synthetic modules:

    def synthesis_use_netease(self,file_name,text):
        result=connect(text,'zh-CHS')
        print(result)
        if result.headers['Content-Type']=="audio/mp3":
            millis = int(round(time.time() * 1000))
            filePath = "./result/" + file_name+"-"+str(millis) + ".mp3"
            fo = open(filePath, 'wb')
            fo.write(result.content)
            fo.close()
            return "1"
        else:
            return "error:"+result.content
    Copy the code
  3. synthesistool.py
    1. In synthesistool. Py, there are some methods directly related to the request of Youdao Wisdom Cloud API, the most core is connect() method, which integrates all parameters required by THE API, calls the method do_request() to execute the request, and returns the PROCESSING result of THE API.

      def connect(text,lang_type):
          q = text
      
          data = {}
          data['langType'] = lang_type
          salt = str(uuid.uuid1())
          signStr = APP_KEY + q + salt + APP_SECRET
          sign = encrypt(signStr)
          data['appKey'] = APP_KEY
          data['q'] = q
          data['salt'] = salt
          data['sign'] = sign
      
          response = do_request(data)
          return response
      Copy the code

    If you need to experience it, please download my code or go to the official website to try it: P. Project address: github.com/LemonQH/Spe…

    Special note: The synthesistool module APP_KEY and APP_SECRET should be replaced by the synthesistool module APP_KEY and APP_SECRET./result You will need to manually create this directory under the project path. Or change it to wherever you want

conclusion

The above is my development process. The voice synthesis API of Youdao Wisdom Cloud has clear documents and no pits in the whole call process. The development experience and synthesis effect are comfortable.

I have a story, I give it to the robot to tell, eyes closed not boring, is really a beautiful thing!

Welcome to pay attention to me, together to fulfill my promise before, even more within a month, to finish several articles.

The serial number Estimated completion time Develop dome name and features & publish article content Is it finished The article links
1 On September 3 Text translation, single text translation, batch translation demo. Has been completed CSDN:Let me directWechat Official Account:Let me direct
2 On September 11 Ocr-demo, complete batch upload identification; In a demo, you can select different types of OCR recognition “include handwriting/print/ID card/form/whole topic/business card), and then call the platform capabilities, specific implementation steps, etc. Has been completed CSDN:Let me directWechat Official Account:
3 On October 27 Voice recognition Demo, demo upload a video, and capture the video short voice recognition – Demo audio for short voice recognition CSDN:Let me directWechat Official Account:
4 On September 17 Intelligent voice evaluation – Demo CSDN: wechat Official Number:
5 On September 24 Essay correction – Demo CSDN: wechat Official Number:
6 On September 30 Voice synthesis – Demo CSDN: wechat Official Number:
7 On October 15 Single question pat-demo CSDN: wechat Official Number:
8 On October 20 Picture translation – Demo CSDN: wechat Official Number:

Follow my wechat public account and push it to you for the first time:

Reply menu, more good gift, surprise is waiting for you.