Surprised! Python can actually read stories

Recently, I was busy with my work, and I just wanted to close my eyes and rest at home. I didn’t want to see the screen for a minute. However, I wanted to catch up with the novel I had read before, so the demand came — I needed a robot to tell my story!

In the browser or reader App also have actually read function, but is more rigid, always fascinating plot to read into the laundry list, points minutes for people abandoned pit, so I consider myself using crawler download updates section regularly, and then synthesize text stored in the audio file, such not only can choose one of speech synthesis tool to deal with the words, And the preserved audio can be listened to repeatedly, killing two birds with one stone.

Text integration is easy, but how do you quickly convert it to audio? Do you have to train your model to solve it? After all, my knowledge of algorithm is very simple, and the hardware conditions are not allowed. In line with the principle of “whatever I can use”, I decided to use the products of open platform on the market first to solve the problem. After comparison, I found that Youdao Zhiyun’s speech synthesis is not bad (you can experience it here), so I decided to use The speech synthesis API of Youdao Zhiyun for development.

Sneak peek at the results:

I took two paragraphs of Mr. Zhu Ziqing’s “Moonlight over the Lotus Pond” as experimental materials and developed a simple demo, which went through the logic from loading text to generating audio files. Here I will introduce the development process in detail.

Text that requires speech synthesis:

Synthesis results (paragraph 1) :

Synthesis results (paragraph 2) :

Unfortunately, you can’t upload mp3 music files here

Preparations for calling the API

First of all, you need to create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article.

Detailed introduction of the development process

The following describes the specific code development process.

Firstly, the API input and output specifications of Youdao Wisdom Cloud are analyzed according to the documents. The call of the voice synthesis API is very simple. The API uses HTTPS to communicate. The parameters required are as follows:

The field name	type	meaning	mandatory	note
q	text	The text string of the audio file to be synthesized	True	How do you do
langType	text	The language type of the synthesized text	True	Support language
appKey	text	Application ID	True	Can be inApplication managementTo view
salt	text	UUID	True	UUID
sign	text	sign	True	MD5(App ID+ Q +salt + App key)
voice	text	The choice of pronunciation, 0 is female voice, 1 is male voice, the default is female voice	false	0
format	text	Target audio format, mp3 support	false	mp3
speed	text	Synthetic audio speed	false	For example, “1” is normal speed
volume	text	The volume of synthetic audio	false	Normal “1.00”, maximum “5.00”, minimum “0.50”

By simply organizing your own language (UTF-8 encoded text), supporting necessary parameters such as signatures, and telling the API the desired audio characteristics, you get a satisfactory synthesis of audio.

In the interface output, if the composition is successful, the binary voice file is normally returned. The specific header information is content-type: audio/mp3. If the composition error occurs, the JSON result is returned. Application /json to determine the running status.

The Demo development:

The demo was developed using PYTHon3, including maindow. Py, synthesize. py, synthesistool.

Interface part:

Part of the interface code is as follows, relatively simple.

root=tk.Tk() root.title("youdao speech synthesis test") frm = tk.Frame(root) frm.grid(padx='50', Pady ='50') btn_get_file = tk.Button(FRM, text=' select file to be composited ', command=get_files) btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20') text1 = tk.Text(frm, width='40', height='10') text1.grid(row=0, Column =1) btn_sure=tk.Button(FRM,text=" synthesise ",command=synthesis_files) btn_sure.grid(row=1,column=1)Copy the code

The btn_sure binding event synthesis_files() collects all the text files, starts the synthesis, and prints the result:

def synthesis_files(): if syn_m.file_paths: Message =syn_m.get_synthesis_result() tk.messagebox.showinfo(" prompt ", message) os.system('start' + '.\ result') else: Tk.messagebox.showinfo (" prompt "," no file ")Copy the code

synthesis.py

Here is mainly with the interface to achieve some text reading and request interface processing returned value logic. First define a Synthesis_model

class Synthesis_model():
    def __init__(self,file_paths,result_root_path,syn_type):
        self.file_paths=file_paths				
        self.result_root_path=result_root_path  
        self.syn_type=syn_type                  
Copy the code

The get_synthesis_result() method reads files in batches, calls the synthesized method, and processes the returned information:

def get_synthesis_result(self): syn_result="" for file_path in self.file_paths: file_name=os.path.basename(file_path).split('.')[0] file_content=open(file_path,encoding='utf-8').read() result=self.synthesis_use_netease(file_name,file_content) if result=="1": syn_result=syn_result+file_path+" ok ! \n" else: syn_result=syn_result+file_path+result return syn_resultCopy the code

The method synthesis_use_netease() is defined separately to specifically implement the API calling method, which increases the expansibility of demo and realizes a loose coupling form of pluggable synthetic modules:

def synthesis_use_netease(self,file_name,text):
    result=connect(text,'zh-CHS')
    print(result)
    if result.headers['Content-Type']=="audio/mp3":
        millis = int(round(time.time() * 1000))
        filePath = "./result/" + file_name+"-"+str(millis) + ".mp3"
        fo = open(filePath, 'wb')
        fo.write(result.content)
        fo.close()
        return "1"
    else:
        return "error:"+result.content
Copy the code

synthesistool.py
1. In synthesistool. Py, there are some methods directly related to the request of Youdao Wisdom Cloud API, the most core is connect() method, which integrates all parameters required by THE API, calls the method do_request() to execute the request, and returns the PROCESSING result of THE API.
```
def connect(text,lang_type):
    q = text

    data = {}
    data['langType'] = lang_type
    salt = str(uuid.uuid1())
    signStr = APP_KEY + q + salt + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['q'] = q
    data['salt'] = salt
    data['sign'] = sign

    response = do_request(data)
    return response
Copy the code
```
If you need to experience it, please download my code or go to the official website to try it: P. Project address: github.com/LemonQH/Spe…

Special note: The synthesistool module APP_KEY and APP_SECRET should be replaced by the synthesistool module APP_KEY and APP_SECRET./result You will need to manually create this directory under the project path. Or change it to wherever you want

conclusion

The above is my development process. The voice synthesis API of Youdao Wisdom Cloud has clear documents and no pits in the whole call process. The development experience and synthesis effect are comfortable.

I have a story, I give it to the robot to tell, eyes closed not boring, is really a beautiful thing!

Welcome to pay attention to me, together to fulfill my promise before, even more within a month, to finish several articles.

The serial number	Estimated completion time	Develop dome name and features & publish article content	Is it finished	The article links
1	On September 3	Text translation, single text translation, batch translation demo.	Has been completed	CSDN:Let me directWechat Official Account:Let me direct
2	On September 11	Ocr-demo, complete batch upload identification; In a demo, you can select different types of OCR recognition “include handwriting/print/ID card/form/whole topic/business card), and then call the platform capabilities, specific implementation steps, etc.	Has been completed	CSDN:Let me directWechat Official Account:
3	On October 27	Voice recognition Demo, demo upload a video, and capture the video short voice recognition – Demo audio for short voice recognition		CSDN:Let me directWechat Official Account:
4	On September 17	Intelligent voice evaluation – Demo		CSDN: wechat Official Number:
5	On September 24	Essay correction – Demo		CSDN: wechat Official Number:
6	On September 30	Voice synthesis – Demo		CSDN: wechat Official Number:
7	On October 15	Single question pat-demo		CSDN: wechat Official Number:
8	On October 20	Picture translation – Demo		CSDN: wechat Official Number:

Follow my wechat public account and push it to you for the first time:

Reply menu, more good gift, surprise is waiting for you.

Surprised! Python can actually read stories

Sneak peek at the results:

Preparations for calling the API

Detailed introduction of the development process

The Demo development:

Interface part:

synthesis.py

synthesistool.py

conclusion

Related Posts

Actual HTTP/2 protocol packet capture

If two objects have the same hashCode(), equals() must also be true.

VSCode connects to the Docker container on the remote server