I have a story. Let the robot read it

Recently, I was busy with my work, and I just wanted to close my eyes and rest at home. I didn’t want to see the screen for a minute. However, I wanted to catch up with the novel I had read before, so the demand came — I needed a robot to tell my story!

In the browser or reader App also have actually read function, but is more rigid, always fascinating plot to read into the laundry list, points minutes for people abandoned pit, so I consider myself using crawler download updates section regularly, and then synthesize text stored in the audio file, such not only can choose one of speech synthesis tool to deal with the words, And the preserved audio can be listened to repeatedly, killing two birds with one stone.

Text integration is easy, but how do you quickly convert it to audio? Do you have to train your model to solve it? After all, my knowledge of algorithm is very simple, and the hardware conditions are not allowed. In line with the principle of “whatever I can use”, I decided to use the products of open platform on the market first to solve the problem. After comparison, I found that Youdao Zhiyun’s speech synthesis is not bad (you can experience it here), so I decided to use The speech synthesis API of Youdao Zhiyun for development.

Sneak peek at the results:

I took two paragraphs of Mr. Zhu Ziqing’s “Moonlight over the Lotus Pond” as experimental materials and developed a simple demo, which went through the logic from loading text to generating audio files. Here I will introduce the development process in detail.

Text that requires speech synthesis:

Synthesis results (paragraph 1) :

Synthesis results (paragraph 2) :

Unfortunately, you can’t upload mp3 music files here

Preparations for calling the API

First of all, you need to create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article.

Detailed introduction of the development process

The following describes the specific code development process.

Firstly, the API input and output specifications of Youdao Wisdom Cloud are analyzed according to the documents. The call of the voice synthesis API is very simple. The API uses HTTPS to communicate. The parameters required are as follows:

The field name	type	meaning	mandatory	note
q	text	The text string of the audio file to be synthesized	True	How do you do
langType	text	The language type of the synthesized text	True	Support [language] (HTTP: / / https://ai.youdao.com/DOCSIRMA/html/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90TTS/API%E6%96%87%E6%A1%A3/%E8%AF%AD%E9%9F %B3%E5%90%88%E6%88%90%E6%9C%8D%E5%8A%A1/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90%E6%9C%8D%E5%8A%A1-API%E6%96%87%E6%A1%A3.htm l#section-9)
appKey	text	Application ID	True	Can be applied in [management] (https://ai.youdao.com/appmgr.s) to view
salt	text	UUID	True	UUID
sign	text	sign	True	MD5(App ID+ Q +salt+ App key)
voice	text	The choice of pronunciation, 0 is female voice, 1 is male voice, the default is female voice	false	0
format	text	Target audio format, mp3 support	false	mp3
speed	text	Synthetic audio speed	false	For example, “1” is normal speed
volume	text	The volume of synthetic audio	false	Normal “1.00”, maximum “5.00”, minimum “0.50”

By simply organizing your own language (UTF-8 encoded text), supporting necessary parameters such as signatures, and telling the API the desired audio characteristics, you get a satisfactory synthesis of audio.

In the interface output, if the composition is successful, the binary voice file is normally returned. The specific header information is content-type: audio/mp3. If the composition error occurs, the JSON result is returned. Application /json to determine the running status.

The Demo development:

The demo was developed using PYTHon3, including maindow. Py, synthesize. py, synthesistool.

Interface part:

Part of the interface code is as follows, relatively simple.

root=tk.Tk() root.title("youdao speech synthesis test") frm = tk.Frame(root) frm.grid(padx='50', Btn_get_file = tk.Button(FRM, text=' select file to be composited ', command=get_files) btn_get_file. Grid (row=0, column=0, Text1 = tk.Text(FRM, width='40', height='10') text1. Grid (row=0, Column =1) # btn_sure=tk.Button(FRM,text=" synthesise ",command=synthesis_files) btn_sure.grid(row=1,column=1) 12345678910111213Copy the code

The btn_sure binding event synthesis_files() collects all the text files, starts the synthesis, and prints the result:

def synthesis_files(): if syn_m.file_paths: Message =syn_m.get_synthesis_result() tk.messagebox.showinfo(" prompt ", message) os.system('start' + '.\ result') else: Tk.messagebox.showinfo (" prompt "," no file ") 1234567Copy the code

synthesis.py

Here is mainly with the interface to achieve some text reading and request interface processing returned value logic. First define a Synthesis_model

class Synthesis_model(): def __init__(self,file_paths,result_root_path,syn_type): Self.file_paths =file_paths # Path of the file to be synthesized self.result_root_path=result_root_path # Result path self.syn_type=syn_type # Synthesis type 12345Copy the code

The get_synthesis_result() method reads files in batches, calls the synthesized method, and processes the returned information:

def get_synthesis_result(self): syn_result="" for file_path in self.file_paths: Basename (file_path).split('.')[0] file_content=open(file_path,encoding=' utF-8 ').read( If result=="1": if result=="1": syn_result=syn_result+file_path+" ok ! \n" else: syn_result=syn_result+file_path+result return syn_result 1234567891011121314Copy the code

The method synthesis_use_netease() is defined separately to specifically implement the API calling method, which increases the expansibility of demo and realizes a loose coupling form of pluggable synthetic modules:

def synthesis_use_netease(self,file_name,text):
    result=connect(text,'zh-CHS')
    print(result)
    if result.headers['Content-Type']=="audio/mp3":
        millis = int(round(time.time() * 1000))
        filePath = "./result/" + file_name+"-"+str(millis) + ".mp3"
        fo = open(filePath, 'wb')
        fo.write(result.content)
        fo.close()
        return "1"
    else:
        return "error:"+result.content
123456789101112
Copy the code

synthesistool.py

In synthesistool. Py, there are some methods directly related to the request of Youdao Wisdom Cloud API, the most core is connect() method, which integrates all parameters required by THE API, calls the method do_request() to execute the request, and returns the PROCESSING result of THE API.

def connect(text,lang_type):
    q = text

    data = {}
    data['langType'] = lang_type
    salt = str(uuid.uuid1())
    signStr = APP_KEY + q + salt + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['q'] = q
    data['salt'] = salt
    data['sign'] = sign

    response = do_request(data)
    return response
123456789101112131415
Copy the code

If you need to experience it, please download my code or go to the official website to try it: P. Project address: github.com/LemonQH/Spe…

Special note: The synthesistool module APP_KEY and APP_SECRET should be replaced by the synthesistool module APP_KEY and APP_SECRET./result You will need to manually create this directory under the project path. Or change it to wherever you want

conclusion

The above is my development process. The voice synthesis API of Youdao Wisdom Cloud has clear documents and no pits in the whole call process. The development experience and synthesis effect are comfortable.

I have a story, I give it to the robot to tell, eyes closed not boring, is really a beautiful thing!

If you are interested in Python, you can learn with me: click the link to join the Python Chat group.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python can read stories: I have stories, let robots read them

I have a story. Let the robot read it

Sneak peek at the results:

Preparations for calling the API

Detailed introduction of the development process

The Demo development:

Interface part:

synthesis.py

synthesistool.py

conclusion

Python can read stories: I have stories, let robots read them

I have a story. Let the robot read it

Sneak peek at the results:

Preparations for calling the API

Detailed introduction of the development process

The Demo development:

Interface part:

synthesis.py

synthesistool.py

conclusion

Related Posts

145. Binary Tree Traversal (Python)

Introduction to Data Structures (Introduction)

Front-end people should know about sites and tools