Use Python to solve your girlfriend's need to watch movies without subtitles

Well, her girlfriend suddenly turned to a movie she liked at night, but there were no subtitles, which upset her very much. I was quick to see to my girlfriend’s needs. Use Python to create software that can recognize speech and then translate text.

First, the story

Well, her girlfriend suddenly turned to a movie she liked at night, but there were no subtitles, which upset her very much.

I was quick to see to my girlfriend’s needs.

I came up with the idea of using Python to create software that could recognize speech and then translate text.

The following picture is the effect of this article. Ha ha, isn’t it good? It looks great.

If you are interested, please give me a thumbs up and bring me more fun and interesting demos and implementation tutorials.

A short excerpt from the first episode of Legend of Zhen Huan:

In fact, it looks like this:

Recently drama shortage, accidentally turned out once downloaded TV series aftertaste, classic is classic, whether it is the plot or lines, are so charming, huh? Wait, lines, lines… As an IT worker, I had an Epiphany — with all the advances in speech recognition technology, could there be a way to save some of my best lines? Maybe I can also be a wild subtitle: P, it seems that I can also easily translate some difficult lines on this basis!

After a bit of thinking, I came up with an idea — a program to extract audio from a video and then request an open speech recognition API to help me convert speech into text. Given the happy experience of calling Youdao Wisdom Cloud before, I decided to use it again for my own use and quickly made this demo (please ignore the ugly interface layout, if it works…). .

Welcome to pay attention to me, together to fulfill my promise before, even more within a month, to finish several articles.

The serial number	Estimated completion time	Develop dome name and features & publish article content	Is it finished	The article links
1	On September 3	Text translation, single text translation, batch translation demo.	Has been completed	CSDN:Let me directWechat Official Account:Let me direct
2	On September 11	Ocr-demo, complete batch upload identification; In a demo, you can select different types of OCR recognition “include handwriting/print/ID card/form/whole topic/business card), and then call the platform capabilities, specific implementation steps, etc.	Has been completed	CSDN:Let me directWechat Official Account:
3	On October 27	Voice recognition Demo, demo upload a video, and capture the video short voice recognition – Demo audio for short voice recognition		CSDN:Let me directWechat Official Account:
4	On September 17	Intelligent voice evaluation – Demo		CSDN: wechat Official Number:
5	On September 24	Essay correction – Demo		CSDN: wechat Official Number:
6	On September 30	Voice synthesis – Demo		CSDN: wechat Official Number:
7	On October 15	Single question pat-demo		CSDN: wechat Official Number:
8	On October 20	Picture translation – Demo		CSDN: wechat Official Number:

Two, the preparation work before development

First of all, you need to create instances, create applications, bind applications and instances on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application used to invoke the interface. The process of personal registration and application creation is detailed in the article less than 100 lines of code to get Python to do OCR identification id card, text and other fonts

Iii. Detailed introduction of the development process

The following describes the specific code development process.

(I) Interface specification description

Firstly, the API input and output specifications of Youdao Wisdom Cloud are analyzed. According to the documentation, the call interface format is as follows:

Youdao Speech Recognition API HTTPS address:

https://openapi.youdao.com/asrapi
Copy the code

Interface call parameters:

The field name	type	meaning	mandatory	note
q	text	The Base64 encoded string of the audio file to translate	True	Must be Base64 encoded
langType	text	The source language	True	Support language
appKey	text	Application ID	True	Can be inApplication managementTo view
salt	text	UUID	True	UUID
curtime	text	Time stamp (seconds)	true	Number of seconds
sign	text	The signature is generated using MD5 (application ID+q+salt+curTime + key)	True	Apply ID+ Q +salt+curTime + MD5 value of the key
signType	text	The signature version	True	v2
format	text	Voice file format, WAV	true	wav
rate	text	Sampling rate, recommended adoption rate of 16,000	true	16000
channel	text	Number of channels. Only mono channels are supported. Please enter a fixed value of 1	true	1
type	text	Upload type. Only base64 upload is supported. Set this parameter to a fixed value of 1	true	1

Q is the base64 encoded audio file to be identified. “The uploaded file length cannot exceed 120s, and the file size cannot exceed 10M”, which needs to be noted.

The API returns something simple:

field	meaning
errorCode	Identify the result error code, it must exist. Details AttendError code list
result	Recognition result, recognition success must exist

(II) Project development

This project is developed using PYTHon3, including maindow. Py, videoProcess. Py, srBynetease.

In the interface part, python tkinter library is used to provide video file selection, time input box and confirm button.

Videoprocess. py: to achieve the function of extracting the audio and processing the information returned by the API in the specified time interval of the video;

Srbynetease. Py: Sends processed audio to the short speech recognition API and returns the result.

1. Realization of interface part

Part of the interface code is as follows, relatively simple.

root=tk.Tk() root.title("netease youdao sr test") frm = tk.Frame(root) frm.grid(padx='50', Pady ='50') btn_get_file = tk.Button(FRM, text=' select video to be seen ', command=get_file) btn_get_file. Grid (row=0, column=0, padx='10', pady='20') path_text = tk.Entry(frm, width='40') path_text.grid(row=0, Column =1) start_label=tk.Label(FRM,text=' ') start_label.grid(row=1,column=0) start_input=tk.Entry(frm) start_input.grid(row=1,column=1) End_label =tk.Label(FRM,text=' ') end_label.grid(row=2,column=0) end_input=tk.Entry(frm) end_input.grid(row=2,column=1) sure_btn=tk.Button(frm, Grid (row=3,column=0, columnSPAN =3) root.mainloop() text=' start identification ', command=start_sr) sure_btn.grid(row=3,column=0, columnSPAN =3) root.mainloop()Copy the code

The sure_bTN binding event start_sr() does simple exception handling and prints the final recognition through a popover:

Def start_sr(): print(video.video_full_path) if len(path_text.get())==0: sr_result = 'file not selected' else: video.start_time = int(start_input.get()) video.end_time = int(end_input.get()) sr_result=video.do_sr() Tk.messagebox.showinfo (" result ", sr_result)Copy the code

2. Developed audio and video functions

(1) In VideoProcess. py, I use The Moviepy library of Python to process the video, capture the video according to the specified start and end time, extract the audio, and convert it into base64 encoding form according to API requirements:

def get_audio_base64(self):
    video_clip=VideoFileClip(self.video_full_path).subclip(self.start_time,self.end_time)
    audio=video_clip.audio
    result_path=self.video_full_path.split('.')[0]+'_clip.mp3'
    audio.write_audiofile(result_path)
    audio_base64 = base64.b64encode(open(result_path,'rb').read()).decode('utf-8')
    return audio_base64
Copy the code

(2) The processed audio file encoding is transmitted to the encapsulated Youdao Wisdom Cloud API calling method:

def do_sr(self):
    audio_base64=self.get_audio_base64()
    sr_result=srbynetease.connect(audio_base64)
    print(sr_result)
    if sr_result['errorCode']=='0':
        return sr_result['result']
    else:
        return "Something wrong , errorCode:"+sr_result['errorCode']
Copy the code

3. Development of sending data translation function

The call method encapsulated in srByyahoos.py is relatively simple. Simply “assemble” data{} according to the API document and send it:

def connect(audio_base64):
    data = {}
    curtime = str(int(time.time()))
    data['curtime'] = curtime
    salt = str(uuid.uuid1())
    signStr = APP_KEY + truncate(audio_base64) + salt + curtime + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['q'] = audio_base64
    data['salt'] = salt
    data['sign'] = sign
    data['signType'] = "v2"
    data['langType'] = 'zh-CHS'
    data['rate'] = 16000
    data['format'] = 'mp3'
    data['channel'] = 1
    data['type'] = 1

    response = do_request(data)

    return json.loads(str(response.content,'utf-8'))
Copy the code

Iv. Effect display

Try opening a short excerpt from the First episode of Legend of Zhen Huan:

The effect can be, a small flaw in the sentence can be ignored. I did not expect this short speech recognition API ancient and modern, ancient speech recognition is so slip, fierce!

Five, the summary

Some attempts have opened the door to a new world. From today, I can be a wild subtitler who can carry subtitles without typing. Later, I can try to recognize the operation of translation into other languages.

Project address: github.com/LemonQH/SRF…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Use Python to solve your girlfriend’s need to watch movies without subtitles

First, the story

Two, the preparation work before development

Iii. Detailed introduction of the development process

(I) Interface specification description

(II) Project development

1. Realization of interface part

2. Developed audio and video functions

3. Development of sending data translation function

Iv. Effect display

Five, the summary

Use Python to solve your girlfriend’s need to watch movies without subtitles

First, the story

Two, the preparation work before development

Iii. Detailed introduction of the development process

(I) Interface specification description

(II) Project development

1. Realization of interface part

2. Developed audio and video functions

3. Development of sending data translation function

Iv. Effect display

Five, the summary

Related Posts

Design pattern — singleton pattern

Ali P7 was baffled by MySQL on the second side, and then I put the MySQL most comprehensive questions + notes + map all thoroughly

Ant Financial SOFAStack won the Peak Open Source Technology Innovation Award of cloud Computing Open Source Industry Conference