This is the sixth day of my participation in Gwen Challenge

You can transfer PDF to Word in WPS or Office, but you can only transfer to the first 5 pages for free. Here’s a Python office hack: Batch Pdf to Word, so you can convert as many pages as you want.

Python’s pdfMine3K library is used to extract PDF content, and the python-docx library is used to save the content to Word.

Here’s a look at the effect:

01 Environment Preparations

Before we start writing the code, let’s install some Python libraries to use as follows:

pip install pdfminer
Copy the code

Note:

PIP install docx is used to install docx.

ModuleNotFoundError :No module named ‘exceptions’

Truth:

pip install python-docx
Copy the code

02 Extracting the PDF content

1. Import related libraries

from pdfminer.pdfparser import PDFParser, PDFDocument
Copy the code

Explanation:

2. Read the PDF content

Before you start reading, take a look at the PDF:

Chenge has created a new two-page PDF file with his original articles sorted by modules.

The above code reads the PDF file and places each page in doc.get_pages.

The loop extracts the contents of each page and prints out each page

03 Save the file to Word

We have successfully extracted the PDF content above, and then we saved the content into Word

Step by step write and save content in traversing PDF content. Finally save it and name it: Python researcher-cheng.docx

04 summary

In order to facilitate your learning, Chen Elder brother has put the complete source of this article uploaded, need to reply in the public background: PDF conversion

In this article, Chen Ge mainly explains how to use Python to convert batch Pdf to Word. If you don’t understand the place, you can leave a message below, and communicate together.