“Life is too short, learn Python.” Because of this slogan, I joined the great army of learning Python, but because there were so many things you could do with Python, I was momentarily confused and didn’t know which direction I should focus on.

After testing in various directions, I still chose the extensive but not deep Web development. Python web development is naturally inseparable from the famous Django. Once I had a whim and downloaded the PDF version of Django, hoping to make it easier to check, but I was surprised to find that the PDF version was only in English, with nearly 1,900 pages. There are many limitations for free. For example, free users can only translate a maximum of 5 pages at a time, and the maximum size of the document cannot exceed 5M to 10M.

How to do? Are Python students afraid of this?

Split, split is the best solution, manual to dismantle? You’ll learn the PyPDF2 module. You’ll learn the PyPDF2 module. You’ll learn the PyPDF2 module quickly, and you’ll understand what “life is short, I use Python” means.

Install the PyPDF2 module

This module is strictly case sensitive, with y in lowercase and the rest in uppercase

pip3 install PyPDF2
Copy the code

After the installation is complete, create a folder on the local hard disk to store the project. In this folder, I have F:\Python\PyPDF2. In the folder, I create a folder named after the module to store it separately and to distinguish it from other projects.

Create the file and prepare the PDF document

The Django official website downloads his help document, which is large enough, more than 1900 pages, for practice. If needed, go to the Django official website and download it, and then create a project file called pdfcF.py.

Everything is ready to go

The first sentence specifies the program to run the file, and the second sentence describes the file. It is not clear what this does, but if you know how to quickly execute a program in batches, you will know what it does.

#! Python # pdfcF.py - PDF file splitterCopy the code

How to split the document

Not fixed split into how many, but fixed each by how many pages, and then to dynamically calculate the number of split, split ideas, then down is to list the calculation formula.

Split copies = Total pages of the document/pages of each PDF splitCopy the code

Here’s an example:

If we want to split a PDF document with a total number of 35 pages and make a new document according to every 10 pages, the calculation formula for how many copies we can split is as follows:

3.5 = 35/10Copy the code

Now, notice that there is a remainder of 0.5. What does that mean? In this example is split into three parts the remaining five pages, so in this case regardless of the remainder is 1, a few have to move forward to complete the break up, as a result of the document split, the former three documents each consist of 10 pages, the fourth document is composed of the last five pages, can be divided exactly by the result directly the number of copies is split.

Python split formula:

if 35 % 10:        # Determine if there is a remainder
    35 // 10 + 1   # Take the remainder integer part and add 1
else:
    0              If # is divisible, return 0
  
# Write this loop on one line
4 = 35 // 10 + 1 if 35 % 10 else 0
Copy the code

How exactly?

Again, take this 35-page document broken up:

For num in range(35), select the number of pages to split, and then select the number of pages to split.

  1. The first document starts from 0 to 10 and does not include 10
  2. The second document is from 10 to 20, without 20
  3. The third document is from 20-30 and does not contain 30
  4. The fourth document is from 30 to 35 and does not contain 35

We find that the rule, the rule of traversing the first number at a time is the number of pages in a document, multiplied by the number of pages you belong to. The second number we find is not regular, in fact, careful observation is also regular, if we sort the number of split, this example is 1-4, the second number is the current split fraction times the number of pages per document (the number of pages is fixed 10).

But the first time we iterate, we start at 0, which makes num ungeneric, so let’s redo the first time we iterate from 1, range(1,35), we iterate from the beginning, based on the fact that the range does not contain the last one in itself, so there is one less page to iterate through, so let’s add 1 to it, which becomes

  1. for num in range(1, 35+1)
  2. The first document starts with 10*(1-1)–10*1, excluding 10
  3. The second document starts from 10*(2-1)–10*2, excluding 20
  4. The third document ranges from 10*(3-1) -10*3, excluding 30
  5. The fourth document goes from 10(4-1) to 35

The detailed traversal code is as follows:

for num in range(1.35+1) :pass
    for i in range(10 * (num-1), 10 * num ifnum ! =4 else 35) :pass
Copy the code

Note: When num = 4 (the last document sort number), simply return the total number of pages (35), and the traversal is complete. Why is there 35 total pages here instead of 35+1? Because this time we’re going to start at 0, and the page number is going to start at 0, so we don’t need to add 1.

Complete split procedure:

import PyPDF2

Open a readable PDF object
pdfReader = PyPDF2.PdfFileReader('django.pdf')
# Get the total number of PDF pages
pdfnums = pdfReader.numPages
# How many pages does each split document consist of
innumber = 100
# Calculate the number of split shares
outnums = pdfnums // innumber + 1 if pdfnums % innumber else 0

for num in range(1,pdfnums):
    Create a blank PDF
    pdfWriter = PyPDF2.PdfFileWriter()
    # Extract the specified page range
    for pageNum in range(innumber * (num - 1), innumber * num ifnum ! = outnumselse pdfnums):
        # get the content of each page
        pageObj = pdfReader.getPage(pageNum)
        # Add the contents of each page to the blank document object created in the first loop
        pdfWriter.addPage(pageObj)
    Save and write to the local file, and rename each document
    with open('PDFREAD %s' % num + '.pdf'.'wb') as pdfOutputFile:
        pdfWriter.write(pdfOutputFile)
Copy the code

Note: If you have a good understanding of the cutting edges and step sizes of Lists in Python, I don’t think it needs to be that complicated. I just need to make a big list of total page numbers and slice it up into smaller lists. Then each split PDF page range is the first number of each small list – the last number +1, I also posted the code I implemented with the list method for your reference.

List split method to achieve split PDF

#! python
# pdfcf.py - PDF file splitter

import PyPDF2
# import LISTCF

Open a readable PDF object
pdfReader = PyPDF2.PdfFileReader('django.pdf')
# Get the total number of PDF pages
pdfnums = pdfReader.numPages
  

Loop the total page numbers into a list
pagenum_list = list(range(pdfnums))

n = 100

# Divide the total page number into several small lists according to the specified number
page_list = [pagenum_list[i:i + n] for i in range(0.len(pagenum_list), n)]

for i in range(len(page_list)):
  Create a blank PDF
  pdfWriter = PyPDF2.PdfFileWriter()
  # Extract the specified page
  for pageNum in range(page_list[i][1], page_list[i][-1] +1):
    pageObj = pdfReader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

  with open('PDFREAD %s' % i + '.pdf'.'wb') as pdfOutputFile:
    pdfWriter.write(pdfOutputFile)
Copy the code

How does it work?

In the project folder, hold down the Shift key, right click, select open command window here, type pdfcf. py, press enter, and change n to your needs.

Write in the last

To share my learning method, generally when the program is written as far as possible not to begin writing, but first want to ideas, ideas, this will avoid in the process of writing a hard situation, this program is not perfect, you can also specify the number of segmentation, automatic computing each page contains how much, can only extract more to how many pages this need, So the following two requirements for you to think to complete, I will post their own code and ideas, we can communicate with each other, creation is not easy, welcome to add attention forward, comment exchange.

Thank you nuggets for providing the platform!