Recently, the company changed the name of a project, so all the previous documents have to be modified.

Most documents need to be renamed, and most documents need to be changed by replacing the name of the document, and nothing else needs to be changed.

Operating file by file in this way is time-consuming and error-prone.

I happen to be learning Python right now, which is exactly what Python is good at. So I wrote a script file in Python to do all the work.

Depending on the document that needs to be modified, there are several things that need to be done:

  1. Lists all files in the current folder.
  2. Change the name of all files in the folder;
  3. Replace the old project name with the new project name in all doc files

Based on the above requirements, the script code uses the following libraries:

  1. OS: lists the file names in the folder
  2. Win32com: Save doc files as docx files
  3. Docx: Modifies the contents of a DOCx
  4. Delay time:

Here I explain each part of the code.

1. Introduce the library to use and define the two variables path and DOCx

Path is the folder where the file to be operated is located. Docx saves the doc file as the directory for saving the docx file

from docx import Document
import os
from win32com import client as wc
from time import sleep

path = 'D:/work' 
docx = 'D:/docx'

Copy the code

2. Define class Mydocx

class Mydocx: """ "Batch processingdocxThe file "" "def __init__(self.old_info.new_info) - >NoneDefine file paths, etc. ""self.old_info Self. new_info = new_info #docx self.path = STR (path) # replace the directory where the file is locatedCopy the code

3. Use OS to list all files in the directory

Since Python cannot read doc files and all files in the directory are in DOC format, you need to save all files as docX files first. The Mydocx class defines a method, get_files, that lists all doc files in the directory. This method uses the OS module’s listdir, which lists the names of all files in a directory.

    def get_files(self):
        """Get file list from folder path"""Files = [] # Get the list of files from the folder path and discard the files that are not in doc formatfor file in os.listdir(self.path):
            if file.endswith('.doc'):
                files.append(file)
            else:
                print(file)
                
        # self.doc2docx(files)
Copy the code

4, Win32com used to save doc format files as docX format files

Use Win32com to save all files as docX format files, and save another directory docx (D:/docx) under.

    def doc2docx(self, files):
        """Doc conversion docx"""Loop save the file to docX formatfor file_name in files:

            word = wc.Dispatch('Word.Application'Docx_name = file_name.split()'. ') [0] +'.docx'
            path = u'%s/' % self.path + file_name
            
            docx_path = u'%s/'% docx + docx_name # if a file with the same name exists, the loop will be brokenif os.path.exists(docx_path):
                print('exists')
                continueDoc = word.documents.Open(path) # Doc.SaveAs(docx_path,12)
            doc.Close()
            word.Quit()
            sleep(2)
Copy the code

5. Loop through the file and modify its contents

Read the target file and replace its contents.

    def read(self, files):
        """Read all files"""
        for file in files:

            doc = Document(self.path+'/'+file)
            self.update(doc)
            doc.save(self.path+'/'+file)
            print("%s replacement completed" % file)

    def update(self, doc):
        """Replace file contents"""Read all paragraphs and replace themfor p in doc.paragraphs:
            for run inP.runs: run.text = run.text. Replace (self.old_info, self.new_infofor table in doc.tables:
            for row in table.rows:
                for cell inRow.cells: cell.text = cell.text.replace(self.old_info, self.new_info) # Replace informationCopy the code

6. Change the file name

    def rename(self, files):
        """Batch change file name"""

        for file_name in files:
            oldname = '%s/' % self.path + file_name
            // Remove Spaces from files
            # newname = '%s/' % self.path + \
            #     file_name.replace(' '.' ')
            newname = '%s/'% self.path + \ file_name.replace(self.old_info, self.new_info) os.rename(oldname, newname) # rename the file using the rename method in OS moduleCopy the code

Here is the complete code:

from docx import Document
import os
from win32com import client as wc
from time import sleep

path = 'D:/work'
docx = 'D:/docx'

class Mydocx: """ "Batch processingdocxThe file "" "def __init__(self.old_info.new_info) - >NoneDefine file paths, etc. ""self.old_info = old_info
        self.new_info = new_info
        self.path = str(path)

    def doc2docx(self, files):
        """Doc conversion docx"""

        for file_name in files:

            word = wc.Dispatch('Word.Application')

            docx_name = file_name.split('. ') [0] +'.docx'
            path = u'%s/' % self.path + file_name
            docx_path = u'%s/' % docx + docx_name
            if os.path.exists(docx_path):
                print('exists')
                continuePrint (path) print(docx_path) doc = word.documents.Open(path)12)
            doc.Close()
            word.Quit()
            sleep(2)

    def rename(self, files):
        """Batch change file name"""

        for file_name in files:
            oldname = '%s/' % self.path + file_name
            # newname = '%s/' % self.path + \
            #     file_name.replace(' '.' ')
            newname = '%s/'% self.path + \ file_name.replace(self.old_info, self.new_info) os.rename(oldname, Def get_files(self) def get_files(self)"""Get file list from folder path"""Files = [] # Get the list of files from the folder path and discard files that are not in DOCX formatfor file in os.listdir(self.path):
            if file.endswith('.docx'):
                files.append(file)
            else:
                print(file)
        print(len(files))
        # self.rename(files)
        # self.doc2docx(files)
        self.read(files)

    def read(self, files):
        """Read all files"""
        for file in files:

            doc = Document(self.path+'/'+file)
            self.update(doc)
            doc.save(self.path+'/'+file)
            print("%s replacement completed" % file)

    def update(self, doc):
        """Replace file contents"""Read all paragraphs and replace themfor p in doc.paragraphs:
            for run inP.runs: run.text = run.text. Replace (self.old_info, self.new_infofor table in doc.tables:
            for row in table.rows:
                for cell inCells: cell.text = cell.text. Replace (self.old_info, self.new_info) # replace information mydoc = Mydocx('Original project name'.'Current project name')
mydoc.get_files()
Copy the code