preface

One day, when we were editing the background, we said that every time we uploaded PPT, PDF and Word, we should first put out pictures for each file and then upload them one by one (PNG for preview, PPT, PDF and Word source files can not be downloaded directly, you have to pay). We said that it was too inefficient and asked if there was any way to just upload the file. At that time, I thought it would be inefficient to rotate every upload, because some exports might have dozens of images.

Finally, through GitHub and blogs. Finally, the automatic picture transfer problem is solved. The first time you write a Python script, you are welcome to point out any errors that are not elegant

The Python version 3.9.5 of this document requires a Windows platform and Microsoft Office installed

The script idea

Operation personnel upload PPT, PDF and Word to the database and read the script files. Remote connection -> Download to the local -> Transfer pictures -> upload to the cloud storage -> Obtain remote picture connection -> store to the database.

Connect to the database to query the collection to be moved

def connectDatabase() :
    conn = pymysql.connect(host='127.0.0.1', user='root', password="",database ='pic',port=3306)  
# host=localhost # can also be written, if 127.0.0.1 is not available # login database
    cur = conn.cursor(pymysql.cursors.DictCursor) 
    return {
       "conn":conn,
       "cur":cur
    }
Copy the code
Get the set of files to transfer
def getUrlArr(cur) :
    sql = 'select * from pic' Write your own SQL statement
    arr = ' '
    try:
        cur.execute(sql)
        ex = cur.execute(sql)
        arr = cur.fetchmany(ex)
    except Exception as e:
        raise e
    finally:
        return arr
Copy the code

Download the file to the local PC

Download the file locally
def downLoad(url) :
    print('----url-----',url)
    filename=' '
    try:
        suffix = os.path.basename(url).split('. ') [1]
        filename = "miaohui."+suffix
        if os.path.exists(filename):  Delete file if file exists
            os.remove(filename)
        wget.download(url,filename)
    except IOError:
        print('Download failed',url)
    else:
        print('\n')
        print('Download successful',url)
        return filename
Copy the code

PPT turn images

# pip install pywin32

Initialize PPT
def init_powerpoint() :
    powerpoint = win32com.client.Dispatch('PowerPoint.Application') #comtypes.client.CreateObject("Powerpoint.Application")
    powerpoint.Visible = 1
    return powerpoint
Copy the code
Turn the PNG # PPT
def ppt2png(url,pptFileName,powerpoint) :
    try:
        ppt_path = os.path.abspath(pptFileName)
        ppt = powerpoint.Presentations.Open(ppt_path)
        # Save as a picture
        img_path = os.path.abspath(downLoad_path + '.png')
        ppt.SaveAs(img_path, 18) Save # 17 as JPG
        # Close the open PPT file
        ppt.Close()
    except IOError:
        print('FAILED to convert PPT to PNG',url)
    else:
        print("PPT to PNG successfully",url)
Copy the code

Turn the PDF images

# pip install PyMuPDF
# PDF to image
def pdf2png(_url,pptFileName) :
    imagePath = os.path.abspath(downLoad_path)
    try:
        pdfDoc = fitz.open(pptFileName)
        for pg in range(pdfDoc.pageCount):
            page = pdfDoc[pg]
            rotate = int(0)
            # Each size has a scaling factor of 1.3, which will give us an image with a resolution improvement of 2.6.
            The default image size is 792X612, dpi=96
            zoom_x = 1.33333333  # (1.33333333 - > 1056 x816) (2 - > 1584 x1224)
            zoom_y = 1.33333333
            mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate)
            pix = page.get_pixmap(matrix=mat, alpha=False)

            if not os.path.exists(imagePath):  # Check whether the folder where the image is stored exists
                os.makedirs(imagePath)  Create an image folder if it doesn't exist
            pix.save(imagePath + '/' + 'Slide %s.png' % pg)  # Write the image to the specified folder

    except IOError:
        print('Failed to convert PDF to PNG',_url)
    else:
        print("PDF converted to PNG successfully",_url)
Copy the code

Turn the word pictures

To convert word to picture, first convert Word to PDF, and then convert PDF to picture.

# word to Pdf
def word2pdf(word_file) :
    Param Word_file: word file: return: ""
    Get word format processing object
    word = Dispatch('Word.Application')
    Open file as Doc object
    doc_ = word.Documents.Open(word_file)
    Save as a PDF file
    suffix = os.path.basename(word_file).split('. ') [1]
    doc_.SaveAs(word_file.replace(suffix, "pdf"), FileFormat=17)
    print(word_file,'---- transfer to PDF successful ')
    Close the doc object
    doc_.Close()
    Exit the word object
    word.Quit()
    return os.path.basename(word_file).split('. ') [0] +'.pdf'
Copy the code

Then call pdf2png above

Upload to the object storage

Here will not post out, we use Huawei cloud OBS. Ali Cloud, Tencent cloud and other object storage have their own Python version SDK, access is also very convenient.

The last group is called together

if __name__=='__main__':
    connect = connectDatabase()
    powerpoint = init_powerpoint()
    downArr = getUrlArr(connect['cur'])
    for i in downArr:
        if(os.path.exists('/'+downLoad_path)):
            removeFileInFirstDir('/'+downLoad_path)
        _url = unquote(i['url'])
        id = i['id']
        pptFileName = downLoad(_url)# Download file
        if(('.pdf' in _url) ==True):
            pdf2png(_url,pptFileName)
        elif (('.doc' in _url) ==True):
            _file = os.path.abspath(pptFileName)
            pdfNmae = word2pdf(_file)
            pdf2png(_url,pdfNmae)
        else:   
             ppt2png(_url,pptFileName,powerpoint) Turn the PNG #
        imgArr = uploadImg(_url) # Upload images to cloud storage to get remote links
        setData(_url,id,imgArr,connect) Save to database
        time.sleep(2)
        print('\n')
        print('\n')
    connect['cur'].close()    Close the cursor
    connect['conn'].close()   Disconnect the database to release resources
    powerpoint.Quit()
    input("Enter any key to end")
Copy the code

Because it is their own internal use, so you can use PyInstaller packaged into an EXE, provided for operation, data upload under the operation, you can batch automatic transfer pictures.

# p y exe
pyinstaller -c -F -i a.ico ppt_to_img.py   
Copy the code

The last

I hope this article is of some help to you. If you have any questions, please feel free to correct them

Looking for interview questions? To the front-end interview question bank WX search advanced large front-end small procedures