Word to PDF on Windows and Linux platforms

  • preface
  • Convert Word to PDF for Windows platform
  • Linux Platform Word to PDF
  • reference

preface

After investigation, I found that there are many solutions on the Internet. Word and PDF can be converted perfectly on Windows platform, but most businesses require to run on Linux, and windows-based methods cannot be transplanted to Run on Linux. After verification, most of the existing Methods of converting Word documents into PDF on Linux have the problem of inconsistency between PDF and Word format (such as inconsistency in page number between PDF and Word, more compact text arrangement, etc.). Therefore, the main goal of this paper is to solve the problem of inconsistent format layout after Word is converted to PDF on Linux platform. First, the conclusion is that the format of most test case documents is basically consistent after conversion, but there are a few inconsistent problems. And package the Java project into a JAR package and call the command line test on the Linux platform to pass.

Convert Word to PDF for Windows platform

import win32com  # python -m pip install pypiwin32
from win32com.client import Dispatch
import sys

def word2pdf(file_path='/ Documents /4- Business Meeting Service Agreement _20191105 Guangzhou) :
	word = Dispatch('Word.Application')
	word.Visible = 0 # background run, do not display
	word.DisplayAlerts = 0  # without warning
	path = sys.path[0] + file_path
	doc = word.Documents.Open(FileName=path, Encoding='gbk')
	doc.SaveAs(path[:-5] +'_pdfed.pdf'.17) # TXT =4, HTML =10, docx=16, PDF =17
	doc.Close()
	word.Quit()
	print("Word2Pdf conversion completed, PDF file and Word in the same directory.")

if __name__ == '__main__':
	# word2PDF ('/ Documents /1- Technical Service contract.docx ')
	file_path = sys.argv[1]
	print(file_path)
	word2pdf(file_path)
Copy the code

Win32com runs only on Windows platforms; there are no similar dependencies on Linux.

Linux Platform Word to PDF

Yum install unoconv use: unoconv -f PDF xxx.docx

Sudo apt-get install cups-pdf use oowriter-convert-to PDF :writer_pdf_Export xxx.docx

3. Test with LibreOffice: Soffice –headless –invisible –convert-to PDF XXX.docx

4. Convert WORD to PDF by Java programming with Apose-Words-15.8.0 tool Hacked (Pojie) version. (It is recommended that the format change of Word is the least after it is converted to PDF.)

Eclipse project directory structure :(complete project source code and packaged jar package are at the end of this article, jar package tests can be called directly)

The jar package can only be placed in the same directory as the word document to be converted, and the generated directory can only be placed in the same directory, which is related to Java getResourceAsStream usage.

GetResourceAsStream reads file paths only within the project’s source folder, including the project SRC root directory, and anywhere in the class package, but this method does not work if the configuration file path is in a folder other than the source folder. Quote from: www.iteye.com/blog/riddic…

GetResourceAsStream: file_path = “C:\ Users\ 16616\ Desktop\ xxx. docx”; word = TestWord.class.getClassLoader().getResourceAsStream(file_path); Document doc = new Document(word); Document doc = new Document(file_path);

Finally, the source code:

package com.demo;

import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.stream.Stream;

import com.aspose.words.Document;
import com.aspose.words.FileFormatUtil;
import com.aspose.words.License;
import com.aspose.words.SaveFormat;

/** ** Because ASPOSE eats memory, large files will overflow, so please set Java VM parameters first: -xMS512m -XMX512m (reference)<br> * If you have any questions, please leave a message on the CSDN download interface, or contact QQ569925980<br> * *@author Spark
 *
 */
public class TestWord {

    private static InputStream license;
    private static InputStream word;

    /** * Obtain license **@return* /
    public static boolean getLicense(a) {
        boolean result = false;
        try {
            license = TestWord.class.getClassLoader().getResourceAsStream("license.xml");/ / license path
/ / word = TestWord. Class. GetClassLoader () getResourceAsStream (4 - guangzhou. \ \ "doc");
            License aposeLic = new License();
            aposeLic.setLicense(license);
            result = true;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;
    }
    
    public static void word2pdf(String file_path,String save_path) {
// word = TestWord.class.getClassLoader().getResourceAsStream("\\XXX.doc"); // The original word path
    	word = TestWord.class.getClassLoader().getResourceAsStream(file_path);
    	
    	/ / validate License
        if(! getLicense()) {return;
        }

        try {
            long old = System.currentTimeMillis();
// Document doc = new Document(word);
            Document doc = new Document(file_path);
// File file = new File("C:\\Users\\16616\\Desktop\\AsposeWord\\src\\test.pdf"); // Output path
            
            // Save as a new PDF file. The file name needs to be extracted from file_path
            File tempFile =new File(file_path.trim());
            String fileName = tempFile.getName();
            System.out.println("fileName = " + fileName);
            
            String[] tmp = fileName.split("\ \.");
            String pdf_name = tmp[0] +".pdf";
            
            System.out.println("pdfName = "+ pdf_name);
            
            File file = new File(save_path+"/"+pdf_name);
            FileOutputStream fileOS = new FileOutputStream(file);
            
            doc.save(fileOS, SaveFormat.PDF);

            long now = System.currentTimeMillis();
            System.out.println("Total time:" + ((now - old) / 1000.0) + "Second \ n \ n" + "File saved at :" + file.getPath());
        } catch(Exception e) { e.printStackTrace(); }}/ * * * *@param args
     */
    public static void main(String[] args) {
    	String file_path = args[0];
    	String save_path = args[1];
    	word2pdf(file_path,save_path);
    	
// word2pdf("C:\\Users\\16616\\Desktop\\XXX.docx","C:/Users/16616/Desktop");}}Copy the code

Will containJava project packaging for third-party JAR packagesAnd use the command line to pass the jar package:

1. First, Eclipse packages projects that already contain third-party JAR packages:

File/Export… thenJava/Runnable JAR fileAnd thenSelect the class that has main() to run, chooseextract required libraries into generated JAR.

2. Find the packaged file and place the word file to be converted in the same directory as the JAR package. If you want to place the converted PDF file in the same directory as the Word file:

The command line calls the jar package method:

Java -jar word2pdf.jar [file name][PDF destination path]

Finally, a pdFed.pdf file is generated in this folder.

How to call a Java project (or in a BAT file) using command line arguments

  1, no parameter Open the jar package directory and enter java-jar xxx.jar2Open the jar package directory and enter the java-jar xxx.jar parameter1parameter2... (Arguments are separated by Spaces) corresponding to the main function:public static void main(String[] args) {	    
	    String sourcePath = args[0];1 / / parameter
	    String targetPath = args[1];2 / / parameter
	}
Copy the code

The source code, JAR package and usage method of the project have been put in the public account, and the background reply keyword [word2PDF] can be obtained.

reference

[1] How-to-convert-word-doc-to-pdF-in-Linux [2] Export Java project as JAR package + Export third-party JAR package + Use command line call + pass parameter [3] Aspose words-15.8.0 [4] How to export the referenced JAR package and this project into jar files