PDF is a standard document format widely used for document exchange between individuals and different organizations. Even if it is popular, it is not always the ideal choice for displaying content. On the Web, for example, HTML is a better choice for a more satisfying user experience. If you want to display PDF content on your web site, converting it to HTML might help. Therefore, this article will show you how to convert PDF documents to HTML using C ++.

  • Convert PDF documents to HTML format using C ++
  • Use C ++ to convert PDF documents to HTML format using other options

PDF for C++ is a C++ library that you can use to create, read, and update PDF documents. In addition, the API supports converting PDF files to HTML format. Download the new version

Convert PDF documents to HTML format using C ++

Easily convert PDF documents to HTML format using Aspose.PDF for C ++ API. You only need two lines of code to perform the transformation. To convert a PDF document to HTML, follow these steps.

  • Load PDF documents using the Document class.
  • Save the HTML output using the Document-> Save (System :: String outputFileName, SaveFormat format) method.

The following sample code shows how to convert a PDF document to HTML using C ++.

// Open the source PDF document
auto pdfDocument = MakeObject(u"SourceDirectory\\Sample 1.pdf");

// Save the HTML file
pdfDocument->Save(u"OutputDirectory\\output.html", SaveFormat::Html);
Copy the code

PDF source file

The output HTML file

Use C ++ to convert PDF documents to HTML format using other options

Asposed.PDF for C ++ API enables you to customize the HTML generated by the transformation process. To do this, the API provides the HtmlSaveOptions class. Here are some of the options provided by the HtmlSaveOptions class.

  • FontSavingMode: used to set the FontSavingMode to be used during conversion. The FontSavingModes enumeration is used to set its value.
  • RasterImagesSavingMode: Used to set how raster images should be processed during the conversion process. The RasterImagesSavingModes enumeration is used to set its value.
  • LettersPositioningMethod: Sets the position of the letters in the word. The LettersPositioningMethods enumeration is used to set its value.
  • The path of the SpecialFolderForAllImages: used to set the save image.
  • SplitIntoPages: This sets whether to convert each page of the PDF into a separate HTML page or the entire document into a single HTML file.
  • SplitCssIntoPages: When SplitIntoPages is set to true, SplitCssIntoPages sets whether CSS should be saved as a single file or as a separate file per HTML page.

Here are the steps to convert a PDF document to HTML using other options.

  • Load PDF documents using the Document class.
  • Create an instance of the HtmlSaveOptions class.
  • Set the desired options.
  • Save the HTML output using the Document-> Save (System :: String outputFileName, System :: SharedPtr option) method.

Here is C ++ sample code that demonstrates how to customize HTML output using the HtmlSaveOptions class.

// Open the source PDF document
auto pdfDocument = MakeObject(u"SourceDirectory\\Sample 1.pdf");

// Create an instance of the HtmlSaveOptions class
SharedPtroptions = MakeObject();

// Set the desired options
options->PartsEmbeddingMode = HtmlSaveOptions::PartsEmbeddingModes::EmbedAllIntoHtml;
options->LettersPositioningMethod = HtmlSaveOptions::LettersPositioningMethods::UseEmUnitsAndCompensationOfRoundingErrorsInCss;
options->RasterImagesSavingMode = HtmlSaveOptions::RasterImagesSavingModes::AsEmbeddedPartsOfPngPageBackground;
options->FontSavingMode = HtmlSaveOptions::FontSavingModes::SaveInAllFormats;

// Save the HTML file
pdfDocument->Save(u"OutputDirectory\\output.html", options);
 
Copy the code

If you have any questions or requirements, please feel free to join the Aspose Technology Exchange Group (761297826), we are happy to provide you with inquiries and consultation.