Stream Excel files

When we want to export database data to Excel file, if the data volume is very large, it may need to consume more memory, resulting in OOM. Even without OOM, it is possible to time out requests because it takes too long to generate Excel files. This would require the POI SXSSF (. Org. Apache POI. XSSF. Streaming) function.

Excel two formats

Excel 97(-2007) file format
Excel 2007 OOXML (.xlsx) file format

HSSF is the POI Project’s pure Java implementation of the Excel ’97(-2007) file format. XSSF is the POI Project’s pure Java implementation of the Excel 2007 OOXML (.xlsx) file format.

HSSF and XSSF provides ways to read spreadsheets create, modify, read and write XLS spreadsheets. They provide:

low level structures for those with special needs
an eventmodel api for efficient read-only access
a full usermodel api for creating, reading and modifying XLS files

Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.

SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

In auto-flush mode the size of the access window can be specified, to hold a certain number of rows in memory. When that value is reached, the creation of an additional row causes the row with the lowest index to to be removed from the access window and written to disk. Or, the window size can be set to grow dynamically; it can be trimmed periodically by an explicit call to flushRows(int keepRows) as needed.

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

Only a limited number of rows are accessible at a point in time.
Sheet.clone() is not supported.
Formula evaluation is not supported

SXSSF

How does SXSSF reduce memory consumption? It reduces memory usage by writing data to temporary files, reducing the probability of OOM errors.

// turn off auto-flushing and accumulate all rows in memory
SXSSFWorkbook wb = new SXSSFWorkbook(-1); 
Copy the code

You can also specify -1 in the constructor to turn off automatic writing to files and keep all data contents in memory.

Although the OOM issue is addressed here, the request timeout issue is not resolved because all data must be written to a temporary file before the request can be answered.

Flow generated

The Excel 2007 OOXML (.xlsx) file format is essentially a zip file. You can change the suffix of the.xlsx file to.zip and unzip it:

├─ [Content_Types].xml ├─ _rels ├─ docProps │ ├─ $mv Output.xlsx output.zip $unzip Output.zip $├─ [Content_Types].xml ├─ _rels ├─ docProps │ ├ ─ ─ app. XML │ └ ─ ─ the core. The XML └ ─ ─ xl ├ ─ ─ _rels │ └ ─ ─ workbook. XML. The rels ├ ─ ─ sharedStrings. XML ├ ─ ─ styles. The XML ├ ─ ─ TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXT TXTCopy the code

We can see that the Excel file unzipped contains the above files, with styles being our defined style format (including font, text size, color, center, etc.) and our data content in the Worksheets directory.

By analyzing the data format, we can control the writing process of the XLSX file, and write the data directly to the response stream instead of temporary files to solve the request timeout problem perfectly.

Sample code:

XSSFWorkbook wb = new XSSFWorkbook()
XSSFCellStyle headerStyle = genHeaderStyle(wb)
sheets.each { sheet ->
    defxssfSheet = wb.createSheet(sheet.name) sheet.setXSSFSheet(xssfSheet) sheet.setHeaderStyle(headerStyle) } File template =  genTemplateFile(wb) ZipOutputStream zos =new ZipOutputStream(responseStream);
ZipFile templateZip = new ZipFile(template);
Enumeration<ZipEntry> templateEntries = templateZip.entries();
try {
  while (templateEntries.hasMoreElements()) {
    // copy all template content to the ZipOutputStream zos
    // except the sheet itself
  }
  zos.putNextEntry(new ZipEntry(sheetName)); // now the sheet
  OutputStreamWriter sheetOut = new OutputStreamWriter(zos, "UTF-8");
  try {
    sheetOut.write("
      ");
    sheetOut.write("<worksheet><sheetData>");
    // Write the content -- rows and cells
    sheetOut.write("</sheetData></worksheet>");
  } finally{ sheetOut.close(); }}finally { zos.close(); }
Copy the code

The template contains index information, such as what styles were created, how many sheets, and so on, which comes first in the ZIP file, followed by the sheet content data.

My blog address: blog.yu000hong.com/2018/07/24/…

Excel two formats

SXSSF

Flow generated

Related Posts

Redis cache avalanche, cache penetration, cache breakdown, cache degradation, cache warm-up, cache update

【 K8S series 】 K8S learning 27-7, K8S own principle of high availability

Nginx series (3) | show you read Nginx load balancing