There is a requirement for Excel export in today’s project. I looked at my colleagues using SXSSFWorkbook. This component has not been used much before. Now that we’ve used it this time, let’s just analyze it. POI provides three Excel operation modes: HSSF, XSSF, and SXSSF. Their differences are as follows:

HSSF: yes Excel97-2003, with the extension of. XLS. XSSF: yes to start operating Excel2007 with the extension.xlsx. SXSSF is a low-memory operation mode provided by POI3.8 based on XSSF. The extension name is. XLSX.Copy the code

The focus of this article is to analyze how SXSSF supports low memory footprint. Each sheet corresponds to a temporary file. When the number of rows is greater than rowAccessWindowSize, the temporary file is flushed to ensure low memory usage. When the rows are created, they are written directly from the temporary file to Excel. One thing to note: operations like cell merge are pure memory operations. If you want to merge more than one row at a time in a project, pay attention to the use of your machine’s content at any time to avoid OOM.

1 for a demo

SXSSFWorkbook wb = new SXSSFWorkbook(100); Sheet sh = wb.createSheet();for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; Cellnum++){// create a row, where the current number of rows is compared to rowAccessWindowSize to write from memory to the file. Cell cell = row.createCell(cellnum); String address = new CellReference(cell).formatAsString(); cell.setCellValue(address); }} // Data with rownum < 900 is flushed to disk and cannot be accessed randomlyfor(int rownum = 0; rownum < 900; rownum++){ Assert.assertNull(sh.getRow(rownum)); } // The last 100 pieces of data are still in memory and can be accessed randomlyfor(int rownum = 900; rownum < 1000; rownum++){ Assert.assertNotNull(sh.getRow(rownum)); FileOutputStream out = new FileOutputStream("d:\\sxssf.xlsx"); wb.write(out); out.close(); Wb.dispose (); // Dispose temporary files from disk.Copy the code

2 main analysis points

2.1 create SXSSFWorkbook

SXSSFWorkbook WB = new SXSSFWorkbook(100); The rowAccessWindowSize is specified to be 100, which means that rowAccessWindowSize row data is cached in memory. When the number of rows exceeds rowAccessWindowSize, input is made from memory to a temporary file.

Temporary file generation is covered in section 2.2 Creating a Sheet. Exceeding the threshold to brush temporary files is explained in section 2.3 Creating a Row.

2.2 create a Sheet

As shown in demo, Sheet sh = wb.createsheet (); Create Sheet. The main function of the creation process is to create a temporary file. Each sheet is a temporary file. Without further ado, let’s look at the implementation of createSheet.

public SXSSFSheet createSheet() {
        return this.createAndRegisterSXSSFSheet(this._wb.createSheet());
    }
Copy the code

The most core is createAndRegisterSXSSFSheet sxSheet = new SXSSFSheet (this, xSheet); . So let’s look at this function:

public SXSSFSheet(SXSSFWorkbook workbook, XSSFSheet xSheet) throws IOException { this._workbook = workbook; this._sh = xSheet; this._writer = workbook.createSheetDataWriter(); // Temporary files are created here. this.setRandomAccessWindowSize(this._workbook.getRandomAccessWindowSize()); this._autoSizeColumnTracker = new AutoSizeColumnTracker(this); }Copy the code

The core logic in createSheetDataWriter is SheetDataWriter. If you look at createTempFile, you’ve created a temporary file.

public SheetDataWriter() throws IOException {
        this._numberLastFlushedRow = -1;
        this._fd = this.createTempFile();
        this._out = this.createWriter(this._fd);
    }
Copy the code

About temporary documents:

Prefix: POi-SXSF-sheet Suffix:.xml Storage path: the code is as follows

private void createPOIFilesDirectory() throws IOException {
        if (this.dir == null) {
            String tmpDir = System.getProperty("java.io.tmpdir");
            if (tmpDir == null) {
                throw new IOException("Systems temporary directory not defined - set the -Djava.io.tmpdir jvm property!");
            }

            this.dir = new File(tmpDir, "poifiles");
        }

        this.createTempDirectory(this.dir);
    }
Copy the code

2.3 create a row

When is a file written from memory? That’s when you create a row. Let’s look at the code:

public SXSSFRow createRow(int rownum) {
        int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();
        if (rownum >= 0 && rownum <= maxrow) {
            if (rownum <= this._writer.getLastFlushedRow()) {
                throw new IllegalArgumentException("Attempting to write a row[" + rownum + "] in the range [0," + this._writer.getLastFlushedRow() + "] that is already written to disk.");
            } else if (this._sh.getPhysicalNumberOfRows() > 0 && rownum <= this._sh.getLastRowNum()) {
                throw new IllegalArgumentException("Attempting to write a row[" + rownum + "] in the range [0," + this._sh.getLastRowNum() + "] that is already written to disk.");
            } else {
                SXSSFRow newRow = new SXSSFRow(this);
                this._rows.put(rownum, newRow);
                this.allFlushed = false; FlushRows if the current number of rows is greater than randomAccessWindowSizeif(this._randomAccessWindowSize >= 0 && this._rows.size() > this._randomAccessWindowSize) { try { this.flushRows(this._randomAccessWindowSize); } catch (IOException var5) { throw new RuntimeException(var5); }}returnnewRow; }}else {
            throw new IllegalArgumentException("Invalid row number (" + rownum + ") outside allowable range (0.." + maxrow + ")"); }}Copy the code

If (this._randomAccessWindowSize >= 0 && this._rows.size() > this._randomAccessWindowsize).

The following sections have nothing to do with low memory footprint, but just look at the steps that are actually used in a project.

2.4 Write to final Excel from temporary files

Writing to Excel is mainly done in workbook.write(out). Take a look at the code:

public void write(OutputStream stream) throws IOException { this.flushSheets(); // Write the last number of lines below randomAccessWindowSize to the sheet temporary file. File tmplFile = TempFile.createTempFile("poi-sxssf-template".".xlsx"); // create a tmplFile temporary file that is not the sheet temporary file. try { FileOutputStream os = new FileOutputStream(tmplFile); Throwable var5 = null; Try {// Here all the data in the workbook is written to the temporary tmplFile file that you just created. this._wb.write(os); }... ZipSecureFile zf = new ZipSecureFile(tmplFile); var5 = null; try { ZipFileZipEntrySourcesource= new ZipFileZipEntrySource(zf); Throwable var7 = null; Try {// writes the tmplFile temporary file to the target Excel. this.injectData(source, stream); }... } finally {// Delete tmplFile temporary file. Note that it is not a temporary file of sheet. deleted = tmplFile.delete(); }if(! deleted) { throw new IOException("Could not delete temporary file after processing: "+ tmplFile); }}Copy the code

The basic logic is simple:

(1) Write the remaining data in memory that is not enough randomAccessSize into the sheet temporary file first.

(2) Write all the data in the workbook (i.e., multiple sheet temporary files) to a TMPL temporary file

(3) Write the data of the TMPL temporary file into the target file.

2.5 Deleting Temporary Files

workbook.dispose(); The logic here.

public boolean dispose() {
        boolean success = true; Iterator var2 = this._sxFromXHash.keySet().iterator(); // Iterate through multiple sheets one by onewhile(var2.hasNext()) { SXSSFSheet sheet = (SXSSFSheet)var2.next(); Try {// Here is the core dispose. Success = sheet.dispose() && success; } catch (IOException var5) { logger.log(5, new Object[]{var5}); success =false; }}return success;
    }
Copy the code

The basic logic is to traverse multiple sheets and dispose on each sheet. The logic of dispose is to flush and then delete the sheet temporary file if no file has been output from the sheet.

2.6 Operations on merging cells

Cell merge usage:

  CellRangeAddress region0 = new CellRangeAddress(rowNum, rowNum+1, column, column);
  sheet.addMergedRegion(region0);
Copy the code

This is just merging rowNum and column.

private int addMergedRegion(CellRangeAddress region, boolean validate) {
        if (region.getNumberOfCells() < 2) {
            throw new IllegalArgumentException("Merged region " + region.formatAsString() + " must contain 2 or more cells");
        } else {
            region.validate(SpreadsheetVersion.EXCEL2007);
            if (validate) {
                this.validateArrayFormulas(region);
                this.validateMergedRegions(region);
            }

            CTMergeCells ctMergeCells = this.worksheet.isSetMergeCells() ? this.worksheet.getMergeCells() : this.worksheet.addNewMergeCells();
            CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
            ctMergeCell.setRef(region.formatAsString());
            returnctMergeCells.sizeOfMergeCellArray(); }}Copy the code

3 summary

This article combined with the Excel tool used in the project – SXSSFWorkbook, a simple explanation. The low memory usage of SXSSFWorkbook is analyzed. Hope to help you ~ this synchronous release in Jane books www.jianshu.com/p/18046332b…

4 References

HSSF, XSSF and SXSSF distinction and optimum Excel export www.cnblogs.com/pcheng/p/74…

Based on the EXCEL file export flow, SXSSFWorkbook source parse www.jianshu.com/p/b80a20b81…

# 5 other

(1) When reading excel numbers, there will be a “.0” by default. How to solve this problem? My.oschina.net/henglaixuex…