In fact, the Excel spreadsheet tool class used in the author’s previous project was a single-thread tool class with relatively slow execution efficiency.

When generating Sheet and 1W lines of data, it takes about 10 seconds, and the amount of data of multiple sheets is superimposed, and the time cost will also be linearly superimposed.


The framework used in the author’s project is Apache open source POI. The main body of this article focuses on optimization ideas, such as the use of POI, thread pool details, the use of concurrent tool classes, etc., which are beyond the scope of discussion.


In a production environment, Excel can be generated several times faster with the following optimizations (add chicken legs to the product).


Score Sheet to optimize


In Excel, there should be no correlation between different sheets, at least for the author’s project.

Therefore, the first step of optimization is to run different sheets in different thread pools. The pseudocode is as follows:


void excel(a){
  // Create an Excel file object
  XSSFWorkbook workbook = new XSSFWorkbook();
  
  // Table data, the first layer is the number of sheets, the second layer is the data in each sheet
  List<List<String>> data = new ArrayList();
  
  // CDL is used for synchronization
  CountDownLatch cdl = new CountDownLatch(data.size());
  
  for(List<String> d : data){
    // Executor is a thread pool. The details of the thread pool are not shown
    executor.submit(()->createSheet(work,d,cdl));
  }
  
  / / synchronize
  cdl.await();
}

/** * The logic to generate the sheet data */ 
Boolean createSheet(XSSFWorkbook workbook,List<String> data, CountDownLatch cdl){
  XSSFSheet sheet = workbook.createSheet();
  for(int i=0; i<data.size(); i++){ XSSFRow sheetRow = sheet.createRow(i);// do something
  }
  cdl.countDown();
}
Copy the code


By generating sheets separately, the time consumed for multiple sheets can be reduced by more than half. For a single sheet, it doesn’t work.


Below continue to look at the author’s single sheet optimization ideas.


Sheet is optimized


In the pseudocode above, you can see that

XSSFRow sheetRow = sheet.createRow(i);
Copy the code

In the createSheet method, it is slow to generate the specific row data line by line and then do the business processing.


So, my first thought is to change this to multiple lines, and then batch the business processing (batch can be multithreaded), is it possible to do this operation?

A List < XSSFRow > sheetRows = sheet. CreateRow (0100);Copy the code

Hey hey think more.. There is no such operation.


Can I create the row first, and then write data asynchronously and in parallel inside the row?

The pseudo-code is as follows:

Boolean createSheet(XSSFWorkbook workbook,List<String> data){
  XSSFSheet sheet = workbook.createSheet();
  
  // Implement synchronization
  CountDownLatch cdl = new CountDownLatch(data.size());
  
  for(int i=0; i<data.size(); i++){ XSSFRow sheetRow = sheet.createRow(i);// The order of the data should be the same as the order of the form
    String d = data.get(i);
    executor.submit(()->createSheetData(cdl,sheetRow,d);
  }

  cdl.await();
}

void createSheetData(CountDownLatch cdl, XSSFRow sheetRow,String data){
  //sheetRow do something
  cdl.countDown();
}
Copy the code





The results of


The business that the author needs to realize is to export two sheets, with a single sheet reaching 1W pieces of data, and the export time should not exceed 30 seconds.


In the absence of the above optimization, export time in about 20S, although meet the requirements, but the product is always teasing, too slow too slow.


I didn’t write the previous code, but after reviewing the logic, I decided to optimize it, and the end result was that the export time was reduced to about 6 seconds.


6 seconds or because the configuration of the online machine is only 2 cores 1G, so no matter how to call system resources, in fact, there is not much resource call.


I hope it helps. Give it a thumbs up.