Please indicate the original source, thank you!

HappyFeet blog

Really stop using POI!!


1. A brief review

Applications in the production environment crashed frequently. After a lot of investigation, I found that a large number of POI objects were generated when I used POI framework to read Excel data, which led to a sharp increase in the application memory usage. Moreover, these POI objects were not properly recycled after processing Excel, resulting in memory leakage. Eventually the app crashed due to OOM.

Production environment configuration:

  • The memory of the application server is 8 GB
  • Java application running with Docker image, Java heap configuration:-Xms2G -Xmx6G

Solution: Use Ali’s open source EasyExcel framework to replace the POI.

2. The Story

(1) The first encounter

On a Friday afternoon, users reported that the system seemed to be down…

That’s right! It was Friday afternoon, just as I was getting ready to leave work and start my happy weekend.

Log in to the application server through the jumper to confirm the application status.

There were only a few WARN logs connected to the database when the application Crash occurred, and no exceptions were thrown:

ERROR:

ERROR | 2019-11-29 14:26:18:152 | [XNIO-3 task-27] api.LoggingExceptionHandler (LoggingExceptionHandler.java:80) - UT005023: Exception handling request to /someUrl
java.lang.IllegalStateException: io.undertow.server.RequestTooBigException: UT000020: Connection terminated as request was larger than 20971520
Copy the code

This is an error reported by the servlet when processing the HTTP request, because we limit the upload file size to 20M, so this is a normal error.

Then I went up and found that there were several errors similar to the above log, so it should be that the user tried to upload for several times and failed to upload, so it is not the problem of this place. There were no obvious errors or exceptions. At the same time, o&M personnel found the kill -i record in the execution command on the application server.

In view of this situation, preliminary speculation has been made:

  1. It is possible that human factors will applykillOff;
  2. Memory overflow may cause the application Crash.

Here’s why:

  • The application log suddenly disappears, and with no error, the application dies. I tested it locally and started the application locallykillThe log behaves very similar to the application log;
  • Guess app suddenly Crash may be OOM, because before also appeared OOM, but specific reportedOutOfMemoryErrorThe error. In fact, I was quite confused at that time, generally there will be abnormal information in OOM, such asjava.lang.OutOfMemoryError: Java heap spaceI don’t want to Crash the car. I want to Crash the car. Since there is no GC log to view, this is actually a configuration error: we started the application with Docker and did have GC log configured at startup. The problem is that we did not map the GC log directory, so when the application was restarted, the GC log was lost.

Based on the above preliminary results, two things were done:

  1. Make sure the app is really artificialkillOff; (Later facts prove that the app is not artificialkill)
  2. Improved THE GC logging configuration and added some JVM configurations to enable it to run inOutOfMemoryErrorDump the memory snapshot of the heap.

The JVM parameters added are as follows:

-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCCause -Xloggc:/var/log/gc_%p_%t.log -XX:+UseGCLogFileRotation -XX:GCLogFileSize=2M -XX:ErrorFile=/tmp/jvm/hs_err_pid_%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumpFilePath
Copy the code
(2) Meet again

Another Friday afternoon! Users came back and said the system was down again…

That’s right! You heard me right! It’s Friday again…

I’m sorry, but it has to be done. And thinking about the configuration of the JVM added last time, this time how to find the problem!

Surprisingly, the application logs performed exactly the same as last time! Exactly the same! And new added – XX: + HeapDumpOnOutOfMemoryError – XX: HeapDumpPath = / dumpFilePath this parameter as if also didn’t work, didn’t find snapshots of the heap memory in the corresponding directory! (Consider why there is no memory snapshot here.)

Fortunately, this time there is a GC log.

From this GC log, there should be a problem because the heap size we applied is configured to: -xMS2g -XMx6g.

The GC log is checked, as shown below:

Obviously, there is something wrong with this GC! In less than 6s minutes, the heap memory expanded from 2376.0M to 5358.0M, and did not drop until the program crashed.

So guess there might be a memory leak.

So I had my operations colleague write a script to periodically monitor the application’s memory usage.

I also started examining user behavior at GC time points. According to the time of each GC, go to the application log to find the log for the corresponding time point and see what the user did at these time points.

After checking, it was found that the user uploaded Excel near GC every time. After the application saved Excel data in the server, it would read the Excel data. So I started thinking, could Excel be behind this?

Once I had this idea, I immediately asked my operations and maintenance colleagues to download one of the Excel files, which was more than 17M in size.

After I got the Excel, I wrote a local test loop to read the Excel data several times and use JProfile to check its memory usage. The result is as follows:

Just reading a 17M Excel memory directly takes nearly 4 GIGABytes! Since the maximum heap memory in the test configuration was 4 GB, you can also see the ongoing GC in the figure above. This is a test I started locally and did nothing but read Excel.

Here’s an illustration of the Excel data:

  • Size: a little over 17M;
  • There are two sheets; The first Sheet is about 500 rows of data, and the number of columns is about 30-40. The data amount of the second Sheet is similar to that of the first Sheet, but there are formulas in it.

In fact, only the data of the first Sheet is used in the code. So I thought to delete these useless data to try, and then the phenomenon is strange:

  • Delete the second Sheet, the size of Excel becomes more than 11M, reduced by more than 5M; (Reading this truncated version of Excel has slightly less memory than the original, but not much.)
  • Then I deleted the data of the first Sheet and found that the size of Excel did not change when the data was reduced to only 2-3 lines. It was so strange!

Okay, that’s a bit of a digression! Back to business.

Earlier, I talked about anticipating memory leaks and monitoring your application’s memory usage on a regular basis. Here we take a simpler approach: a timed script that executes jmap-histo PID every half hour and outputs the results to a specified file. When checking the file, we found a memory leak in the POI:

As shown in the figure, the first figure shows the memory usage of the application at 13:30:26 and the second figure shows the memory usage of the application at 14:30:26. At the same time, I checked the application log and found that the user uploaded an Excel at 13:03:22. After processing the Excel data, these POI objects should be collected by GC, but the fact is that these POI objects were not collected until 14:30:26. Then the service crashed at 14:42:08.

This is what causes frequent crashes. Then I looked it up and found a bunch of:

I just don’t get it: why are so many people using POI frameworks so easy to run out of memory?

(3) Finally solved

After finding the root cause of the problem, it was easy to solve. After searching the tools to read Excel, we finally decided to change to ali’s open source EasyExcel framework:

After the easyExcel framework was changed, JProfile was used to test the memory usage, and the above mentioned 17M Excel was also processed. The results are shown in the figure below:

Obviously, it looks much better than POI, with significantly lower GC frequency and memory footprint.

Finally, the problem was solved.

3. Summarize the reasons

There are still some questions left in the article:

  • Why is it configured?-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumpFilePathDoes not dump the memory snapshot?
  • Why does the program Crash have any exception log?

You can think of it in terms of the memory usage of the application server rather than going into it in this article.

Here are some lessons learned from the experience:

  1. Starting Java applications should log GC logs and output them to the specified directory; If Docker is used, map the log directory to the host. Such as:

    -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCCause -Xloggc:/var/log/gc_%p_%t.log -XX:+UseGCLogFileRotation -XX:GCLogFileSize=2M -XX:ErrorFile=/tmp/jvm/hs_err_pid_%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumpFilePath
    Copy the code
  2. Log some key steps in the code to facilitate troubleshooting of online problems;

  3. Don’t assume that the open source frameworks everyone uses are perfect. They can be buggy.

  4. Books to time square hate less, things through never know difficult; I have read the book “Deep Understanding of Java Virtual Machine” in the middle of 18 years before, but I am still not familiar with many places when solving the problems of OOM, so I still need to read some books for reference.

Finally, I write to myself, and also to everyone: no matter how busy you are, you should leave yourself some time to grow up!

The full text, I hope to help you!