Looking at the data in front of me, it was like receiving a death notice, so that people like a bolt from the blue, the mind is shaken. About a month ago, during the routine service inspection, I suddenly saw the memory usage data of the following service process. It was this data that led me to a difficult memory optimization journey.

Not hope for hope

Java memory optimization is a systematic knowledge that requires keen insight and calm analysis. When it comes to Java program memory problems, we can’t simply use past experience to directly locate the problem. So we need extreme calm, careful analysis, and a fixation on data. Most of the time, however, we are swayed by our past experience and blindly believe in our own assumptions. This seemingly hopeful idea is only a self-comforting deception.

When he first got this data, he was surprised at the 22.2 percent RSS memory footprint, but he immediately had a very strong guess: stack memory overflow. This is because the service process code-named cloud.hfData did have a memory overflow problem at some point in the past, due to an internal file object builder being improperly invoked. Because of this, white assumes that it must be a stack overflow, and thus ignores several key data features.

Start with a well-trod stack Memory analysis combo: Test environment + JMAP + Memory Analyzer.

Memory analysis should not be performed directly in a production environment; this should be a basic tenet. So white in the test environment to get a sample:

While this may not seem like a lot of trouble, the test server actually has twice as much memory as the production server, so 9.1% RSS is pretty high. Then use jmap to get the sample memory data, run the following command in the JDK installation directory bin:

sudo ./jmap -dump:live,format=b,file=23856.hprof 23856

Next, Xiao Bai exports the Memory data to the local computer and uses JHAT or Memory Analyzer to view the specific data report. White uses the Memory Analyzer plug-in in Eclipse. The following report was obtained through Memory Analyzer:

Figure 1 heap memory object data

There was no abnormal peak in the JVM process heap memory when this data was obtained, and the 23.5MB of memory was a far cry from the 719MB+ RSS. While the heap memory is normal, Whitey is a little bit obscured, but whitey still thinks the stack is suspicious. So toss about an afternoon, irritable small white mood can be imagined. Another day of lightning strikes. Ahhhh…

A ray of hope before despair

Sometimes luck is part of strength. When we don’t see the problem, find a mirror. While the little white was wondering whether the stack was responsible for the ghost, at one point I suddenly came up with an idea: why don’t I find another JVM service process to compare? Yes, find a working JVM service process and take a look at its stack. Say to do, and then white to find the codename cloud. Throat service process:

Then move on to our stack analysis one-two punch: Test environment + JMap + Memory Analyzer. Again, we can get an analysis report:

Figure 2. Control sample stack

From the comparison of Figure 1 and Figure 2, it can be seen that although there are slight differences between them, the differences are not great. It was this report that allowed Bai to eliminate the stack problem. So that the small white line of sight to return to the data itself, to objectively find the reason.

A last resort

The stack overflow was ruled out, but the case was at a dead end once again. Now that everything is back to square one, let the work begin. By this time, small white also slowly calm down. If you need to solve the problem, you first need to locate the problem. But to locate the problem, you need to look for clues. If you need clues, you need data.

Well, let’s gather some basic data first. Although the road ahead some detours, but also not without value, at least some data. Since it is to solve the memory problem, then the first to understand the memory components of the program. Here white had to amass the jstat tool. Jstate is a performance statistics tool for the JVM. It allows us to look at JVM memory usage, such as the new generation, the old generation, and the eternal generation (called Metadata in the new version). Learning from the above lessons, Xiao Bai collected the data of Cloud. Hfdata and Cloud. Throat at the same time:

Do not give up diagnosis

From the above data, we can extract several key indicators EC, OC and MC. There was little difference in Cenozoic capacity EC. Although the capacity OC of the old generation is quite different, there is no superline. However, the MC is different. There is a difference of nearly 100MB between cloud. Hfdata and cloud. The difference is only 100MB, but is it normal? Small white think is abnormal. 24% of metadata is very unhealthy.

Of course, the ratio does not fully prove that metadata memory problems. So White redeployed cloud. Hfdata and collected the initial memory ratio:

From the longitudinal comparison of time, EC, OC and MC all increased to different degrees. In terms of proportion, they increased by 45.3%, 820% and 148.9% respectively. At first glance, it looks like OC is the big problem. But don’t forget, forget the dose of proportion is brush rogue. Let’s look at the increased doses: 82MB, 41MB and 103MB.

In addition, it is important to know that EC and OC are constantly changing as the program is running, while metadata is not usually changing dramatically. Because metadata is typically used to store static resources such as program code, it is less likely to change dramatically. But MC has a huge increase, which is extremely abnormal. At last, Xiao Bai collected the memory changes of Cloud. Hfdata and Cloud. Throat for several days:

As can be seen from the above, Cloud. Throat basically maintained the memory ratio of 6.4%, while cloud. The other thing that is more important is that the stack memory has been constrained. If it’s really the new generation or the old generation that’s out of memory. The program must be forced to exit because of a stack exception. So White is sure metadata is sick.

The discovery of the culprit

It is normal for Java programs to use some metadata memory, and it is normal for some metadata memory to increase. However, if metadata memory continues to increase significantly, it becomes a big problem. The new question is, what is causing the increase in metadata?

With this question in mind, Xiao Bai looked up some information about metadata. Meta Space, which stores metadata data, is the new JVM’s replacement for perM (permanent generation). One of the reasons for the increase in Meta Space memory is too many reflection calls. More specifically, too many classes are loaded through the ClassLoader.

Above this reason, let the small white feel that he is about to get the answer. Then Xiao Bai starts to analyze whether there is a program in cloud.hfdata that uses reflection. But I did a lot of work, and I didn’t find anything suspicious. Was it wrong again?

As I continue to analyze the code, I will continue to collect more information about metadata. As time goes by and more information collection, finally an object appears in the eyes of the little white: com. Googlecode. Aviator. AviatorEvaluator, this is an expression of Google engine tools, used in text expression is calculated. Similarly, Apache’s Exp4 computing engine.

Finally, a passage from the official document makes Bai really sure that the real killer is aviator:

Figure 3 comes from the AviatorScript official document

Final redemption

So much evidence points to Aviator as the killer. And the small white also obtains the solution from the official website: cache. Then Xiao Bai needs to redesign the Aviator.

First, Xiao Bai tests the code comparison to determine the performance before and after caching:

public void testCache() throws Exception { Date startTime = new Date(); int count = 100000; for(int i = 0; i<count; i++) { Expression script = AviatorEvaluator.getInstance().compile("#`v_a`+#`v_b`", true); Map<String, Object> env = new HashMap<String, Object>(2); env.put("v_a", i); env.put("v_b", i+1); Object value = script.execute(env); } system.out.println (string.format (" Cache time: %d ms", new Date().gettime () -starttime.gettime ())); System.out.println(String. Format (" cache memory usage: %d", Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory())); } public void testNoCache() throws Exception { Date startTime = new Date(); int count = 100000; for(int i = 0; i<count; i++) { Expression script = AviatorEvaluator.getInstance().compile(String.format("%d+%d", i, i+1)); Object value = script.execute(); } system.out.println (string.format (" No cache time: %d ms", new Date().gettime () -starttime.gettime ())); System.out.println(String. Format (" Memory usage with no cache: %d", Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory())); }Copy the code

Figure 4 time and memory for Aviator without cache

Figure 5 Aviator cache time and memory

It can be seen from the above experiments that in the case of 100000 Aviator expression engine calls, whether there is cache will greatly affect the time and memory usage of Aviator. It is also proved from the side that aviator without cache is a meta space memory problem.

conclusion

Finally, Xiao Bai re-optimized the service program of Cloud.hfData. It’s been a difficult time, but it’s been fruitful. Some thoughts on performance optimization for Java programs:

First, when locating problems, don’t be influenced by past experience or experience, and keep objective and calm.

Second, the positioning of problems must be based on data, using statistical analysis and comparison tools.

Thirdly, I am familiar with Memory model of JVM program and analysis tools such as JMAP, JSTAT, MAT (Memory Analyser Tool)

Fourth, summarize the process, form their own set of performance optimization solutions.