Abstract: This article begins with an introduction to load testing, APM tool-based application and server monitoring, followed by some best practices for writing high-performance Java code. Finally, jVM-specific tuning techniques, database side optimization, and architectural tweaks are explored. The following is a translation.

introduce

In this article, we’ll discuss several ways to help improve the performance of Your Java applications. We’ll start by showing you how to define measurable performance metrics, then look at what tools are available to measure and monitor application performance, as well as identify performance bottlenecks.

We’ll also look at some common Java code optimizations and best coding practices. Finally, we’ll look at JVM tuning tips and architectural tweaks to improve the performance of Java applications.

Please note that performance tuning is a broad topic, and this article is just a starting point for exploring the JVM.

Performance indicators

Before we can start tuning the performance of our applications, we need to understand the non-functional requirements such as scalability, performance, availability, and so on.

Here are some performance metrics commonly used in a typical Web application:

  1. Average application response time

  2. Average number of concurrent users supported by the system

  3. Expected number of requests per second during peak load

These metrics can be monitored using a variety of monitoring tools and can be very useful for analyzing performance bottlenecks and tuning performance.

Sample application

We’ll use a simple Spring Boot Web application, described in this article, as an example. This application can be used to manage employee lists and exposes REST apis for adding and retrieving employees.

We will use this program as a reference to run load tests and monitor various application metrics in the following sections.

Identify performance bottlenecks

Load testing tools and application performance management (APM) solutions are commonly used to track and optimize the performance of Java applications. The key to finding performance bottlenecks is to load test various application scenarios and use APM tools to monitor CPU, IO, heap usage and so on.

Gatling is one of the best tools for load testing, providing HTTP protocol support and making it an excellent choice for HTTP server load testing.

Stackify’s Retrace is a mature APM solution. It is feature-rich and helpful in determining your application’s performance baseline. One of the key components of Retrace is its code analysis capability, which allows you to gather runtime information without slowing down your application.

Retrace also provides widgets to monitor memory, threads, and classes for JVM-based applications. In addition to the metrics of the application itself, it supports monitoring the CPU and IO usage of the server hosting the application.

Therefore, a full-featured monitoring tool like Retrace is the first step in unlocking your application’s performance potential. The second step is to recreate the actual usage scenario and load on your system.

It’s easier said than done, and it’s important to know your application’s current performance. That’s what we’re going to focus on.

Gatling load test

Gatling’s mock test scripts are written in Scala, but the tool also comes with a very useful graphical interface for documenting specific scenarios and generating Scala scripts.

After running the simulation script, Gatling generates a very useful HTML report that can be analyzed.

Define the scenario

Before starting the logger, we need to define a scenario that represents what happens when the user is browsing the Web application.

In our example, the scenario would be “launch 200 users, each making 10,000 requests.”

Configuration logger

As described in “Step 1 of Gatling”, create a Scala file named EmployeeSimulation with the following code:

class EmployeeSimulation extends Simulation {
    val scn = scenario("FetchEmployees").repeat(10000) {
        exec(
          http("GetEmployees-API")
            .get("http://localhost:8080/employees")
            .check(status.is(200))
        )
    }
    setUp(scn.users(200).ramp(100))
}Copy the code

Run load tests

To perform a load test, run the following command:

$GATLING_HOME/bin/gatling.sh-sbasic.EmployeeSimulationCopy the code

Load testing your application’s apis can help find extremely subtle and hard-to-find errors, such as database connection exhaustion, request timeouts under high load, high heap utilization due to memory leaks, and so on.

Monitoring applications

To develop Java applications using Retrace, you first need to sign up for a free trial account on Stackify. Then, we configure our own Spring Boot application as a Linux service. We also need to install the Retrace agent on the server hosting the application, as described in this article.

Once the Retrace agent and the Java application we want to monitor are started, we can go to the Retrace dashboard and click the AddApp button to add an application. After adding the application, Retrace will start monitoring the application.

Find the slowest point

Retrace automatically monitors applications and tracks the use of dozens of common frameworks and their dependencies, including SQL, MongoDB, Redis, Elasticsearch, and more. Retrace can help us quickly determine why an application is experiencing the following performance problems:

  • Does an SQL statement slow down the system?

  • Is Redis suddenly slow?

  • Is a particular HTTP Web service down or slowing down?

For example, the graph below shows the slowest component over a given period of time.

Code level optimization

Load testing and application monitoring can be useful to identify some of the key performance bottlenecks in an application. But at the same time, we need to follow good coding practices to avoid excessive performance issues when monitoring our applications.

In the next section, we’ll look at some best practices.

Use StringBuilder to concatenate strings

String concatenation is a very common and inefficient operation. Simply put, the problem with using += to append strings is that each operation allocates a new String.

The following example is a simplified but typical loop. The original connection was used and the builder was used:

public String stringAppendLoop() {
    String s = "";    for (int i = 0; i < 10000; i++) {        if (s.length() > 0)
            s += ",";
        s += "bar";
    }    return s;
}public String stringAppendBuilderLoop() {
    StringBuilder sb = new StringBuilder();    for (int i = 0; i < 10000; i++) {        if (sb.length() > 0)
            sb.append(",");
        sb.append("bar");
    }    return sb.toString();
}Copy the code

The StringBuilder used in the above code is very effective for improving performance. Note that modern JVMS optimize string operations at compile or run time.

Avoid recursive

Recursive code logic that causes StackOverflowErrors is another common problem in Java applications. If recursive logic cannot be removed, tail recursion is a better alternative.

Let’s look at an example of header recursion:

public int factorial(int n) {    if (n == 0) {        return 1;
    } else {        returnn * factorial(n - 1); }}Copy the code

Now let’s rewrite it as tail recursion:

private int factorial(int n, int accum) {    if (n == 0) {        return accum;
    } else {        return factorial(n - 1, accum * n);
    }
}public int factorial(int n) {    return factorial(n, 1);
}Copy the code

Other JVM languages (such as Scala) already support tail-recursive code optimization at the compiler level, although there is some controversy surrounding this optimization.

Be careful with regular expressions

Regular expressions are useful in many scenarios, but they tend to have a very high performance cost. It’s important to understand the various JDK String methods that use regular expressions, such as String.replaceall (), string.split ().

If you have to use regular expressions in computation-intensive code segments, then you need to cache references to Pattern to avoid double compilation:

static final Pattern HEAVY_REGEX = Pattern.compile("(((X)*Y)*Z)*");Copy the code

Using popular libraries such as Apache Commons Lang is also a good choice, especially for string manipulation.

Avoid creating and destroying too many threads

Thread creation and disposal is a common cause of performance problems for JVMS, because thread object creation and destruction are relatively heavy.

If your application uses a large number of threads, it is even more useful to use thread pools because thread pools allow these expensive objects to be reused.

To that end, Java’s ExecutorService, the foundation of thread pools, provides a high-level API to define and interact with the semantics of thread pools.

The Fork/Join framework in Java 7 is also worth mentioning, as it provides tools to try to use all available processor cores to help speed up parallel processing. To improve parallel execution, the framework uses a ForkJoinPool to manage worker threads.

The JVM tuning

Tuning heap size

Determining the appropriate JVM heap size for a production system is not a simple matter. The first step is to answer the following questions to predict memory requirements:

  1. How many different applications are planned to be deployed into a single JVM process, such as EAR files, WAR files, jar files?

  2. How many Java classes might be loaded at run time, including classes from third-party apis?

  3. Estimate the space required for in-memory caching, for example, internal cached data structures loaded by the application (and third-party apis), such as data cached from databases, data read from files, and so on.

  4. Estimate the number of threads the application will create.

These numbers are hard to estimate without testing them in real-world scenarios.

The best and most reliable way to learn about your application’s requirements is to perform actual load tests on your application and track performance metrics at run time. The Gatling-based test we discussed earlier is a good way to do this.

Select the appropriate garbage collector

The stop-the-World (STW) garbage collection cycle is a big issue that affects the responsiveness of most client-facing applications and overall Java performance. However, current garbage collectors mostly solve this problem and, with appropriate optimization and resizing, eliminate the perception of collection cycles.

Profilers, heap dumps, and verbose GC logging tools can help. Again, note that this all needs to be monitored in real-world load mode.

For more information about the different garbage collectors, see this guide.

JDBC performance

Relational databases are another common performance problem in Java applications. To get response times for full requests, we naturally had to look at each layer of the application and think about how the code would interact with the underlying SQL DB.

The connection pool

Let’s start with the well-known fact that database connections are expensive. Connection pooling is an important first step in solving this problem.

It is recommended to use HikariCP JDBC, a very lightweight (about 130Kb) and extremely fast JDBC connection pooling framework.

JDBC batch

Persistence should perform batch operations as much as possible. JDBC batching allows us to send multiple SQL statements in a single database interaction.

This can result in significant performance improvements on both the driver side and the database side. PreparedStatement* PreparedStatement* is a great batch command. Some database systems (such as Oracle) only support batching prepared statements.

Hibernate, on the other hand, is more flexible, allowing us to quickly switch to batch operations with just one configuration change.

The statement cache

Statement caching is another way to improve the performance of the persistence layer, which is a little-known but easy-to-learn performance optimization method.

You can cache a PreparedStatement either on the client side (the driver) or on the database side (the syntax tree or even the execution plan) as long as the underlying JDBC driver supports it.

Scaling of scale

Database replication and sharding are great ways to increase throughput, and we should take full advantage of these proven architectural patterns to extend the persistence layer of enterprise applications.

Architecture to improve

The cache

The price of memory is low and getting lower, and the performance cost of retrieving data from disk or over the network remains high. Caching naturally becomes critical in terms of application performance.

Of course, introducing a separate caching system into your application’s topology does add architectural complexity, so you should take advantage of the existing caching capabilities of the libraries and frameworks you currently use.

For example, most persistence frameworks support caching. Web frameworks such as Spring MVC can also use the caching support built into Spring, as well as the powerful HTTP-level caching based on ETags.

Horizontal scaling

No matter how much hardware we have in a single instance, there will always be times when we run out of hardware. In short, scaling has inherent limitations, and when the system encounters these problems, scaling horizontally is the only way to handle more load. This step is certainly quite complex, but it is the only way to scale the application.

This is well supported by most modern frameworks and libraries, and will get better and better. The Spring ecosystem has a complete set of projects dedicated to addressing this specific application architecture area, and most other frameworks have similar support.

In addition to improving Java performance, scaling out through a cluster has other benefits, such as redundancy and better handling of failures by adding new nodes, which improves overall system availability.

conclusion

In this article, we’ve explored a number of concepts around improving the performance of Java applications. We started with load testing, APM tool-based application and server monitoring, followed by some best practices for writing high-performance Java code. Finally, we looked at JVM-specific tuning techniques, database side tuning, and architectural tweaking.