This article will introduce how to perform performance tuning for Serverless applications.

SpringBoot is a suite based on the Java Spring framework that comes preloaded with Spring components, allowing developers to create standalone applications with minimal configuration. In the cloud-native world, there are a number of platforms that can run SpringBoot applications, such as virtual machines, containers, and so on. But one of the most attractive is to run SpringBoot applications Serverless. I will analyze the merits and demerits of SpringBoot application running on Serverless platform from five aspects, including architecture, deployment, monitoring, performance and security, through a series of articles. To make the analysis more representative, I chose Mall, an e-commerce app with more than 50K stars on Github, as an example. This is the fourth article in a series that shows you how to apply performance tuning to Serverless.

Instance startup speed optimization

In the previous tutorial, I believe you all felt the beauty of Serverless. You can easily launch a flexible and highly available Web application by uploading the code package and image. However, it still has the problem of “cold start delay” for the first startup. The Mall app starts up in about 30 seconds, and users will experience a long cold start delay, which may not be a disadvantage in this “real-time era”. (” cold start “is the state in which a function serves a particular invocation request. When there is no request for a period of time, the Serverless platform will reclaim the function instance; The next time there is a request, the system pulls up the instance again in real time, a process called cold boot.)

Before optimizing cold start, we should first analyze the time consuming of each stage of cold start. First, enable the link tracing function on the service configuration interface of the FC console.

Make a request to the mall-admin service. After the request is successful, check the FC console and see the corresponding request information. Note that “View function errors only” is turned off so that all requests are displayed. There is a delay in monitoring indicators and collecting link data. If the indicator is not displayed, refresh the indicator after a while. Locate the request for the cold start flag and click Request Details under More.

The call link displays the time of each link of cold start. Cold start includes the following steps:

  • PrepareCode: Mainly download code packages or images. Since we have enabled image acceleration, there is no need to download all images, so this step takes a very short time.
  • Run-time initialization: starts from the start function until the function computing (FC) system detects that the application port is ready. This includes the application startup time. Run s mall-admin logs on the command line to check the corresponding log times. We can also see that the Spring Boot application takes a lot of time to start.
  • Application Initialization: The Initializer interface is provided for function calculations. You can perform Initialization logic in Initializer.
  • Invocation Delay: The delay in processing a request, which is very short.

From the link tracing diagram above, instance startup time is the bottleneck, which can be optimized in a variety of ways.

1.1. Using reserved Instances

Java applications generally start slowly. An application needs to interact with many external services during initialization, which takes a long time. Such processes are required by business logic and are difficult to optimize latency. Therefore, function calculation provides the function of reserving instances. The reserved instance starts and stops with the user’s own control and will stay there without requests, so there is no problem with cold starts. Of course, the user will have to pay for the entire instance to run, even if the instance does not handle any requests.

In the function computing console, we can set up instances for functions on the “Elastic Scaling” page.

The user configures the minimum and maximum number of instances in the console. The platform reserves the minimum number of instances, which is the maximum number of instances under this function. You can also set periodic reservation rules and reservation rules by indicator.

After a reservation rule is created, the system creates a reservation instance. When the reserved instance is in place, there will be no cold start when we access the function again.

1.2. Optimize instance startup speed

Lazy initialization

In Spring Boot 2.2 and later, you can turn on a global lazy initialization flag. This will increase startup speed at the cost of a potentially longer delay on the first request as you wait for the component to initialize the first time.

The following environment variables can be configured for related applications in S. aml

SPRING_MAIN_LAZY_INITIATIALIZATION=true

Close the optimized compiler

By default, the JVM has multiple stages of JIT compilation. While these phases can increase the efficiency of an application over time, they can also increase the overhead of memory usage and increase startup time. For short-running Serverless applications, consider turning this optimization off to sacrifice long-term efficiency for shorter startup times.

The following environment variables can be configured for related applications in S.aml:

JAVA_TOOL_OPTIONS=”-XX:+TieredCompilation -XX:TieredStopAtLevel=1″

Example of setting environment variables in S. yaml:

As shown in the figure below, configure the environment variables for the mall-admin function. Then run sudo -e s mall-admin deploy.

Log in to the instance to check whether environment variables are correctly configured

Locate the request in the request list on the Console function details page, and click The Instance Details link in More.

On the Instance Details page, click Login Instance.

Run the echo command on the shell interface to check whether environment variables are correctly set.

Note: For non-reserved instances, the functional computing system will automatically reclaim the instance after no request has been received for a period of time. The instance cannot be logged in again (the login instance button on the instance Details page above is grayed out). So make the call and log in as soon as possible before the instance is reclaimed.

Set proper instance parameters

When we choose an application instance specification, such as 2C4G or 4C8G, we then want to know how many requests an instance can handle to make full use of resources while maintaining performance. When the number of requests exceeds the threshold, the system can eject instances quickly to ensure smooth application performance. How to measure instance loading of multiple dimensions, such as QPS exceeding a certain threshold, or instance CPU/Memory/Network/Load exceeding a threshold, etc. The function calculation uses Instance Concurrency as a measure of the load on an Instance and a basis for Instance scaling. Instance Concurrency is the number of requests that an Instance can execute simultaneously. For example, setting instance concurrency to 20 means that an instance can execute a maximum of 20 requests at any one time.

Note: Distinguish between instance concurrency and QPS.

Using example concurrency to measure load has the following advantages:

  • The system can quickly count the instance concurrency index value to expand and shrink the capacity. Instance-level indicators, such as CPU, Memory, Network, and Load, are collected in the background and need to be counted for tens of seconds before scaling, which cannot meet the elastic scaling requirements of online applications.
  • Under various conditions, the index of instance concurrency can reflect the system load level stably. If the request delay is taken as an indicator, it is difficult for the system to distinguish whether the delay is increased due to overload of instances or the downstream service becomes a bottleneck. A typical Web application, for example, would normally access a MySQL database. If the database becomes a bottleneck and request latency is high, expanding at this point not only makes no sense, but also makes the situation worse by overwhelming the database. QPS are related to request latency, which can also be problematic.

Despite the above advantages, users often do not know what instance concurrency to set. I recommend following the following process to determine a reasonable level of concurrency:

  1. Set the maximum number of instances of the application function to 1 to ensure that the performance of a single instance is measured.
  2. Use the load load tool to pressure the application and look at metrics such as TPS and request latency
  3. Incrementally increase the instance concurrency, and continue to increase if the performance is still good; If the performance is not as expected, turn down the concurrency.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.