Summary: In the Serverless architecture, although we pay more attention to our business code, we also need to pay attention to some configuration and cost, and when necessary, we need to optimize the configuration and code of our Serverless application according to the configuration and cost.

In the Serverless architecture, although we pay more attention to our business code, we also need to pay attention to some configuration and cost, and when necessary, we need to optimize the configuration and code of our Serverless application according to the configuration and cost.

Resource assessment is still important

Although the Serverless architecture is pay-per-volume, it does not necessarily mean that it is lower than the traditional server rental cost. If we do not accurately evaluate our own project and set some indicators improperly, the cost of the Serverless architecture may be huge.

In general, FaaS fees are directly related to three metrics:

Configured memory specifications;

The time consumed by the program;

And the data charges incurred.

Typically, the time consumed by the program may depend on memory size and the business logic that the program itself is processing. The traffic charge is related to the size of the packet between the program and the client. Therefore, among these three common indicators, the memory specification may lead to a large deviation in the charging due to non-standard configuration. Taking Ali Cloud function calculation as an example, we assume that there is a Hello World program, which will be executed 10,000 times every day, and can calculate the cost of instances of different specifications (excluding network cost) :

Ali cloud

As you can see from the above table, when the program runs properly in 128MB, if we incorrectly set the memory size to 3072MB, the monthly cost will increase by 25 times! So before we launch the Serverless application, we need to evaluate the resources in order to get a more reasonable configuration to further reduce our costs.

Reasonable code package specifications

Each cloud vendor’s FaaS platform has a limit on the size of the code package. If the cloud vendor’s limit on the code package is removed, the impact of the code package specification can be seen through the cold start process of the function:

In the process of function startup, there is a process of loading code. If the code package uploaded is too large, or the decompression speed is too slow due to too many files, the process of loading code will be longer, and the cold startup time will be longer.

Imagine that we have two compression packages, one is only 100KB code compression package, the other is 200MB code compression package, both at the same time under the gigabit Intranet bandwidth ideal (that is, regardless of disk storage speed, etc.), even though the maximum speed can reach 125MB/S, The download speed of the former is less than 0.01 seconds, while the latter takes 1.6 seconds. In addition to the download time, there is also the decompression time of the file, so the cold boot time between the two can be 2s different.

In general, a response time of more than 2s for a traditional Web interface is actually unacceptable for many businesses, so we need to keep the compression size as low as possible when we package our code. Take the Node.js project as an example. When packaging code packages, Webpack and other methods can be used to compress the size of dependent packages, further reduce the overall code package specifications, and improve the cold start efficiency of functions.

Rational reuse of examples

In FaaS platforms of various cloud vendors, instance reuse exists in order to better solve the problem of cold startup and make more rational use of resources. Instance reuse is when an instance completes a request and is not released, but goes into a “silent” state. Within a certain period of time, if a new request is allocated, the corresponding method will be directly called without the need to initialize all kinds of resources, which greatly reduces the occurrence of cold start of the function. To verify, we can create two functions:

Function 1:

# -*- coding: utf-8 -*-

def handler(event, context):

print(“Test”)

return ‘hello world’

Function 2:

# -*- coding: utf-8 -*-

print(“Test”)

def handler(event, context):

return ‘hello world’

We click the “Test” button on the console for several times to Test these two functions and determine whether they output “Test” in the log. We can count the results:

According to the above situation, we can see, in fact, the case of instance reuse does exist. Because “function 2” does not always execute some statement other than the entry function. From “function 1” and “function 2”, we can also further consider if the print(“Test”) statement is an initial database connection, or if it is loading a deep learning model, is “function 1” written to execute every request, And function 2 is written in such a way that you can reuse an existing object, right?

So in a real project, there are some initialization operations that can be implemented as “function 2”, for example:

  • In machine learning scenarios, the model is loaded during initialization to avoid the efficiency problem caused by loading the model every time the function is triggered and improve the response efficiency in instance reuse scenarios.
  • Database and other link operations, can be initialized at the time of the establishment of link objects, avoid each request to create a link object;
  • Other scenarios that need to download files for the first time, load files, implement this part of the requirements at the time of initialization, can be more efficient at the time of instance reuse;

Take advantage of functional features

FaaS platforms of various cloud vendors have some “platform features”. The so-called platform features mean that these functions may not be the capabilities specified in CNCF WG-Serverless Whitepaper V 1.0 or described capabilities, but are only used as cloud platforms according to their own business development and demands. From the perspective of the user, and the implementation of the function, may be only in a cloud platform or several cloud platforms have the function. Such features, if properly used, will generally improve our business performance.

Pre-freeze & pre-stop

Taking Aliyun function computing as an example, in the process of platform development, users’ pain points (especially the smooth migration of traditional applications to Serverless architecture) are as follows:

  • Asynchronous Background Indicator Data delay or loss: If data is not successfully sent during a request, it may be delayed until the next request or data points may be discarded.
  • Synchronous send metric adds latency: If a Flush like interface is invoked at the end of each request, it not only adds latency to each request, but also imposes unnecessary stress on the back-end service.
  • Function gracefully offline: The application needs to clean up connections, close processes, and report status when the instance is closed. There is no way for the developer to know when an instance goes offline in the function calculation, and there is also a lack of Webhook notification function instance goes offline events.

Runtime extensions are published based on these pain points. This functionality extends the existing HTTP service programming model by adding PreFreeze and PreStop Webhooks to the existing HTTP server model. The extension developer implements an HTTP handler that listens for function instance lifecycle events, as shown below:

  • PreFreeze: Before the function calculation service decides to freeze the current function instance, the function calculation service calls the HTTP GET /pre-freeze path. The extension developer is responsible for implementing the corresponding logic to complete the necessary operations before freezing the instance, such as waiting for indicators to be sent successfully. The time when the function InvokeFunction is called does not include the execution time of PreFreeze Hook.

  • PreStop: Before each function calculation decides to stop the current function instance, the function calculation service calls the HTTP GET /pre-stop path. The extension developer is responsible for implementing the corresponding logic to ensure that the necessary operations before the instance is released, such as closing the database link, reporting and updating the status, etc.

2. Single instance with multiple concurrent operations

As we all know, the function calculation of various manufacturers is usually isolated at the request level, that is, when the client initiates three requests to the function calculation at the same time, theoretically three instances will be generated to deal with it. At this time, the problem of cold start may be involved, and the problem of state association between requests may be involved. However, some cloud vendors provide the ability of single instance and multiple concurrent operations (such as Ali Cloud Function Computing), which allows users to set an InstanceConcurrency for a function, i.e. how many requests a single function instance can handle at the same time.

As shown in the figure below, assuming that there are three requests to be processed at the same time, when the instance concurrency is set to 1, the function calculates that it needs to create three instances to process the three requests, and each instance processes one request respectively. When the instance concurrency is set to 10 (that is, one instance can handle 10 requests at the same time), the function computes that only one instance needs to be created to handle the three requests.

Single instance multiple concurrent effect diagram

The advantages of single-instance multi-concurrency are as follows:

  • Reduce execution time and save costs. For example, I/ O-biased functions can be processed concurrently within an instance, reducing the number of instances and thus the total execution time.
  • State can be shared between requests. Multiple requests can share the database connection pool within an instance, reducing the number of connections to the database.
  • Reduces cold start probability. Because multiple requests can be processed within an instance, fewer new instances are created and the probability of cold starts is reduced.
  • Reduce VPC IP address usage In a system with the same load, the number of concurrent instances can be reduced and THE VPC IP address usage reduced.

The application scenario of single-instance multi-concurrency is quite extensive. For example, the scenario in which the function spends a lot of time waiting for the response of the downstream service is suitable for this function. However, single-instance multi-concurrency is not suitable for all application scenarios, for example, when the function has shared state and cannot be accessed concurrently. Scenarios where the execution of a single request consumes a large amount of CPU and memory resources are not suitable for using single-instance multi-concurrency.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.