Source | alibaba cloud native public number

In 2020, the COVID-19 pandemic has catalyzed the transformation of digital lifestyles into the norm. Today, enterprises are actively engaged in digital transformation and comprehensively improve efficiency. Almost no one denies that Serverless, born with the mission of “reducing cost and increasing efficiency”, will become a new computing paradigm in the cloud era.

Serverless is unleashing a new revolution in cloud computing productivity by freeing developers from the burden of manual resource management and performance optimization.

However, the problem of Serverless landing is often very difficult, such as how to migrate traditional projects to Serverless while ensuring the business continuity of the migration process, how to provide perfect development tools and effective debugging diagnostic tools under the Serverless framework, How to use Serverless to do a better cost saving, each is a difficult problem.

Especially when it comes to large-scale landing Serverless in mainstream scenarios, it is not easy. This makes it all the more urgent for the industry to call for best practices for scaling up Serverless core scenarios.

The total transaction volume was 498.2 billion yuan, and orders reached a peak of 583,000 transactions per second. In 2020, Tmall Double 11 set another record. For Ali Cloud, this year’s Double 11 also has another significance: Ali Cloud achieved the first large-scale deployment of Serverless in the core business scenario in China, and overcame the world’s largest traffic peak, creating a milestone of Serverless landing application.

Serverless landing pain

Challenge 1: Cold startup takes a long time

Quick bounce is a natural attribute of Serverless, but the condition of quick bounce is to have the extreme cold start speed to support. In non-core services, millisecond latency is almost unaffected. However, for core business scenarios, delays of more than 500 ms can affect the user experience.

Although Serverless utilizes lightweight virtualization technology, it continuously reduces cold startup, even to less than 200 milliseconds in some scenarios. But this is just the ideal independent running scene, on the core business link, the user is not only running your own business logic, but also rely on the middleware, database, storage and other back-end services, these services connection will be made when the instance is started to build, this virtually increases the cold start time, and then the cold start time extended to the second level.

Second cold starts are unacceptable for core online business scenarios.

Challenge 2: Disconnect from the R&D process

Serverless’s main scenario is to write business code just like writing business functions, which is simple and quick to go online, allowing developers to write code on the cloud and complete online easily.

In reality, however, the requirements of the core business pull developers back from the cloud to face several soul questions: How do you test? How to gray line? How do I implement service Dr? How do I control permissions?

When developers answer these questions, they become disheartened. It turns out that “functional functioning” is only a small part of the core business launch, which is as far away as the Yangtze River.

Challenge 3: Connectivity of middleware

Core online services are not independent functions. They need to be connected to storage, middleware, and background services in data. After data is obtained, data is calculated and output is returned to users.

Traditional middleware clients need a series of operations such as opening up the network with customers and initializing the connection, which often slows down the function startup speed.

In the Serverless scenario, the instance life cycle is short and the number of instances is large, which will lead to the problem of frequent connection establishment and large number of connections. Therefore, network connectivity optimization is carried out for the clients of middleware commonly used in online core applications, and monitoring data is opened for calling links. To help SRE practitioners better evaluate the downstream middleware dependencies of functions, it is very important for core application migration Serverless.

Challenge 4: Poor visibility

Most of the core business applications of users adopt microservice architecture, so the problems of core business applications will have the characteristics of microservices. For example, users need to conduct detailed checks on various indicators of the business system, including not only business indicators, but also resource indicators of the system where the business is located. But in a Serverless scenario where there is no concept of machine resources, how do these metrics translate? Can only the error rate and concurrency of requests be revealed to satisfy the needs of the business side?

In fact, the business side needs more than that. Whether the business trusts your technology platform is also a factor in how well you do observability. Observability is an important prerequisite to win the trust of users.

Challenge 5: Remote debugging is difficult

When the core business has an online problem, it needs to immediately enter the investigation, and the first element of the investigation is: on-site retention, and then login for debugging. There is no machine-level concept in the Serverless scenario, so if a user wants to log on to a machine, it is difficult to do so on top of the existing Serverless infrastructure technology. Of course, the reasons are not limited to this, such as vendor-Lockin’s concern.

The above broad categories of pain points are mainly for the development experience of developers, for the actual development scenario, whether it is really “improved “, rather than a new bottle of old wine. Most of the core application developers are still on the fence about Serverless, and there is certainly some skepticism that “FaaS is only suitable for small business scenarios and non-core business scenarios”.

Serverless double 11 “Big Exam”

In December 2019, consulting firm O’Reill released its Serverless Usage Survey, which found that 40% of respondents in their organizations had adopted Serverless. In October 2020, China Academy of Information and Communication Technology released the “China Cloud Native User Research Report” pointed out that: “Serverless technology is heating up significantly, nearly 30% of users have been applied in the production environment. In 2020, more and more enterprises choose to join the Serverless camp, eagerly waiting for more cases of Serverless scale implementation of core scenarios.

In the face of the steady growth of the number of Serverless developers, Alibaba has formulated the strategy of “building Serverless Double 11” at the beginning of the year. The purpose is not just to resist traffic and hit the peak, but to reduce costs and improve resource utilization. Build Ali Cloud Serverless into a safer, more stable and more friendly cloud product through the “Double 11 technological alchemy furnace” to help users achieve greater business value.

Different from the double 11 of the past 11 years, following the cloud on the core system of Tmall Double 11 last year, Alibaba has realized a comprehensive cloud biogenesis based on the digital native business operating system, and the upgrading of the underlying core technology has brought surging power and extreme efficiency. IT costs per 10,000 peak transactions, for example, are down 80 per cent from four years ago to support peak order creation. Serverless also ushered in its first large-scale implementation under the core scenario of Double 11.

Scenario 1: Multiple front-end scenarios

On November 11, 2020, The front end of Alibaba Group fully embraced the cloud original Serverless, tao, Flying Pig, Autaavi, CBU, ICBU, Youku, Kaola and other dozens of BU, and jointly implemented the integrated cloud research and development mode with Node.js FaaS online service architecture as the core.

This year, on the premise of ensuring stability and high resource utilization, the r&d mode of the key marketing guide scene of multiple BU has been upgraded. The average delivery improvement of cloud integrated R&D mode supported by front-end FaaS is 38.89%. Relying on the convenience and reliability of Serverless, taobao, Tmall, Feizhu and other double 11 venue pages quickly landing SSR technology, improve the user page experience, in addition to the guarantee of great promotion, daily flexibility is also reduced by 30% than before computing costs.

Scenario 2: Personalized recommendation scenario

Serverless’s natural elastic scalability is the most important reason for the “Personalized recommendation business scenario” to be implemented by Serverless. The operation and maintenance cost of thousands of heterogeneous applications has always been a pain point in this scenario. Serverless further releases operation and maintenance, allowing developers to focus on business algorithm innovation.

At present, the application scope of this scenario is becoming wider and wider, covering almost the entire Alibaba APP: Taobao, Tmall, Alipay, Youku, Feizu and so on. Therefore, we can make more optimization on the utilization rate of machine resources. Through intelligent scheduling, the utilization rate of machine resources reaches 60% at the peak.

Scenario 3: Middle and background scenarios

In 2020, century Lianhua Double 11 was flexibly expanded based on Ali Cloud function computing (FC), and applied in multiple scenarios such as SSR, online commodity seconds, fixed coupon issuance, industrial shopping guide, and data computing. The peak QPS of business exceeded 230% of that of Double 11 in 2019. R&d efficiency delivered by more than 30%, flexible resource cost reduced by more than 40%.

Of course, there are many more scenarios for Serverless that need to be enriched by developers in more industries. In general, FaaS achieved a very impressive performance this year. In the Double 11 promotion, FaaS not only undertook some core businesses, but also had a new high traffic volume, helping the business to withstand the traffic peak of one million QPS.

Ali Cloud how to beat Serverless pain points?

So, in the face of the industry’s common Serverless landing pain, ali Cloud how to overcome it?

1. Reservation mode + quantity mode eliminates cold start

In a major upgrade of Serverless 2.0 in 2019, Ali Cloud Function Computing was the first to support reservation mode, followed by AWS Lambda a few months later.

Why did Aliyun take the lead in raising this issue? Ali Cloud Serverless team constantly explores the needs of real business. Although the pay-per-quantity model is very attractive, the cold start time is too long, so the core online business is shut out. Next ali Cloud focuses on the analysis of the core online business appeal: small delay, ensure resource flexibility. So what’s the solution?

See the figure below, a very typical business graph, with reservation mode to meet fixed volume at the bottom and elastic capacity to meet burst requirements.

For burst capacity expansion, two capacity expansion methods are combined: capacity expansion by resource and capacity expansion by request. For example, users can only set the capacity expansion threshold of CPU resources to 60%. When the INSTANCE CPU reaches the threshold, capacity expansion will be triggered. In this case, new requests are not sent to the expanded instance immediately, but are diverted after the instance is ready, thus avoiding cold start.

Similarly, if the capacity expansion threshold of the concurrency indicator is set to 30 (the concurrency of each instance), the same process will be triggered after the condition is met. If both indicators are set, the conditions met first trigger capacity expansion.

Through a variety of scaling methods, Ali cloud function computing solves the Serverless cold start problem, and supports delay-sensitive business well.

2. R&d improvement of core business 38.89%

“Improving efficiency” should be the advantage of Serverless, but for core applications, “fast” = “high risk”, users need to go through CI test, daily test, pre-release test, gray deployment and other processes to ensure the quality of the function. These processes are a stumbling block to FaaS adoption in core applications.

To solve this problem, Ali Cloud function computing strategy is “be integrated”, combining the advantages of the research and development platform with Ali Cloud function computing, which can not only meet the CI/CD process of users, but also enjoy the dividends of Serverless and help users to cross the gap of using FaaS.

Ali Group integrates the OpenAPI of exposed standards with the R&D platform of each core application. After the verification of the r&d of the Double 11 business, the r&d efficiency is greatly improved by 38.89%. On the public cloud, we integrate with the cloud performance platform, integrate the R&D process with FaaS more closely and smoothly, and help businesses outside the group to improve human efficiency.

3. The middleware is connected

Core applications can not be separated from upstream and downstream cooperation. Once core applications use functional computing, how to cooperate with middleware? Traditional application development needs to integrate various kinds of middleware SDK, package online, but for Serverless functions, the size of the code package is a hard problem, this problem will directly affect the cold start time.

Ali Cloud function computing has gone through two stages of development. In the first stage, we built middleware Proxy to get through middleware through Proxy. The function only used a single protocol to interact with Proxy, so as to offload the burden of middleware SDK.

The second stage: with the sink of middleware capabilities, some requirements for control types were also put on the agenda, such as command delivery, traffic management, configuration pull and so on. During this period, Aliyun embraced the open source component Dapr and offloaded the intermediate interaction costs in the way of Sidecar.

The above scheme is based on ali Cloud function calculation of the Custom Runtime, and Custom Container function to complete.

4. The ultimate development experience

Remote debugging, log viewing, link tracing, resource utilization, and improving the surrounding toolchain are all required capabilities for developers. Ali Cloud function computing started different research groups at the same time. Firstly, it combined with Tracing/ARMS to create clear link Tracing capability, and integrated with SLS to create comprehensive business data monitoring.

As a result, the business can be customized on demand and embrace the open source product Prometheus which exposes resource utilization and supports remote debugging capabilities of WebIDE.

In addition, Ali Cloud recently opened source: Serverless-Devs, a developer tool without vendor binding, to help developers achieve double the efficiency of development/operation and maintenance under the Serverless architecture. Developers can simply and quickly create applications, project development, project testing, release deployment, etc., to achieve the full life cycle management of the project.

Serverless initial pain point has many, why Ali Cloud can Serverless landing to all walks of life?

First, Ali Cloud provides the most complete Serverless product matrix among all cloud vendors, including FC for function computation, SAE for Serverless application engine, ASK for container orchestration, and ECI for container instances.

Rich product matrix can cover different scenarios. For example, function calculation provides rich event source integration capability and extreme flexibility of 100 milliseconds scaling for event triggering scenarios. For microservice applications, the Serverless application engine can achieve zero code transformation, so that microservices can also enjoy the Serverless dividend.

Secondly, Serverless is a rapidly developing field. Ali Cloud is constantly expanding the product boundary of Serverless. For example, function computing supports container mirroring, prepaid mode, concurrent execution of multiple requests within instances and other industry-initiated functions, which completely solves Serverless problems such as performance burr brought by cold start and greatly expands the application scenarios of function computing.

Finally, Ali economy has very rich business scenarios, which can further polish Serverless’s landing practice. This year, the double 11 core business scenes of Amoy, Kaola, Flying Pig, Autonavi and other BU of Ali economy all used Ali Cloud function calculation, and successfully held the peak of Double 11.

Serverless leads the next decade

“The biggest radical labor productivity, as well as the use of labor is more skilled, skills and judgment, and seemed to be the result of the division of labor”, this is an excerpt from a speech of “the wealth of nations”, emphasizes the “division of labor” stakes, any industry, the size of the market, the greater the division of labor will be more fine, this is the famous “Smith’s theorem”.

Again, this theorem is also suitable for industry application software market, as the traditional industry has entered the phase of the Internet, the size of the market is more and more big, the division of labor is more and more thin, physical machine hosting era has become history, replaced by mature IaaS layer, followed by container services, is now also is the industry standard.

So what is the next decade of technology? The answer is: Serverless, flattening of the developers in the budget, the lack of operational experience, in the fight against flood peak of the business case, the vast majority of research and development also can easily control processing, not only greatly reduces the threshold of r&d technology, greatly increased the efficiency of research and development at the same time, online tools such as early warning, flow observation from soup to nuts, technology research and development of free operations easily done, It can be said that Serverless is a more fine-grained division of labor, which makes business developers no longer pay attention to the bottom operation and maintenance, but only focus on business innovation, thus greatly improving labor productivity. This is the effect of “Smith’s Theorem”, and also the internal reason for Serverless to become an inevitable trend in the future.

At present, the whole cloud product system has been Serverless, more than 70% of the products are Serverless form. Serverless products such as object storage, message middleware, API gateway and table storage have been well known to developers. Over the next decade, Serverless will redefine the programming model of the cloud and reshape the way enterprises innovate.

Course recommended

In order for more developers to enjoy the dividends brought by Serverless, this time, we gathered 10+ Technical experts in the field of Serverless from Alibaba to create the most suitable Serverless open course for developers to learn and use immediately. Easily embrace the new paradigm of cloud computing – Serverless.

Click to free courses: developer.aliyun.com/learning/ro…