On January 18, 2019, alibaba MaxCompute developer community and Alibaba Cloud community jointly organized “Alibaba Cloud Computing Developer Salon big data technology special session” near Beijing Union University. At this technology salon, Alibaba senior technical expert Wu Yongming shared MaxCompute. Serverless based high-availability big data services and the secret behind MaxCompute’s low computing costs.

The following content is based on the video and PPT of the speech. What is MaxCompute

Big Data in Alibaba

First, we will introduce some background of Alibaba’s big data technology. As shown in the figure below, Alibaba actually started to lay out big data technology at a very early time. It can even be said that the establishment of AliYun is to help Alibaba solve technical problems related to big data. Today, almost all of Alibaba’s BU is using big data technology. In Alibaba, big data technology is not only widely used, but also very in-depth. In addition, the whole group’s big data system will eventually come together.







Overview of Computing Platform



Alibaba Cloud Computing Platform Business Division is responsible for the integration of Alibaba’s big data system, and is responsible for storage, computing related research and development of the whole group. The diagram below shows the simple architecture of Alibaba’s big data platform. At the bottom is the unified storage platform Pangu, which is responsible for storing big data. Storage is static, and computing power is needed to mine the value of data. Therefore, Alibaba’s big data platform also provides various computing resources, such as CPU, GPU, FPGA and ASIC. In order to make better use of these computing resources, it is necessary to abstract them uniformly and manage them effectively. In Aliyun, the unified resource management system is called Fuxi. Based on the resource management and scheduling system, Ali Cloud has also developed various computing engines, such as MaxCompute, stream computing Blink, machine learning PAI and Flash, a graph computing platform. On top of these computing engines, the platform provides a variety of development environments on which businesses are implemented.





This article focuses on the general-purpose computing engine MaxCompute. MaxCompute is a general distributed big data processing platform that can store massive data and perform general computations based on data.



Let’s start with two numbers. Currently, 99% of alibaba’s internal data storage and 95% of its computing tasks are hosted by MaxCompute. In fact, MaxCompute is the data center of Alibaba. The data generated by each BU of the group will be summarized into MaxCompute eventually, so that the group’s data assets can be continuously increased. Second, let’s look at some indicators. MaxCompute performs 2.5 times better in BigBench tests than the average open source system. In Alibaba Group, the number of MaxCompute clusters has reached tens of thousands of machines, and the daily workload has reached millions. Currently, the MaxCompute cluster also stores a very large amount of data and has long since achieved EB status, which is a global leader. In addition, MaxCompute is not only available within the group, but also for use by external enterprises, providing more than 50 industry solutions to date. Today, the storage and computing capacity of MaxCompute is growing at a very high rate every year, and with the help of Aliyun, MaxCompute can be deployed not only in China, but also in many countries and regions overseas.



MaxCompute System architecture



The system architecture of MaxCompute is similar to that of big data computing engines in the industry. As shown in the figure below, users can access through a number of clients, and data can be transferred to MaxCompute through the access layer. There is a management layer in the middle, which can manage a variety of user jobs and also carry a variety of functions, such as the most basic SQL compilation, optimization and running functions in big data. In addition, MaxCompute provides distributed metadata services, because without metadata management, there is no way to know exactly what the data is stored in. At the bottom of the MaxCompute architecture is where the actual storage and computation takes place.







MaxCompute Serverless



The concept of Serverless is very popular these days, and from the perspective of MaxCompute developers, Serverless is both delightfully and frustratingly popular. The good news is: The MaxCompute technology team began to develop similar functionality many years before the Serverless concept was proposed, and the design concept is fully consistent with Serverless, which shows that MaxCompute is early in the Serverless layout and has some advanced technology. The frustrating thing is that while the MaxCompute team was early on conceptualizing Serverless and recognizing the value of this technology route, it was not able to package this capability early enough.







In Serverless concept put forward before, when all of the practice of big data are basically follow the steps below, the first to buy the server, set up the cluster, after the hardware can be installed on large data processing software “buckets”, after the import data, the query and write business need to do calculation, finally get the results. Of course, each of these steps involves operations. If the big data system is built by ourselves, the operation and maintenance work will run through the whole process. For the operation and maintenance students, they also need to be on standby all the time in case of system problems.



However, if you look back at the above steps, the real business value is the writing of the query and the actual computation in step 4, but the other steps consume a lot of resources and people, which can be an additional burden for a business-oriented company. Now, Serverless offers an opportunity to help eliminate this extra burden.

Using MaxCompute Serverless as an example, it takes only four steps to achieve the required business value. The first step is to open an account and create a project, which is both a data storage carrier and a computing carrier. Step 2 upload data; The third step to prepare the query statement and calculation; The fourth step is to get the result. There is no operational effort required throughout the process, so the development team can focus on the core areas where business value can be generated, which is the strength of Serverless. In fact, it represents the changes that new technology has brought to daily work, enabling people to focus more on the parts that generate core value, rather than the additional or ancillary work.

And other open source software or big companies directly to provide clients with a large data packages, MaxCompute is big data services provided, and the services provided by the can achieve 365 (day) X 24 (hours) of high reliability, offers both large data calculation ability, also provides the ability of data storage, And enable users to truly realize the big data technology out of the box.

In the process of realizing big data platform Serverless service, we need to face many challenges. In this paper, we will focus on the following three issues:

  • How do big data services continue to iterate and upgrade
  • The trend of big data services: automation and intelligence
  • Data security

Challenges and solutions for continuous improvement and release



If you want to implement a big data system, the topic of continuous improvement and release is absolutely unavoidable. This is because users are always coming up with a variety of new requirements, and in order to meet these requirements, the big data platform needs to be constantly upgraded and improved, while at the same time, it needs to face the millions of jobs running on the platform every day, and ensure that the service is 24/7. Then, how to achieve a smooth and safe platform upgrade process, and make users unaware of the release of new features, in addition to the need to ensure the stability of the new version, no bugs and performance backoff problems, and in the event of problems can quickly stop losses, etc., these are the issues that need to be considered. In addition, the tension between testing and data security needs to be considered during continuous improvement and release. In short, continuous improvement and release on a big data platform is like changing an engine on a flying plane.





Facing the above problems and challenges, ali Cloud MaxCompute team has done a lot of work. At present, in the field of big data, MapReduce is not user-friendly to the industry has become a consensus, SQL-like big data engine has become the mainstream. For SQL-like processing, there are three key steps: compilation, optimization, and execution. Thus, MaxCompute also gets to the heart of the issue, developing Playback tools for compilation and optimization and Flighting tools for execution. In addition to the two tools used in the test and validation phase, MaxCompute also provides greyscale tools when it comes to live.

Playback tools



Playback was born in the context of MaxCompute’s need to quickly improve the expressibility and performance optimization of compilers and optimizers. After major changes are made to the compiler and optimizer, it becomes a challenge to ensure that the compiler and optimizer are problem-free. The original approach is to put some existing SQL on the compiler and optimizer and analyze the results manually to find problems with the compiler and optimizer improvements. But to do it manually, it could take a hundred years to complete, which is obviously intolerable. Therefore, the MaxCompute team wanted to self-validate the new compiler optimizer with the power of the big data computing platform.



The principle is to use the powerful and flexible power of MaxCompute itself to collect all the real user queries and put them into the MaxCompute system, to run the user queries in a large scale and distributed way, and to complete the compilation and optimization of these jobs through the new compiler optimizer. Because MaxCompute follows the industry-standard pattern, it is easy to plug job plug-ins into a task to verify compilation optimizations and expose problems if they exist.



In summary, Playback leverage MaxCompute’s flexible and powerful computing power to analyze a large number of user tasks, as well as UDF support and good isolation for compilation optimization and validation. This makes it very easy to validate the compiler optimizer for the new version, which is now possible with the power of the big data platform.

Flighting tools



Flighting is a run-time specific tool. In fact, the common approach to test tuning an improved optimizer is to set up a test cluster to test it. This is the most common and natural approach, but it can also be a huge waste of resources because the cost of testing the cluster is very high. Because the test environment is different from the real environment, the test cluster cannot simulate the real scenario. In addition, for validation in a test environment, testers are required to create data that may be too simple. Otherwise, you need to take some data from the real scene, but to do so requires not only the user’s consent, but also data desensitization, which is also a very troublesome thing. Not only is there a risk of data leakage, but the final data may be different from the real data, making it impossible to verify problems with the new functionality in many cases.



Real clusters and environments online are used by MaxCompute’s runtime validation tool Flighting. Resource isolation enables 99% of the computing resources in the MaxCompute cluster to be handed over to the user to run jobs, while the remaining 1% is handed over to Flighting tools to test new functionality of the executer. This is very efficient, not only does it eliminate the need to drag and drop data, but it can run directly in a production environment and easily expose problems with real actuators.



Gray line



The MaxCompute platform has grayscale capabilities that minimize risk by classifying tasks by importance, publishing them at a fine-grained level, and supporting instantaneous rollback.



Continuous development validation iterative process for the MaxCompute platform

Shown below is the continuous development validation iteration process for the MaxCompute platform, which combines Playback tools for compiler optimizer, Flighting tools for runner, and grayscale lighting capabilities. In this process, the second cycle does not have to wait for the first cycle to complete before starting, which makes the whole r&d process more efficient. At present, the continuous development and delivery process of MaxCompute has strong technical advantages in compilation optimization, execution and grayscale, and has strong technical competitiveness in the entire industry, which enables MaxCompute to evolve better and bring more and better functions to customers. It also ensures that the upgrade of the MaxCompute platform will not affect the normal use of users.



Automation and intelligence

With the continuous development of ARTIFICIAL intelligence and big data technologies, services now need to be not only highly available, but also automated. In view of this idea, it can be viewed from two perspectives, that is, from the perspective of the service itself and from the perspective of user operation and maintenance. For the service itself, the first need to achieve automation, and further can achieve artificial intelligence.



Any service will have all kinds of problems, all kinds of bugs, and when problems occur, traditional solutions rely on human resources to solve them. The ultimate goal of service automation and intelligence is to find and solve problems through the self-healing ability of the system without manual repair. The goal of service automation and intelligence needs to be realized step by step. From the perspective of user operation and maintenance, it means that the operation and maintenance work of the user is transferred to the big data platform on the cloud to complete the big data task, such as Ali Cloud MaxCompute. At this time, it is equivalent to transferring the operation and maintenance work of the customer to the MaxCompute team. The third step is that the MaxCompute team also hopes to get rid of the tedious operation and maintenance tasks of the big data platform, realize the automation and intelligent operation and maintenance of services, further release the energy of the MaxCompute team, and put all the energy into more valuable work. The purpose of realizing big data service automation and intelligence is to improve service reliability, liberate productivity, and give back to users by doing more valuable things. The basic idea is to collect and monitor the service indicators regularly, and take timely measures against abnormal indicators.

In addition to the automation and intelligence of big data services, it is also critical for user queries to automatically identify queries that can be optimized and provide optimization suggestions. The SQL queries you write may not be the most efficient. There may be more efficient ways or better techniques, so you need to automatically identify the parts of the user query that can be optimized, and give corresponding optimization recommendations. For this part, it is also the direction of development in the industry, and currently the MaxCompute team is also carrying out corresponding research work.

The low cost of MaxCompute makes MaxCompute more competitive, but behind the low cost, MaxCompute uses the technology dividend to actually reduce the cost. In order to reduce the cost of big data tasks, MaxCompute is mainly improved in three aspects: computation, storage and scheduling.

  • In computing, MaxCompute itself was optimized calculation engine, to improve performance, reduce the number of homework takes up resources, both approach can make the platform can carry more computing tasks, each task or can reduce the computation time, or can reduce the size, could reduce the computing tasks occupied resources, This frees up more resources for more computing tasks.
  • On the storage side, there are also various ways to optimize. For example, optimize compression algorithm, use more advanced column storage, classify data storage, archive cold data, and recycle useless data in time.
  • In order to reduce the cost, the optimization methods mainly include intra-cluster scheduling and cross-cluster global scheduling.



There are many problems that need to be solved for intra-cluster scheduling. On the one hand, there are many job instances waiting to be scheduled at any given moment, and on the other hand, there are many resources waiting to run jobs.



Intra-cluster scheduling



So how to combine the two is very critical, but also very test scheduling ability. At present, the scheduling decision frequency of MaxCompute big data platform reaches 4000+/s. The scheduling decision made here is to put a certain size of jobs into corresponding resources, so as to meet the resources required by the job without wasting resources.





In terms of performance, incremental scheduling should be supported to make good use of resources. In addition, flexible quota capabilities are provided to enable peak load cutting and valley filling while multiple users are running computing tasks. In the real use of big data platform, it is impossible for everyone to submit tasks at the same time. In most cases, tasks will be submitted at the wrong peak, so there will be peaks and troughs of resource use. However, through flexible quota capability, peak clipping and valley filling can be achieved as far as possible to make full use of resources. For optimal task scheduling, the balance between delay and throughput needs to be considered. In terms of load balancing, you need to avoid some hot spots. In addition, the scheduling decision system of MaxCompute also supports task priority and preemption, which can realize real-time performance and fairness.

In addition, the scheduling of services and jobs is further divided for complex task scheduling. A task is a one-time job, whereas for a service the data job is continuously processed for a period of time after it is started. In fact, there is a big difference between service and job scheduling. Service scheduling requires high reliability and cannot be interrupted, because once interrupted, the cost of recovery will be very high. Jobs are more tolerant of interruptions and can be pulled up and run again.

Secondary quota group

For scheduling within a single cluster, this paper mainly introduces the concept of secondary quota groups, which is rare in general big data platforms. Before introducing second-level quota groups, this section describes quota groups. Simply put, a quota group is a resource pool that combines resources in a physical cluster into a group. On big data platforms, jobs run in quota groups, and jobs in the same quota group share all resources in the quota group. However, the common quota group policy has many problems. For example, if a tier 1 quota group has 1000 CPU cores and 100 users share the quota group, at some point a user submits a very large job that occupies the entire quota group’s 1000 CPU cores, causing all other users’ jobs to wait.



To solve the problem in the above situation, a secondary quota group is required. Level-2 quota groups are further subdivided based on level-1 quota groups to set the upper limit of resources available to each user. Each level-1 quota group can be divided into multiple level-2 quota groups to better utilize resources. In a tier 2 quota group, resources can be shared. However, a user cannot use more resources than the upper limit of the tier 2 quota group, ensuring user experience.

Different from other big data systems, MaxCompute not only implements scheduling within a single cluster, but also implements global scheduling across multiple clusters, which is also a representation of MaxCompute’s powerful capabilities. A single cluster may not be able to scale due to various factors, so you need to scale horizontally at the cluster level.

For applications with large volumes of services, services must be run on multiple clusters to ensure high service availability. This requires global cross-cluster scheduling capabilities. When a job comes in, you need to analyze which clusters the job can be scheduled to and which clusters are currently available and idle. In this case, the problem of multiple versions of data is also involved, because the same data is stored in different clusters, resulting in data inconsistency. In addition, for different versions of the same data, specific computing and data replication strategies need to be considered, and Ali Cloud MaxCompute is also improved after a lot of experience accumulation.

Fourth, data security



For an enterprise, the biggest concern when using big data computing platform is data security. Judging from Alibaba’s experience, there are three main concerns for users in terms of data security. Number two: putting data on a platform or service, and whether the people who provide the service will see the data. Third point: hosting the data on the big data platform, if there is a problem with the platform, then what happens to the data.



MaxCompute handles all of these user concerns fairly well, and the user basically does not have to worry about data security. First, the MaxCompute platform has a comprehensive authentication and authorization mechanism. The data belongs to the user, not the platform. Although the platform helps the user to store and calculate the data, it does not have the ownership of the data. Neither other users nor the platform side can see the data without authorization. In addition, the dimensions of authorization are varied, including table level, column level, and job level. Because MaxCompute is a big data platform on the cloud, it naturally has the characteristics of multi-tenant, so it needs to implement data and computing isolation between tenants. For data, the MaxCompute platform implements physical isolation, fundamentally eliminating data security problems. In addition, the platform also provides strict permission control policies, users can not see other users’ data. In addition, MaxCompute provides full-link storage encryption for E2E, which is particularly important for financial institutions. CRC check is also provided to ensure the correctness of data, and this part of the ability has been used in Alibaba for many years, has been very perfect; Finally, the MaxCompute platform also provides users with data desensitization functions, convenient for users to use.

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.