EMAS Peng Zhao (Zhou Mu)

Summary: DevOps is a great software delivery concept that has been practiced on the server side, but can it be applied to mobile delivery as well? What are the differences and challenges of mobile DevOps versus server DevOps based on the differences between mobile and server scenarios? This paper shares the thinking, challenges and solutions of Alibaba Cloud original application DEVELOPMENT platform EMAS in the process of building cloud native Mobile DevOps, and deciphers its design architecture and technical details.

Introduction to Mobile DevOps

1. What is Mobile DevOps

1) DevOps, as it’s known

At this point in time, DevOps is no longer a new concept. I’m sure you all have some idea of what DevOps is, but when it comes to describing exactly what DevOps is, it’s hard to tell. In fact, there is no universally agreed definition of DevOps, and it is difficult to define precisely because DevOps is an idea, or even a collection of ideas, that is hard to visualize. The term “DevOps” can literally mean the entire lifecycle of software, from Dev (Development) to Ops (Operations), but what exactly is DevOps? Among the many definitions of DevOps, I think the Azure DevOps definition [1] is more precise and specific:

DevOps is a combination of development (Dev) and operations (Ops), combining people, processes, and technology to continuously deliver value to customers. What does DevOps mean to teams? DevOps enables previously isolated roles (development, IT operations, quality engineering, and safety) to coordinate and collaborate to produce better, more reliable products.

By adopting DevOps culture, practices, and tools, teams are better able to respond to customer needs, have greater confidence in the applications they build, and achieve business goals faster.

There are a few key messages in this definition to summarize: ① the integration of people, processes, and technology; ② DevOps enables previously isolated roles to coordinate and collaborate; ③ DevOps is an idea that is both cultural and supported by automation tools; and ④ The goal is to produce better, more reliable products faster

2) From DevOps to mobile DevOps

Since DevOps is a great idea for software delivery, why not apply DevOps to mobile delivery too? And that’s what we’re going to be talking about today, mobile DevOps. Mobile DevOps can be quite different from server DevOps because of the difference between mobile and server scenarios. It is mainly reflected in the following aspects:

Automated building of mobile applications is more complex

• Build environment fragmentation

Android and iOS platforms need to build build environments based on different operating systems and build toolchains. Version fragmentation exists even when building toolchains on the same platform. For example, Android SDK and Gradle rely on multiple versions at the same time. IOS builds on Xcode and Ruby versions that require multiple versions

• Mobile builds involve data security issues such as certificate hosting
• The Mac devices that iOS builds rely on are non-standard devices in the equipment room

Mac devices that do not belong to standard servers cannot be deployed in standard equipment rooms. Therefore, you need to build Mac equipment rooms by yourself, which poses challenges to o&M and stability.

Automated builds are an essential capability in DevOps, which requires mobile DevOps to solve the problem of client-side automated builds and one-click packages by technical means.

Mobile terminal fragmentation is serious, application delivery compatibility is a huge challenge

Different from the consistency of the server deployment environment, the mobile application running environment is very fragmented, and the compatibility test coverage is much more difficult than that of the server. Fragmentation of mobile terminals is particularly serious in Android system, which is mainly reflected in the following aspects:

• Fragmentation of mobile phone models

Android market has a large number of mobile phone manufacturers and a vast number of models, different manufacturers will do the underlying “optimization” of the system, theoretically any model not covered by the test may face compatibility problems, the following is the latest Baidu Statistics traffic Research Institute [2] Android Top model distribution in October 2020. The market occupancy rate of the Top 10 models is less than 15%, indicating the serious fragmentation of models

• Operating system version fragmentation

Differences in operating systems have a direct impact on application running. It is common to see incompatibilities caused by major system upgrades. Each major operating system release is a test of application compatibility. While considering compatibility with the new system, users of the old system should not be abandoned.

The figure below is the latest Android version distribution data from Baidu Traffic Research Institute in October 2020. It can be seen that Android 10.0 has been released for more than a year, but the market share is less than 50%. Two years ago, the operating system is still dominant

Due to fragmentation of end devices, mobile DevOps is required to have mobile testing capability and automate a large number of real machine compatibility tests.

The release and update cycle of mobile applications is long

A new version of an application may not be updated more than 50% within 2 weeks of release, unlike a server that releases software to all servers in a very short period of time. Long release cycles mean higher error costs, and a buggy release can take a long time to digest through updates.

This requires mobile DevOps to have a sound grayscale publishing mechanism to avoid releasing problematic applications to users at once. On the other hand, once a buggy version has been released, mobile DevOps needs to have hot fix capability, which can be made lighter and faster by releasing incremental fix packs.

Mobile applications run on a large number of mobile devices

Unlike the server service running in a specific cluster, unified control, operation and maintenance, mobile applications run on the user’s mobile phone, and for super apps like Handtao, it is a massive device with hundreds of millions of levels.

In this case, mobile monitoring products need to use big data technology to implement mobile terminal operation and maintenance monitoring, and even need remote log function to pull error logs from specified devices to locate and troubleshoot errors.

Based on the above points and referring to DevOps’ definition of software delivery life cycle, the mobile DevOps application life cycle and capability requirements at each stage are summarized as follows:

2. What is Mobile DevOps

1) Mobile DevOps is a concrete implementation of the EMAS Mobile DevOps concept

First, LET’s introduce EMAS (Enterprise Mobile Application Studio). EMAS is a leading cloud native Application development platform (Mobile App, H5 Application, small program, Web Application, etc.) from Aliyun in China. Backend as a Service, Serverless, DevOps, and low-code Backend provides one-stop application development and management services for enterprises and developers, covering the entire application life cycle, such as development, testing, o&M, and operation. For more information about EMAS, please refer to the EMAS details page on aliyun official website. Mobile DevOps is a concrete product output of EMAS Mobile DevOps concept. It is an axis product of EMAS, which integrates all EMAS products to realize the above Mobile DevOps concept. Mobile DevOps has realized the linkage and complete closed-loop of EMAS products, which were isolated in each application life cycle, as shown in the figure above, realizing the upgrade of EMAS from a Mobile middleware platform to a Mobile R&D platform. Mobile DevOps combines the following EMAS products to form Mobile DevOps of EMAS: RESEARCH and development domain: Mobile DevOps test domain: Mobile Test release domain: Mobile DevOps operation domain: Mobile monitoring, Mobile hot repair operation domain: Mobile push, Mobile user feedback

2) History of Mobile DevOps

Mobile DevOps is the commercial output version of the group’s internal Mobile RESEARCH and development platform. The first proprietary cloud output version was developed in 2017 by AliYun and Mopao teams, and the first public cloud output version was launched in April 2020.

The chart below shows the development history of Mobile DevOps. It can be said that the development history of Mobile DevOps is actually the development history of Alibaba Group’s Mobile R&D technology, which is the precipitation of Alibaba’s Mobile technology and engineering R&D concept in the past ten years.

3) The state of Mobile DevOps

Mobile DevOps Proprietary cloud mainly targets large customers, especially those undergoing digital transformation. These customers have high requirements for security and can only accept the deployment mode of private cloud and are willing to invest in improving r&d efficiency. In 2018, Mobile DevOps was officially exported in the form of private cloud scenarios. So far, it has created value for dozens of big customers in multiple industries and enabled enterprises to digitally transform their R&D processes. In the free public test of public cloud, compared with private cloud, Mobile DevOps public cloud is more for small, medium and micro enterprises. These customers have demands for the improvement of R&D efficiency, but are sensitive to price. Public cloud is a good way to undertake. At the same time, some external businesses of Alibaba Group (such as exclusive Dingding) cannot do Mobile DevOps based on the group’s internal RESEARCH and development platform. Mobile DevOps public cloud is also a good choice. Mobile DevOps public cloud has officially started free public testing since July 2020.At present, it has served a large number of small, medium and micro customers, as well as alibaba Group internal exclusive dingding, dingding government affairs, Singing duck and other customers.

Cloud native Mobile DevOps

Compared with private clouds, the construction of cloud native Mobile DevOps in public cloud scenarios faces more technical challenges. In this chapter, we will share with you our thoughts, challenges and solutions in the process of building cloud native Mobile DevOps.

1. Why do we need Mobile DevOps for public clouds

1) Provide inclusive Mobile DevOps service for small, medium and micro customers

Although proprietary cloud deployment has advantages such as exclusivity and Intranet security isolation, the high cost of proprietary cloud delivery is destined to be acceptable only to high-end players in the industry. The cost evaluation of Mobile DevOps in private cloud is as follows: • One-time investment: million-level single procurement cost • Continuous investment: at least 30 W/ year server cost + 20 W/ year human maintenance cost Based on the above cost calculation, the investment costs of the first, second and third years of private cloud are as follows: 150W, 50W and 50W total 200W, which is unacceptable for small, medium and micro customers. Ali Cloud as the infrastructure of the new era, the new era of hydropower and coal, it is necessary to provide inclusive cloud services for more small, medium and micro enterprises other than large customers. Mobile DevOps in the form of public cloud precisely conforms to such a concept. Based on the advantages of flexible expansion and volume charging of cloud native, the cost of Mobile DevOps can be greatly reduced for small, medium and micro customers. At the same time, in the public cloud scenario, DevOps r&d processes more suitable for target customers are provided according to the characteristics of micro, medium and small customers.

2) Linked EMAS product line provides one-stop mobile research and development platform for developers

The launch of Mobile DevOps in public cloud can effectively link EMAS ‘existing Mobile testing, Mobile monitoring, Mobile hot repair and other products, so that EMAS can cover the whole application life cycle, complete the upgrade of EMAS from Mobile middleware to Mobile R&D platform, and improve user experience and stickiness. EMAS one-stop mobile R&D platform has obvious advantages in cost, high availability and technical support compared with traditional self-built CI/CD platforms based on open source solutions such as Jekins and Gitlab Runner, and can one-stop cover application construction, testing, release, operation and maintenance, and operation life cycle management. Compared with the traditional self-built CI/CD “chimney” independent open source systems, the collaborative efficiency of r&d has obvious advantages.

2. Challenges faced by Public cloud Mobile DevOps

Mobile DevOps in public cloud mode faces more technical challenges than those deployed on private cloud networks and used by internal employees. These challenges are mainly reflected in the following aspects:

1) Security

• Tenant isolation The first problem faced by public clouds is tenant isolation, where different customers use shared resources at the same time but cannot see each other’s data. To build this kind of scenario, in addition to different customers build tasks may influence each other, build environment also involves the user code, certificate and other personal information, must want to have the perfect solution to ensure users build environment isolation, code, certificate, and the secret key private data security: building must be related to the user code, certificate, the secret key, These data are extremely private, and any problems in public cloud storage, transmission and use may cause significant losses to users. • External attacks Public clouds are exposed to the public network and can be used by anyone. In particular, the construction scenario involves a large number of custom commands. Therefore, a sound mechanism must be established to prevent hackers from running malicious custom commands and leaving a back door in the construction environment.

2) High availability

• Elastic capacity expansion When the public cloud service scale grows, the service capacity must be rapidly expanded to adapt to service growth. Otherwise, service exceptions may occur. This requires cloud products to conform to distributed architecture in technical implementation, especially to support stateless rapid expansion of cluster construction. • Stability of the build environment The build environment should be stable, avoiding damage to the build environment caused by attacks or abnormal use, such as environment variables and build tool chains. • High standard SLA, real-time online, never down high standard SLA is not only the commitment to customers, but also the awe of Ali Cloud brand.

3) Scalability

The number of proprietary cloud customers is limited, and there are perfect KA customer technical support services, so the differences of applications are limited and there are special personnel to support access. However, in the public cloud environment, there are many customers, and the diversity of application architecture puts forward higher requirements on the universality and expansibility of the system. • Diversified R&D process Different customers of public cloud have different R&D team size, R&D culture and r&d process, which also puts forward higher requirements for the scalability of Mobile DevOps R&D process.

3. Our solution

In view of the above challenges faced by public cloud Mobile DevOps, we solve them through technical means from the following two aspects:

1) Generic build architecture based on pipelining

Pipeline architecture makes construction universal, builds flow based on customized orchestration of pipeline, and expands pipeline business capabilities based on task plug-ins, which solves the above scalability problems well. This architecture has the following features: • Common build architecture that supports platform-wide build capabilities • YAML custom choreographer build processes • Pipeline-visual choreography • Pipeline-infinite scalability of task plug-ins

2) Cluster building based on containerization/virtualization

Using a container (Linux)/virtualization (Mac Os) scheme can resolve various due to the security and stability of the problems of resource sharing, each build tasks on new container/virtual machine is running, build tasks/virtual machine is destroyed, immediately after the completion of the container can not only effectively separated between tasks running environment, build environment is also commonly used often “new”, It can effectively avoid the problem that the construction environment is destroyed; In addition, a stable stateless containerization/virtualization build cluster can ensure the high availability of build services. In chapters 3 and 4, we will expand on these two points respectively, deciphering their design architecture and technical details.

General construction architecture based on pipeline

1. Technical pre-study

In fact, there are many products based on Pipeline design in the industry, especially foreign similar products, such as Azure DevOps Pipeline and Github Actions two excellent Pipeline products, These two products have many advantages over other products in terms of feature richness, ease of use, documentation and user scale. Azure DevOps, formerly Visual Studio Team Services (VSTS), is a software development collaboration platform with a history of more than ten years. Its Azure Pipeline product was released in April 2018 [3]. Github Actions was released in August 2019 [4] and was a heavyweight product released after Microsoft acquired Github. Both are relatively new platforms in general, and Azure Pipeline is just over 2 years old. An interesting phenomenon was found in the pre-study. Since Github is a subsidiary of Microsoft, the two pipeline products are not only similar in design concept, but also share their Mac virtualization solutions and even the Mac virtualization cluster room. Github Actions compared to Azure Pipeline is more streamlined and elegant, in addition, Github Actions still continues the style of Github open source, its Pipeline plug-ins are open source, although it is only online for more than 1 year, there have been 5000+ open source plug-ins. This is a gold mine from a plug-in perspective, and if these plug-ins can be used directly in Mobile DevOps, the basic pipeline of functional plug-ins will be aligned with the open source community. Considering the possibility of supporting these open source plug-ins in the future, the final Mobile DevOps design architecture also embraces the Github Actions of the open source community.

2. Core concepts of assembly line

• Pipeline Pipeline, the smallest unit to be triggered to run. A pipeline can contain one or more jobs • Job A Job is the smallest unit to be scheduled. A Job can be classified into Agent (cluster construction) and Agentless (server) jobs based on the execution environment to which a Job is scheduled. Multiple jobs can be executed concurrently without dependencies or sequentially with dependencies. The relationship between multiple jobs can be represented in a DAG diagram. Each Job can contain one or more steps • steps

A Step is the smallest unit to be executed. Each Job consists of multiple sequential stepsCopy the code

The Task,

A Task is a Task plug-in with predefined specifications and functions that can be declared by reference in a Step. A Step contains only one TaskCopy the code

3. Technical architecture of the assembly line

The pipeline consists of the following core systems:

1) Pipeline process engine

Responsible for triggering, scheduling, state flow execution of pipeline, and maintenance of pipeline metadata information. Pipeline trigger module The trigger module is responsible for triggering the execution of a pipeline, supporting manual, timer, event (Git event, webhook callback, etc.) three trigger modes. Trigger is the only entry to pipeline execution. In this layer, checksum check of caller can be done, and different trigger parameters can be passed to control pipeline execution and scheduling process. Pipelining module Pipelining defines a SET of DSL language for describing a pipelining. Based on this DSL language, a pipelining that can be scheduled and executed can be defined exactly. Pipeline execution module The pipeline execution module mainly ensures that all jobs in the pipeline are executed in parallel or in sequence according to the correct dependency relationship, and updates the real-time status of pipeline flow in real time.

2) Job scheduling engine

A Job is the smallest unit to be scheduled in an assembly line. The Job scheduling engine is responsible for dispatching every Job generated from the assembly line process engine to the correct construction cluster machine.

3) Integration engine

There are two types of task plug-ins in the pipeline. One is Agent task, such as Android and iOS construction, which requires a specific construction environment, so it is natural to expect to be scheduled to the construction machine by the Job scheduling engine. Another category of tasks are Agentless tasks, such as approvals, notifications, and external system calls. These tasks can be performed on the normal server side without consuming valuable build resources and are scheduled by the Job scheduling engine to be executed on the integration engine. Most Agentless tasks are related to external service integration.

4) Channel service

Channel The Channel is responsible for constructing communication links and implementing protocols between the cluster and the server. The system provides the following functions: • Unified authentication of requests for cluster construction For security purposes, the cluster is in a different VPC from other microservices, and the network is completely isolated to ensure that the cluster cannot directly access the server Intranet. Based on this background, the construction cluster in the above “pipeline technology Architecture diagram” accesses the server through public NETWORK HTTPS request, which requires authentication of the construction machine request. A Channel is the closing of the authentication server. • Unified closing of requests for cluster construction A cluster needs to maintain the heartbeat, status report, pull task, and task execution status with the server in real time. A Channel is the closing of these requests and is responsible for allocating requests of different services to different micro-services.

5) Build a cluster

The construction cluster is mainly responsible for pulling and executing the Agent class construction task, and the services running in the construction cluster are responsible for starting the isolated build environment matching the task type: • Linux platform Docker container startup Android build based on Linux platform, Linux platform Docker container scheme is the only choice of environmental isolation, Start serverless Docker container based on ACK Serverless (Ali Cloud public cloud K8S product) and perform automatic destruction recovery. Cloud-native ACK ServerLess maximizes the flexibility of cluster construction, hardly occupies any computing resources without building, and greatly controls the construction cost. • Starting virtual machines under the Mac OS platform Due to the ecological limitations of Apple, iOS and Mac App construction can only be carried out under the Mac OS system, while the current Mac OS does not have a mature container scheme similar to Docker class that can be used. Finally, we realized environment isolation based on the virtualization scheme. We have built a Mac virtualization cluster based on the cloud architecture. Mac physical resources are fully pooled, which enables the cluster to quickly expand and shrink flexibly, fully in line with the concept of cloud native. Each build dynamically creates a virtual machine from the virtualized cluster and destroys it immediately after it is built. It is worth mentioning that Mac virtualization cluster is our technical advantage. In chapter 5, we will detail Mobile DevOps’ practice in the direction of Mac virtualization cluster.

4. Mac virtualization to build a cluster

At present, Mobile DevOps’ Mac virtualization cluster construction solution is in the absolute leading position in China. We are “perhaps” the first DevOps platform in China to implement iOS construction based on Mac virtualization technology. There are almost no manufacturers supporting iOS construction in China, and the underlying reason is actually the Mac virtualization technology limitation: The traditional Mac physical bare-metal construction can only be used in the internal environment and does not meet the conditions of public cloud open services. Mac virtualization is the technical advantage of Mobile DevOps.

1. Virtualization solution selection

Due to the limitations of the Mac OS kernel, the current Mac OS platform containerization scheme is extremely immature, and the only way to isolate the Mac OS environment is virtualization.

Select the virtualization type

The following figure shows the two types of virtualization solutions, both of which are based on Hypervisor implementation. The comparison between the two solutions is as follows:

Virtualization solution 1: • Non-host OPERATING systems (oss) are directly based on hypervisors to virtualizes VMS. This virtualization solution has high resource utilization and is more suitable for cloud services. • A more suitable virtualization scheme for desktop users is to virtualize VMS based on the Hypervisor on the OS of the host computer. • With the OS of the host computer, hardware compatibility is better. Considering that Mobile DevOps provides public cloud services, scheme 1 can improve resource utilization more effectively. Hardware compatibility can be circumvented simply by choosing the right hardware product. Apple Ecosystem is closed and has many security compliance restrictions. The Mac platform has the following legal compliance restrictions:

1.MacOS must run on Apple hardware. 2

Compared with the above four virtualization schemes, only Scheme 4 has both Apple ecological compliance and compatibility, and scheme 4 is actually the virtualization scheme 1 we chose in the previous section. Based on the above virtualization types and apple ecological security compliance and compatibility considerations, we finally choose the above scheme 4.

2. Virtual clusters in the cloud architecture

To provide common build services on the cloud, it is not enough to have a virtualization solution, but a cloud-compliant virtualization cluster solution to meet Mobile DevOps’ clustering requirements:

1 Mac hardware resource pooling – All Mac resources in a cluster should be stateless. All Mac hardware resources form a resource pool and can be uniformly allocated and scheduled by the cluster.

(2) Elastic capacity expansion – Public cloud services have certain flexibility, which requires that virtual clusters can adapt to business scenarios and rapidly expand and shrink to keep up with the growth rate of services.

③ High availability – In the case of some Mac hardware devices being damaged, the cluster can quickly and automatically assign tasks to new virtual machines, improving the task execution success rate.

From a single VIRTUAL machine to a virtual machine cluster, in addition to Mac hardware resource pooling, we also need to solve the newly introduced distributed storage and distributed network problems. From a single virtual machine to a virtual cluster, the following figure shows:

5. Future prospects

future

Currently, public cloud Mobile DevOps is still in beta, and there are still many directions to work on: • Increase the ability to build error intelligence analysis and hints. In the case of a large number of public cloud users, the construction error answering is a huge human cost. The subsequent technical means such as keyword matching, big data analysis, and even AI automatic error classification should be used to directly indicate the cause of the construction error, so as to reduce the cost of manual answering. Connect Mobile DevOps with a complete app development lifecycle • Better connect with the community. Support migration of Github Actions, Azure Pipeline and other platforms to Mobile DevOps; • Enhance the ability to be integrated so that the Mobile DevOps Mobile development platform can be better integrated into the customer’s existing development process • Deeply optimize the efficiency of application compilation and construction, reduce the application construction time. Ultimate goal is to build time of the application of cloud significantly shorter than the local building, let developers intuitive feel the advantage of building on the cloud If you move to build compilation technology, research and development of mobile technology, or interest in cloud native sense of direction, and is a technical challenge of person you like, you are welcome to join us, our goal is to “do the international leading mobile enterprise brand”. ➡️ Click here for job information.

Citation:

[1]Azure DevOps: What is DevOps? [2] GitHub Build CI/CD in charge [4] GitHub build CI/CD in charge