Under the background of social development driven by technological change, a large number of technological innovation, creation and entrepreneurial applications supporting large-scale distributed applications have emerged. Technical terms such as Could Native, Service Mesh and Serverless have triggered a large number of interpretations and discussions worldwide. This article is collated from alibaba’s senior technical expert Li Yun’s sharing at QCon Beijing 2019. It will take you to see the essence and driving force behind it, better grasp the technology trend and establish your own thinking logic.

The topic around solving the technical challenges of large-scale distributed applications always attracts a lot of attention, and CNCF’s concept of cloud native takes this topic to an unprecedented new level.

According to the current development status of the industry, cloud native is the key path to the future of distributed applications. In addition, the EMERGENCE of CDF from the CI/CD perspective to help solve the challenges surrounding distributed applications in the future. We ask, “What is the future of distributed applications?” In the face of this problem, it is not realistic to give a specific description that everyone can understand and be consistent at the present stage. However, it is not difficult to give a relatively vague concept that captures the key point. The author used distributionalism to express it.

Technology that is transforming the world is not just about technology, but also about business interests. Understanding the driving forces allows us to more accurately capture the advantages of new technologies and think about the possible endpoints of future development, which is critical for technology decision makers of all kinds, from Ctos to CIOs.

The ultimate paradigm for solving complex problems



Technology ultimately serves business and society, but in a situation where all roads lead to Rome, how can one be judged to be better than the other? Or is there a fixed paradigm that we can use to understand the advantages of new technologies as they evolve? In the author’s opinion, this ultimate paradigm is “divide and rule after abstraction”.

The exploration of solutions to complex scale problems is always dynamic and gradual, and will go through a continuous iterative process of understanding the problem and searching for better solutions, during which part of the “old concept” is broken and the “new concept” is reshaped.

For example, at the beginning of the practice of micro-service software architecture, people focused on “how to dismantle”, “how big to dismantle” and the matching of technology and organizational structure (Conway’s Law). The core idea was to break down individual applications into smaller software distribution units. In order to solve the problem of slow iteration of software in single application (which leads to the slow creation of commercial value).

However, when the number of micro service improvement work completed and micro reached a certain scale, the connection between the services, each fault, security, monitoring and other problems are gradually surfaced, then industry profoundly understand and recognize the micro service software architecture is actually will be transferred to micro complexity from the monomer used between services.

As the scale of distributed applications increases and the number of development and operations personnel involved grows to a certain level, the efficiency issue again becomes as significant as it was in the era of monolithic applications. This time, however, the scope and scale of the problem is much larger.

In order to solve the new problems brought by microservice software architecture, it is necessary to explore more systematic, standardized and globally consistent solutions, so it is inevitable to adopt new concept segmentation methods to build new solutions, and it is inevitable to break old concepts and create new ones.

There are the following differences between the old and new concepts:

  • The old concepts lay more emphasis on local optimization, and the cohesion between different old concepts is stiff. The exploration of complex problems has always been progressive, so the understanding of the problem is a process from local to global, such a situation is very normal phenomenon. For a software system, the quality of its software design can be reflected from whether the concept is elegant and smooth. The rigid conceptual connection between subsystems often means that the software design lacks the overall view, so that there are many ugly codes buried in the connection, which affects the efficiency of the evolution of the whole system and the effectiveness of solving problems due to the high maintenance cost.
  • The new concept is more abstract, with the goal of achieving global optimization (systematization) and satisfying the diverse demands of more stakeholders. As the design of the problem domain is larger, higher pattern, so the connection between the sub-concepts is quite smooth, reflecting the overall sense and consistency of software design.

“Divide and conquer after abstraction” provides a method for technologists to analyze whether one technology is superior to another from a purely technical perspective, which can avoid being confused by new concepts like “old wine for new bottle” to a certain extent. This paradigm is used to look at the Kubernetes, Istio, Knative and other technologies emerging around cloud native, and we believe that these new technologies can be recognized for their unique concept, more systematic and higher technical vision.

The driving force of cloud native and its nature



There are countless examples of new technologies not being adopted and eventually dying. It has only been four years since the concept of cloud native was put forward and now it is in full swing worldwide. It is worth thinking about what is the driving force behind it. In addition, the concept of cloud native technology is still quite vague, and understanding the nature of the problems it solves will enable the technology team to face the trend and the technology decision makers to better plan the direction of the technology development.

From a business perspective, AWS is the undisputed leader in the cloud computing market, but it has far less technological clout than Google. Although Google is a global leader in technology, its arrogance in the development of the cloud computing market has failed to match AWS in terms of market performance. In the face of lagging market share, Google launched CNCF Foundation, and hoped to have another chance to break through in the market by proposing that cloud native technology is committed to creating vendor-neutral open source software de facto standard. What does “vendor neutrality” mean?

There is a kind of interest game between the manufacturer exporting technology products and its customers — technology lock and anti-lock. When the technology exported by the manufacturer cannot create business value for the customer immediately in the short term, the traces of this game will be more obvious, otherwise the customer will be happy to be locked in the short term. From the manufacturer’s point of view, by targeting customers through technology, there will be a higher price space and a continuous source of profit. On the contrary, if the customer achieves the anti-lock technology, he will have stronger bargaining power and freedom of choice, otherwise he will worry about becoming “fish on the chopping block”. In the long run, how to balance the power of this game is the key to creating a thriving technology or business ecosystem. Such cases have long appeared in the mature communications industry and are still happening.

There is a standardization organization called 3GPP in the communications industry, which includes telecom operators from China Mobile, China Unicom and China Telecom, as well as global communication equipment manufacturers such as Huawei and ZTE. 3GPP gives operators the freedom to purchase communications equipment from whomever they choose by setting specifications and requiring all equipment manufacturers to fully follow them. Back in the cloud computing market, the norm in the communications industry has become open source software that everyone builds and adopts, or rather open source software that is the de facto standard.

The key to a de facto standard is not open source, but adoption by all cloud vendors, which is especially critical for cloud vendors that provide cloud-based technologies. As a concrete example, Kubernetes is the infrastructure in the cloud native and has been adopted by all cloud vendors to provide corresponding cloud products. When a customer purchases AWS Kubernetes products, they can easily migrate to Aliyun at any time without worrying about technology lock-in issues. There is no denying that cloud vendors provide cloud products and users build their own cloud products is a completely different level of usability assurance dimension, which is the key to many customers willing to pay for cloud products.

As long as we communicate with customers, we will certainly see the interest game of technology lock-in prevention. This power has also been discovered and applied by CNCF to promote cloud native technology, and has become the core driving force for the development of cloud native technology, making cloud native technology receive strong support from cloud manufacturers and cloud customers in a relatively short period of time. It’s important to note that AWS, as the industry leader in cloud computing, rarely talks about technology lock-in. That’s because there’s no point in losing customers to it by publicizing it. That’s not because AWS doesn’t see customer concerns about technology lock-in.

Whether Google can gain a larger share of the future cloud computing market through cloud native adoption is unknown, but that’s not our concern. The focus is on how we cater to and plan the future development of cloud computing as a recognized technology trend in the industry, and work together to create a healthy and thriving cloud computing industry ecology.

The value of understanding the core driver of cloud native is that cloud vendors need to fully consider not giving customers lock-in, but engagement through products, when offering their cloud native solutions. Lock-in means “if you use me, you can’t easily leave me”, while stickiness means “if you use me, you can bring different value to you, and you can easily leave me at any time”.

It’s not enough to understand the core drivers of cloud native, but to understand the underlying problems it solves. The author concludes that the essential problem cloud native solves is the progressive triad of flexibility, ease of use, and portability of applications (or “services”, as the term is used interchangeably here).

Application resilience is the ability of technology to create business value for customers even in the most demanding business scenarios. In other words, after a customer uses a technical solution, he can continue to use that product effectively to create business value. From a technical point of view, resilience includes microservices software architecture, full decoupling, high availability, remote multi-activity, flow limiting, fuses, downscaling, immutable infrastructure, and the ability to rapidly expand and shrink applications.

The second essential issue addressed is ease of use. If only elasticity of application is addressed without ease of use, then the labor and time required to use technology to support business value creation will be the next heavy burden for the enterprise, and ultimately, agility and economy will not be reflected in the application and value creation of technology. Ease of use represents good development and operation efficiency for users and customers of cloud products.

Good development efficiency means the user (client side developers) only care about the concept of minimum and write the least code, this would require the cloud products when creating well around the user’s mind and the ability to design, by making technology as far as possible the seamless connection between, a technical details to do a good abstract or even completely shielding to reduce the threshold. Good o&M efficiency means that only a few people can operate and maintain a large cluster, and the entire distributed application can be published, found, and troubleshooting efficiently.

Above, the author lists DevOps, GitOps, Cloud IDE, and CI/CD in the usability section. GitOps is considered the next generation of DevOps, making operations the same as writing code, using Git repositories as “the single source of truth” for operations, which is very valuable for multi-cloud, mixed-cloud, and multi-cluster deployments. Git’s version management capabilities make o&M more traceable and controllable. Overall, usability addresses software development efficiency, engineering quality, and human cost.

The third essential problem addressed is application portability. Multi-cloud (note: multiple public clouds) and hybrid cloud (note: the use of both public and private clouds) are considered by Gartner as important strategies for future enterprise IT. In keeping with this strategy, the final state should be such that the code of a distributed application can be easily deployed to different clouds (with different configurations, of course) with no changes. Logically, for an application to be portable, the application should not contain any code related to the cloud platform, which needs to be fully sunk into the cloud platform, fully decoupled from the application and cloud platform. For that code to be available on all cloud platforms, the underlying technology needs to be adopted by all cloud vendors, which is the key to whether a technology is a “de facto standard.”

For customers, the value of implementing application portability is that, in addition to preventing lock-in by cloud vendors, it makes it possible to freely match the technology and cost advantages of each cloud vendor. It also makes it easy to meet policy compliance requirements for multi-cloud deployment for applications that are national or international in nature.

The whole focus of cloud native technology is to help customers create value in a better, faster and more economical way by reducing cost and increasing efficiency, which is the ultimate technical challenge of distributed applications.

On the left side of the figure above, the author lists CNCF’s official Chinese definition of cloud native, which is highlighted in orange and red, and agrees with but does not share the expression of the core driving force and essential problem solved by the author here.

Cloud native technology trends



Simply put, the trend in cloud native technology is around the portability of applications, layered for consistency. The five layers are listed on the left side of the figure, and some of the corresponding open source software is shown on the right. The top Cloud Portability is the same as saying that there is no Portability in the application.

There are two different layers in the diagram, Service Portability and Network Portability. The former addresses the portability issues at layers 4 to 7 of the OSI network hierarchy model, such as Istio, the open source software for Service Mesh. The latter addresses the two or three layers of grid connectivity that the open source Network Service Mesh focuses on. From the Network Portability perspective, future Network connectivity is no longer described in terms of IP addresses and Network masks, but will be built based on the Network requirements described in the YAML files of the applications being deployed, in a similar way to service registration and discovery in RPC.

Go with cloud native



The author gives some advice from the perspective of application developers and cloud platform developers on the tide of cloud native technologies.

From an app developer’s perspective

  • First, try to deploy the application on the Base of Kubernetes. Kubernetes makes the deployment and operation of applications easier and less prone to error than the previous generation. We believe that the cloud ecosystem built around Kubernetes will have more technical dividends to enjoy.
  • Secondly, try to use open source software in CNCF Landscape to build their own distributed application system. Most projects in CNCF Landscape have good community activity, and also develop around the big picture of cloud native. Its richness and maturity can avoid many detours and eliminate unnecessary repetitive construction;
  • Finally, develop applications that strive to be stateless, lightweight, and loosely coupled. In the era of cloud native, it is not enough to develop a functional application, but also to think about the portability of the application, which is the only way to keep up with the pace of technological development and update their knowledge system.

For cloud platform developers

  • The first suggestion is to build a cloud platform based entirely on projects in the CNCF Landscape. The emergence of cloud native is a great subversion and blow to the self-built infrastructure of cloud manufacturers in the past. Many cloud native technology designs are better because of the higher pattern. In this case, we should resolutely give up self-built, and consider participating in open source to build when the open source meets our own needs. Of course, if it is found that the self-built products can enrich the CNCF Landscape, it can be considered to contribute to CNCF, and enhance the technological influence by becoming bigger and stronger to become a de facto standard.
  • The second suggestion is to focus on the “three sexes” to find a point of strength. The concept of cloud native makes many cloud platform developers feel confused, because it is too abstract and makes everyone have different understanding and unable to focus on the discussion, which further leads to the difficulty in finding the power point. Today’s cloud native is still evolving dynamically around its core drivers and “three qualities”, and will become more concrete as it develops further. In this case, cloud platform developers should look at these elements to see if their technology path is cloud native and avoid development deviation.
  • The third suggestion is “borrow from open source, feed back open source”. The idea is to avoid the cost of reinventing the wheel. If the open source community already has a similar product to the self-built one, the authors suggest thinking carefully about the relationship between self-built and open source. The decision of whether to abandon the build altogether or contribute to the open source community can be made based on the differences between the two (and of course CNCF will accept it). If you find that the features and performance of open source are not what you need, you should consider making enhancements to open source and feeding those enhancements back to the open source community. Participation in the open source community in this way makes “cloud native” more tangible. If you really have technical strength, you should be confident to give up your own and make a technological impact by devoting yourself to open source.
  • A final piece of advice is to try not to lock your technology in to your customers. For platform technology, customers are very sensitive to technology lock, once the idea of locking to make products will make users give up the choice. Of course, with non-platform technologies, locking is not a problem to worry about.

Kubernetes, Service Mesh and Serverless



This diagram helps us understand the location relationship between Kubernetes and Serverless. Kubernetes is still developing today, including CaaS and PaaS, and has a tendency to be thicker. The advantage of thicker platform technology is that it gives more control over the evolution of the underlying technology and makes the applications lighter, allowing them to focus more on business logic and less on solving common problems such as distributed application connectivity, security, control, and telemetry. The figure above also shows that the concept of Service flows from IaaS to PaaS layer to layer, which is elegant and consistent from a software design perspective. There is no Service Mesh in this image. The following image shows it from a different perspective.



The data plane and control plane are shown in the figure. The Sidecar of the Service Mesh forms a data bus that connects the PaaS and SaaS services at the two layers, facilitating the interconnection (Service registration and discovery) between the two layers. Combined with the control plane, Realize the control of all services (traffic gray scale, traffic limiting, fusing, degradation, etc.), observation (log, indicator and call link tracking), and security between services. The control plane runs through all layers and is the control center of the entire distributed ecosystem. The other two equally important development planes and operations planes are not exemplified in the figure.



Kubernetes, Service Mesh, and Serverless work together to interpret the details below for different levels of encapsulation and upward masking. Kubernetes introduces different design patterns and implements new, effective and elegant abstraction and management patterns for various cloud resources, making cluster management and application publishing a fairly easy and error-free thing.

Microservice software architecture is widely used to transfer the complexity of distributed applications to services. How to manage it by means of global consistency, systematization, standardization and non-intrusion has become a crucial content of microservice software architecture. The Service Mesh does this easily by stripping the content shared and environment-specific by the services into the Sidecar process deployed alongside each Service. This stripping action makes the service and platform fully decoupled to facilitate their evolution and development, and also makes the service lighter and helps improve the timeliness of service start and stop.

Cloud native is naturally multilingual — technical teams can use their best and most efficient programming languages to create business value and explore business. Because the Service Mesh separates the logic related to Service governance into a Sidecar as an independent process, the functions implemented by Sidecar naturally support multiple languages, creating more favorable conditions for the above services to be developed in multiple languages.

The Service Mesh is used to close the Service traffic of the whole network, so that the system engineering related to traffic scheduling, such as remote multi-live, can be more elegant, simple and effective, and the gray scale and rollback of Service version upgrade can be realized more conveniently to improve the quality of production safety. Technology closure creates new development space for the governance and evolution of service traffic, troubleshooting, and the economy of log collection.

Serverless’s greatest value to customers is two-fold.

  • The first is to turn capital expenditure (CAPEX) into operating cost (OPEX), and to solve the “miscalculation problem” of the business. Serverless pays more accurately based on the resources consumed by the volume of business, eliminating the need to purchase computing resources prior to estimating the volume of business. In the traditional way of purchasing computing resources by estimating the business peak value, if the business volume is high, it will cause the purchase of redundant resources and waste; if the business volume is low, it will make the business value cannot be maximized and the revenue will narrow. From the technical level, Serverless can realize the capacity expansion of computing resources in milliseconds and cope with the fluctuation of business traffic well.
  • Second, it saves high operation and maintenance costs. Serverless Requires no personnel to operate and maintain servers. The entire solution of Serverless shields developers from a lot of technical details through encapsulation, allowing developers to focus on business logic for greater development efficiency and shorter uptime. Predictably, Serverless is an important form of implementation for flexibility, ease of use, and portability.

Some people are confused about Service Mesh and Serverless: Do you still need a Service Mesh after Serverless? Neither, according to the author, is contradictory. Service Mesh is to solve the complexity problem between services under microservice software architecture. As long as the microservice software architecture is adopted, Service Mesh should be used. Serverless solves the Serverless o&M problem. A Serverless solution for microservice software architecture should contain the content of the Service Mesh, but it may not be perceived by the terminal developer.

The intension and development trend of the existing



Cloud native is an important development path of distributed applications, and its final state should be Distributionless. The implication is twofold.

  • First, all distribution-related problems are solved by the cloud platform. In other words, cloud secondary developers can efficiently develop, deploy, and distribute applications without having to master complex underlying technologies and concepts. The cloud platform provides scaffolding for tools to help developers find problems, diagnose, troubleshoot, and orchestrate resources;
  • Second, distributed application development can be just as convenient as traditional application development, if not more convenient. The efficiency and adequacy of the flow of information is an important factor in many innovations, and the Internet has these advantages naturally, which will ensure that the development efficiency of distributed applications will continue to improve dramatically.

The trend of Distrubutionless development is as follows:

  • Platforms get thicker, heavier, more standardized, applications get thinner, lighter. The complexity of distributed applications exists regardless of the technical solution. The key is how and where to solve it. The logical path is to sink complexity onto the platform, which results in platforms getting thicker and heavier, and applications getting thinner and lighter. Along the cloud native path, only platform standardization can ultimately achieve application portability.
  • Plane grid of data. Distributed applications from single to microservice software architecture, the underlying idea is that the data plane should be grid. As the key technology of cloud native service grid, there must be different in the future application of grid customization demand and the different application customization of expenses shall be borne by the application of the machine, and cannot be passed on to other applications, only grid can realize resources used in the application level of resource isolation (different load is applied to implement custom logic plugins). Techniques such as RSocket that employ a centralized Broker simply do not meet this requirement. The biggest challenge for the data plane is how to isolate the customized requirements of the data plane and how to make the paths lossless. The latter means that in the case of increasing Sidecar, RT and resource occupancy tend to zero through technological innovation.
  • Centralized control plane. The data plane and the control plane are twin brothers. A “scattered” data plane needs a “unscattered” control plane to help achieve global governance. The technical challenge of the control plane is how to push the relevant control information of the whole cluster to the data plane in the shortest time.
  • The operation and maintenance plane is productized. The level of productization represents the convenience of operation and maintenance of future distributed applications and the timeliness of emergency response. The concept abstraction and human-computer interaction design in the transition process represent cloud platform vendors’ insights and best practices in the professional field of distributed application development, and also express the depth of cloud vendors’ understanding of the humanization of using technology. Product work must not be a graphical human-computer interaction to express technical details, but for the user to create a set of easy to understand and grasp the mental model.
  • The development plane is seamlessly integrated. The development plane consists of the processes that distributed application developers should follow in their daily work, including but not limited to requirements and task decomposition, outline design, coding, unit testing, integration testing, code management, software packaging and release, etc. The key challenge is to make the development plane developer-centric so that he can do his job smoothly and efficiently. In addition, the design of the development plane requires a good combing and expression of the methodology of efficient software development, and can practice some common practices in the industry. For example, traceability of requirements and software defects can be easily implemented.

For cloud platform vendors, the sellable key competitiveness comes from the operation and maintenance plane transition and the ability to seamlessly integrate the development plane. These two pieces are the key to reaching customers and users directly to embody technology to deliver customer value faster and better. I believe that “experience is king” will also be applicable to the future of distributed applications.


Author: Zhi Jian (Li Yun)

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.