About us

More about cloud native cases and knowledge, can pay attention to the same name [Tencent cloud native] public account ~

Benefits:

① Public account background reply [Manual], you can get “Tencent Cloud native Roadmap manual” & “Tencent Cloud native Best Practices” ~

② Public number background reply [series], can get “15 series of 100+ ultra practical cloud original dry goods collection”, including Kubernetes cost reduction and efficiency, K8s performance optimization practices, best practices and other series.

③ Public account background reply [white paper], you can get “Tencent Cloud container Security White Paper” & “Source of Cost reduction – Cloud native Cost Management White Paper V1.0”

④ Public account background reply “Introduction to light speed”, you can get Tencent cloud expert 50,000 words essence tutorial, introduction to light speed Prometheus and Grafana.

The author

Xiaowei Wang, FinOps certified Practitioner, Tencent Cloud Technology product Manager, Crane product leader.

Cloud resource management status

Imagine you’re an application developer, writing business code is your main job, and how much resources your application needs is often determined by stress testing, which results in a huge waste of resources during off-peak hours. As it happens, communities and companies are actively pushing cloud native, claiming that it can use its powerful scheduling and flexibility to solve resource waste problems. You embrace cloud native with enthusiasm, only to discover that the traditional and manual way of allocating resources to cloud native businesses is still the same.

For example, if you are a platform side operation and maintenance person, you are burdened with the KPI of improving the utilization of platform resources. There are a lot of applications running in the cluster with regular load fluctuations. You are surprised to find that Kubernetes offers automatic capacity expansion, and you want to try it out. However, when HPA is actually used, there may be a lag of minutes or even tens of minutes from the threshold triggered by the load rise, to the expansion of the elastic controller, and to the completion of the application startup. Before the elasticity kicks in, the application will be overwhelmed. So you throw away automatic resiliency and go back to locking in excess resources.

Can developers get out of the abyss of resource allocation and make resilient capabilities efficient and useful? So you go to the community with questions to find answers. You find that Serverless technology, which completely separates application code from infrastructure, seems to be an option. However, as you learn more about it, Serverless is just a concept, not a standard. Due to the complete abandonment of servers, the underlying autonomy and control and performance optimization capabilities are completely lost. Another type of resource hosting cluster, led by Google’s Autopilot cluster, should satisfy your needs, but it is platform bound and costs money.

We decided to change the status quo. We accumulated a lot of experience in cost optimization of Tencent’s internal business. Combined with resource prediction, intelligent elasticity and full mix capability, we increased the cluster peak utilization rate to more than 50% without sacrificing stability. We looked forward to working with the community to optimize the common problems of application resource allocation and elasticity, so we gave everyone the possibility of not reinventing the wheel and chose open source.

Figure 1 Crane’s optimization effect in large-scale scenarios

Crane: The first open source tool for enterprise cost optimization

In order to promote Cloud native users to achieve the ultimate cost reduction on the basis of ensuring business stability, Tencent launched the first cost optimization open source project Crane (Cloud Resource Analytics and Economics) based on Cloud native technology in China. Crane follows FinOps standards and aims to provide a one-stop solution for cloud cost optimization for cloud native users.

Current Crane project contributors include industry experts from Tencent, Xiaohongshu, Google, eBay, Microsoft, Tesla and other well-known companies. Crane open source project address: github.com/gocrane/cra…

FinOps compliant Crane cost optimization tool capability model

Crane is the systematic output of Tencent’s internal cloud resource optimization process methods and tools. Meanwhile, the construction and planning of Crane’s core capabilities completely fit the capability model proposed by FinOps Foundation.

Figure 2 Crane capability model

Crane architecture and characteristics

Figure 3 Crane architecture

Crane is committed to recommending resources and intelligent elastic allocation, so that business personnel no longer have to worry about how much resources the business needs, how to allocate the automatic expansion and contraction capacity, and so on. Crane will provide the optimal solution based on the time series change data of the business.

A key deployment

Crane maintains platform independence, and installs Crane into any Kubernetes cluster through a Helm package, both on and off the cloud, to enjoy one-stop resource optimization capabilities. The Crane is less intrusive, and its core components include a centralized controller, Craned, and node agent, Crane Agent, which you can install in combination with featureGate to choose which capabilities to turn on.

Easy to use visual console

To lower the barriers to entry, Crane offers a built-in console that allows users to view cost allocations, cost trends, and cost optimization with the click of a mouse. All capabilities provide grayscale control and preview modes, as well as the ability to roll back to eliminate business-side concerns about resource changes.

Out-of-the-box inspection capability

Crane can globally scan the overall waste situation, the hidden waste visualization, so that operation and maintenance personnel from pulling monitoring data, writing query scripts and other repetitive work.

The optimization plan includes a presentation of cost changes, a presentation of utilization changes, possible risk points, and even a ranking of optimization proposals. Because we believe that every business is unique and has its best optimization solution, it cannot be generalized.

Instant quick elasticity (EffectivePodAutoscaler(EPA))

Traditional event-based resiliency tools have a natural drawback — resiliency is triggered only when business metrics deviate from normal, and this lag prevents cloud users from using resiliency. EPA supports scalable predictive algorithms that drive horizontal and vertical resiliency with predictive results, ensuring that businesses pop out early and avoid the embarrassment of native resiliency capabilities dying before they pop out. At the same time, Crane unified HPA and VPA resiliency capabilities of the community and proposed the concept of resiliency EPA.

Figure 4. EPA ensures workload expansion ahead of schedule

Stability and resource optimization

Crane improves resource utilization at the expense of stability. Crane allows users to grade services. Node agents are responsible for periodically checking node resource water level and system indicators, identifying application interference, and ensuring that sensitive business service levels are not damaged through scheduling prohibition, cgroup adjustment, expulsion and other means.

Crane present situation and future

At present, Crane has released version 0.2.0, which is equipped with core capabilities such as resource recommendation, elastic recommendation, intelligent elasticity and stability enhancement. Please refer to the milestones for more development plans.

read

FinOps (Financial Operations) defines a set of cloud Financial management rules and best practices that enable organizations to maximize benefits by enabling engineering and finance teams, technology and business teams to collaborate with each other to make data-driven cost decisions.

Adhering to the core values of user-oriented and technology-oriented, Tencent Cloud shares the experience, methods and tools of internal cloud resource optimization to the community in the form of open source, and regards it as its mission and responsibility to assist cloud users in cloud cost optimization. In December 2021, Tencent became a top member of FinOps Foundation, committed to the promotion of cloud resource optimization concept and technology output.

Join us

For the Crane project open source, please visit github.com/gocrane/cra… Favorites /Star support.

We are gathering the first batch of open source fans of Crane in limited amount. As long as you are interested in Crane and related technologies, you are welcome to join us. How to participate: Add Tengxiaoyun wechat (TKEplatform), and answer: Crane, Xiaoyun will pull you into the group.