Instance, namely cloud virtual host (Instance), is the most basic cloud computing resource, divided into on-demand instance, reserved instance, bidding instance (SPOT) three billing models, the price is from high to low. Currently, in most cloud systems, instance costs are a significant part of the total cost. In this case, how can we take advantage of low price bidding instances to reduce the cost of the cloud system? Get your money saving tips with one click

The challenge of building large-scale systems in the cloud

The cloud computing platform has the characteristic of “acquisition on demand”, which enables users to obtain the required computing resources at any time and realize the dynamic matching between computing resources and business scale. In the age of cloud computing, computing resources are no longer the bottleneck for building large-scale systems. In most cases, cost has become the biggest challenge for enterprises when building large-scale applications.

How to reduce the cost of the cloud system

At the heart of reducing the cost of cloud systems is taking full advantage of the “access on demand” and “pay as you use” features of cloud computing. For example, scaling computing resources in response to changes in business volume is a common way to reduce costs. In terms of architecture, the micro-service architecture is used to replace the single architecture, so that the system can obtain finer scale granularity, so that it can use more appropriate and cheaper computing resources.

For cutting down the cost, an easily neglected, from different points of the cloud system, traditional building is the implementation of the system, we must give full consideration to the cloud billing mode, for example, the news of the cloud service is usually charged according to the request number to, so in a call to send multiple messages can be dropped to a sliver of their original cost.

Instance, namely cloud virtual host (Instance), is the most basic computing resource. At present, in most cloud systems, instance cost accounts for a high proportion of the total cost (most cloud systems are directly migrated from the local system and seldom adopt architectures such as Serverless). Therefore, it is necessary to pay attention to the billing model of instance. It is essential to reduce the cost of the system.

Here are the three basic instance billing models:

  • Instances on demand: On demand, capacity is calculated and paid for by the hour or by the second based on the running instance.
  • Reserved instances: There is a commitment to use (e.g., 1 year, 3 year commitment), and a substantial discount (usually 60% of the on-demand pricing) can be offered for reserved instances compared to on-demand instances.
  • Spot: An extremely resilient and cheap computing resource launched by cloud vendors. Its price varies according to supply and demand, and has a significant price advantage over pay-as-you-go instances (typically 10% to 20% of the on-demand case). At the same time, there is a system interrupt mechanism in the bidding instance, and the system will decide to interrupt the running instance according to the comprehensive consideration of the price and the stock of the resource pool.

It can be seen from the above that bidding instance is the computing resource with the most price advantage, and the cost of computing resource can be greatly saved if bidding instance is effectively utilized.



About Bidding Examples

Pricing model

In order to obtain a bidding instance resource, the user needs to set a maximum price, and when the set maximum price is higher than the current market price of the instance, the user can get the instance. It is important to note that the price users pay for usage is based on the market rate, not the user’s bid.

Instance pool

The whole bidding instance market is divided into many different instance pools according to the dimensions of Region, model, and availability Zone. Each pool evaluates its own supply and demand relationship separately, which means that each pool has its own capacity and market price. Therefore, when a change in supply and demand in one pool results in a change in the instance price, or when an instance is interrupted for collection, the other instance pools will not be affected.

interrupt

The interrupt of the bid instance occurs in one of two cases:

1. The maximum bid set for an instance pool is less than the current market price for the pool

2. There is a tension between supply and demand in the instance pool and there is a shortage of resources

A few minutes before the instance is terminated for collection, the cloud platform will issue an interrupt alert.

Application of bidding instance

Due to the possibility of interruption of the bidding instance, many users are often scared off. In fact, the bidding instance scale is huge, and the stability of the instance capacity in the cluster can be guaranteed under the circumstances of reasonable utilization. Cloud merchant will also provide bidding instances of the historical price and recovery query.

Applicable scenario of bidding example:

1. Time flexibility

At present, bidding instances are widely used in backstage batch tasks such as big data calculation and machine learning model training. Different from the online system, this kind of task usually does not require real-time response to online requests, which is called time flexibility. Therefore, the retry mechanism can be used to achieve fault tolerance for interrupts. Currently, if you use a big data service or machine learning service provided by a cloud platform, the cloud platform usually has built-in support for using bid instances (retries that automatically complete tasks).

2 Instance flexibility

When a request for an online service does not need to be handled by a particular instance, such a service is instance flexible. Bidding examples can also be used in online service clusters if the online service is flexible and the effects of outages are handled effectively. To improve the resiliency of on-line systems, failure – and recovery-oriented design principles can often be followed.

Bid instance cluster management

Next, we will focus on the management practice of bidding instance cluster, which can effectively improve the stability of cluster capacity, so as to reduce the impact of bidding instance recycling on service availability.

“Mash-ups” build clusters

1. Mixing and matching of multiple price models

It can be considered to guarantee the basic SLA service by reserving the instance, and to deal with the traffic changes by using the auto-scaling and bidding instance.

2. Mixing and matching of bidding instances in different pools

As mentioned above, price changes and interrupt recovery in different pools are independent, so we mix hosts from different pools in the machine to effectively avoid the large-scale capacity loss of the cluster caused by interruption due to machine shortage in a certain pool.

Active management interruption

The cloud computing platform issues an interrupt warning (which can be obtained by querying the instance’s metadata or by listening to the event manager) several minutes before the bid instance is retrieved. If this period of time can be effectively used, it can not only realize graceful termination of the interrupted instances, but also greatly improve the stability of cluster capacity through service and state transfer to ensure availability and service capability.

1. Simple scenarios usually include the following steps:

2. How do you handle more complex cases?

  • Stateful case: need to handle data and state migration;
  • Use other service discovery mechanisms: Other service discovery mechanisms need to be invoked to manage the addition and removal of service instances. Consul is by far the most common service discovery software;
  • Container environment: such as Kubernetes, which needs to cooperate with Kubernetes to complete the migration of POD on the interrupt node and the pre-opening of the replacement node.

In practice, we can use the cloud cost optimization solution SpotMax’s cluster management product MaxGroup to accomplish these complex proactive interrupt management tasks. MaxGroup cluster structure, stability and save both have smart planning capacity, stable time cluster, continue to reduce the interrupt probability, ensure consistency, embrace the cloud native features, let developers can fully enjoy the cloud cheap idle work force, and don’t have to worry about reliability problems, reduce the additional procurement include on-demand instance, reserved instances such as expensive work force.

Learn about SpotMax: https://spotmaxtech.com/

Understand MaxGroup:https://new.spotmaxtech.com/p…