Building microservices from scratch: Incomplete Strategy of Ali Cloud (1)

This article is the beginning of a new series called “Building Microservices from Scratch,” which will focus on microservices with an emphasis on hands-on building.

A typical Microservices architecture (paraphrased from Introduction to Microservices)

Microservices are a big topic, and opinions vary on the pros and cons of microservices versus individual services. However, there is no denying that microservices play a huge role in the rapid iteration and quick running of products. In the era of mobile Internet, we should embrace changes. Therefore, from the actual situation of our company, how to implement micro-services based on the limited r&d manpower, this is a problem placed in front of us from the beginning of entrepreneurship. To build a service, you need a server first, and preferably all kinds of basic services right out of the box. With limited manpower, we decided to put everything in the cloud without much hesitation. As mentioned in Netty Memory Management Expedition: PoolArena Allocation Puzzle, we use Ali Cloud heavily. Yes, the cloud provider we chose was Allicloud. The variety of products and purchase options in Aliyun will dazzle many beginners and leave them at a loss to start. This article will talk about a little experience in this respect.

Ali cloud the correct way to open

Open the link www.aliyun.com, first we have to register an Aliyun account. For enterprise users, they should apply for enterprise real-name authentication as soon as possible. Only the enterprise real-name authentication account can transfer and recharge to the corporate account, issue VAT invoices and other special functions for enterprises. The real-name authentication entry is located in “Account Management” :

Aliyun enterprise real-name authentication

Next, a major product area must be selected according to the target population of the product. Almost all Ali Cloud products require users to choose their location, which is also quite understandable. Cloud products are actually landed in a physical machine room maintained by Ali Cloud, which will have geographical attributes. The intercommunication of Aliyun products within the same region can be approximately understood as Intranet access, while the intercommunication of Aliyun products across regions requires public network communication through the Internet, which will affect its reliability and real-time performance. In addition, through the public network communication, generally speaking, in addition to the product itself, Ali Cloud will also charge the corresponding public network outbound traffic fees. Ali Cloud currently does not support the relocation of products after purchase. Therefore, unless the purpose is clear (such as remote mutual backup), it is recommended that alicloud products belonging to the same system be purchased in the same region. At present, Aliyun region includes:

Ali Cloud product area

If the target population is mainly domestic users, the selected regions include: North China 1/2, East China 1/2 and South China 1. According to the author’s measurement, there is no great difference between the domestic north-south connectivity and operator connectivity in these five regions. The physical location of Huadong 1 is in Hangzhou, the base camp of Ali Cloud. Many new products of Ali Cloud will be put into trial operation in Huadong 1 for the first time. Therefore, it is suggested to choose first.

Cloud server: ECS

Typically, cloud servers (ECS) are the first to be purchased. In the ECS purchase interface of Ali Cloud, users can flexibly combine according to CPU type, number of CPU cores and ratio of memory, public network bandwidth, and size of cloud disk bound by ECS. However, such flexibility also brings dazzling pricing. Let’s analyze them one by one:

2.1 CPU type and Memory Ratio

At present, the instance types of Ali Cloud ECS are divided into three categories: series I, Series II and Series III. The differences of these three series are mainly reflected in CPU type (supporting instruction set, etc.), corresponding memory model and whether I/O optimization is adopted. Users can choose according to the characteristics of the actual running applications (IO intensive or computational intensive). In my experience, even series I with the lowest configuration is sufficient to run regular IO intensive JVM application services on CentOS 6.5/7 based on JDK8. In the microservice architecture, after the business logic of a single service is fully optimized based on asynchronous IO and asynchronous interface mode, a single group of back-end services can easily support 1K to 5K or even higher concurrency depending on the complexity. The excerpt is as follows:

The series II is a hardware upgrade over the Series I, with Haswell CPU, DDR4 memory, and I/O optimization instances by default, while adding some new instruction sets to double the performance of integer and floating point arithmetic, and more overall computing power.

The SERIES III adopts Intel Broadwell CPU and DDR4 memory, which are used as I/O optimization instances by default. The high and medium frequency cpus and various memory ratios provide users with better performance and more choices.

The difference between I/O optimization and mounted CLOUD disk IOPS depends on:

Example for I/O optimization: When an SSD or high-efficiency cloud disk is mounted, the storage performance of the cloud disk can be fully achieved. I/O optimization provides better network capability between the instance and the cloud disk, ensuring the storage performance of the cloud disk.

An EXAMPLE that does not support I/O optimization: When an SSD cloud disk is mounted, the maximum IOPS is about 1000. When an efficient cloud disk is mounted, the performance can reach hundreds of IOPS

In the instance selection interface, according to the different series, the number of CPU cores and memory size fixed collocation provided. The ratio of CPU cores to memory is 1:1(1 GB memory for one core), 1:2(2 GB memory for one core), 1:4(4 GB memory for one core), and 1:8(8 GB memory for one core). In each of the three series, the minimum is 1 GIGAByte per core, while the current maximum can be 224GB with 56 cores. If there is a special need for the instance (more CPU cores or more memory), fill in the work order as shown in the screenshot below and submit it to Ali Cloud background for manual processing.

Purchase of nonstandard instances requires a work order request

A sample work order for a nonstandard instance

As mentioned earlier, back-end services are mostly IO intensive, and in particular, the incidence of OutOfMemory exceptions is much higher for JVM applications than for CPUS (most system load anomalies are also caused directly or indirectly by OutOfMemory). Therefore, in practice, we mainly choose the 2-core 16G memory type instance. The following configuration is taken as an example, and several factors are fixed to compare the procurement cost of the three series instances:

All billing methods are “pay by volume”

Set the bandwidth charging mode to fixed bandwidth and 0Mps. In this case, the ECS has only internal IP addresses but not public IP addresses. The advantages, disadvantages and application scenarios of such internal ECS will be described later

There is only one default 40 GB system disk and no additional data disk. The cloud disks of the lowest configuration type are used in this area

The following table lists the hourly costs for different series and regions of instances in a 2-core 16G configuration:

Instance type	East China 1	East China 2	North China 1	North China 2	South China 1
Series I(2 cores 16G)	1.707 when/selections	1.71 when/selections	N/A	1.71 when/selections	1.71 when/selections
Series II(2 cores 16G)	1.75 when/selections	1.75 when/selections	1.75 when/selections	1.75 when/selections	1.75 when/selections
Series III(2 cores 16G)	1.83 when/selections	1.83 when/selections	1.83 when/selections	1.83 when/selections	1.83 when/selections

Cost comparison of different ECS series

The data in the table are only for reference when this paper was written (2017.2)

As can be seen from the table above, as expected, series III has the highest cost and series I the lowest, with the same number of cores and memory size, and there is almost no geographical difference within the same series. The only exception is that in Huadong 1, the hourly cost of series I is 0.003 yuan cheaper than that of the other three regions. This is because in Huadong 1 region, the storage of series I can only choose ordinary cloud disks, while in other regions, series I can only choose efficient cloud disks or SSD cloud disks with higher performance and more expensive.

Note 1:

Currently, the maximum configuration in pay-by-quantity mode is 4 cores and 16 GB

The maximum configuration of the instance in the case of annual and monthly billing can be staggering: 56 cores 224G; Of course, the cost is also surprisingly expensive!

2.2 Billing method: Payment by volume vs annual and monthly package

Ali cloud all products support the billing method is: pay by volume, which is cloud services compared with the traditional IDC one of the fundamental advantages. The so-called open faucet is used, that is of course how many resources to pay how much money, this one billing method is the easiest to understand to the user. On the other hand, for several core products, including ECS, Aliyun also provides annual and monthly billing to lock in long-term users. Specifically, users can pay a one-time upfront fee for resource use over a period of time. This upfront fee is much cheaper if you use the same amount of time than if you charge per hour. Let’s figure out exactly how good a deal it is. Taking the configuration of series I with 2 cores and 16G in East China 2 as an example, the service duration is 1 year. The one-year cost comparison under different billing methods is as follows:

Series I(2 cores 16G)	Volume,	monthly	Package 1 year	Pack 2 years	Package for 3 years
Billing unit price	1.71 when/selections	Selections of 500.00 / month	Selections, 5100 / year	Selections 8400/2 years	Selections 9000/3 years
Number of billing cycles	X 24 x 365	X 12	X 1	Present 2	Present three
Total Cost (YUAN)	14979.60	6000	5100	4200	3000

A than startled ah! The monthly charge for one year’s use is only 40 percent of the amount used for one year. Not to mention, ali cloud is vigorously promoting the package of 2 years, the package of 3 years of discount. Of course, annual and monthly offers come at the cost of locking in a lifetime. It should be noted that there is an option to refund the ECS within 5 days without any reason: after the purchase of the ECS, it is found that the configuration is wrong, or the geographical selection is wrong, or any other reason, such as the purchase has not been more than 5 days before the effective date of the refund. However, please note that up to now, the details of the regulation are still as follows: each user has only one accumulated ECS 5 days without reason refund opportunity, that is, as long as the Aliyun account has carried out a successful ECS refund operation, it cannot be refunded. Therefore, order to be careful ah!

Ali Cloud 5 days no reason refund rule

In summary, we can sum up what kind of charging method we choose as follows:

For ECS purchase for trial or temporary use, whether it is test configuration or test area, it is better to pay by the amount. At least one hour is required, as long as you remember to release the instance through the console immediately after the trial is completed (less than one hour will be charged by one hour).

For the purpose of long-term use, under the premise of ECS configuration and region have been completely clear, you can pack for as long as you like, and enjoy the super discount

2.3 Network and Security Groups

When purchasing ECS, we can choose which network type to use. Ali Cloud provides “classic network” and “private network” :

Classic network: The Intranet IP address is uniformly assigned by Ali Cloud and cannot be changed. It is convenient for users to exchange Intranet visits with other instances or Ali Cloud products purchased by users in the same region. It is suitable for users with moderate requirements for network customization and independent management

A private network is a logically isolated private network. Users can customize the network topology and IP address. Dedicated network connections are supported, and the network scalability is strong. It is suitable for customers with personalized and advanced network customization requirements

In short, “private network” gives users who have the ability and need to use “virtual switch/router” to plan and define private networks the possibility of establishing a customizable private isolated network within aliyun’s basic network, and customizing the network topology and IP address of this private network. The functions of the two network types are compared as follows:

The function point	Classic network	Proprietary network
Layer-2 logical Isolation	Does not support	support
User-defined private network segment	User customization is not supported	User defined
Plan private NETWORK IP addresses	Unique in classic network	Unique within a private network and repeatable between private networks
Self-built VPN	Does not support	support
The private network communication	The account can communicate with each other in the same region	Private networks communicate with each other, but are isolated from each other
Self-built NAT gateway	Does not support	support

Classic network vs. proprietary network

One application scenario of “private network” is to form hybrid cloud with high-speed channel, as shown in the figure below:

Hybrid cloud diagram

Private networks (VPC), ECS, and other cloud products are used to build the service system on the cloud. For the sake of data confidentiality, core data is stored in the self-built DATA center (IDC) of users under the cloud. High-speed channel dedicated line access is used to realize data communication between the cloud and the cloud, forming a hybrid cloud environment.

There are other application scenarios of private networks. For details, see alicloud: Common APPLICATION scenarios of VPC private networks. For most users, the “classic Web” is good enough. The security group firewall can achieve layer 3 network access control. Therefore, if you want to perform simple access isolation on a classic network, you can create multiple security groups. For details, see the attached reference. In small-scale use, it is often enough to “slack off” a bit and use a default security group globally.

2.4 ECS-bound Storage

The default size of the ECS system disk is 40 GB, regardless of the instance type, ranging from 1 GB for 1 core to 224 GB for 56 cores. Users can adjust the maximum size of the system disk to 500 GB or purchase additional data disks. Increasing the number of system disks or purchasing data disks will increase the procurement cost of ECS. In my experience, the default 40GB system disk has enough free space after installing the system for back-end services and temporary run logs.

Problem 1: how to use cloud disk to save files (pictures, audio and video uploaded by users) for a long time, and the number of files is estimated to be quite a lot? If you need to save files for a long time, you are advised to purchase OSS products of Ali Cloud and use its SDK for secondary development to complete the functions. From the perspective of security, data integrity and the convenience of using ali Cloud CDN, OSS provides far more capabilities than cloud disk file system. This will be covered in more detail in future articles in this series.
Fault 2: The run logs generated by the system contain a lot of useful information. If you want to save them for a long time, you may need to analyze them later. What can I do if the 40 GB space is insufficient? If log files generated during the running of back-end services need to be saved for a long time, you are advised to use ali Cloud log service, storage services such as OSS, table storage, and MaxCompute together with E-MapReduce and MaxCompute for offline log analysis.

In addition, if it is necessary to purchase additional cloud disks, it is recommended to purchase independent cloud disks. The purchase entry is located under “Cloud Server ECS” -” Disk “, as shown in the screenshot:

Independent cloud disk purchase entrance

Independent cloud disks can be mounted to different ECS instances in the same region at different times, which makes the use of cloud disks more flexible and economical.

2.5 Bandwidth Mode

ECS also provides two options for bandwidth charging — “by fixed bandwidth” and “by Used Traffic” :

Fixed bandwidth: Specify the outbound bandwidth of the public network, for example, 10 Mbit/s. This mode is applicable to service scenarios where customers have stable network bandwidth requirements and the cost is low

By Traffic Mode: The charging mode is based on the actual outbound traffic of the public network. This mode is applicable to service scenarios where the demand for network bandwidth changes greatly, for example, the bandwidth usage is low but the network access peaks occur intermittently. To prevent a sudden burst of traffic from incurring high costs, you can specify a maximum allowable network bandwidth limit

For how to choose the bandwidth mode, we can consider from the access architecture. From the practice of our company, a relatively reasonable production system access architecture is: Aliyun load balancing (SLB) in front of the business system. In this case, the SLB charges the inbound and outbound public network traffic generated by the interaction between the front-end APP or WEB and the back-end service. Generally, the SLB and the back-end ECS belong to the same area. Therefore, the SLB and the back-end ECS communicate with each other through Intranet traffic and do not incur costs.

SLB front diagram

The advantages of this approach are huge. If the SLB back-end connects to multiple ECS running the same functional service, it can meet the requirements of high availability and throughput level expansion. In addition, the ECS is not directly exposed to the public network, which avoids security problems caused by VULNERABILITIES in the ECS operating system. In this case, the ECS as a back-end service can be configured as the “Intranet ECS” mode mentioned above without the need for a public IP address — the fixed bandwidth is directly set to 0Mbps. The use of SLB will be covered in more detail in the next article in this series.

Can all ECS be configured as Intranet ECS without public IP addresses?

In practice, it is often impossible for the following reasons:

In addition to using the ECS console provided by Ali Cloud, you generally need to log in to ECS and perform operations through the graphical interface or command line. Even in a Linux operating system, at least one server that can be accessed through the public network is required as the login jumper to manually log in to any ECS in the group using SSH or similar methods
An ECS with a bandwidth of 0 Mbit/s cannot directly access the public network (not directly inbound), nor can it directly access the public network (not directly outbound). This limitation usually does not cause problems, because all Ali cloud products, including RDS, OSS, cloud database Redis, message queue, etc., have Intranet addresses and can be accessed through the Intranet or only support Intranet access. However, if the back-end service running on the ECS needs to access the public network service of non-Aliyun products, such as third-party SMS platform, Baidu open platform, and Tencent open platform, the ECS must be able to access the public network.

To sum up, if the ECS is only used as an O&M skip, it can be determined that the bandwidth charging mode must be cost-effective based on traffic consumed. Then, the maximum network bandwidth can be determined by roughly estimating the maximum possible download demand (outbound bandwidth for ECS) during O&M operations. If the back-end service needs to access the public network evenly, you are advised to use fixed bandwidth. If the access operation timing and required bandwidth are random, charging by Traffic is recommended. In addition, if you want to build a content download business, considering the cost of bandwidth and experience, you should give priority to using Aliyun CDN instead of providing download services directly through ECS or SLB.

2.6 Extension problem: How many service instances is appropriate for a single ECS to run

In microservice architecture, it is a basic fact that there are many more independent service processes, or “service instances,” than there are large ones. So, how do you allocate these service instances to the server: do you try to run single instances on the low-volume ECS, or do you try to run more instances on the high-volume ECS with more CPU cores and memory? For the convenience of the following article, we call this concept: ECS/ instance matching strategy, the former is defined as: 1:1 ratio, the latter is defined as :1 :N ratio. Let’s briefly measure the cost of using two different ECS/ instance matching strategies. Since the minimum instance of ECS in Ali Cloud is 1 core with 1G memory, we pretend that the instance of order service uses 1G memory, and there are 12 such service instances to run. Under this assumption, the cost comparison is as follows:

ECS/ instance matching policy	Low configuration ECS running single instance (1:1)	High Configuration ECS Running multiple Instances (1:N)
Single-instance memory resource consumption	1G	1G
Number of instances required by the entire system	12	12
Configure a single ECS	1 the nuclear 1 g	2 nuclear 16 g
Number of ECS required	12	1
Cost of single ECS^[2]	Selections of 459.00 per year	Selections of 5079.60 per year
Total cost of ECS purchase (yuan)	Selections of 5508	Selections of 5079.60

Comparison of ECS/ Instance Matching Strategies (1)

As can be seen from the above table, the 1:1 ratio will cost 428.4 yuan more per year than the 1:N ratio strategy. Further, because microservices architectures encourage single service instances to focus on implementing single business functions, memory consumption can be reduced by fully optimizing single service instances based on asynchronous IO and asynchronous interface patterns. In Netty Memory Management Adventures: The PoolArena Allocation Puzzle article cites our actual case in a production system: XHarbor (API Gateway) can be configured to use 128M Heap plus 96M Direct to run on a total of 224M system memory. At present, our company production system of different types of service entity has as many as 30 +, according to the 30 types of service, in order to ensure high availability, each service entity to run at least two instances, so a total of 60 running instances at the same time, according to a single instance consumes 250 m memory to measure again the backend system running time of the ECS cost you need 1 year, The comparison of the two matching strategies is as follows:

ECS/ instance matching policy	Low configuration ECS running single instance (1:1)	High Configuration ECS Running multiple Instances (1:N)
Single-instance memory resource consumption	250M	250M
Number of instances required by the entire system	60	60
Configure a single ECS	1 the nuclear 1 g	2 nuclear 16 g
Number of ECS required	60	2^{【 note 3 】}
Cost of single ECS^[2]	Selections of 459.00 per year	Selections of 5079.60 per year
Total cost of ECS purchase (yuan)	Selections of 27540	Selections of 10159.2

Comparison of ECS/ Instance Matching Strategy (2)

In this case, the cost of the 1:N ratio policy is only 37% of that of the 1:1 ratio policy. Obviously, this is due to a large amount of system memory resources being wasted under the 1:1 ratio policy. In the two calculations above, careful readers will have noticed that we have ignored the difference in CPU usage. Under the 1:N ratio strategy calculated in the second calculation, each service instance can allocate only 1/15 of the CPU resources. However, in most cases, back-end services are IO – intensive services that consume low CPU resources and do not become system bottlenecks. On the other hand, Aliyun and its partners provide quite a number of standard computation-intensive services, including image cropping, recognition, speech recognition, video conversion, big data analysis, etc., with lower input/output costs than building their own services. If the above functions are used in the business, it is recommended to purchase standard services to solve the problem. Compared with explicit procurement cost, implicit ECS operation and maintenance cost is often more easily ignored by decision makers. More ECS means more operations and more system level monitoring, which means more human costs. The higher the degree of automation of the operation and maintenance system, the lower the labor cost of operation and maintenance, but it also brings more investment in basic research and development. Even with ali Cloud’s powerful product console and OpenAPI as the basis, our company’s practice shows that it is a “formidable battlefield” that requires skilled troops and strong talents.

Of course, the 1:N ratio strategy also has shortcomings that need to be handled carefully:

Because computing resources are not isolated, there is a problem that the whole ECS is slow due to the abnormal load of a single service instance, which requires a more real-time and sensitive business monitoring system to find such problems in time, to buy time for short-term manual solution, to find clues for long-term automatic processing;
Because network resources are not isolated, the access ports of multiple service instances must be separated, preferably automatically assigned by the operating system (in the case of Socket programming, the bound port value is set to 0). Therefore, even if a service instance is physically the same, each time it runs, the service port will be different, which is one of the core problems to be solved in microservices architecture: service discovery.
Also, because the system/network resources are not isolated, the network inbound or outbound traffic of a single service instance is too large, which will affect the normal provision of services with other instances on the ECS, as well as problems such as too many TCP connections and too many system file handles. Therefore, relevant monitoring points need to be embedded in the unified development framework of microservices in advance, and cooperate with the business indicator monitoring system to timely troubleshoot and accurately locate abnormal business modules.

To sum up, to borrow a famous phrase from the Myth of the Man-month: “No Silver Bullet.” In practice, we often mix 1:1 and 1:N ratio strategies according to the characteristics of different function instances in microservices. For service instances with small service loads, under the premise of ensuring high availability (multiple instances of the same function service are deployed on different ECS), a single high-configuration ECS should be used as much as possible. Service instances with heavy or variable service loads can be deployed on a single ECS that matches the load requirements to ensure its carrying capacity and isolate the drastic load changes. As an aside, isn’t it the technical sense of achievement that r&d teams should pursue to allow technicians to adjust deployment methods according to the “personality characteristics” of the system, so as to achieve maximum output with less input?

Note 3: Two instances of the same type of service run on different ECS and are hot standby for each other to ensure high availability. In this case, each high-configuration ECS runs 30 instances, requiring a total of 0.25G x 30 = 7.5g memory. In addition to 2G memory reserved for the operating system, there is actually 6.5 GB memory resources for running more service instances

2.7 Last issue: configuration changes after ECS purchase

For those of you who have been patient enough to read this, take comfort. After ECS are purchased, limited configuration changes can be made. Refer to the relevant documents of Ali Cloud as follows:

Pay-as-you-go ECS does not support configuration changes. This can understand, if the configuration is not appropriate, directly release, re-purchase can be;

Limited configuration changes are supported as follows:

With the exception of exclusive instances, instance specifications can be upgraded at any time, including CPU, memory, and base bandwidth. After upgrading the instance specifications, restart the instance on the console.

Can not drop at any time, but can renew the drop. The new configuration will take effect in the new renewal period. The configuration will not change during the remaining service life.

After submitting the renewal and configuration reduction operation, the upgrade and configuration reduction functions will not be supported in the remaining service period.

When lifting and lifting, the instance specification family can not be exchanged, but can only choose the relevant specification in the same specification family;

The IP addresses of the public and internal networks do not change before and after the configuration.

Okay, one last little problem

Operating system mirroring: Of course, we strongly recommend Unix Like systems: CentOS, CoreOS, FreeBSD… Anything, as long as you’re familiar.

In this paper, we combine the architecture (mainly micro-service architecture), focus on the selection and purchase details of Ali Cloud ECS, but also dig a lot of holes, including SLB, OSS, CDN and other related products. The next article will fill in the holes. (TO BE CONTINUED… I WILL BE BACK

No more explanation 🙂

Reference: Ali Cloud product documentation: Security group FAQ Ali cloud product documentation: VPC private network Common application scenarios Ali Cloud product documentation: load balancing Ali Cloud product documentation: object storage OSS Ali Cloud product documentation: Content distribution network (CDN