The author | ZhangYe Ming

Dr. Almond CTO. Middle-aged programmer who focuses on various technologies and team management.

This article is based on the 10.22 GDG talk “Why Should Startups Embrace containers?” To sort out.

1. Technical challenges for startups

Tolstoy said, “Happy families are all alike; each unhappy family is unhappy in its own way.” The same goes for Internet startups. Most Internet startups face several technical challenges.

  1. How to build the system quickly and cheaply while ensuring security and stability?

  2. How to quickly build and publish applications to meet business requirements?

  3. How to improve team development efficiency and ensure development quality?

This list is by no means complete, but these three problems should be common to tech teams at startups. Almonds can’t be said to have solved all of these problems, but some progress has been made. Let’s take a quick look at how almonds respond to these challenges and what the container can bring.

This series will be divided into three parts. The first article introduces the history of almond technology architecture before containerization. The second part introduces the container and the containerization scheme of almond. The third part concludes with a summary of why we think startups should use containers and why containers can help us deal with these three challenges.

2. Early almonds

Before 2012, most Internet companies, including start-ups, directly purchased servers and rented racks in IDC room for deployment. Applications run directly on physical machines and must buy new servers to scale. IDC often out of a variety of failures, if encountered IDC migration, it is more painful, must move the machine in the middle of the night, before dawn online. In short, the cost, service stability and work efficiency of start-up companies are very large consumption.

But Dr. Almond was lucky enough to have the public cloud come of age, so it was built from the beginning. The earliest architecture of Dr. Almond is as follows:

The architecture is very simple, with load balancing and database based on Tencent cloud. Then Tencent cloud also provides some basic monitoring, alarm and security services. Then there are two applications, a mobile backend API and a business platform, both play-based Scala applications.

Many people may wonder why Scala/Play was chosen for development, since Scala is not widely used in China. This is partly because Doctor Almond inherited the prescription architecture, which was developed based on Scala/Play, and the team is familiar with this solution. We need to build The Almond Doctor quickly, naturally choosing the most familiar language and framework. And Scala/Play is really efficient for small to medium scale applications. Scala itself is very expressive and an interesting language. Many engineers who are eager to learn will also be interested in new languages.

3. Apply split and CI/CD

After more than a year of rapid evolution, the whole application is more and more complex. So we split the application, and as the business expands, there are more and more applications, such as HIS, CRM, etc. So our architecture looks like this.

This is where Scala’s biggest problem comes into play: compilation speed. We had a very primitive way of deploying applications back then. You had to log in to the server and run a Shell script, which pulled the code, compiled it, packaged it, and ran it, and the whole process took 5-10 minutes. The number of nodes for our API application grew to 5 or 6, and even if two were released at the same time, it would take 20 minutes for all of them to be released. If something goes wrong after the launch, it will spit blood, because the rollback process is the same and takes another 20 minutes.

One time we did an Apple Watch giveaway and it started at 12 a.m. We had a very low-level BUG that started the event by giving away several Apple Watches per minute. Our startup didn’t have much money, and every one of them was white money. It hurt. The fix was easy, but publishing or rolling back required compiling, which was too slow, so we shut down the server and deployed the new code a few minutes later.

This is where we felt we had to have an automated distribution system. In fact, a few years ago, the release was o&M required execution, development submitted to O&M, O&M manual deployment. Naturally, these releases are unlikely to be frequent, and are a huge burden for development and operations. But as the culture of Agile and Devops became mainstream, continuous integration and publishing (CI/CD) became an infrastructure.

Our first VERSION of CI/CD was simple, Jenkins based, compiled by script, packaged, and copied to the server for distribution. The problem of slow deployment is alleviated because you only need to pack once. But there are several problems.

  1. First, there is no application repository. Packaging is a one-time process that backs up the current application directory at deployment time for rollback, so you can only roll back to the previous version.

  2. Second, the health check is relatively simple and only checks whether the application is started. We have had cases where the application started and the test was fine, but the service still had serious problems and was basically unavailable.

  3. Finally, grayscale publishing is not supported, and a problem can only be rolled back.

We had a major failure during that period, and it took a long time to recover because of these factors. With that in mind, we developed the Frigate publishing system, whose architecture is shown below.

  1. Frigate has an App Repository, App Repository. The app repository stores the released version of the application. You can specify the version when rolling back.

  2. Watcher implements a more powerful application detection function. In addition to general HTTP detection, you can also obtain data from logs, monitoring, and further health detection based on the number of exceptions, error rate, and so on.

  3. Frigate supports group and phased releases. For example, release 2 machines now and then health check, or there can be some manual check in between and then release the rest of the machines.

In retrospect, Frigate didn’t use containers, but it actually implemented many of the functions of container choreography. Frigate posted the screenshot below, which is based on Jenkins Pipeline.

4. Microservices

With more systems, complex dependencies, no data isolation, and logic duplication, the next logical direction is microservices. About micro services, our public account has two articles (Lego micro service transformation (top), Lego micro service transformation (bottom)) have a more detailed analysis of this, here is a brief introduction.

Our service registration and discovery is Consul based and load balancing is implemented via Nginx. The following figure shows the entire service registration and discovery process:

Several points are worth mentioning.

First of all, our microservices are based on HTTP and Json and do not use binary protocols such as Protobuf, Thrift, etc. In fact, the performance difference between HTTP and binary protocol is not as big as many people think, usually only 2 or 3 times the difference (no test). For most enterprises, this difference is not a bottleneck at all, especially now that there is HTTP2. If you really need to, you can also run binary protocols on HTTP2 by adding a layer on the server side and client side through the framework.

Second, our microservices are non-invasive to applications. We didn’t use the usual Dubbo, SpringCloud framework. On the one hand, our service caller has a Java application as well as a Scala application, which takes some work to access. On the other hand, we see the future direction of microservices framework development as a non-invasive stand-alone microservices infrastructure layer. This is consistent with the concept of container choreography, and the recently proposed Service Mesh concept is a further extension of what we think is the future of microservices.

Finally, we generate an SDK for each microservice, which is easy for callers to call. The SDK integrates fuse break, asynchronous, distributed chase (under development), and more.

After building the basic framework of micro-services, we developed several micro-services, including business such as orders and appointments, and infrastructure such as push and SMS. Of course, some of them are not micro.

But we found that the whole system still has many problems

  1. Based on cloud services, the cost is lower and the efficiency is higher. However, O&M is resource-oriented and resource utilization is not high.

  2. Continuous integration and deployment capabilities. However, new nodes and new services still require a large number of people to operate and maintain, and expansion is not convenient.

  3. Practice microservices and improve application architecture. However, the dependency management and monitoring are not perfect and the stability is still not enough.

In the next article, we’ll look at how almonds use containers and container choreography to optimize these problems.

The full text after


You may also be interested in the following articles:

  • Lego micro Service Transformation (I)

  • Lego micro service Transformation (II)

We are looking for a Java engineer. Please send your resume to [email protected].

Almond technology station

Long press the left QR code to pay attention to us, here is a group of passionate young people looking forward to meeting with you.