Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

100 million traffic Architecture column:

  • How to support the storage and computation of billions of data
  • How to design highly fault-tolerant distributed Computing Systems
  • How to design a high-performance architecture for ten billion traffic
  • How to design a high concurrency architecture with hundreds of thousands of queries per second
  • How to design a Full link 99.99% High Availability Architecture ** *

Previously on

The last article (How to Design a High Performance Architecture for Ten billion traffic) talked about how to support such high concurrent writes in the scenario of ten billion traffic, and how to ensure ultra-high performance computing in the context of high concurrent writes.

In this article, let’s continue to talk about how to evolve and design this architecture in the context of massive data of ten billion levels and high concurrent queries of one hundred thousand levels per second.

Let’s take a look at the current system has evolved to what kind of architecture, we take a look at the following figure:

First of all, to review the extent to which the right side of the whole architecture has evolved, it is actually quite good, because the scenario of ten billion traffic, 100,000 levels of concurrent writes per second, is supported by MQ peak clipping and distributed KV cluster.

Then, the architecture of separation of computation and storage is used. Each Slave compute node is responsible for extracting data into memory, and the calculation is completed based on the self-developed SQL in-memory computing engine. At the same time, it adopts the structure of data dynamic and static separation, static data all cache, dynamic data automatic extraction, to ensure that the cost of network request as much as possible to reduce to the lowest.

In addition, the self-developed distributed system architecture, including distributed execution of data sharding and computing tasks, elastic resource scheduling, distributed fault-tolerant mechanism, and automatic master/standby switchover mechanism, can ensure arbitrary on-demand expansion of the entire system, high performance and high availability.

Next, let’s look at the left side of the architecture.

Second, the swelling of offline computing results

In fact, you’ll notice that there’s also a MySQL on the left, and that MySQL is used to hold real-time and offline calculations and put them together.

Terminal business users can query data analysis results in MySQL at will to support their own decisions. They can read data analysis reports of the day or any period in history.

But that MySQL may be better in the early days, because in fact, the amount of data stored in this MySQL is relatively small, after all, it is just some results of calculation. But by mid – to late-stage, MySQL was on the ropes.

To give you an example, in the offline computing link, if the increment data is 10 million per day, then the result after the calculation is only about 500,000 per day, 500,000 new data into MySQL, in fact, it is acceptable.

But if it’s a billion increments per day, then it’s going to be tens of millions of increments per day, you can calculate it and it’s going to be 50 million increments per day, 50 million increments per day in MySQL on the left, what do you think?

We can tell you about the situation of the system at that time. Basically, the disk storage space of a single MySQL server will soon be close to full, and the data volume of a single table is hundreds of millions or even billions.

With this amount of data in a single table, do you think users will have a good experience when querying data analysis reports? Basically, a query is a matter of seconds. Very slow.

Even more, there has been a user to query the level of ten seconds, even dozens of seconds, on the level of minutes. It’s broken, the user experience is poor, it’s nowhere near the level of a paid product.

So after solving the problem of storage and calculation on the right, the problem of query on the left is also imminent. A new round of restructuring is imperative!

3. Separate database and separate table + separate read and write

The first is the old, separate library and table + read and write separation, which is basically based on MySQL architecture, the only way, after all, it is not particularly difficult to implement, and the speed is fast, the effect is quite remarkable.

The whole idea is basically the same as described in the first article (” How Large-scale System Architecture Evolves to support storage and computing of billions of data “).

To put it bluntly, after the separation, each main library can bear part of the write pressure, and the write concurrency of a single library will be reduced. Secondly, the disk space of a single master library can reduce the amount of data loaded, so that it will not be full soon.

After table splitting, the amount of data in a single table can be reduced to millions. This is the best practice to support massive data and ensure high performance. The basic amount of data in a single table of 2-3 million is reasonable.

After separation, and then, speaking, reading and writing will be a single library can read and write load pressure separation to the main library and the library from multiple machines, the main library carrying load, read from the library carrying load, so avoid single library machine load is too high to read and write, lead to high CPU load, IO load, network load, the last database machine downtime.

After refactoring the database level architecture this way first, the results are much better. Because the amount of data in a single table is reduced, the performance of user queries is greatly improved, and the results can be achieved in less than 1 second.

High concurrency challenge of 100,000 queries per second

Above the set of preliminary separation of the architecture of the depots table +, speaking, reading and writing is supported by a period of time, but slowly the set of architecture and exposed the disadvantages, for business users are the data analysis page after page with js script every few seconds to send a request to the backend to load the latest results of data analysis.

There is a problem here, as the pressure to query MySQL becomes more and more intense, and the range is almost predictable towards 10 levels per second.

But we have analyzed, in fact 99% of the queries, are page JS script automatically issued to refresh the day of the data query. Only 1% of the queries are for historical data before yesterday, when users manually specify the range of queries.

But now under this architecture, we is to get the real-time data calculation results (on behalf of the thermal data) and historical offline calculation results (data) represents the cold together, so you can imagine, hot and cold data together, and then the thermal data of high concurrent query accounted for 99%, that this architecture is reasonable?

Of course it doesn’t make sense. We need to rebuild the system architecture again.

5. Cold and hot data separation architecture

One of the obvious architectural refactorings to address the problems mentioned above is to separate hot and cold data. ** That is, hot data calculated in real time today will be placed in one MySQL cluster, and cold data calculated offline will be placed in another MySQL cluster.

Then develop a data query platform, encapsulate multiple MySQL clusters at the bottom, and dynamically route to hot data storage or cold data storage according to the query conditions.

Through this step of reconstruction, we can effectively reduce the amount of single table data in the hot data store to less and less, some single table data volume may be several hundred thousand, because the large amount of data results of offline calculation are separated from the table and placed in another cluster. At this point you can imagine, of course, the effect is better.

Because the amount of single table data of hot data is reduced a lot, one of the most obvious effects at that time is that 99% of users’ queries are initiated for hot data storage, and the performance is reduced from about 1 second to less than 200 milliseconds. The user experience is improved and everyone feels better.

Elasticsearch+HBase+ pure memory query engine

The evolution of the architecture here looks good, but there are still a lot of problems. Because at this stage, the system encountered another serious problem: cold data storage, if it is completely hosted by MySQL is not reliable. The amount of cold data is growing every day, and it’s growing fast, adding tens of millions every day.

So your MySQL server is going to face a constant need to expand the problem, and if to support the 1% of cold data query requests, continue to expand the number of highly configured MySQL server, do you think it is reasonable?

Certainly not appropriate!

MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL Add an index? It’s all a hassle.

In addition, because the query of cold data is generally aimed at a large amount of data, for example, users will choose the data of the past few months, or even a year to analyze the query, at this time, if the pure use of MySQL is still quite disastrous.

At that time, it was obviously found that in the case of massive data, the performance of the query and analysis of data for several months or years was extremely poor, and it was easy to produce results in seconds or even tens of seconds.

Therefore, in view of the problem of cold data storage and query, we finally choose to develop a SET of SQL computing engine based on NoSQL and then NoSQL+ memory.

To be specific, all cold data is stored in ES+HBase. In ES, conditional indexes for filtering cold data, such as date and data of various dimensions, are stored in HBase. In HBase, all data fields are stored.

Because the native SQL support of ES and HBase is not good, we directly developed another SQL engine to support this specific scenario, that is, there is basically no multi-table association, that is, the query and analysis of a single data set, and then support NoSQL storage + memory calculation.

There is a prerequisite, that is, if you want to do all the data is a single table to cold class set of data query, data must be in the cold into no storage, all on the basis of the characteristics of ES and HBase do multi-table warehousing association, all make it into the data storage for a big wide table, the data association all push to the warehouse to complete, Not at query time.

For the query of cold data, our self-developed SQL engine will first use the distributed high-performance index query of ES according to various WHERE conditions. ES can retrieve the required part of data with high performance for mass data, and ES is the most appropriate process.

Then, complete data fields corresponding to the retrieved data are extracted from HBase and spliced into completed data.

This data set is then stored in memory for complex function calculations, grouping and aggregation, and sorting.

All the above operations are implemented based on Elasticsearch, HBase, and pure memory.

7. Cache cluster is introduced for real-time data storage

Ok, so far, the problem of mass data storage of cold data and high performance query is solved. Then come back to the query of the real-time data of the day, in fact, the daily calculation results of real-time data are not too much, and the write concurrency is not particularly high, tens of thousands per second is about the same.

Therefore, in this context, it is no problem to use MySQL database and table to support data writing, storage and query.

However, there is a small problem, that is to say, the real-time data of each merchant does not change frequently, and may not change at all for a period of time, so there is no need for high concurrent requests, 100,000 levels per second all fall to the database level? To get all the way down to the database level, you might have to mount many slave libraries to each master to support high concurrency reads.

So here we introduce a cache cluster, and when the real-time data is written after each update, it is written to the database cluster as well as to the cache cluster, in a dual write mode.

More than 90% of the high concurrency queries go to the cache cluster, and only 10% of the queries go to the database cluster.

Viii. Periodic summary

Ok, so far, the left side of the architecture has been refactored:

  • Hot data is based on a cache cluster + database cluster to host highly concurrent queries at the level of 100,000 per second
  • Cold data A self-developed query engine based on ES+HBase+ memory computing supports massive data storage and high-performance query.

Through practice, the whole effect is very good. The user’s response speed for hot data is usually tens of milliseconds, while the response speed for cold data is usually less than 200 milliseconds.

9. Prospects for the next stage

In fact, it is not easy to evolve the architecture here, because it seems that such a map involves countless details and the implementation of technical solutions, which requires a team to spend at least a year to achieve this level.

But then, we have to deal with high availability, because with premium, we have to have super high availability, 99.99% availability, even 99.999% availability.

However, the more complex the system, the more prone to problems, the more complex the corresponding HIGH availability architecture, so in the next article, we talk about: “100 million traffic system architecture how to design the full link 99.99% high availability architecture”.

END

A large wave of micro services, distributed, high concurrency, high availability **** original series

The article is on its way,Please scan the qr code belowContinue to pay attention to:

Architecture Notes for Hugesia (ID: Shishan100)

More than ten years of EXPERIENCE in BAT architecture

**> ** * recommended reading: **> > 1, [please! Interview please don’t ask me Spring Cloud underlying principle](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5be13b83f265da6116393fc7) > > 2. [[Behind the Double 11 carnival] How does the micro-service registry carry the access of tens of millions of large-scale systems?] (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5be3f8dcf265da613a5382ca) > > 3, [[Performance optimization approach] Spring Cloud parameter optimization under tens of thousands of concurrent operations per second](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5be83e166fb9a049a7115580) > > 4. [How to guarantee 99.99% high availability of micro-service architecture under double 11 Carnival](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253 DHTTPS % 252525253 a % 252525252 f % 252525252 fjuejin. Im 252525252 fpost % % 252525252 f5be99a68e51d4511a8090440) > > 5, [brothers, Hadoop architecture in plain English: https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5beaf02ce51d457e90196069) > > 6. How can Hadoop NameNode support thousands of concurrent accesses per second in a large cluster? (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5bec278c5188253e64332c76) > > 7, [[the secret of performance optimization] Hadoop how to optimize the performance of large terabyte files upload hundreds of times](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525252F%2525253Ftarget%25 25253 DHTTPS % 252525253 a % 252525252 f % 252525252 fjuejin. Im 252525252 fpost % % 252525252 f5bed82a9e51d450f9461cfc7) > > 8, [please, Please do not ask me TCC distributed transaction implementation principle](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Flink.juejin.im%25252F%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Fjuejin.im%2525252Fpost%2525252F5bf201f7f2 65DA610F63528a)[pit daddy ah! (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252 fjuejin. Im 25252 fpost % % 25252 f5bf2c6b6e51d456693549af4) > > 9, [eventual consistency pit dad! 】 【 distributed transaction how to guarantee high availability of 99.99% in the process of production?] (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252 fjuejin. Im 25252 fpost % % 25252 f5bf2c6b6e51d456693549af4) > > 10, [please, please don’t ask me the interview Redis principle of distributed lock!] (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252 fjuejin. Im 25252 fpost % % 25252 f5bf3f15851882526a643e207) > >, * * * * * * 11 [see the Hadoop shine at the moment! 】 【 how elegant the underlying algorithm to large-scale cluster performance improvements over 10 times?] (https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%253Ftarget%253Dhttps%25253A%25252F% 25252Fjuejin.im%25252Fpost%25252F5bf5396f51882509a768067e)** > > **12, ****[How to support the storage and calculation of ten-billion-level data based on the multi-billion-level traffic system architecture](https://link.juejin.im? target=https%3A%2F%2Flink.juejin.im%2F%3Ftarget%3Dhttps%253A%252F%252Fjuejin.im%252Fpost%252F5bfab59fe51d4551584c7bcf)** > > 13, how to Design highly Fault-tolerant Distributed Computing System based on 100 million Traffic System Architecture (https://link.juejin.im? target=https%3A%2F%2Fjuejin.im%2Fpost%2F5bfbeeb9f265da61407e9679) > > The system architecture of traffic load level 14, [$] how to design the high performance bearing billions flow architecture (https://juejin.cn/post/6844903726050705422) * *