directory

  • Business background
  • The pain points of not introducing a multi-service data center
  • Data center architecture design ideas
  • Data storage architecture design for data center
  • Offline data backup and recovery mechanism for data centers
  • conclusion

Business background

We today to share with you in the company, the data center architecture design for multiple business team, how he step by step from business team more data analysis of the situation, and gradually the evolution of architecture to design a data center, hope I can help you to now is very popular data center to build up the concept of systematic knowledge.

First told everyone in the absence of a data center, the company’s various business team is what kind of a situation, in simple terms, is a different business team have developed their own business system, have their own independent data storage, usually on his system access to their data is enough, as shown in figure 1:

Figure 1

The pain points of not introducing a multi-service data center

But then with your system to evolve, the demand is increasing, more and more complex, gradually appear every system needs access to other data system, each system will appear at this time you have to open some data interface, let the other system to invoke your interface to access your data, at the same time, you may want to visit other interface to get other people’s data, As shown in Figure 2 below:

Figure 2

How do you feel when you see the picture above? Do you feel confused? Because, in fact, as the system evolved, is likely to make into A open system interface is called system B and C, system B open interface is called system A and C, C drive system of the interface is called system A and B, this time will appear very embarrassing scene, is chaotic, yes, I bet you look at the figure above 10 s, should still very meng, Not a clue.

Yes, so in fact this is the biggest pain points, various business system is actually a data island, which is everyone can only access to their own data, and then others to access your data, must be through your interface to access, resulting in n intricate call relation between business systems, leading to bad system maintenance, operational difficulties.

Data center architecture design ideas

So this problem, we designed a data center for many business team, architecture and design idea of the data center, data storage is through various business systems change monitoring, such as for the MySQL database can deploy Canal to listen his data changes, and then pull every business system data to data stored in the middle office, As shown in Figure 3 below:

Figure 3

Then data center can provide two kinds of data access patterns, is an active query interface, one is passive listening MQ notice, that is, for the data center, every business system you can invoke the interface of data centers, direct access to other business system of the data you want, at the same time, the data center will also notice sent to the change of various business data to MQ, You can also subscribe to business data change notifications that interest you, as shown in Figure 4 below:

Figure 4.

When you see the architecture design drawing above, does the world suddenly feel clean? Yes, in fact, in Internet companies, as for the complicated intercall interfaces of multi-business systems to access each other’s data, a unified public data center is often selected to enable each business system to realize data sharing, which can greatly improve the neatness of our system’s overall architecture.

Data storage architecture design for data center

Then come back to tell you the data center architecture design of another key point, is the data storage architecture design, you can think about it, although we each business system basic are in MySQL data storage is given priority to, but our data center storage architecture with the demand of the business system is different, Because business systems generally need to use the transaction mechanism of MySQL to achieve complex business logic, but for our data center, the essence is just to synchronize data, and then the subsequent focus is to provide external queries.

This is the function positioning, another difference is the scale of the data is different, because our data center is to store the full amount of all business system data, so this leads to the possibility of each business system level is class millions to billions of data and our data center his magnitude may be billions level, this is a big characteristic, as shown in figure 5:

Figure 5

Therefore, HBase+Elasticsearch is adopted as the core architecture of our data center storage architecture. That is to say, HBase stores data distributed in KV format on multiple servers. When writing data, it is in KV format, and when reading data, it is in KV format. Value is a complete row of data.

At the same time, for the query conditions of each query interface, the field values to be queried are written into ES to establish a query index, so that the query interface can first search for the data primary key ID based on the index in ES, and then query a complete line of data in HBase according to the data primary key ID, as shown in Figure 6 below:

Figure 6.

Next, I will introduce some technical difficulties and problems in this architecture. One is how to ensure the consistency between hbase and ES. That is, if writing to hbase succeeds, but writing to ES fails, what should be done? In this case, a compensation mechanism should be designed. That is, if the write to hbase succeeds but the write to ES fails, a compensation message needs to be sent to MQ, and then another write needs to be performed next time to achieve the final consistency, as shown in Figure 7

Figure 7.

Another production architecture experience is much more critical business resource isolation, which is to limit each business partner China interface for data traffic, otherwise there may be a problem, is a surge in business for their own business or business bugs, lead to read instantaneous high concurrent access to the interface of data center, All of a sudden, the data center’s request processing threads fill up, and then you can’t handle query requests from other business systems, as shown in Figure 8 below:

Figure 8.

So often in this case, we must be more business in the data center design resource isolation mechanism, that is to say that each business system access interface to access, is the most use of the thread of the data center resources, more than the threshold, the current limit, excessive access does not allow this business party, as shown in figure 9:

Figure 9.

Offline data backup and recovery mechanism for data centers

And then we have another important architectural solution, the data center is extremely important now is the data storage, because all the data of the business systems will be aggregated and stored inside the data center, and then each business system will rely heavily on all the data provided by the data center. Therefore, if there is data storage failure or even data loss in the data center, it will lead to great trouble, so we designed the mechanism of offline data backup and recovery.

In other words, a copy of all data is periodically synchronized to the Hadoop cluster based on the big data technology. If the hbase or ES cluster crashes or data is lost, the data can be restored to a certain point in time based on the offline backup data in the Hadoop cluster and continue to provide external services, as shown in Figure 10 below:

Figure 10.

conclusion

Well, today to share an Internet company’s multi-service system data center architecture design is introduced here, I hope that after you see our architecture design ideas today, you can have an overall design and solution idea when you encounter similar problems in the company in the future.

END

Scan code for free 600+ pages of Huisei teachers original fine articles summary PDF

Summary of original technical articles