Take Taobao.com as an example to analyze the evolution of large-scale Java project architecture

My official account is MarkerHub and my website is Markerhub.com

For more selected articles, please click: Java Notes Complete.md

Small Hub read:

A little advanced, need to read carefully!

If the predator edge
www.jianshu.com/p/796f488fd…

preface

Take Taobao.com as an example to have a brief understanding of the server architecture of large e-commerce companies. As shown in the figure, the security system is at the top, the business operation system is in the middle, which contains different business services. Below are some shared services, and some middleware, among which ECS is the cloud server, MQS is the queue service, OCS is the cache, etc. The right side is some support system services.

In addition to what is shown in the figure, there are also some things that we can’t see, such as high availability embodiment. At present, Taobao has realized multi-machine room Dr And remote machine room unitary deployment, providing stable, efficient and easy to maintain infrastructure support for Taobao services.

This is a very high value of architecture, is also a huge and complicated structure, of course, the evolution of architecture is not a day two days like this, is not the start design and develop into such, for start-up companies, it is difficult to forecast to the future in the initial flow one thousand times, ten thousand times web site architecture will be what kind of situation, At the same time, if the initial design of tens of millions of concurrent traffic architecture, it is difficult to support this cost.

So a large service system, are approaching step by step, should be found in each phase to phase web site architecture, and the problems faced and then continue to solve these problems, in this process, the whole structure will evolve, also contains the code will evolve, big to architecture, small to the code is in constant evolution and optimization. So lofty project technical architecture and development design is not achieved overnight, this is the so-called great oaks grow from little acorns.

Single machine architecture

To start with a small website, it’s usually enough to start with a single server, with file servers, databases, and applications all deployed on one machine. Also known as the All in One architecture.

Many machine deployment

With the gradual increase of website users, more and more visits, hard disk, CPU, memory and other began to strain, a server is difficult to support. Looking at the evolution process, we separated the data service from the application service. The application server was configured with better CPU, memory, etc., while the data server was configured with better and faster large hard disk, as shown in the figure. Three servers were deployed to improve the performance and availability.

Distributed cache

As access becomes more concurrent, the architecture continues to evolve in order to reduce interface access times and improve service performance.

We found that a lot of business data didn’t need to be retrieved from the database every time, so we used caching, because 80% of business access is focused on 20% of the data (the 80/20 rule). If we could cache this data, we could improve performance a lot. There are two kinds of caching. One is the local cache in the Application, and the other is the remote cache, which is divided into remote stand-alone cache and distributed cache (the distributed cache cluster is shown in the figure).

We need to think about which business characteristics of data are cached, which business characteristics of data are cached locally, and which business characteristics of data are cached remotely. What are the problems of distributed cache expansion, how to solve them, what kinds of algorithms of distributed cache are there, and what are their advantages and disadvantages? These are the questions we need to think about and solve when using the architecture.

Server cluster

At this time, with the continuous improvement of QPS access, assuming that the Application Server we use is Tomcat, the processing capacity of Tomcat Server will become a bottleneck. Although we can also buy more powerful hardware, there will always be an upper limit. And that cost increases exponentially in the later stages.

At this time, we can make a cluster of servers, and then add a Load Balancer. After the cluster of servers, we can expand our servers horizontally, and solve the bottleneck of server processing capacity.

At this time, we need to consider several questions, such as what are the scheduling policies of load balancing, what are their advantages and disadvantages, and what scenarios are suitable for each, such as polling, weight, address hash, address hash is divided into original IP address hash, destination IP address hash, minimum connection, weighted minimum connection, and so on.

After server cluster, suppose we log in server A and the session information is stored on server A. If our load balancing strategy is polling or minimum connection, it is possible to access server B next time. At this time, the session information stored in server A cannot be read by server B, so we need to solve the problem of session management.

Session Sharing solution

session sticky

We use the session sticky method to solve this problem. The processing rule of this method is that the load balancer will perform NAT on the packets in the same connection and then forward them to the fixed back-end server for processing. This solution solves the problem of session sharing.

As shown in the figure, client 1 is constantly forwarded to server 1 through load balancing. The disadvantages are: first, if a server restarts, the session of the server will all disappear; second, our load balancing server becomes a stateful server, and there will be trouble in implementing DISASTER recovery.

Session replication

Session replication, when Browser1 stores sessions to Application1 through the load balancer, browser1 copies sessions to Application2 at the same time, so that multiple servers store the same session information.

The disadvantage is the bandwidth of application servers. Session information needs to be constantly synchronized between servers. When a large number of users are online, the server occupies too much memory.

Based on a cookie

Cookie-based, which means that we use a cookie with session information to access the application server every time. The disadvantage is that the length of the cookie is limited, and the security of the cookie saved in the browser is also a problem.

The session server

Make session a session server, for example using Redis. In this way, the session information of each user accessing the application server is saved in the Session Server, and the application server obtains the session from the Session Server.

In the current architecture, session Server is a single point. How to solve the single point and ensure its availability? Of course, session Server can also be made into a cluster, which is suitable for the large number of sessions and Web servers. After this architecture is changed, the business logic for storing sessions needs to be adjusted when writing applications.

Database read/write separation

After solving the horizontal expansion of the server, we continued to look at the database. The database read and write operations need to go through the database. When the number of users reaches a certain amount, the database performance becomes a bottleneck, and we continue to evolve.

We can use read/write separation of the database while the application accesses multiple data sources. Access through a unified data access model. Database read/write separation is to introduce all write operations into the master library (master) and read operations into the slave library (slave). At this time, the application program also needs to make corresponding changes. We have implemented a data Access Module, so that the upper layer of the code does not know the existence of read/write separation. In this way, reading and writing from multiple data sources does not intrude on the business code, which is code level evolution.

How to support multiple data sources, how to encapsulate without invading the business, how to use the ORM framework currently used by the business to complete the separation of master and slave read and write, whether to replace the ORM, what are the advantages and disadvantages of each, and how to choose are all issues that need to be considered in the current architecture. When the traffic is too large, that is to say, the IO of the database is very large, our database read and write separation will encounter the following problems?

For example, there is no delay in master and slave replication, and if we deploy master and slave servers, it is even more of a problem to transfer synchronous data across machine rooms. In addition, the application of routing to data sources, these also need to think about and solve the point.

CDN acceleration with reverse proxy

We continue to add CDN and Reverse Proxy Server. CDN can be used to solve the problem of access speed in different regions, and Reverse proxy can cache user resources in the server room.

Distributed file server

Our file server bottleneck again this time, we will be changed to distributed file server cluster file server, when using the distributed file system, several problems need to consider, how to don’t affect the deployment of the application of online access, whether to need to business department to help clean the data, whether you need to backup server, whether to need to do DNS and so on.

Database sub – database sub – table

At this time, there was a bottleneck in our database. We chose the form of dedicated database and carried out vertical data splitting, and related businesses used their own library alone, so we solved the problem of large amount of concurrent data writing.

When we split these tables into different libraries, new problems arise. For example, cross-business and cross-library transactions, distributed transactions can be used, or transactions can be removed, or strong transactions can not be pursued.

As the volume of traffic is too large and the amount of data is too large, the amount of data and update of the database of a certain business has reached the bottleneck of a single database. At this time, it is necessary to split the database horizontally, for example, to split the user into user1 and user2, which is to split the data of the same table into two databases. At this point we solved the single database bottleneck.

Horizontal split and pay attention to what points, there are several ways of horizontal split. If we have a user, how do we know if the user information is in user1 or user2 database? Because of the split, our primary key strategy will be different, and we will also face the problem of pagination. Suppose we want to query the details of users who have placed orders in a certain month, and these users are distributed in user1 and user2 libraries, our background operation management system will display it in pages. These are the issues we need to address when using this architecture.

Search engine with NoSQL

After the website was released and promoted on a large scale, the search volume of our application server soared again. We extracted the search function of the application server to make a search engine, and some scenes can use NoSQL to improve performance. At the same time we develop a unified data access module, at the same time connected to the database cluster, search engine and NoSQL, to solve the data source problem of the upper application development.

After the order

This is a simple example, not based on an actual business scenario. In fact, the architecture of each service is optimized and evolved according to the actual business characteristics, so the process is not exactly the same. Of course, this architecture is not final, and there are still a lot of improvements to be made.

For example, the load balancing server is currently a single point. If the load balancing server cannot be accessed, the subsequent server cluster cannot be accessed. So you can cluster load balancing servers, and then do some hot standby, and do an automatic switchover solution.

In the process of the evolution of the whole architecture, there are more things that need to be paid attention to, such as security, data analysis, monitoring, anti-cheating…… Use message queue and task scheduling…… for some specific scenarios such as transaction, recharge, flow calculation, etc The entire architecture continues to evolve into SOA architecture, servitization (microservices), and multi-room……

Finally, I would like to say that the technical architecture and development design implementation of a high quality project is not a price to pay.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Take Taobao.com as an example to analyze the evolution of large-scale Java project architecture

Small Hub read:

preface

Single machine architecture

Many machine deployment

Distributed cache

Server cluster

Session Sharing solution

session sticky

Session replication

Based on a cookie

The session server

Database read/write separation

CDN acceleration with reverse proxy

Distributed file server

Database sub – database sub – table

Search engine with NoSQL

After the order

Recommended reading

Take Taobao.com as an example to analyze the evolution of large-scale Java project architecture

Small Hub read:

preface

Single machine architecture

Many machine deployment

Distributed cache

Server cluster

Session Sharing solution

session sticky

Session replication

Based on a cookie

The session server

Database read/write separation

CDN acceleration with reverse proxy

Distributed file server

Database sub – database sub – table

Search engine with NoSQL

After the order

Recommended reading

Related Posts

SpringCloudgateWay upgrade to 3.1.1 Have you encountered any of these potholes?

ElasticSearch garbage collector optimization

Spring Boot +Vue+Spring Security (v) : front and back end permission control