Lofty taobao framework

Above is some security system system, such as data security system, application security system, front-end security system.

In the middle is the business operation service system, such as membership service, commodity service, shop service, transaction service and so on.

There are also shared services, such as distributed data layer, data analysis service, configuration service, data search service, etc.

At the bottom are middleware services, such as MQS or queue services, OCS or cache services, etc.

There are some things that cannot be seen in the figure. For example, high availability (HA) implements dual machine room Dr And remote machine room unitary deployment, providing stable, efficient, and easy maintenance infrastructure support for Taobao services.

This is a very high value architecture, and a very complex and large architecture. Of course, this did not evolve in a day or two, nor was it designed and developed into such a lofty architecture.

So here’s the thing, what do small companies do? For many startups, it’s hard to anticipate at the beginning what the site architecture will look like ten, a hundred, and a thousand times over. At the same time, if the system is designed from the beginning of a multi-million concurrent traffic architecture, few companies can support this cost.

Therefore, a large service system is a small step by step, in each stage, find the appropriate stage of the site architecture facing the problems, and then continue to solve these problems, in the process of the entire architecture will continue to evolve.

So let’s take a look.

Single server – commonly known as All in One

Starting with a small website, ONE server is enough. The file server, database, and application are ALL deployed on ONE machine, commonly known as ALL IN ONE

As we have more and more users, more and more access, hard disk, CPU, memory and so on began to be tight, a server has been unable to meet, at this time to see the next evolution

Data services are separated from application services

We separated the data service from the application service, and configured the application server with better CPU and memory. Better and bigger hard drives for data servers.

After separation, certain usability will be improved. For example, when Files Server is down, we can still operate applications and databases.

As access QPS became higher, reducing interface access times, improving service performance and concurrency became our next goal, discovering that there was a lot of business data that didn’t need to be fetched from the database every time.

Use caches, including local cache, remote cache, and remote distributed cache

Because 80% of business visits are focused on 20% of data, which is what we often call the Rule of 28. If we can cache this data, performance will improve immediately. The cache is divided into two types: local cache and remote cache, and remote distributed cache. The diagram of remote cache here shows a distributed cache Cluster.

Think of some

  • What business characteristics of data use caching?

  • What business characteristics of data use local caching?

  • What business characteristics of data are used for remote caching?

  • What are the problems of distributed cache expansion? How to solve it? What are the algorithms for distributed caching? What are the advantages and disadvantages of each?

At this point, as access to QPS improves, the server’s processing power becomes a bottleneck. You can buy more powerful hardware, but there is always a ceiling, and the cost increases exponentially at a later stage, when you need a cluster of servers. In order for our servers to scale horizontally, we had to add something new: a load-balancing scheduling server.

Cluster servers using load balancing

After adding load balancing and server cluster, we can horizontally expand the server and solve the bottleneck of server processing capacity.

Think of some

  • What are the scheduling policies for load balancing?

  • What are the advantages and disadvantages of each?

  • Which scenes are suitable for each?

For example, we have polling, weighting, address hashing, which is subdivided into original IP hash, destination IP hash, least-join, weighted least-join, and many other strategies for continuing to upgrade…… Let’s analyze it

Typical load balancing policy analysis

  • Polling: Advantages: simple implementation, disadvantages: does not consider the processing capacity of each server

  • Advantages: The server processing capacity is considered

  • Address hash: Advantage: The same user can access the same server

  • Minimum connections: Advantage: More evenly loads servers in the cluster

  • Weighted least connections: Assign weights to each server based on the least connections. The algorithm is (number of active connections *256+ number of inactive connections)/ weight, and the server with the smallest calculated value will be selected first.

Continue the scene that leads to the question:

When we log in, we log in to server A, and the session information is stored on server A. Assuming that the load balancing policy we use is IP hash, the login information can also be accessed from server A. However, this may cause excessive pressure on some servers and no pressure on some servers. This is when overstressed machines (including network card bandwidth) can become bottlenecks and requests are not spread out enough.

In this case, we use polling or the minimum connection load balancing strategy. As A result, the first visit to server A may lead to the second visit to server B, and the session information stored on server A cannot be read on server B.

Session management – Sticky Session:

For example, if we make sure we use our own chopsticks every time we eat, as long as we keep our chopsticks in a restaurant, as long as we go to that restaurant every time we eat.

For packets in the same connection, the load balancer will forward them to the fixed server at the back end for processing.

It solves our session sharing problem, but what are the drawbacks?

When a service running on a server dies or is restarted, its sessions are lost and the load balancer becomes a stateful machine, which hampers disaster recovery in the future

Session management -Session replication

Just like we keep our own bowl and chopsticks in all restaurants. We can go to any restaurant to eat, not suitable for large-scale clusters, suitable for the situation of few machines.

It solves our session sharing problem, but what are the drawbacks?

Bandwidth between application servers is faulty because session data needs to be constantly synchronized. When a large number of online users are deployed, the server occupies too much memory

Session management – Cookie based

For example, every time we go to a restaurant, we bring our own dishes and chopsticks.

It solves our session sharing problem, but what are the drawbacks?

Cookie length limit Cookies are stored in the browser, security is an issue

Session Management -Session server

For example, our bowls and chopsticks are stored in a huge cupboard, and we can go to any restaurant and get our own bowls and chopsticks from the cupboard.

Solving our session sharing problem, what are the issues that need to be addressed in this solution?

How to ensure the availability of the session server? We need to adjust the business logic for storing sessions when we write applications for example, we can continue to cluster session Servers in order to improve the availability of session Servers

The middle summary

So, when the website architecture meets some indicators of bottleneck, in the process of evolution, what are the solutions, they have what advantages and disadvantages? How to choose between business functions? How to make a choice? It’s the process that matters.

With scale-out application servers out of the way, let’s move on

I have specially sorted out the above technologies. There are many technologies that can not be explained clearly by a few words, so I simply recorded some videos with my friends. The answers to many questions are simple, but the thinking and logic behind them are not simple. If you want to learn Java engineering, high performance and distributed, simple. Micro services, Spring, MyBatis, Netty source analysis of friends can add my Java advanced group: 680130298, group of Ali Daniel live explain technology, and Java large Internet technology video free to share to you.

Continue back to the current architecture diagram

Database read and write operations also need to go through the database. When the number of users reaches a certain amount, the database will become the bottleneck. So how do we solve this?

Database read/write separation

Using the hot standby function provided by the database, all the reads are imported to the slave server. Since the reads and writes of the database are separated, our application also needs to make changes accordingly. We implement a data access module (the Data Access Module in the figure) so that upper-level writers don’t know about read/write separation. In this way, multiple data source read/write separation has no intrusion into the business code. This is where the evolution of the code hierarchy comes in

Think of some

  • How to support multiple data sources?

  • How to encapsulate the business without intrusion?

  • How to use current business ORM framework to achieve master/slave read/write separation? Do YOU need to replace the ORM model? What are the strengths and weaknesses of each ORM model? How to choose?

Database read/write separation may encounter the following problems:

  • Consider latency, database support, and replication conditions when replicating master and slave.

  • This is even more of a problem when transferring synchronous data across machine rooms from database extensions to improve availability.

  • Application routing problem to data source

Use reverse proxy and CDN to speed web site response

CDN can be used to solve the problem of access speed in different regions, and reverse proxy caches user resources in the server room.

As traffic grew, we had bottlenecks in our file server.

Distributed file system

Think of some

  • How does a distributed file system not affect online service access? You can’t make an image suddenly inaccessible

  • Do business departments need to clean data?

  • Do I need to perform domain name resolution again?

  • At this point, the database bottleneck appears again

Vertical data split

Database dedicated library, as shown in Products, Users, Deal library.

Solve the problem of concurrency and large amount when writing data.

Think of some

  • Cross-business transactions? How to solve it? Use distributed transactions, remove transactions, or not pursue strong transactions

  • There are too many configuration items in the application

  • How do I join data across libraries

At this point, the amount of data or updates in a table for a business reaches the bottleneck of a single database

Horizontal split of data

As shown in the figure, we split User into User1 and User2, splitting the data of the same table into two databases, eliminating the bottleneck of single database.

Think of some

  • What are the strategies for horizontal splitting? What are the advantages and disadvantages of each?

  • How to clean data when splitting horizontally?

  • SQL routing problem, need to know a User on which database.

  • The primary key has a different policy.

  • Suppose we need to query the details of the user names that have been singled in April 2017 in our system, and these users are distributed on user1 and user2, how can our background operation system be paginated in the display?

At this point, the company made an external traffic import, and the search volume in our app soared and continued to evolve

Split search engine

Use search engines to solve data query problems. In some scenarios, NoSQL can be used to improve performance and develop a unified data access module to solve the data source problem of upper-layer application development. Data Access Module can access database, search engine, NoSQL

Here to provide you with a learning platform, Java architect group: 680130298

  • Those with 1-5 work experience, who do not know where to start in the face of the current popular technology and need to break the technical bottleneck can add group.

  • After staying with the company for a long time, I was very comfortable, but I hit a wall in the interview when I changed my job. Need to study in a short period of time, job-hopping can be added to the group.

  • If there is no work experience, but the foundation is very solid, Java working mechanism, common design ideas, common Java development framework master proficient can add group.

The final summary

This is just an example. The technical architecture of each service needs to be optimized and evolved according to its own business characteristics, so everyone’s process is not exactly the same.

The last one is not perfect. For example, load balancing is a single point, and clustering is also required. Our architecture is just the tip of the iceberg. In the process of architecture evolution, system security, data analysis, monitoring, anti-cheating, and so on should be considered…… SOA architecture, servitization, message queuing, task scheduling, multiple rooms… .

From the explanation of architecture evolution just now, it can be seen that the architecture and code of all large projects evolve step by step according to the actual business scenarios and development situation. At different stages, different technologies and architectures are used to solve practical problems. Therefore, Lofty project technical architecture and development design implementation is not accomplished overnight.

Great oaks from little acorns grow. In the process of architecture evolution, from the core module code to the core architecture, there will be continuous evolution. This process is worth our in-depth study and thinking. If it helps you, please move your little hands and pay attention to it!