preface

A mature large websites (such as taobao, Tmall, tencent, etc.) of the system architecture is not the beginning design with integrated high performance, high availability, high character such as scale, it is with the increase of users, the expansion of the business function evolves gradually perfect, in the process, developing mode, the technical architecture, design idea and great changes have taken place, Even the technical staff has grown from just a few people to a division or even a product line. Therefore, mature system architecture is gradually improved with the expansion of business, not overnight; Different business characteristics of the system, will have their own focus, such as: taobao, to solve the vast amounts of product information search, order, payment, tencent, for example, to solve the real-time message transmission of hundreds of millions of users, baidu it to deal with huge amounts of search requests, they all have their own business characteristics, system architecture is also different. However, we can also find out the common optimization technology from these different site background, these optimization technology and means are widely used in the architecture of large site system, let’s understand these optimization performance technology and means.

A few big ways to improve website performance

The initial site architecture

At the beginning, the business volume was small and the traffic volume was small. At this time, the architecture, applications, databases and files were all deployed on a server, and some of them were just renting host space

1. Separate applications, data, and files

Deploy applications, databases, and files on independent servers, and configure different hardware based on the purpose of the server to achieve the best performance.

2. Use caching to improve site performance

Most website visits follow the principle of 28, that is, 80% of the access requests end up on 20% of the data, so we can cache hot data to reduce the access path of hot data and improve the user experience. The common methods of cache implementation are local cache and distributed cache. Of course, there are CDN, reverse proxy.

2.1 Local Cache

Local caching, as the name implies, caches data locally on the application server, either in memory, files, or components. The local cache is characterized by high speed, but the amount of cached data is limited because of the limited local space. OSCache is a common local cache.

2.2 Distributed Cache

Distributed cache is characterized by the ability to cache massive data, and it is very easy to expand. It is often used in portal websites, and the speed is not as fast as local cache. The commonly used distributed cache are Memcached and Redis.

2.3 Reverse Proxy

When a user receives a request, the reverse proxy server returns cached data to the user. If no cached data is available, the reverse proxy server continues to access the application server to obtain data, reducing data acquisition costs. Reverse proxies are Squid, Nginx.

2.4 the CDN

Assume our servers are deployed in hangzhou in the room, for zhejiang user access is faster, and for the user access of Beijing is slower, this is due to zhejiang and Beijing respectively belong to different developed areas of telecom and China unicom, Beijing needs to through Internet users to access the router through a long path to access to the server, in hangzhou return path, too, Therefore, data transmission takes a long time. In this case, CDN is often used to cache the data content to the equipment room of the carrier. When users access the data, they first obtain the data from the nearest carrier. In this way, the path of network access is greatly reduced. More professional CDN operators have Blue flood, net.

3. Use cluster and load balancing to improve the performance of application servers

Application servers, as portals to web sites, take on a lot of requests, and we tend to share the number of requests through application server clusters.

A load balancing server is deployed in front of the application server to schedule user requests and distribute the requests to multiple application server nodes based on the distribution policy.

Commonly used load balancing technology hardware F5, the price is relatively expensive generally more than 15W.

Software LVS, Nginx, HAProxy. LVS is a four-layer (transport layer) load balancer, which selects internal servers according to the destination address and port. Nginx and HAProxy are seven-layer (application layer) load balancers, which can select internal servers according to the packet content. Therefore, LVS distribution path is better than Nginx and HAProxy, with higher performance. However, Nginx and HAProxy are more configurable, such as dynamic and static separation (according to the characteristics of the request packet, choose the static resource server or application server).

4. Database optimization

4.1 Read/write separation and database table

With the increase of the number of users, database has become the biggest bottleneck, the common means to improve the performance of database is to carry out read and write separation and separate database and table. Read and write separation, as the name implies, is to divide the database into read and write libraries, through the master and standby functions to achieve data synchronization. The database and table are divided into horizontal and vertical sharding. Horizontal sharding is to split large tables of a database, such as user tables. Vertical sharding is based on different services. For example, tables related to user services and commodity services are placed in different databases.

4.2 Using NoSql Databases and Search engines

For the query and analysis of massive data, we use noSQL database and search engine to achieve better performance. Not all data needs to be in relational data. Common NOSQL includes mongodb, hbase, and Redis. Search engines include Lucene, Solr, and ElasticSearch.

5. Split services on the application server

With the expansion of the business, the application became very bloated. At this time, we needed to split the business of the application. For example, Baidu was divided into news, web pages, pictures and other businesses. Each business application is responsible for relatively independent business operations. Businesses communicate with each other through messages or share databases.

6. Use distributed systems

6.1 Distributed File System

As the number of users increases and the volume of services increases, more and more files are generated. A single file server can no longer meet the requirements. In this case, a distributed file system is required. Common distributed file systems include GFS, HDFS, and TFS.

Google File System (GFS) provides high performance services to a large number of users

• Suitable for deployment on inexpensive generic hardware

• Provides fault tolerance

The Hadoop Distributed File System (HDFS) provides high-throughput data access and is suitable for applications on large-scale data sets

• Runs on General Hardware

• Highly fault tolerant

• Suitable for deployment on inexpensive machines

Taobao Flies System (TFS) mainly provides highly reliable and concurrent storage access for massive unstructured data

• High scalability, high availability and high performance

• Internet oriented services

• Suitable for mass small file storage

6.2 Distributed Service

Each business application will use some basic business services, such as user service, order service, payment service, security service, these services are the basic elements supporting each business application. We extract these services and use the partial service framework to build distributed services. Ali’s Dubbo is a good choice.

summary

The complete system architecture diagram is as follows:

The architecture of large websites is constantly improved according to business needs, and specific design and consideration will be made according to different business characteristics. This article is just about some optimization techniques and means involved in a conventional large website.

Recommend an exchange learning group: 650385180, which will share some veteran architects recorded video: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization these become the architect’s necessary knowledge system. You can also receive free learning resources, which have benefited a lot at present: