This article will share with you the summary of large distributed website architecture technology. Part of the paper is book notes, part is experience summary, has a good reference value.

1 features of large websites

Many users, widely distributed

Heavy traffic and high concurrency

Massive data, high availability of services

Poor security environment and vulnerable to network attacks

Multiple functions, fast change, frequent release

** from small to large, progressive development **

User-centric

Free service, paid experience

2. Large-scale website architecture mode

** layer: ** can be generally divided into: application layer, service layer, data layer, management layer, analysis layer;

** Segmentation: ** is generally divided according to business/module/function characteristics, such as the application layer is divided into home page and user center.

** Distributed: ** Applications are deployed separately (e.g. on multiple physical machines) and work together through remote calls.

** Cluster: ** Multiple applications/modules/functions are deployed (for example, multiple physical machines) to provide external access through load balancing.

** Cache: ** Places data closest to the application or user to speed up access.

** asynchronous: ** asynchronizes synchronous operations. The client sends a request without waiting for a response from the server. After the server processes the request, the client notifies the requester in the form of notification or polling. Generally refers to: request – response – notification pattern.

** Redundancy: ** Increase copies to improve availability, security, and performance.

** Security: ** Has effective solutions to known problems, and establishes discovery and defense mechanisms against unknown/potential problems.

** Automation: ** Repetitive tasks that do not require human involvement are done by means of tools, using machines.

** Agility: ** Actively accept changing requirements and quickly respond to business development requirements.

Java Senior Architect distributed technology sharing

3. Large site architecture objectives

** High performance: ** Provides fast access experience.

** High availability: ** Website services can always be accessed normally, which can guarantee 7*24 high availability target.

** Scalability: ** Increase/decrease processing power through hardware.

** Security: ** provides website security access and data encryption, secure storage and other policies.

** Extensibility: ** Easily add/remove new features/modules by adding/removing them.

** Agility: ** on demand, fast response;

Java Senior Architect distributed technology sharing

4 High availability architecture

Large sites should be accessible at all times. Normal provision of external services. Because of the complexity of large sites, distributed, cheap servers, open source databases, operating systems and other characteristics. It is difficult to ensure high availability, which means that site failures are inevitable.

How to improve usability is an urgent problem to solve. First, usability needs to be considered at the architecture level when planning. The industry generally uses several nines to represent usability metrics. For example, four nines (99.99) allow 53 minutes of unavailability over the course of a year.

Different layers use different policies. Generally, redundancy and failover are used to solve high availability problems.

** Application layer: ** is generally designed to be stateless, and it does not matter which server is used to process each request. Generally, load balancing technology (Session synchronization needs to be solved) is used to achieve high availability.

** Service layer: ** load balancing, hierarchical management, fast failure (timeout setting), asynchronous invocation, service degradation, idempotency design, etc.

** Data layer: ** Redundancy backup (cold, hot backup [synchronous, asynchronous], warm backup), failover (confirmation, failover, recovery). The well-known theoretical basis for high availability of data is the CAP theory (persistence, availability, data consistency [strong consistency, user consistency, and ultimately consistency]).

5 High-performance Architecture

User-centric, fast web access experience. The main parameters are shorter response time, larger concurrent processing capacity, higher throughput and stable performance parameters. It can be divided into front-end optimization, application layer optimization, code layer optimization, storage layer optimization.

Front-end optimization: the part of the site before the business logic;

Browser optimization: reduce the number of Http requests, use browser caching, enable compression, Css Js location, Js async, reduce Cookie transfer;

CDN acceleration, reverse proxy;

Application layer optimization: servers that handle web site business. Use caching, asynchrony, clustering

Code optimization: reasonable architecture, multi-threading, resource reuse (object pool, thread pool, etc.), good data structure, JVM tuning, singleton, Cache, etc.

Storage optimization: cache, SSD, optical fiber transmission, optimized read and write, disk redundancy, distributed storage (HDFS), NOSQL, etc.

6 Scalable architecture

There is an essential difference between scalability and pure performance tuning. Scalability is a comprehensive consideration and balance of high performance, low cost and maintainability and many other factors. Scalability pays attention to smooth and linear performance improvement, focuses more on horizontal scaling of the system, and realizes distributed computing through cheap servers. The general performance optimization is only the performance index optimization of a single machine. What they all have in common is a preference between throughput and latency based on the characteristics of the application system, although scaling partitions horizontally imposes CAP theorem constraints.

7 Extensible architecture

It can easily add/remove functional modules, providing good expansibility at the code/module level.

Modularization, componentization: high cohesion, low coupling, improved reusability, expansibility.

** Stable interface: ** defines a stable interface where the internal structure can be changed “at will” while the interface remains unchanged.

** Design patterns: ** Apply object-oriented thinking, principles, using design patterns, code level design.

** Message queues: ** Modular systems that interact through message queues to decouple dependencies between modules.

** Distributed services: ** Public module servitization, provide other systems to use, improve reusability, scalability.

8 Security Architecture

Effective solutions to known problems, discovery and defense mechanisms for unknown/potential problems. For security issues, first of all, we should improve security awareness, establish an effective mechanism for security, and guarantee it from the policy level and organizational level. For example, the server password can not be leaked, the password is updated every month, and three times can not repeat; Weekly security scans, etc. Strengthen the construction of safety system in an institutionalized way. At the same time, attention should be paid to all aspects related to safety. Security issues cannot be ignored. Including infrastructure security, application system security, data security and so on.

** Infrastructure security: ** Hardware procurement, operating system, network environment security. Generally, regular channels to buy high-quality products, choose a safe operating system, timely repair vulnerabilities, install anti-virus software firewall. Virus protection, back door. Configure firewall policies, establish DDOS defense systems, use attack detection systems, and perform subnet isolation.

** Application system security: ** Use the right way to solve known common problems at the code level during program development. Protects against cross-site scripting attacks (XSS), injection attacks, cross-site request forgery (CSRF), error messages, HTML comments, file uploads, path traversal, etc. You can also use Web application firewalls (such as ModSecurity) to scan for security vulnerabilities to enhance application-level security.

** Data confidentiality and security: ** Storage security (exist in reliable equipment, real-time, scheduled backup), storage security (important information encryption and preservation, choose the right personnel complex preservation and detection, etc.), transmission security (prevent data theft and data tampering);

Common encryption and decryption algorithms (monomial hash encryption [MD5,SHA], symmetric encryption [DES,3DES,RC]), asymmetric encryption [RSA], etc.

9 agility

Website architecture design, operation and maintenance management to adapt to change, to provide high scalability, high scalability. Convenient to respond to rapid business development, sudden increase in traffic access requirements.

In addition to the architecture elements described above, agile management and agile development ideas need to be introduced. Make business, product, technology, operation and maintenance unified, on demand, quick response.

10 Large-scale Architecture Examples

Java Senior Architect distributed technology sharing

The above uses a seven-layer logical architecture, the first layer is customer layer, the second layer is front-end optimization layer, the third layer is application layer, the fourth layer is service layer, the fifth layer is data storage layer, the sixth layer is big data storage layer, and the seventh layer is big data processing layer.

** Client layer: ** Supports PC browser and mobile APP. The difference is that the mobile APP can be accessed directly through IP access, reverse proxy server.

** Front-end layer: ** uses DNS load balancing, CDN local acceleration, and reverse proxy services;

** application layer: ** website application cluster; Vertical split by business, such as merchandise applications, member centers, etc.;

** Service layer: ** provides public services, such as user service, order service, payment service, etc.;

** Data layer: ** Supports relational database cluster (support read/write separation), NOSQL cluster, distributed file system cluster; And distributed Cache;

** Big data storage layer: ** Supports log data collection of application layer and service layer, structured and semi-structured data collection of relational database and NOSQL database;

** Big data processing layer: ** Perform offline data analysis or Storm real-time data analysis through Mapreduce, and store the processed data into relational database. (In practice, offline data and real-time data are classified according to business requirements and stored in different databases for use by the application layer or service layer).