We live in a computer era full of distributed system solutions. Whether it is the top traffic products such as Alipay and wechat, or hot concepts such as blockchain and IOT, or the booming container ecological technology such as Kubernetes, the core of the technical architecture behind it is inseparable from distributed system.

Why understand distributed architecture

Systematic learning of distributed architecture design is crucial for the growth of technical people. For cloud native developers, how to design applications in line with cloud native design philosophy is often inseparable from the application of distributed system knowledge and methodology. How to design highly resilient, configurable, distributed, high performance, high fault tolerance, more secure, more resilient, fast delivery of native applications is often an important reference to measure the level of developers.

However, distributed system is a very large concept, from the architecture design, research and development process, operation and maintenance deployment, engineering efficiency and other aspects of the deep knowledge can be mined, learning costs and is relatively large. Recently, I have sorted out some books and articles related to distributed development that I have read in the past, plus some of my experience in distributed development to share with you. This article, as the beginning, gives an overview of the knowledge in general, and the subsequent chapters will be elaborated in combination with code practice. The draft is hasty and the level is limited. Welcome to study and correct together.

Large view of distributed system

A, design,

Gateway mode, Gateway

function

  • To request a route, the client invokes the Gateway directly, which forwards the route to the registration service
  • The service registers, the back-end service registers the API, and the Gateway takes care of routing
  • Load balancing: supports multiple load policies
    • round robin
    • Stochastic equalization algorithm
    • Multi-weight load
    • The session adhesion
    • other
  • Security features, HTTPS, account authentication, and other security features are supported
  • Grayscale publishing can be performed for features such as service versions or tenants
  • API aggregation: aggregates multiple back-end interfaces to reduce the number of client calls
  • API choreography, which concatenates multiple apis to accomplish specific services

Design points

  • Availability, high availability must be guaranteed
  • Scalability, which can be flexibly extended to support specific services such as specific business flow control
  • High performance, usually implemented using asynchronous IO model frameworks such as Java Netty and Go Channel
  • Security, such as encrypted communication, authentication, DDOS defense, etc
  • operations
    • Application monitoring, including capacity, performance, anomaly detection, etc
    • Flexible expansion with high resilience to cope with peak value at low cost
  • architecture
    • Decoupled from business, providing extension extension mechanisms such as Plugin and Serverless to support back-end business
    • Service isolation: Gateways can be divided according to back-end services, so that different services use different gateways
    • The gateway is deployed near the back end to ensure minimum network loss and optimal performance

Sidecar mode

The value of

  • Separate control and logic, business logic and routing, flow control, fuses, idempotent, service discovery, authentication, and other control components
  • Applicable scenario
    • The Sidebar process and service process are deployed on the same node and communicate through network protocol
    • Multilingual hybrid distributed system extension
    • Applications are provided by multiple parties

Design points

  • The standard Service protocol, Sidebar to Service, Sidebar to Sidebar protocol is as language decoupled as possible
  • Aggregate control logic such as flow control, fuses, idempotent, retry, and reduced business logic
  • Do not use service intrusion for interprocess communication such as semaphore, shared memory, preferentially use local network communication such as TPCP or HTTP

Service Mesh

The new generation of microservices architecture is essentially an infrastructure layer for communication between services.



The characteristics of

  • Interapplication communication intermediate layer
  • Lightweight Web proxy
  • Decouple the application
  • The application is not aware

Mainstream framework

A distributed lock

The solution

  • Redis distributed lock, SETNX key Value PX expireTime
    • Value generation, preferably globally unique such as TraceID, can be generated using /dev/urandom
    • The unit of expireTime is milliseconds. An expired lock is automatically released, and the lock holder is guaranteed to compete for resources to complete calculation within the expiration time
  • Pessimistic lock, lock first, then operation, throughput bottom
  • Optimistic lock: the version number is used to achieve high throughput and lock exceptions may occur. It is suitable for multi-read situations
  • CAS, scenarios that modify shared data sources can replace distributed locks

Design points

  • Exclusivity. Only one client can acquire the lock for any condition
  • Locks have automatic release modes, such as timeout release
  • Locks must be highly available and persistent
  • The lock must be non-blocking and reentrant
  • Deadlocks are avoided. The client must obtain the lock eventually, and the lock cannot be released when exceptions occur
  • Cluster fault tolerance, part of the cluster machine failure, lock operation is still available

Configuration center

  • Static configuration, environment and software startup configuration
  • Dynamic configuration, such as flow control switch, fuse switch, etc

Asynchronous communication

  • Request-responsive, in which the sender sends a request directly to the receiver
    • The sender actively polls
    • The sender registers a callback function that calls back the sender when the processing is complete
  • Event-driven Design (EDA)
    • Message subscription, the sender publishes the message, the receiver subscribes and consumes the message
    • Broker middlemen, sending messages to the Broker and receiving messages to the Broker subscription, decoupled from each other, such as the middleware RocketMQ
    • Things drive design advantages
      • The dependency between services is removed
      • Service isolation is high

idempotence

  • The essence of an operation is that no matter how many times it is performed, the result is always the same
  • The idempotent core is a globally unique ID, and links are idempotent according to the global ID. Multiple implementation modes can be selected according to the service complexity
    • The ID of the database grows automatically
    • The UUID is generated locally
    • Redis production id
    • Twitter’s open-source algorithm Snowflake
  • HTTP idempotent, except POST, HEAD, GET, OPTIONS, DELETE, PUT all meet the idempotent

Second, the performance

Distributed cache

  • Cache update mode
    • Cache Aside, used to maintain Cache invalidation, hits, updates, etc
    • Read/Write Through: The cache agent updates the database. The application view has only one copy of the database
    • Write Behind Cache, an I/O acceleration method, is used to update the database asynchronously

Asynchronous processing

  • Push model, central scheduling, high complexity
  • Pull model, no central scheduling, low complexity
  • Push + Pull model

Database extension

  • Database sharding
    • Vertical fragmentation
      • Field splitting: Split the fields with different change frequencies into different tables
    • Level of fragmentation
      • Hash algorithm to divide, high data dispersion, reduce the possibility of hot spots
      • Time range sharding ensures data continuity
      • Division by service type, such as data classification and tenant separation
    • Sharding Design Essentials
      • Reserve enough space for sharding to avoid re-sharding
      • Shard aggregation is done in parallel
      • Businesses do as little as possible to transact across shards

Three, fault tolerance,

System availability

  • MTTF, Mean Time To Failure, the average length of Time it takes for a system To fail, the longer the better
  • MTTR,Mean Time To Recover, indicates the average fault recovery Time. The shorter the Time, the better
  • Availability calculation formula, Availability= MTTF/(MTTF+MTTR)

Service degradation

  • Reduced consistency
    • Strong consistency: All synchronization consistency is switched to final consistency to improve throughput
    • Weak consistency, sacrificing consistency for overall service reliability when necessary
  • Disabling secondary Services
    • Disable secondary applications to release physical resources for different applications
    • For the same application, turn off secondary functions and allocate more resources to core functions
  • Simplified service functionality
    • Such as simplifying business processes and reducing communication data

Service current limiting

  • Current limiting purpose
    • One of the SLA guarantee methods
    • It can save capacity planning cost to some extent by dealing with sudden peak spike flow
    • Tenant isolation policy to prevent some users from occupying other users’ resources and causing widespread service unavailability
  • Current limiting way
    • Service degradation
    • Denial of service
  • The solution
    • Service weight division. In a multi-tenant environment, resources are allocated according to weight to ensure resources for important customers
    • Service delay processing. Join the service buffer queue to delay service pressure for peak load cutting
    • Service elastic scaling, dependent service monitoring, elastic scaling capacity
  • Flow control algorithm
    • counter
      • The number of requests made by a user in a specified period is saved in a single machine or cluster. When the number reaches the threshold, flow control is triggered
    • Queue algorithm
      • FIFO queue
        • The request speed fluctuates, the consumption speed is uniform, and the queue is full
      • Weight of the queue
        • Priority queues are divided by service. Different queues have different weights
      • The key of queue algorithm design: the presetting of queue length is very important
        • The queue is too long, the flow control is not in effect, the service has been killed
        • The queue is too short, the flow control is triggered frequently, and the experience is poor
    • Funnel algorithm
      • Essentially a queue + limiter implementation, limiter guarantees uniform consumption speed class TCP Sync Backlog
      • Uniform forwarding speed
    • The token bucket
      • The middleman has issued a token to the bucket at a constant rate, and the service request starts to service if it gets the token, otherwise it is not processed
      • The forwarding speed is uneven, and the traffic is accumulated when the traffic is small and consumed when the traffic is large
    • Dynamic flow control
      • Real-time computing service capability such as QPS, if the service RT is too large, the QPS will be reduced
  • Design points
    • Manual switch, active operation and emergency use
    • Monitoring notification, when limiting the flow of stakeholders to be clear
    • User awareness, such as return of a specific error message (error code/ error message)
    • Link identifier. Adding the traffic limiting identifier for RPC links facilitates different processing in traffic limiting scenarios

Fuse design

  • scenario
    • Overload protection a measure taken to protect the system from failure when the system is overloaded
    • Prevents the application from constantly trying to do something that might fail
  • Three state
    • The system requires an error counter based on the timeline, and switches to the Open state if the number of errors reaches the threshold
    • Open: indicates the disconnected state. All pairs of services immediately return an error to the request without calling the back-end service for calculation
    • Half-open: Allows some traffic to enter and process the request. If the request is successful, it switches to the Closed state according to a policy
  • Design points
    • Defines the type of error that triggers a fuse
    • All error requests that trigger fuses must have a uniform log output
    • Circuit breakers must have service diagnosis and automatic recovery capabilities
    • It is desirable to have a manual switch for the fuse mechanism for switching between the three states
    • Circuit breaker should be divided into business, business isolation circuit breaker

Compensating transactions

  • CAP
    • Consistence, Availability, Partition Tolerance
  • BASE
    • Basic Availabillity
    • Soft State indicates the Soft State
    • Eventual Consistency
  • Design For Failure
  • A: Exponential Blackoff

Fourth, the conversation

The deployment of

  • infrastructure
    • cloud
      • Public clouds
      • Private clouds
      • A hybrid cloud
    • Container technology
      • Docker
      • Kubernetes
  • Deployment strategy
    • Stop the deployment
    • Scroll to deploy
    • Blue green deployment
    • Gray scale deployment
    • A/B testing

Configuration management

  • Ansible
  • Puppet
  • Shippable

monitoring

CI and CD

5. Engineering efficiency

Agile management

  • Scrum

Continuous integration

Continuous delivery

Write in the last

Distributed system has been widely used in alibaba economy, in the author’s elastic technical team, for example, when business enough scale, the final technical problems is through practice of distributed system architecture design concept and method to solve, it can be said that the knowledge and methodology of distributed system architecture is the current Internet application scale after the general solution.

Learning distributed system design is not accomplished overnight. It is necessary to constantly acquire theoretical knowledge, and then put the theory into practice, and finally maximize the value of knowledge through time and time again tuning. The author of the final proposal is the first theory, practice, practice, do not compromise, the so-called paper come zhongjue shallow, must know this to practice, and your mutual aspiration.

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”