• Theory of CAP
  • The BASE theory of
  • Distributed cache
  • Consistency of the hash
  • Cache consistency
  • A distributed lock
  • Use Zookeeper to implement distributed locks
  • Distributed transaction
  • Distributed transaction implementation
  • Distributed authentication & Distributed authorization
  • Introduction to the

    What technology is hot right now?

    Big data, artificial intelligence, blockchain, edge computing, microservices, but the bottom of so many cutting-edge technologies all depend on distribution

    Distributed core: Disassembly

    The difference between microservices and distribution

    • Distributed: whether horizontal split or vertical split, split on the line
    • Micro services: vertical split (split according to business logic, e-commerce: users, payments, shopping…) , minimize split

    Horizontal split: JSP /servlet, Service, DAO split at different levels, vertical split: according to the business logic split into independent small projects

    Theory of CAP

    Principles that must be considered in any distributed system.

    • C: Consistency (strong consistency) : data on all child nodes is consistent at all times
    • A: Usability: overall availability
    • P: partition fault tolerance: partial failure is allowed

    CAP theory: in any distributed system, C\A\P cannot coexist, only two can exist.

    Basics: In general, at least P should work, because distributed networks often have weak networks. So you have to choose between C and A.

    Here’s an example:

    When computer A fails, the fault tolerance of the partition is satisfied. If the consistency is satisfied, the partition must be rolled back. Otherwise, computer B has data and Computer A does not, so data consistency is not satisfied.



    The BASE theory of

    The purpose of BASE theory: to make up for the deficiency of CAP.

    Concepts to understand:

    • Strong consistency (consistent all the time, consistent in a short time)
    • Final consistency (as long as the final consistency is ok)
    • Soft state: When multiple nodes are deployed, data inconsistency is allowed.

    Try your best to approximate CAP 3: final consistency instead of strong consistency C

    BASE theory: the preference satisfies A\P and therefore does not satisfy C. But you can replace C with final consistency.

    BASE: Basically Available

    Distributed cache





    The cache problem





    Cache breakdown:

    • A hotspot data expires, resulting in a large number of user requests directly to the DB.

    Solution:

    1. Monitoring thread
    2. Set the time in advance to ensure that hotspot data does not expire during peak hours

    Cache avalanche:

    • A large number of caches all fail (a large number of caches are set to expire at the same time; Cache server failure)

    Solution:

    1. Allocate proper cache expiration time
    2. Setting up a Cache Cluster

    Cache penetration:

    • Malicious attack. In general, meaningless data is not cached. But a malicious tool can make repeated requests with meaningless data.

    Solution:

    Meaningless data is also cached, and the expiration time is set relatively short.

    Generally speaking, local cache is level 1 cache, and distributed cache is level 2 cache.

    Consistency of the hash

    Hash algorithm: Mapping

    Strings, pictures, and objects are converted to numbers

    Hash (a.png) converts to 12312313

    Consistent hash was originally used to solve distributed caching problems





    One problem with mapping data to a cache server is that the cache is invalidated when the number of servers changes.





    Resolve problem caused by Server Number Change: Consistent hash




    Hash skew problem




    Resolve hash skew: virtual nodes

    Generate multiple virtual nodes in the replacement, the subsequent request to find the virtual node first, and then through the virtual node to find the corresponding real node





    Cache consistency

    The easiest thing to think about, but not recommended: update the cache as soon as DB is updated; Delete the cache first, then update the DB.

    Recommendation: Delete the cache immediately after DB is updated

    Analysis :(counter example)

    Update the cache as soon as DB is updated (not recommended)

    Question:

    If thread A updates first and thread B updates later, the end result may be the result of THREAD A. Cause: Thread execution speeds may be inconsistent.

    Recommendation: Delete the cache immediately after DB is updated. Because: For deletes, there is no speed sequencing problem (there is no speed inconsistency problem)

    (With this recommended approach, there is still a very small probability of error.)



    This is an error case of this recommendation, but the probability of occurrence is extremely small: 1. 2. The occurrence of the premise: the writer thread is faster than the reader thread, almost impossible.

    Q: Can this extremely small error situation be avoided?

    A: can!!!! The above error occurs on the premise that the read operation and the write operation are interleaved. Lock, or read/write separation. But general advice, don’t solve.

    Can switch the order: delete the cache, and then DB update after

    Example:



    The premise of the above error: write slowly, read fast. Therefore, the probability of occurrence is greater.

    A distributed lock




    Distributed locks: database unique constraints, Redis, Zookeeper

    In architecture development: there is no best, only the right.

    Use Zookeeper to implement distributed locks

    Zookeeper: Distributed coordination framework consisting of a tree (similar to a DOM tree)

    The tree species contains many leaf nodes, the type of leaf nodes used in distributed locks: temporary sequential nodes.

    Zookeeper provides the following two types of support:

    • Listener: can listen to the state of a node, when the state changes to trigger the corresponding event
    • Timing of temporary node deletion: when the client disconnects from the temporary node accessed

    Lock: Only one thread/process can access it at a time



    Distributed transaction

    Local transactions: One transaction for one database connection. (Transaction: Connection = 1:1)

    Local transactions cannot be used if they do not match the 1:1 relationship described above. For example:

    1. A node has multiple database instances. Two databases have been set up locally: order database and payment database.

    Order operation = Order database + payment database

    2. In a distributed system, multiple services deployed on different nodes access the same database.

    The usefulness of local transactions is very limited.

    Distributed transaction implementation

    The sample

    • Order operation = Order database + payment database
    • Server A: Order operation, order database
    • Server B: Payment database

    Simulate distributed transactions: single order operation

    begin transaction:

    1. Execute the order database;

    2. Execute the payment database

    commit/rollback ;

    Above, there are problems in a distributed environment:

    begin transaction:

    1. Execute the order database locally (server A);

    2. Remotely invoke the payment database on server B

    commit/rollback ;

    If the local order service is successful and the remote payment is successful, but the response cannot be made due to network problems, the user will think that the order has failed.



    Above, distributed transactions can be used when local transactions cannot operate.

    Use 2PC to implement distributed transactions

    2PC: 2 Phase Commit, consisting of a coordinator and multiple participants (similar to master-slave structure)

    Coordinator: Transaction manager,TM

    Participant: Resource manager,RM

    Two-stage refers to:

    Prepare phase: When the transaction begins, the coordinator sends a Prepare message to all participants, asking them to execute the transaction. Participants receive the message and either agree or reject it. If agreed, the transaction is executed locally, logged, but not committed.

    Commit phase: If all participants agree, the coordinator sends the commit request to all participants again. Otherwise, rollback is performed.



    The general idea is as follows: Perform a task in two steps: 1. Request each node to perform the task. 2. If everyone agrees, ask to submit it together

    Disadvantages of 2PC: 1. Distributed transactions are blocking during execution, thus introducing latency 2. If coordination encounters a weak network environment at step 5, some nodes may not commit. 3. Centralized architecture, single point of disaster.

    Other distributed transaction solutions: Three-phase commit, TCC for distributed transactions, message queues for distributed transactions

    Note: If you want to ensure strict transaction consistency: PaxOS algorithm. Google chubby author

    Distributed authentication & Distributed authorization

    Authentication method: system development, tripartite platform

    System development: distributed authentication

    Third-party Platform: Distributed Authorization (SSO SSO)

    Distributed authorization: OAuth2.0 authorization protocol, the flow of which is as follows:




    Tip: Historically, OAuth2.0 is not compatible with OAuth1.0