Theory of CAP

The BASE theory of

Distributed cache

Consistency of the hash

Cache consistency

A distributed lock

Use Zookeeper to implement distributed locks

Distributed transaction

Distributed transaction implementation

Distributed authentication & Distributed authorization

Introduction to the

What technology is hot right now?

Big data, artificial intelligence, blockchain, edge computing, microservices, but the bottom of so many cutting-edge technologies all depend on distribution

Distributed core: Disassembly

The difference between microservices and distribution

Distributed: whether horizontal split or vertical split, split on the line
Micro services: vertical split (split according to business logic, e-commerce: users, payments, shopping…) , minimize split

Horizontal split: JSP /servlet, Service, DAO split at different levels, vertical split: according to the business logic split into independent small projects

Theory of CAP

Principles that must be considered in any distributed system.

C: Consistency (strong consistency) : data on all child nodes is consistent at all times
A: Usability: overall availability
P: partition fault tolerance: partial failure is allowed

CAP theory: in any distributed system, C\A\P cannot coexist, only two can exist.

Basics: In general, at least P should work, because distributed networks often have weak networks. So you have to choose between C and A.

Here’s an example:

When computer A fails, the fault tolerance of the partition is satisfied. If the consistency is satisfied, the partition must be rolled back. Otherwise, computer B has data and Computer A does not, so data consistency is not satisfied.

The BASE theory of

The purpose of BASE theory: to make up for the deficiency of CAP.

Concepts to understand:

Strong consistency (consistent all the time, consistent in a short time)
Final consistency (as long as the final consistency is ok)
Soft state: When multiple nodes are deployed, data inconsistency is allowed.

Try your best to approximate CAP 3: final consistency instead of strong consistency C

BASE theory: the preference satisfies A\P and therefore does not satisfy C. But you can replace C with final consistency.

BASE: Basically Available

Distributed cache

The cache problem

Cache breakdown:

A hotspot data expires, resulting in a large number of user requests directly to the DB.

Solution:

Monitoring thread
Set the time in advance to ensure that hotspot data does not expire during peak hours

Cache avalanche:

A large number of caches all fail (a large number of caches are set to expire at the same time; Cache server failure)

Solution:

Allocate proper cache expiration time
Setting up a Cache Cluster

Cache penetration:

Malicious attack. In general, meaningless data is not cached. But a malicious tool can make repeated requests with meaningless data.

Solution:

Meaningless data is also cached, and the expiration time is set relatively short.

Generally speaking, local cache is level 1 cache, and distributed cache is level 2 cache.

Consistency of the hash

Hash algorithm: Mapping

Strings, pictures, and objects are converted to numbers

Hash (a.png) converts to 12312313

Consistent hash was originally used to solve distributed caching problems

One problem with mapping data to a cache server is that the cache is invalidated when the number of servers changes.

Resolve problem caused by Server Number Change: Consistent hash

Hash skew problem

Resolve hash skew: virtual nodes

Generate multiple virtual nodes in the replacement, the subsequent request to find the virtual node first, and then through the virtual node to find the corresponding real node

Cache consistency

The easiest thing to think about, but not recommended: update the cache as soon as DB is updated; Delete the cache first, then update the DB.

Recommendation: Delete the cache immediately after DB is updated

Analysis :(counter example)

Update the cache as soon as DB is updated (not recommended)

Question:

If thread A updates first and thread B updates later, the end result may be the result of THREAD A. Cause: Thread execution speeds may be inconsistent.

Recommendation: Delete the cache immediately after DB is updated. Because: For deletes, there is no speed sequencing problem (there is no speed inconsistency problem)

(With this recommended approach, there is still a very small probability of error.)

This is an error case of this recommendation, but the probability of occurrence is extremely small: 1. 2. The occurrence of the premise: the writer thread is faster than the reader thread, almost impossible.

Q: Can this extremely small error situation be avoided?

A: can!!!! The above error occurs on the premise that the read operation and the write operation are interleaved. Lock, or read/write separation. But general advice, don’t solve.

Can switch the order: delete the cache, and then DB update after

Example:

The premise of the above error: write slowly, read fast. Therefore, the probability of occurrence is greater.

A distributed lock

Distributed locks: database unique constraints, Redis, Zookeeper

In architecture development: there is no best, only the right.

Use Zookeeper to implement distributed locks

Zookeeper: Distributed coordination framework consisting of a tree (similar to a DOM tree)

The tree species contains many leaf nodes, the type of leaf nodes used in distributed locks: temporary sequential nodes.

Zookeeper provides the following two types of support:

Listener: can listen to the state of a node, when the state changes to trigger the corresponding event
Timing of temporary node deletion: when the client disconnects from the temporary node accessed

Lock: Only one thread/process can access it at a time

Distributed transaction

Local transactions: One transaction for one database connection. (Transaction: Connection = 1:1)

Local transactions cannot be used if they do not match the 1:1 relationship described above. For example:

1. A node has multiple database instances. Two databases have been set up locally: order database and payment database.

Order operation = Order database + payment database

2. In a distributed system, multiple services deployed on different nodes access the same database.

The usefulness of local transactions is very limited.

Distributed transaction implementation

The sample

Order operation = Order database + payment database
Server A: Order operation, order database
Server B: Payment database

Simulate distributed transactions: single order operation

begin transaction:

1. Execute the order database;

2. Execute the payment database

commit/rollback ;

Above, there are problems in a distributed environment:

begin transaction:

1. Execute the order database locally (server A);

2. Remotely invoke the payment database on server B

commit/rollback ;

If the local order service is successful and the remote payment is successful, but the response cannot be made due to network problems, the user will think that the order has failed.

Above, distributed transactions can be used when local transactions cannot operate.

Use 2PC to implement distributed transactions

2PC: 2 Phase Commit, consisting of a coordinator and multiple participants (similar to master-slave structure)

Coordinator: Transaction manager,TM

Participant: Resource manager,RM

Two-stage refers to:

Prepare phase: When the transaction begins, the coordinator sends a Prepare message to all participants, asking them to execute the transaction. Participants receive the message and either agree or reject it. If agreed, the transaction is executed locally, logged, but not committed.

Commit phase: If all participants agree, the coordinator sends the commit request to all participants again. Otherwise, rollback is performed.

The general idea is as follows: Perform a task in two steps: 1. Request each node to perform the task. 2. If everyone agrees, ask to submit it together

Disadvantages of 2PC: 1. Distributed transactions are blocking during execution, thus introducing latency 2. If coordination encounters a weak network environment at step 5, some nodes may not commit. 3. Centralized architecture, single point of disaster.

Other distributed transaction solutions: Three-phase commit, TCC for distributed transactions, message queues for distributed transactions

Note: If you want to ensure strict transaction consistency: PaxOS algorithm. Google chubby author

Distributed authentication & Distributed authorization

Authentication method: system development, tripartite platform

System development: distributed authentication

Third-party Platform: Distributed Authorization (SSO SSO)

Distributed authorization: OAuth2.0 authorization protocol, the flow of which is as follows:

Tip: Historically, OAuth2.0 is not compatible with OAuth1.0

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

The interviewer’s favorite question is “distributed” core design. If you haven’t mastered it, check it out!

Introduction to the

Theory of CAP

The BASE theory of

Distributed cache

The cache problem

Cache breakdown:

Cache avalanche:

Cache penetration:

Consistency of the hash

Cache consistency

A distributed lock

Use Zookeeper to implement distributed locks

Distributed transaction

Distributed transaction implementation

Use 2PC to implement distributed transactions

Distributed authentication & Distributed authorization

The interviewer’s favorite question is “distributed” core design. If you haven’t mastered it, check it out!

Introduction to the

Theory of CAP

The BASE theory of

Distributed cache

The cache problem

Cache breakdown:

Cache avalanche:

Cache penetration:

Consistency of the hash

Cache consistency

A distributed lock

Use Zookeeper to implement distributed locks

Distributed transaction

Distributed transaction implementation

Use 2PC to implement distributed transactions

Distributed authentication & Distributed authorization

Related Posts

Thread Pool Principle

MyBatis + Druid

How to debug a NodeJS application running on a server remotely with Visual Studio Code