preface

When it comes to the properties of mysql transactions, everyone knows the four properties: ACID, atomicity, consistency, isolation, persistence. Among them, atomicity, isolation and persistence are relatively intuitive to understand, and their definitions are relatively unified in various places. “Consistency” is less intuitive. You seem to know what it means, but you can’t. This article explores what “consistency” in databases really means.


First, why there are transactions?

Many people take the concept of “business” as a natural thing, as a matter of course as air.

In fact, they are not. Transactions occur because they simplify the programming process when an application accesses the database without having to worry about potential errors and concurrency issues.

When we use transactions, we either roll back or commit. Deal only with business problems, not network jitter or physical machine problems. Don’t have to try… Catch and exception judgment. Transactions were born to serve the application layer, not the database system. If there are no transactions, code must be used to ensure data integrity and accuracy. This would be a disaster for programmers.

Why is “consistency” hard to understand?

I find consistency difficult to understand for two reasons.

The meaning of “consistency” is generalized

The concept of “consistency” is used in all kinds of systems and designs. There are at least four concepts of consistency.

  • Consistency in ACID
  • Consistency hashing
  • Consistency of CAP theorem
  • Distributed multi-copy consistency

Among them, the consistency of CAP refers to the linear consistency, which mainly describes the coordination problem before the copy under failure and delay. Consistent hashing is designed to solve distributed caching problems. Distributed multi-copy consistency ensures data consistency among nodes in a distributed system.

None of this is consistent in the sense of ACID, but you can’t say nothing. Many people confuse database consistency with other concepts of consistency, so they often get confused and confused.

Database consistency has two meanings

Database consistency actually has two layers. At the database level, “consistency” is actually a set of constraints, rules. In business terms, “consistency” actually refers to transactions that move the database from one valid state to another.

Consistency is a constraint

From the database’s point of view, it only cares if transcation complies with defined rules, and if it complies with rules, it complies with conformance.

So what are those rules? It can be a constraint, CASCADE (when updating/DELETE records on the parent table, synchronize update/ DELETE to remove matching records on the child table), or trigger. Or a combination of them.

Take the case of inventory of goods. When the inventory >=1, it indicates that there are commodities and they can be sold. Inventory =0 means that no goods can be sold. Based on our plain living experience, inventory cannot be less than 0.

Therefore, our database constrains the number of goods to be greater than or equal to zero. If the update good_table set good_number=good_number-1 statement is executed when inventory =0, the database will reject it because it breaks “consistency”.

From a database perspective, it considers consistency to be just that. However, for business logic, the database does not know, nor does it care. For example, two users simultaneously reduce their inventory by one at a time, causing one user’s inventory reduction operation to be overwritten by another user (this is often the case with oversold problems, such as the article titled “Kill Feitian Maotai Oversold” that was seen everywhere two days ago). The database is not protected against such errors.

That’s what I’m going to talk about in a second, consistency is an end.

Consistency is an end

At a higher level, consistency is the goal of programming. Ensure that transactions can only move from one correct state to another correct state.

What is the right state?

A database stores data in a certain schema, which is essentially a modeling of the real world and a mapping of the physical world. A database is the correct state when the state in the database is consistent with the data state in the real world.

That might be a little abstract, but let me give you an example. Let’s also do the transfer example, so let’s say that PERSON A’s bank account has $500, and person B’s bank account has $0. A transfers 100 yuan to B over the network. The transfer process must be divided into two stages, the first is to deduct 100 yuan from A’s account, the second is to add 100 yuan to B’s account. If the process goes well, A’s account will change to 400 yuan and B’s account to 100 yuan.

However, if there is A “consistency” problem, it may appear that the account of A changes to 400 yuan, but the account of B is still 0 yuan. This would never happen in the real world, where either the $100 bill has already been handed to B, or it’s still in A’s hand, and either way, there’s no way that A’s $100 has somehow gone missing, and B doesn’t have A $100 bill. Unless it is a ghost…… (Ps: Forget about consistency in this case and get glasses to control the degree of myopia.)

Congruence with the real world is the right state.

At this point, “consistency” is a purpose, relying on the database is not complete, but the database is the performance of “consistency” carrier. So how to achieve this consistency?

Lock!!!!! Select XXX from XXX lock in share mode (MVCC, select XXX from XXX lock in share mode)

conclusion

The concept of “consistency” is rampant. It is necessary to clearly distinguish the “consistency” of the database from other “consistency”. The “consistency” of a database is, at bottom, a set of constraints. This set of constraints can be constraints, triggers, etc., or a combination of them. At a higher level, “consistency” is a purpose to maintain a correct mapping between the database and the real world. At this point, various locks are needed to achieve “consistency”.