A scheme to ensure consistency between cache and database data

In the project, in order to improve the data access speed and reduce the pressure on databases (such as mysql), we usually put a copy of commonly used data into the cache, such as Redis.

When accessing data, I usually do the following:

If no data is found, query the mysql database, write the query result to the cache, and return.

So, there will be a common question, redis cache and mysql database, how to ensure data consistency?

The most rudimentary methods are usually used:

Delete the data in the cache first.

Write to mysql database.

Submit and return.

When the updated data is queried, query the cache and find that there is no data to query the mysql database. Then write the query result to the cache and return it.

In traffic like this will have a problem, is when the step 1, step 2 to complete before, if there are queries to come over, will go to query data in the database, and at this point, the completion of data in the database update has not yet been submitted, then look up to the front of the data or to update the data, the data will be written to the cache, When the user reads the data again after the database update is submitted, the data from the cache is still the data before the update. When this happens, it can go very wrong.

How to avoid the above problems? We can avoid this by locking:

Before data is updated, delete the cache and lock the corresponding key. In this case, the mysql database is not updated when the database is read. (For strict requirements, etCD is recommended, and the lock can be renewed.)

If there is a thread to read the data before the release, read the database directly.

Update mysql database

Release the lock and return.

When reading, read cache first; If not, read the database, put the data in the cache, and return the response.

This avoids the error of reading data during modification. However, if the amount of data is too large, it will bring great pressure to the database when updating. Therefore, we made further improvements as follows:

${key}_bak (${key}_bak); ${key}_bak (${key}_bak); There’s a lot of read data coming in and reading directly into the mysql database. (For strict requirements, etCD is recommended, and the lock can be renewed.)

If a thread reads the data before the release, the lock is found and the database copy key is read directly. The validity period of the copy setting is 10s. If you are not sure, you can set it longer.

Update the mysql database. Write cache for large concurrent data.

Release the lock and delete the cache copy.

When reading, read directly from the cache. If the amount of concurrent data is not large, you can not write to the cache during the update. If the first time you read the cache after the update, you can read the database, put the data into the cache, and return the response.

I have heard that some people switch the cache to the backup database when writing the database. This scheme should be no problem if the update is relatively few. If the update is relatively frequent, the frequent cutting of the database is also a big problem.

A scheme to ensure consistency between cache and database data

Related Posts

Go through Redis dictionary

OSM Data import PostgreSQL troubleshooting

[Message Queue series 6] RabbitMQ use posture