Writing in the front

In practice, there is a very common concurrency scenario: read more and write less. In this scenario, caching is often used to improve application access performance in order to optimize application performance. Because caching works well in scenarios where you read too much and write too little. In the concurrent scenario, ReadWriteLock is provided in the Java SDK to satisfy the scenario of read more than write less. This article explains how to implement a generic cache center using ReadWriteLock.

The knowledge points involved in this article are:

The article has been included:

Github.com/sunshinelyz…

Gitee.com/binghe001/t…

Read-write lock

Speaking of read-write lock, I believe that friends are not unfamiliar. In general, read/write locks should follow the following principles:

  • A shared variable can be read by multiple readers at the same time.
  • A shared variable can only be written by one writer thread at a time.
  • A shared variable cannot be read by the reader thread when it is being written to.

One important difference between a read-write lock and a mutex is that a read-write lock allows multiple threads to read a shared variable simultaneously, while a mutex does not. Therefore, read/write locks perform better than mutex locks in high concurrency scenarios. However, the write operations of the read-write lock are mutually exclusive, that is, when a shared variable is written by the writer thread, the shared variable cannot be read by the reader thread.

Read and write locks support fair and unfair modes, which are controlled by passing a Boolean variable in the ReentrantReadWriteLock constructor.

public ReentrantReadWriteLock(boolean fair) {
    sync = fair ? new FairSync() : new NonfairSync();
    readerLock = new ReadLock(this);
    writerLock = new WriteLock(this);
}
Copy the code

In addition, the need to note is that: in the read-write lock, lock call newCondition () will throw an UnsupportedOperationException exception, that is to say: read lock does not support condition variables.

Cache implementation

Here, we use ReadWriteLock to quickly implement a cached generic utility class, with the overall code shown below.

public class ReadWriteLockCache<K.V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    / / read lock
    private final Lock r = rwl.readLock();
    / / write locks
    private final Lock w = rwl.writeLock();
    / / read the cache
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally{ r.unlock(); }}/ / write cache
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally{ w.unlock(); }}}Copy the code

As you can see, in ReadWriteLockCache, we define two generic types, K for the cache Key and V for the cache value. Inside the ReadWriteLockCache class, we use a Map to cache data. We know that HashMap is not thread-safe, so we use a read-write lock to keep the thread safe. For example, we use a read lock in the get() method. The get() method can be read by multiple threads simultaneously; The put() method uses a write lock internally, which means that only one thread can write to the cache at a time.

It is important to note that the lock release operation needs to be placed in the read lock or write lockfinally{}Code block.

In the past, there were two ways to load data into the cache. One was to load the full amount of data into the cache when the project was started, and the other was to load as much cache data as needed during the project.

Next, let’s take a look at the full load cache and the on-demand load cache.

Full load cache

The full load cache is relatively simple, that is, when the project is started, the data is loaded into the cache at a time. This applies to scenarios where the amount of cached data is not large and data changes are not frequent. For example, data dictionaries and other information in the system can be cached. The overall cache loading process is shown below.

Once the data is fully loaded into the cache, the subsequent data can be read directly from the cache.

Full load cache code implementation is relatively simple, here, I will directly use the following code to demonstrate.

public class ReadWriteLockCache<K.V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    / / read lock
    private final Lock r = rwl.readLock();
    / / write locks
    private final Lock w = rwl.writeLock();
    
    public ReadWriteLockCache(a){
        // Query the databaseList<Field<K, V>> list = ..... ;if(!CollectionUtils.isEmpty(list)){
            list.parallelStream().forEach((f) ->{
				m.put(f.getK(), f.getV);
			});
        }
    }
    / / read the cache
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally{ r.unlock(); }}/ / write cache
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally{ w.unlock(); }}}Copy the code

Load the cache on demand

On-demand caching can also be called lazy loading, which means data is loaded into the cache only when it needs to be loaded. To be specific: when the program is started, the data will not be loaded into the cache. When the program is running, it needs to query some data, first detect whether there is the needed data in the cache, if there is, it will directly read the data in the cache, if not, it will query data in the database and write the data into the cache. Subsequent read operations, because the corresponding data already exists in the cache, directly return the cached data.

This approach to the query cache applies to most scenarios where data is cached.

We can use the following code to represent the business of on-demand query caching.

class ReadWriteLockCache<K.V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl =  new ReentrantReadWriteLock();
    private final Lock r = rwl.readLock();
    private final Lock w = rwl.writeLock();
    V get(K key) {
        V v = null;
        / / read the cache
        r.lock();        
        try {
            v = m.get(key);
        } finally{
            r.unlock();    
        }
        // It exists in the cache
        if(v ! =null) {  
            return v;
        }  
        // Do not exist in cache, query database
        w.lock();     
        try {
		   // Verify the presence of data in the cache again
            v = m.get(key);
            if(v == null) {// Query the databaseV = data queried from the database m.puff (key, v); }}finally{
            w.unlock();
        }
        returnv; }}Copy the code

Here, in the get() method, the data is first read from the cache. At this point, we add a read lock to the query cache operation and unlock the query when it returns. Check whether the returned data in the cache is empty. If it is not empty, the data is returned directly. If it is null, the write lock is acquired and data is read from the cache again. If there is no data in the cache, the database is queried and the result data is written to the cache to release the write lock. The result data is finally returned.

Why do we need to query the cache inside the write lock when the program already has a write lock?

This is because in high concurrency scenarios, there may be multiple threads competing to write locks. For example, the first time the get() method is executed, the data in the cache is empty. If three threads call get() at the same time, running into w.lock() code at the same time, because of the exclusivity of the write lock. Only one thread will acquire the write lock, and the other two threads will block at w.lock(). The thread that acquires the write lock proceeds to query the database, writes the data to the cache, and releases the write lock.

At this point, two other threads compete to write the lock, one thread will acquire the lock, continue to execute, if there is no v = m.net (key) after w.lock(); The thread queries the database again, writes the data to the cache, and releases the write lock. The last thread will follow the same process.

In this case, the first thread has actually queried the database and written the data to the cache, and the other two threads do not need to query the database again and can simply query the corresponding data from the cache. So, add v = m.net (key) after w.lock(); Querying cached data again can effectively reduce the problem of repeated database query in high concurrency scenarios and improve system performance.

Upgrade or downgrade of read/write locks

In ReadWriteLock, locks cannot be upgraded. If you acquire a write lock while the read lock is still on, the write lock will wait forever and the corresponding thread will block and cannot be woken up.

While lock escalation is not supported, ReadWriteLock does support lock degradation. For example, take a look at the official ReentrantReadWriteLock example, shown below.

class CachedData {
    Object data;
    volatile boolean cacheValid;
    final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();

    void processCachedData(a) {
        rwl.readLock().lock();
        if(! cacheValid) {// Must release read lock before acquiring write lock
            rwl.readLock().unlock();
            rwl.writeLock().lock();
            try {
                // Recheck state because another thread might have
                // acquired write lock and changed state before we did.
                if(! cacheValid) { data = ... cacheValid =true;
                }
                // Downgrade by acquiring read lock before releasing write lock
                rwl.readLock().lock();
            } finally {
                rwl.writeLock().unlock(); // Unlock write, still hold read}}try {
            use(data);
        } finally{ rwl.readLock().unlock(); }}}}Copy the code

Data Synchronization Problems

First, data synchronization refers to data synchronization between the data source and the data cache, or more directly, data synchronization between the database and the cache.

Here, we can take three solutions to solve the problem of data synchronization, as shown in the figure below

Timeout mechanism

When the cache times out, the cached data will be automatically removed from the cache. When the program accesses the cache again, it queries the database and writes the data to the cache because there is no corresponding data in the cache.

This solution needs to pay attention to the cache penetration problem, for more information about cache penetration, breakdown, and avalanche, you can refer to the “[high concurrency] interviewer: What is cache penetration?

Periodically update cache

This scheme is an enhanced version of the timeout mechanism, which also gives a timeout when writing data to the cache. Different from the timeout mechanism, a separate thread is started in the background of the program to periodically query the data in the database and then write the data to the cache, which can avoid the cache penetration problem to a certain extent.

Real-time update cache

In this scheme, the data in the database can be synchronized with the cached data in real time, and the open source Canal framework of Alibaba can be used to realize the real-time synchronization of MySQL database and cached data. You can also use my own open source MyKit-Data framework (recommended)

Recommended reading

  • Interviewer: Tell me about cache penetration. The breakdown? An avalanche? How to solve it?
  • Two lines fix mysql8.x binlog parsing error!!

Mykit-data open source address:

  • Github.com/sunshinelyz…
  • Gitee.com/binghe001/m…

If you have any questions, you can leave a comment below or add me to wechat: SUN_shine_LYz. I will pull you into the group. We can exchange technology together, advance together, and make great force together