Guava source code analysis (Cache principle)

preface

Guava is a core enhanced Java library that is widely used.

I usually use quite frequently, this time with the help of daily use of the Cache component to see how Google big bulls are designed.

The cache

This discussion focuses on caching.

Caching is an important part of everyday development, and if your application has a high reading frequency for certain types of data and changes are small, caching is a great way to improve performance.

Caching improves performance because it is very efficient, just like CPU’s L1, L2, and L3 caches, the higher the level, the faster the read.

It’s not all good, it’s faster but it has less memory and more valuable resources, so we should cache the data we really need.

In fact, it is a typical space for time.

Let’s talk about caching in Java.

The JVM caching

First is the JVM cache, which you can also think of as the heap cache.

Create global variables, such as maps and lists, to store data in containers.

Such advantages are simple to use but also have the following problems:

Data can only be written explicitly to clear data.
Data cannot be eliminated according to certain rules, such asLRU, LFU, FIFOAnd so on.
Callback notification when data is cleared.
Some other custom functions.

Ehcache, Guava, Cache

As a result, several open source tools have emerged specifically for JVM caches, such as the Guava Cache mentioned in this article.

It has features not provided by the previous JVM cache, such as automatic data cleanup, multiple cleanup algorithms, and cleanup callbacks.

But because of these features, such caches inevitably have a lot of additional things to maintain, which naturally increases the consumption of the system.

Distributed cache

Both types of caches mentioned above are actually in-heap caches that can only be used on a single node, making them untenable in distributed scenarios.

There are caching middleware, such as Redis and Memcached, that can share memory in a distributed environment.

The details are beyond the scope of this discussion.

Guava Cache example

Guava’s Cache comes to mind recently as a requirement, which is as follows:

Read application system logs from Kafka in real time. This log information contains application health status. If abnormal information occurs X times in time window N, I need to give feedback (alarm, log, etc.) accordingly.

For this reason, Guava’s Cache is very suitable. I take advantage of its characteristics of clearing the Cache when no data is written in N times. I can determine whether the exception information is greater than X each time I read data.

The pseudocode is as follows:


    @Value("${alert.in.time:2}")
    private int time ;

    @Bean
    public LoadingCache buildCache(a){
        return CacheBuilder.newBuilder()
                .expireAfterWrite(time, TimeUnit.MINUTES)
                .build(new CacheLoader<Long, AtomicLong>() {
                    @Override
                    public AtomicLong load(Long key) throws Exception {
                        return new AtomicLong(0); }}); }/** * Determine whether the alarm needs to be raised */
    public void checkAlert(a) {
        try {
            if (counter.get(KEY).incrementAndGet() >= limit) {
                LOGGER.info("* * * * * * * * * * * warning * * * * * * * * * * *");

                // Clear the cache
                counter.get(KEY).getAndSet(0L); }}catch (ExecutionException e) {
            LOGGER.error("Exception", e); }}Copy the code

The first is to build a LoadingCache object that reclaims the cache if no data is written to it within N minutes (0 is returned by default if the cache cannot be retrieved by Key).

The checkAlert() method is then called on each consumption to verify the above requirement.

Let’s think about how Guava can automatically erase expired data in the same way that LRU does.

Let’s make a bold assumption:

The order of the cache is maintained internally by a queue, each time the accessed data is moved to the head of the queue, and an additional thread is opened to determine whether the data is expired and deleted. This is somewhat similar to the hands-on implementation of an LRU cache I wrote about earlier

Hu Shi once said: Bold hypothesis careful argumentation

Here’s how Guava does it.

The principle of analysis

Look at the principle is best with the code step by step:

Sample code is here:

Github.com/crossoverJi…

In order to see how Guava deleted expired data, it went to sleep for 5 seconds before retrieving the cache, reaching the timeout condition.

Will eventually be found at com.google.com mon. Cache. LocalCache class 2187 lines is the key.

If we follow up on line 2182, we need to determine if count is greater than 0, which holds the current cache quantity, and volatile to ensure visibility.

For more information about volatile, see the volatile keyword you should know

Then follow:

2761 line, according to the method name, is to check whether the current Entry is expired. This Entry is queried through the key.

It is obvious that the current key is expired based on the expiration method specified at build time.

If it expires, go down and try expiring deletion (locks are required, more on that later).

And here it is clear:

Gets the total number of current caches
Decrement by one (previously acquired lock, so thread-safe)
Delete and assign the updated total to count.

In fact, this is basically the process. Guava does not maintain expired data on another thread as previously assumed.

It should be for the following reasons:

New threads require resource consumption.
Maintaining expired data also requires acquiring additional locks, which increases consumption.

This is done in conjunction with the query, but if the cache is not accessed for a long time, the data will not be recycled, but this is not a problem for a high-throughput application.

conclusion

Finally, a summary of Guava’s Cache.

The following code is used to locate data by a key:

If you’ve seen how ConcurrentHashMap works, you’ll realize that this is very similar.

In fact, the core data structure of Guava Cache is based on ConcurrentHashMap in order to meet the requirements of concurrent scenarios, which is also a process of locating keys to a specific location.

Finding the Segment first and then the location is like doing a Hash Hash twice.

One of the assumptions above is true, it maintains two accessQueues internally, and writeQueue is used to record the cache order so that data can be flushed out in order (similar to LRU caching using LinkedHashMap).

It is also the builder pattern to create objects from the above construction approach.

As a tool for developers, it requires a lot of custom properties, so the build pattern is perfect.

There are a lot of things Guava doesn’t talk about, such as its use of GC to reclaim memory, callback notifications when removing data, etc. And we’ll talk about it later.

Scan the code to follow the wechat official account and get the news in the first time.