As long as the dry program know what “cache” is, and even with IT related industries all kinds of personnel from time to time, the system slow can add “cache” ah, such as the little sister of the test group, the little sister of the operation group, the little brother of the product group. But is caching really so simple that everyone can use it?

It is well known that caching can make slow pages open in seconds. Almost every APP and website you visit involves the use of caching.

So what does caching do, besides speed up data access?

Also, as everything has two sides, how can we maximize the benefits of caching while avoiding its drawbacks?

This article discusses cache from the following aspects

  1. What does caching do?

  2. Where can I cache?

  3. Is cache Silver Bullet?

1. What does caching do?

As mentioned earlier, the most common understanding is that when we encounter a slow page, we want to introduce caching so that the page opens quickly.

In fact, fast and slow are relative. From a technical point of view, the cache is fast because the cache is built based on memory, and the read and write speed of memory is X times faster than that of hard disk. Therefore, using memory instead of hard disk as the read and write medium can greatly improve the data access speed.

The process goes something like this, by storing the queried data in memory for subsequent access.

In addition, there are two other important uses of cache: prefetch and deferred write.

Proofread the take

Prefetch is the preloading of data, also known as cache preheating. Before the system provides external services, it loads some data from the hard disk into the memory and then provides external services.

The reason for doing so is that some systems once started will face thousands of requests in, if directly let these requests hit the database, very large may be the database pressure explosion, directly dry, unable to respond normally.

To alleviate this problem, prefetch is required.

You might ask, even with caching, it still can’t carry? Then you need to do horizontal scaling + load balancing.

If prefetch is a buffer in front of the data exit, then deferred write is, as the name implies, a buffer behind the data entry.

Delay to write

As you all know, writing to a database is slower than reading because there are a number of mechanisms to ensure that data is accurate when writing.

So, if you want to improve the speed of writing, you can either do a separate table, or you can cache a buffer, and then write to disk in batches to speed up.

Because of the huge side effects of cross-table operation and multi-condition combination query, the complexity of introducing database and table division is much greater than that of introducing cache, so we should give priority to the scheme of introducing cache.

Then, caching to speed up the process of “writing” is called deferred writing. Write data to disk or database in advance, temporarily write to memory, and then return success. Then periodically write the data in the memory to the disk in batches.

You may think that writing to memory is successful, in case of an accident, power failure, downtime and other abnormal termination of the program, the data will not be lost?

Yes. Therefore, deferred write is generally only used in scenarios where data integrity is not so critical. For example, the number of likes, the number of participants, etc., can greatly relieve the pressure caused by frequent changes to the database.

In fact, the default persistence mechanism used in Redis, a distributed cache known to us, is RDB.

In a mature system, there are different places where caching can be used. Here’s Z to help you sort out where we can cache.

Where can I cache?

Let’s get one thing straight. What do we want to cache? That is, what characteristics of the data need to be cached? After all, caching is an additional cost and is worth it.

Generally speaking, you can use these two criteria to judge: hot data, which is accessed frequently, such as dozens of times per second, and static data, which is rarely changed, read much more than write, such as every few days.

Then you can find the right place to cache them.

Caching is by nature a “defensive” mechanism, and data flow between systems is an orderly process. So choosing where to put a cache is like choosing where to put a roadblock on a road. The road behind this barricade is protected from traffic.

So there are roughly the following locations that can be set up as cache points along the road from the end user to the database used by the system.

Each set point blocks some traffic, creating a funnel that protects the underlying system and ultimately the database.

Browser cache

This is the closest place to the user that can be used as a cache, and with the help of the user’s “resources “(cached data on the user’s terminal device), the best cost performance, let the user help you share the pressure.

When you open developer tools in your browser and see “From Cache” or “From Memory Cache” or “From Disk Cache”, it means that the data has been cached on the user’s device (which is why some of the data can be accessed even when the Internet is offline).

This process is done by the browser for us and is generally used to cache images, JS, CSS, etc. We can Control it through cache-control in the Http header, but the details are not covered here.

The use of global variables and cookies in JS also falls into this category.

Browser caches are user-side cache points, so we have a lot less control over them, and you can’t actively update data without making a new request.

CDN cache

Service providers providing CDN services deploy a large number of server nodes (called edge servers) nationwide or even globally.

Distributing data to these widely distributed servers as a cache, allowing users to access cached data on nearby servers, can help spread the load and speed it up. This works especially well on TOC-type systems.

However, it should be noted that due to the large number of nodes, updating cached data is slow, usually at least at the minute level. Therefore, it is generally only applicable to static data that does not change frequently.

The solution is to add an increment or unique identifier to the URL, such as? V = 1000. Because different urls are treated as “new” data and files, they are recreated.

Gateway (proxy) cache

Caching here is your home turf. Many times we will have a layer of gateway (or reverse proxy, forward proxy) in front of the source station for some security mechanism or unified traffic policy entry.

It’s also a good place to do caching. After all, the gateway is “business independent”, and its ability to block requests is also a big benefit to the source behind it, reducing a lot of CPU processing.

Common gateway (proxy) caches are Varnish, Squid, Ngnix. In general, simple caching scenarios, use Nginx, because most of the time we will use it for load balancing, can introduce one less technology to reduce the complexity. For a large number of small files, you can use Varnish, whereas Squid is larger, more comprehensive and more expensive to use.

In-process cache

The fact that a request can reach this point indicates that it is “business relevant” and needs to be evaluated by business logic.

Because of this, the cost of introducing a cache from here is much higher than the previous three because of the higher requirement for data consistency between the cache and the database.

This is probably the first time most of us programmers deliberately use caching. There are a lot of details about in-process and out-of-process caching that need to be addressed in a future article.

Out-of-process cache

You can even write your own program to store cached data that can be called remotely by other programs.

Again, we’ll talk about the details later, but here’s a few more tips on how to choose between Redis and memcached.

You can use Memcached if you are serious about resource utilization (CPU, memory, etc.), but your program needs to tolerate data loss because it is a pure memory mechanism. If you can’t tolerate this and are aggressive about resource utilization, you can use Redis. Redis also has a larger database structure. Memcached only has key values and is more like a NOSQL store.

Database cache

The database itself has its own cache module, otherwise it will not be called the memory killer, basically you give how much memory can eat how much.

The database cache is the internal mechanism of the database, which we won’t go into here. The configuration that sets the size of the cache space is usually given for you to intervene.

Finally, the disk itself has a cache. So you’ll find that it’s really hard to get data to write smoothly to a physical disk, and you never know when a disk “fast” enough that you don’t need to worry about caching will come to the rescue of us programmers.

Is cache Silver Bullet?

You might think that caching is so good, that more should be better, as long as it’s slow to cache?

No matter how good a thing looks, it has a negative side. Caching also has a number of side effects to consider. In addition to the cache updates and data consistency issues mentioned above, there are other issues such as:

1. Cache avalanche

2. Cache penetration

3. Cache concurrency

4. Cache bottomless pit

5. Cache obsolescence

6、…

And so on, these brother Z will be in the next article with you in-depth analysis.

conclusion

All right, so let’s wrap things up. This time, the IT Institute introduces you to three ways to use caching.

After that, several locations of the cache can be set up in a complete system, and share some experience about the use of browser cache, CDN cache, gateway (proxy) cache.