Use CDN to build direct read caches

Building a read-through cache using CDN

Aritra Das

The Nuggets translation Project

Permanent link to this article: github.com/xitu/gold-m…

Translator: Hoarfroster

Proofreader: Chorer, zaviertang

Caching is almost inevitable when you build apis for systems that require high throughput. Every developer working on a distributed system uses some caching mechanism at some point. In this article, we’ll look at the design of building direct reading caches using CDN. This approach not only optimizes the API, but also reduces infrastructure costs.

Having some knowledge of caches and CDN will help you understand this article. If you haven’t read at all, I’d like you to read a little before you come back to this article.

A bit of background

As OF this writing, I’m working with Glance. Glance is the world’s largest lock-screen-based content discovery platform, with about 140 million active users in this area. So you can imagine the scale of the services we need to run. And on this scale, even the simplest things can get super complicated. As backend developers, we strive to build highly optimized apis to provide a good user experience. The story starts here, how we face the exact problems, and then how we solve them. I hope that after reading this article, you have learned a lot about large-scale system design.

The problem

We need to develop some apis that have the following characteristics:

The data don’t change very often.
The response from all users is the same, no unexpected query parameters, simple and direct access to the API.
The maximum response data size is 600 kB.
We expect the API to have very high throughput as expected (approximately 50,000-60,000 requests per second).

So when you first saw this problem, what did you think? For me, just adding an in-memory cache (probably Google Guava) to the API on the node (due to low data volume), sending invalid messages using Kafka (because I like Kafka 😆 and it’s reliable), Set automatic sizing for service instance startup (because traffic is not uniform throughout the day). As shown in the figure:

Bam! Problem solved! Is it easy? Well, not really. Like any other design, this one has some flaws. For example, the design was a little too complex for a simple use case, and the infrastructure cost went up — we now needed to build a Kafka + Zookeeper cluster, plus 50,000 requests per second, We need to scale our service instances horizontally (Kubernetes Pods for us). This translates into an increase in the number of bare metal nodes or virtual machines.

So we looked for something simpler and more cost effective, which is why we ended up with a “CDN direct reading cache” solution. I’ll discuss architectural details and trade-offs later.

But before going any further, let’s look at the building blocks of design.

Read from cache

The standard cache update strategy looks like this

Cache aside
Read-through
Write-through
Write back
Refresh ahead

I won’t go into the details of other strategies, but rather focus on direct reading, which is what this article is about, after all. Let’s dig a little deeper to see how it works.

User1 -> is just an imaginary attempt to fetch data

The figure above is self-explanatory and summarizes the above.

The application never interacts directly with the database, but always with the cache.
When there is no cache, the cache is read from the database and enriched.
When there is a cache, the data comes from the cache.

As you can see, the database is accessed infrequently, and since our caches are mostly in-memory (REDIS/MEMCACHED), they are very responsive. Now we have solved quite a few problems 😅

CDN

The definition of CDN on the Internet is: “Content distribution network (CDN) is a globally distributed proxy server network, serving content from the location close to users, and used to provide static files such as images, videos, HTML, CSS, etc.”. But we’ll work backwards with the CDN and provide dynamic content to the user (JSON responses instead of static JSON files).

In addition, there are usually two concepts of CDN

Push CDN: Upload data to the CDN server
Pull CDN: The CDN will Pull data from your server (the original server)

We’ll be using the Pull CDN self-push method, and I’ll have to deal with retries and other things, which is an extra pain for me and doesn’t really add value to this use case.

Treat the CDN as a direct read cache

The idea is simple, we use the CDN as a caching layer between the user and the actual back-end service.

As you can see, the CDN sits between the client and the back-end service, known as the cache. In a data flow sequence it looks like this:

Let’s dig deeper into it, because that’s the crux of the design

Abbreviations to use

T1 -> time instance 1 + milliseconds

T2 -> time instance 1 + one minute + several milliseconds

TTL -> Retention time

Raw server -> in this case your actual back-end service

T1: The client initiates a request to obtain user1.
T1: Requests are transferred to the CDN.
T1: CDN finds no key associated with user1 in the cache.
T1: THE CDN requests up, that is, to your actual back-end server, to get user1.
T1: The back-end service returns the user1 response in standard JSON format.
T1: the CDN receives and stores the JSON
Now it needs to decide what the TTL for this data should be, and how does it do that?
There are usually two ways to do this: the original server specifies the length of the cache to be cached or a constant value is set on the CDN configuration. It will use this time to set the TTL.
It is better to let the original server set the TTL so that we can control or conditionally set the TTL the way we like.
Now the question arises how the original server specifies TTL.Cache-controlThe header does the work here. The response from the original server can containCache-controlHeaders, such asCache-control: Public; Max-Age: 180, indicating that the CDN can publicly cache this data, which is valid for 180 seconds.
T1: The CDN now gets this information and caches the data at a TTL of 180s.
T1: THE CDN responds to the client with the JSON file of user1.
T2: Another client requests user1.
T2: Requests are transferred to the CDN.
T2: THE CDN sees that it has the User1 key stored in its store, so it does not request the original server for JSON.
T3: Cache expires on CDN after 180 seconds.
T4: Some other client requests user1, but since the cache is empty, the CDN repeats the steps starting at Step 3, and so on.

You don’t have to keep the TTL only for 180 seconds, but you just need to choose the TTL based on how long you should cache stale data. If this leads you to the question, why not invalidate the cache when data changes, which I’ll answer later in the disadvantages section.

implementation

So far, we’ve been talking about design without really getting into the actual implementation. The reason for the design is very simple and can be implemented in any setup. For us, our CDN is on Google Cloud and the Kubernetes cluster of back-end services is on Azure, so we set it up according to our needs. For example, you can choose to do this on CloudFlare CDN, so it doesn’t go into implementation and remains abstract. But just for the curious mind, this is how we set up our production.

If you don’t understand this, that’s fine. If you understand the concepts, the related builds will be a piece of cake.

Here’s an excellent document from Google Cloud to get you started.

Request to merge

[This section was added after Abhishek Singla raised this issue in his comments]

But there was still a problem, the CDN handled all the load for us, but we didn’t have room to scale. However, our server will be running on 60K QPS, meaning that in the event of a cache miss, 60K calls will go directly to our source server (considering that it takes 1 second to populate the CDN cache), which might overwhelm the service, right?

This is where the request merge comes in.

As the name suggests, this is essentially a combination of multiple requests with the same query parameters to send only a small number of requests to the source server.

The beauty of our design is that we don’t have to do the request merge ourselves, the CDN will do it for us. As I already mentioned, we are using Google Cloud CDN, which has a concept of Request Coalescing (just another name for Request Collapsing). Therefore, when a large number of cache fill requests are issued simultaneously, the CDN can recognize this situation, and each node of the CDN will send only one request to the source server and respond to all requests with the corresponding response content. This is how it protects our source servers from high traffic.

Well, we’re almost at the end of this article. No design is complete without a pros and cons analysis. So let’s take a little look at design and see how it helps us, and where it doesn’t.

Advantages of Design

** Simplicity: ** This design is super simple and easy to implement and maintain.
** Response time: ** You know that CDN servers are geographically positioned to optimize data transfer, and as a result, our response time has also become super fast. For example, isn’t 60ms (ignoring TCP connection setup time) great?
** Load reduction: ** Since the actual back-end servers now receive requests only once every 180 seconds, the load is extremely low.
** Infrastructure costs: ** If we hadn’t done this, we would have had to scale our infrastructure to handle this load, which has significant costs. But Glance has invested heavily in the CDN. Since we are a content platform, why not use this approach? The increased cost of supporting these apis is now negligible.

Weaknesses of design

** Cache invalidation: ** Cache invalidation is one of the most difficult things to deal with in computer science, and it becomes even harder to solve when a CDN becomes a cache. Any burst cache invalidation on a CDN is a costly process and generally does not occur in real time. If your data changes, because we can’t invalidate the cache on the CDN, your client may get stale data for some time, depending on your TTL. If your TTL is set to a few hours, you can also invalidate the cache on the CDN. But if TTL is within a few seconds/min, that’s a big problem! Also, keep in mind that not all CDN providers disclose apis that invalidate CDN caches.
** Lack of control: ** Since requests are not being sent directly to our server right now, we feel that, as developers, we don’t have enough control over the system (or am I just a control monster 😈 trying to control everything). In addition, the observability may be slightly reduced. Even though we can set up logging and monitoring on the CDN at any time, there is usually an added cost to doing so.

A few words of feeling

Any design in a distributed world is slightly subjective and there are always trade-offs. It’s our job as developers or architects to weigh in and choose the design for our work. It is said that no design is concrete enough to go on forever, so given the constraints, we choose a certain design, depending on how it applies to us we may also develop further.

Thanks for reading!

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.