Memcache is the most used KV cache in the layered architecture of the Internet. Memcache interview questions are almost mandatory. How well can you answer memcache interview questions?

Voiceover: It’s probably about where you get the offer.

The first question: do you know

This kind of question, inspect whether have not used, know whether to know, relatively easy answer.

Some basic features of memcache are as follows:

(1) THE core function of MC is KV memory management, and the maximum value storage is 1M, which does not support complex data structures (hash, list, set, ordered set, etc.);

(2) MC does not support persistence;

(3) MC supports key expiration;

(4) Memory fragmentation rarely occurs when MC runs continuously, and the speed does not decrease with the service running time;

(5) MC uses the non-blocking IO multiplexing network model and the multi-thread model of listener thread/worker thread;

When faced with these closed questions, be sure to answer them firmly and without hesitation.

Question 2: Why? What?

This kind of problem, inspects for a tool, only stays in the use level, still has the original rational thinking.

Why doesn’t Memcache support complex data structures? Why not support persistence?

The business-determined technical solution, MC, was born with the design goal of “managing KV memory as a service, not as a library”. What it overturns is that the KV memory management component library, complex data structure and persistence are not its original intention.

Of course, the word “disruptive” is not necessarily inappropriate. Libraries and services have different usage scenarios, but in a distributed environment, services are more widely used. The design goal and birth background are very important, which to some extent determines the implementation scheme. For example, the emergence of Redis is to have a better and more functional cache service.

Voiceover: I love asking this question, and most candidates are in a state of confusion when faced with a question that has no standard answer.

What technology does memcache use to implement key expiration?

Lazy expiration.

Why does memcache guarantee performance and very little memory fragmentation?

Allocate memory in advance.

Why memcache uses a non-blocking IO multiplexing network model? What are the advantages and disadvantages of using a multithreaded listening/worker model?

The goal is to improve throughput.

Multithreading can take full advantage of multiple cores, but it can cause some lock conflicts.

In the face of this kind of semi-open questions, some do not have standard answers, must answer their own thinking and opinion.

The third kind of problem: how to do (how) | text at first

These are the kinds of questions that test how well the candidate understands, understands, and understands the technology.

Voice-over: The so-called “curiosity” is really important, and it’s hard for technical people who just want “a job” to have this kind of curiosity.

What is memcache to implement memory management to reduce memory fragmentation, and how to implement memory allocation?

Before we get started, let’s explain a few very important concepts:

Chunk: It is the smallest unit of memory allocated to users.

Item: Data that the user wants to store, including keys and values, is eventually stored in chunk.

Slab: It manages several chunks of a fixed chunk size, while MC memory management consists of several slabs.

Voice-over: To avoid complexity, the concept of page will not be introduced in this article.

As shown in the figure above, a series of slabs manage 128B, 256B, 512B respectively… Chunk memory unit of.

Zoom in on sLAB0 that manages 128B in the figure above:

Some of the core data structures you can find in slab are:

  • Chunk_size: this slab manages chunks of 128B

  • Free_chunk_list: used to quickly find free chunks

  • Chunk [] : indicates the actual chunk space pre-allocated for storing user item data

Voice-over: There’s actually lru_list.


If a user wants to store a 100B item, how do they find the corresponding chunk available?

The corresponding chunk will be quickly found through free_chunk_list from the chunk[] closest to slab with the size of item. As shown in the figure above, the chunk closest to item is 128B.

Why don’t you have memory fragmentation?

Get a chunk of 128B to store a 100B item, and the remaining 28B will not be used by other items. That is, the storage space is actually wasted to reduce memory fragmentation and ensure access speed.

Voiceover: In theory, memory fragmentation is almost non-existent.

Memcache uses slab, chunk, and free_chunk_list to quickly allocate memory and store user items. How does memcache quickly implement key search?

There is no special algorithm:

  • Fast search through the Hash table

  • Conflict resolution through linked lists

In the simplest way, to achieve a quick key lookup.

As the number of items increases, the number of hash conflicts increases. How can hash tables ensure query efficiency?

When the number of items reaches 1.5 times the hash table length, the hash table dynamically expands and rehash data to ensure that the search efficiency does not decrease.

How do you ensure data consistency and service availability during migration when the same key is moved between the old and new hashes after an extension?

Hash table extension, data migration is a time-consuming operation, will have a dedicated thread to implement, in order to avoid large locks, the use of “segmented migration” strategy.

When the number of items reaches the threshold, the migration thread will migrate in sections, lock some buckets in the Hash table, migrate data, and unlock:

  • One is to ensure that there is no prolonged blocking that affects the availability of the service

  • Second, ensure that the item does not differ between the old and new hash tables

If a new item is inserted during the item migration, should the old hash table or the new hash table be inserted?

Memcache checks whether the bucket in which item should be inserted from the old hash table has been migrated to the new table:

  • If it has been migrated, the item is directly inserted into the new hash table

  • If it has not been migrated, the old hash table is inserted directly and the migration thread will wait to migrate to the new hash table in the future

Why would I do that? Can’t I just insert a new hash table?

Memcache ensures that all data stored in a bucket is stored in a single hash table (either new or old), and will never appear in any scenario.

Memcache allows keys to expire with lazy expiration.

The two most common ways to implement timeouts and expiration are:

  • Start a timeout thread that scans all items, and if a timeout is found, the timeout callback is processed

  • Each item sets a timeout signal notification that triggers timeout callback processing

Both methods require additional resource consumption.

The MC query service is very simple and only returns cache hit and cache miss results. In this scenario, lazy elimination is very suitable.

The core of lazy elimination is:

  • Item is not actively weeded out, that is, there are no timeout threads and no signal notifications to actively check

  • Item checks the timestamp every time it gets a query. If it has expired, it is passively eliminated and returns a cache miss

For example, if we set a key with a validity period of 100s:

  • In the 50s, a user gets the key, determines that the key is still valid, and returns the corresponding value

  • At 200s, another user queries (gets) the key, judges that it has expired, releases the chunk where the item resides, and returns the cache miss

This approach is very cheap to implement and consumes very few resources:

  • Add an expiration time attribute to item

  • In get, a time determination is added

Memory is always limited. When the number of chunks is limited, the number of items that can be stored is limited. What if chunk is used up?

If the chunk of 128B is used up and the user sets another item of 100B, should the existing item be removed?

Want to.

The implication here is:

(1) Item may be eliminated even if its validity period is set to “permanent”;

(2) If you want to do full data cache, you must carefully evaluate the size of cache memory, must be greater than the total size of full data, otherwise it is easy to pit;

Which item are you squeezing out? How crowded?

The LRU elimination mechanism is involved here.

If operating system memory management, the most common elimination algorithms are FIFO and LRU:

  • FIFO(First in first out) : The first set item, the first to be eliminated

  • LRU(least recently used) : The item that has been used least recently is eliminated first

Using the LRU algorithm to squeeze out item, two attributes need to be added:

  • Count of recent item access

  • Last item access time

And add an LRU linked list, can be quickly implemented.

Voiceover: So, each slab of chunk is managed, in addition to free_chunk_list, and also lru_list.

Thinking is more important than conclusion.

                                               

The architect’s Path– Share technical ideas

The article is longer, if there is a harvest, help forward + look again.