The biggest advantage of using more mature third-party solutions is that while saving their own research and development costs, they can also find more document information on the Internet, which is very helpful to help us solve some daily problems.

At present, the popular third-party Cache solutions mainly include Memcached, an object-based distributed memory Cache software, and Berkeley DB, an embedded database programming library. I’ll do an analysis and architectural discussion of each of these solutions.

For many readers, Memcached is not too new. Its popularity is not much worse than that of MySQL. Memcached is so popular for several reasons:

The communication protocol is simple and the API interface is clear. Efficient Cache algorithm, event handling mechanism based on libevent, excellent performance; Object-oriented features, very friendly to application developers; All data is stored in memory, data access is efficient; Open source software, based on BSD open source protocol; I won’t go too far into the details of Memcached itself, which is not the focus of this article. Let’s focus on how Memcached can help you scale your data service (which might not be appropriate if you were using the database itself).

To integrate Memcached into your system architecture, you first need to locate Memcached in your application. Is it only a Cache tool to improve the performance of data services, or can it be better integrated with MySQL database to become a more efficient and ideal data services layer?

If we only use Memcached to improve the performance of the system, as a Cache software, it is more necessary to use the application to maintain the data in Memcached and database synchronization update. Memcached at this point can be understood as a more front-end Cache layer than MySQL database.

If you use Memcached as a data Cache service for your application, you don’t have to do anything with the MySQL database. You just have to use the application itself to maintain and update the Cache. The greatest advantage of this approach is that the database architecture is completely untouched, but at the same time, it has the disadvantage that if more data objects are required in the Cache, the application will need to increase the amount of code, and the system complexity and maintenance cost will increase dramatically.

Here is a schematic of the architecture of Memcached as a simple Cache service layer.

As you can see from the figure, all data is written to the MySQL Master, including the INSERT of the first written data, as well as the UPDATE and DELETE of existing data. However, if you are using existing data, you need to DELETE the Memcached data at the same time as you UPDATE or DELETE the MySQL data to ensure the overall data consistency. All read requests are first sent to Memcached, and if they read data, they return directly. If they don’t read data, they read data with MySQL Slaves and write the read data to Memcached for Cache.

Generally speaking, this usage mode is more suitable for the environment that needs to cache fewer object types and large amount of data, and it is a fast and effective solution to the performance problem completely. Since this architecture doesn’t have much to do with the MySQL database itself, I won’t go into too many technical details here.

In addition to using Memcached as a quick productivity tool, we can actually use it to increase the scalability of our data services layer, by integrating it with our database, or by serving as a buffer for our database.

Let’s start by looking at how the Memcached and MySQL databases can be integrated into a single service. In general, there are two ways to integrate Memcached and MySQL databases into a single service. One is to directly use the memory capacity of Memcached as the secondary cache of MySQL database, so as to increase the cache size of MySQL Server. The other is to communicate with Memcached through MySQL UDF. Data in Memcached is maintained and updated, and the application reads the data directly through Memcached.

For the first approach, the business requirements are very specific, it is really difficult to sharde the data, and there are very difficult to adapt the application to take advantage of the Cache outside the database scenarios.

Of course, you wouldn’t be able to do that under normal circumstances, and very little has to be done right now, and Waffle Grid is the source project that we need.

Waffle Grid is a brainchild of several foreign DBAs out of their spare time: Since the low cost of PC Server is so attractive to us, and its Scale Up ability is difficult to make a big breakthrough, why not use the very popular Memcached as a single PC Server memory limit? With that in mind, a couple of guys started Waffle Grid, an open source project that took advantage of the fact that both MySQL and Memcached are open source, and combined with the simplicity of the Memcached communication protocol, Memcached has been successfully implemented as an external “second level cache” for MySQL host. Currently, it only supports Buffer Pool for InnoDB.

The implementation of Waffle Grid is not very complicated. What it does is that when InnoDB is in the Local Buffer Pool (let’s call it Local Buffer Pool), before reading data from the disk data file, First try to read the cached data (let’s call it the Remote Buffer) from Memcached via the Memcached communication API. Only if the desired data is not in the Remote Buffer either. InnoDB accesses disk files to read data. Also, only data in the LRU List in the InnoDB Buffer Pool is sent to the Remote Buffer Pool and InnoDB moves it to the Flush List as soon as it is modified. The Waffle Grid also purges data entering the FLUSH List from the Remote Buffer Pool. Therefore, there will never be Dirty Pages in the Remote Buffer Pool. This also ensures that there will be no data loss when the Remote Buffer Pool fails. The following diagram is a schematic of the architecture when using the Waffle Grid project:

As shown in the architecture diagram, we first applied the Waffle Grid Patch on the MySQL database side to communicate with other Memcached servers through the other connection. To ensure the performance of network communication, use a private network with the highest bandwidth possible between MySQL and Memcached.

In addition, the architecture diagram here does not distinguish between database Master and Slave any more, it does not mean that you cannot distinguish between database Master and Slave, it is just a schematic diagram. In practice, most of the time, you only need to apply the Waffle Grid on the Slave, the Master itself does not need such a large amount of memory.

Looking at the implementation principle of Waffle Grid, some readers may have some questions. Wouldn’t this directly affect the performance of all queries that need to generate physical reads? All operations to read the Remote Buffer need to be accessed over the network. Is the performance high enough? In this regard, I also use the author’s measured data of Waffle to address people’s doubts:

Based on the data of DBT2, I don’t think we need to worry too much about the performance. As to whether Waffle Grid is suitable for your application scenario, it is up to you readers to assess for yourself.

Now let’s introduce another way to integrate Memcached with MySQL. That is, you can write your own program to communicate with Memcached by using MySQL’s UDF function.

What makes Memcached different from Waffle Grid is that the data in Memcached is not maintained entirely by MySQL. Instead, the data is maintained by the application and MySQL together. Each time an application reads data from Memcached, if it cannot find the data it needs, it reads the data from the database again and writes the data it has read to Memcached. MySQL controls failover in Memcached every time something is updated or deleted from the database. MySQL, on the other hand, uses user-written UDFs to call Memcached’s API to notify Memcached that some data is invalid and delete the data.

Based on the above implementation principles, we can design a data service layer architecture as follows:

As shown in the figure, the biggest difference between this architecture and the normal Cache server above, where Memcached and MySQL are read completely off, is that the Memcached data is maintained and updated by the MySQL database, not by the application. First, the data is written to the MySQL database by the application. At this point, the relevant user-written UDF on MySQL will be triggered. Through this UDF, the relevant communication interface of Memcached will be called to write the data to Memcached. When the data in MySQL is updated or deleted, the relevant UDF in MySQL will also update or delete the data in Memcached. Of course, we could have MySQL do a little less, just delete Memcached data via a UDF when the data is updated or deleted, leaving the writing to the application just as in the previous architecture.

Because of Memcached’s object-based access to data and its ability to retrieve data through hashes, all data stored in Memcached needs to be identified by a Key that is used for all access operations. That is, if you cannot read a result set containing multiple data from one (or more) key conditions like MySQL’s Query statement, this only applies to a single data read from a unique key.

To be honest, the name database programming library is a bit awkward, but I can’t find any other name for Berkeley DB, so let’s just use the more common name on the Internet.

Memcached implements an in-memory Cache. If the performance requirements are not so high and the budget is not too large, we can also choose a database Cache software such as Berkeley DB. Many readers may wonder why we use a “database” like Berkeley DB when we use a MySQL database. In fact, Berkeley DB was one of the storage engines used by MySQL before, but later for unknown reasons (it was acquired because of commercial competition), it was removed from the storage engine supported by MySQL. The reason why we also use database Cache like Berkeley DB when using database is that we can give full play to their respective advantages. While using traditional universal database, we can also use the efficient key-value pair storage mode of Berkeley DB as the performance supplement of efficient data retrieval. To achieve better data service layer scalability and higher overall performance.

The architecture of Berkeley DB itself can be divided into five functional modules. The five modules are relatively independent in the whole system, and one (or several) modules can be set to be used or disabled, so it might be more appropriate to call them five subsystems. The five subsystems and their basic introduction are as follows:

The data access subsystem is mainly responsible for the primary and most basic work of data storage and retrieval. In addition, Berkeley DB supports the following four data storage results: Hash, B-Tree, Fixed Length and Dynamic Length. In fact, these four ways correspond to the four actual data file storage formats. The data storage subsystem can be used entirely on its own and is a subsystem that must be turned on. Transaction Management The transaction management subsystem provides complete ACID transaction properties for data processing services with transaction requirements. When turning on the transaction management subsystem, in addition to the basic data access subsystem, the lock management subsystem and the logging system should be turned on at least to help achieve transaction consistency and integrity. Lock management Lock management system is mainly in order to ensure the consistency of data and provide shared data control function. Support row – and page-level locking mechanisms, while providing services to the transaction management subsystem. The shared memory subsystem is used to manage and maintain shared caches and buffers and to provide data caching services for system performance. The logging system mainly serves the transaction management system. In order to ensure the consistency of transactions, Berkeley DB also adopts the strategy of writing log first and then writing data, which is generally used and closed at the same time with the transaction management system. Given the nature of Berkeley DB, it is difficult to integrate it with a MySQL database as tightly as you can with Memcached. The maintenance and update of data is mainly done by the application program. www.diuxie.com In general, the main reason to use Berkeley DB while using MySQL is to improve the performance and scalability of the system. Therefore, most of the time, the two structure of data storage format, Hash and B-tree, are mainly used, especially Hash format, is the most widely used, because this method is also the highest access efficiency.

In the application program, every mobile game data request, first through the pre-set Key to Berkeley DB fetch and search once, if there is data, then return the obtained data, if the bit retrieved data, then read again in the database. Then, according to the preset Key, the whole read data is stored in Berkeley DB, and then returned to the client. When data changes occur, the application must also delete data from Berkeley DB after modifying data in MySQL. Of course, you could modify the data in Berkeley DB directly if you wanted, but you could introduce more data consistency risk and increase the complexity of the system.

In principle, using Berkeley DB is not that different from using Memcached as a pure Cache. Why not use Memcached instead? There are two main reasons. Memcached uses pure memory to store data, while Berkeley DB uses physical disks. There is a big cost difference between the two. Another reason is that Berkeley DB can support data storage in addition to Memcached’s Hash storage format, and can also use other storage formats, such as B-tree.

Since it’s not too different from the basic principles of Memcached, I won’t draw any diagrams here.