Advanced RocketMQ principle key query

All who strive to improve themselves will eventually succeed — Goethe

“This is the 13th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

indexFile

In addition to message consumption via the usual specified Topic, RocketMQ provides the ability to query messages by key or time interval. This query is a quick query indexed by indexFile in the index subdirectory of the Store directory. Of course, the index data in this indexFile is written when the message containing the key is sent to the Broker. If the message does not contain a key, it will not be written. The fixed size of a single IndexFile is about 400M, and an IndexFile can hold 2000W indexes. The underlying storage of IndexFile is designed to implement HashMap structure in the file system. So rocketMQ’s index file is implemented as a hash index.

Storage location

$HOME\store\index${fileName}, the name is timestamp named

Index entry structure

Each Broker contains a set of IndexFiles, and each indexFile is named after a timestamp (the timestamp at which the indexFile was created). Each indexFile consists of three parts: indexHeader, slots, and indexes. Each indexFile contains 500W slot slots, and each slot contains 4 bytes. In addition, each slot may mount a large number of index units, each of which is 20 bytes

So the total size is: indexHeader size + (slots size)500W + (size of index unit) * 2000W = 40 + 4 * 500W + 202000w = 420000040 bytes = 400 MB

indexHeader

IndexHeader is a fixed 40 bytes containing the following data:

  1. BeginTimestamp: Storage time of the first message in the indexFile
  2. EndTimestamp: Time when the last message in the indexFile was stored
  3. BeginPhyoffset: The offset of the first message in the indexFile in the Commitlog offset
  4. EndPhyoffset: The offset of the last message in the indexFile in the Commitlog offset
  5. HashSlotCount: The number of slots that have been filled with index (not every slot has index units attached to it, but the number of slots that have index units attached to it)
  6. IndexCount: The number of index units in the indexFile (the total number of index units mounted to all slots in the current indexFile)

Slots and indexes relationship

The logical storage

The most complex aspect of indexFile is the relationship between Slots and Indexes. In the actual storage, the Indexes come after Slots, but for ease of understanding, their relationship is shown as follows:The hash value % 500W for the key is the slot value. The slot value is then changed to the indexNo of the index unit, from which the position of the index unit in the indexFile can be calculated. In order to solve this problem, preIndexNo was added to each index unit to specify the index unit that preceded the current index unit in the slot. A slot always stores the latest index unit indexNo. If the slot is found, the latest index unit can be found, and all previous index units can be found through this index unit.

IndexNo is a sequence number in an indexFile, ascending from 0. That is, all IndexNos in an indexFile are incremented by this. IndexNo is not represented in the index unit; instead, it is computed by indexes.Physical storage

Indexing unit

Index The index unit contains 20 bytes, which contains the following four attributes:

  1. KeyHash: Hash value of the business key specified in the message
  2. PhyOffset: The offset of the message corresponding to the current key in the Commitlog offset
  3. TimeDiff: Indicates the difference between the time when the message corresponding to the current key is stored and the time when the current indexFile is created
  4. PreIndexNo: Indicates the indexNo of the index unit that preceded the current index unit in the current slot

Query messages by Message Key

Querying messages by Message Key is based on RocketMQ’s IndexFile IndexFile. RocketMQ’s index file logical structure is similar to the implementation of HashMap in the JDK. The specific structure of the index file is as follows:IndexFile provides Message index query by Message Key. An IndexFile is stored in the following locations: $HOME\store\index${fileName} fileName is named after the timestamp when it was created. The file size is fixed, equal to 40+500W4+2000W20= 420000040 bytes. If the UNIQ_KEY property is set in the message properties, write to the value of topic + “#” + UNIQ_KEY as the key. If the message is set to KEYS (multiple KEYS separated by Spaces), topic + “#” + KEY will also be indexed.