Advanced RocketMQ principle of message storage

To protect the coral from the pounding waves? That would ruin their beauty

“This is the 11th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

File directory structure

The messages stored in MQ are stored in the local system files under the ~/store folder

1. abort

This file is created automatically after the Broker is started, and disappears when the Broker is shut down. If this file exists without starting the Broker, the previous shutdown of the Broker was abnormal.

2. checkpoint

It stores the last flush timestamp of commitlog, ConsumeQueue, and index files

3. commitlog

There are commitlog files, where messages are written

4. config

Holds some configuration data for the running of the Broker

5. consumequeue

This is where the consumeQueue file is stored, and the queue is stored in this directory

6. index

It holds the message indexFile indexFile

7. lock

Global resource locks used during runtime

Commitlog file

In many materials, files in the Commitlog directory are simply called commitlog files. But in the source code, the file is named mappedFile

The commitlog directory has manymappedFileFile,All messages in the current Broker are dropped into these mappedFile files. The size of the mappedFile file is 1 gb (less than or equal to 1 GB). The file name is a 20-digit decimal number that indicates the start offset of the first message in the current file.





The first filename must be a 20-bit 0. Because the first file has a commitlog offset of 0 for the first message, when the first file is full, a second file is automatically generated to continue the message. Assume that the size of the first file is 1073741824 bytes (1G = 1073741824 bytes), then the second file name is 00000000001073741824 with the start offset of 1073741824, and so on. The NTH file name should be the sum of the sizes of the first N-1 files. Commitlog offsets for all Mappedfiles in a Broker are consecutive.Messages are written primarily sequentially to the log file, and when the file is full, to the next file;

Pay attention to

A Broker contains only one commitlog directory, where all mappedfiles are stored. That is, no matter how many Topic messages are currently stored in the Broker, they are sequentially written to the mappedFile file. That is, the messages are stored in the Broker without being sorted by Topic, because they are read and written sequentially and are accessed efficiently without inquiry time

Message unit entity





The mappedFile file consists of message units. Each message unit contains the total length of the message MsgLen, physical location of the message physicalOffset, message Body content Body length BodyLength, message Topic, Topic length TopicLength, message producer BornHost and message sending timestamp B There are more than 20 message-related attributes, such as ornTimestamp, Queue QueueId, and Queue QueueOffset

The commitlog stores Queue information, and we’ll look at the relationship between the two later

consumequeue

Purpose of existence

Message consumption queues are introduced to improve the performance of message consumption. Since RocketMQ is a topic-based subscription model, message consumption is performed on a topic, and it is inefficient to retrieve messages by topic through commitlog files.The Consumer can then look for messages to consume based on the ConsumeQueue.The ConsumeQueue (logical consumption queue) serves as an index to the consumption messages and holds queue messages for a given Topic in the CommitLogStart Physical offset offset.Message sizeandHashCode of the message TagValue.

Consumequeue files can be regarded as commitlog files based on topics. Therefore, the consumeQueue folder is organized as follows: Topic /queue/file Three-layer organization structure, the specific storage path is: $HOME/store/consumequeue / {topic} / {queueId} / {fileName}.



Currently, you can only see the TopicTest theme you created earlier and see what queues are in this theme. By default, four queues are created



There are four queues created by default: 0,1,2,3. What is in this queue



Similarly, the consumeQueue file adopts a fixed length design, and each entry has 20 bytes, respectivelyCommitlog physical offset of 8 bytes, message length of 4 bytes,8 bytes tag Hashcode, a single file consists of 30W items, and each item can be accessed randomly like an array. Each ConsumeQueue file is about 5.72M in size.



The process of writing a message

  1. The Broker obtains QueueOffset from the consumeQueue directory for the corresponding index entry of the message
  2. Encapsulate data such as queueId and queueOffset with messages as message units
  3. Write message units to commitlog
  4. At the same time, a message index entry is formed and distributed to the corresponding ConsumeQueue

The process of pulling messages

  1. The Consumer obtains the consumption offset of the Queue in which it wants to consume messages, and calculates the message offset of the message to consume. (Consumption offset is the consumption progress, the Consumer’s consumption offset of a Queue, The message offset = consumption offset + 1
  2. The Consumer sends a pull request to the Broker, which contains the Queue, message offset, and message Tag for the message it wants to pull
  3. Broker calculates queueOffset in this consumequeue (_queueOffset = _ message _offset * 20_ bytes)