Copyright Notice: This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. Copyright notice: No reprint, welcome to learn. QQ email address: [email protected], if there is any business exchange, can contact at any time.

1 Kafka message format changes

The Kafka message format has also gone through three versions, starting with version 0.8x and now with version 1.1.x. Each release change heralds a new optimization. The Broker, as a Kafka service carrier, responds to and receives message protocols.

  • Persist messages.
  • Transitions messages from the sender to the consumer.

2 JVM message rearrangement mechanism (Heavy body of Java objects)

  • The Java memory model is expensive to store objects, and may even require twice as much space as messages to store data. To reduce this overhead, the JMM (Java Memory Model) rearranges fields for user-defined classes.
  • Garbage collection can be an overall drag on application throughput as data on the heap expands.
  • The JMM requires that the object be aligned to 8 bytes, and the unaligned parts are filled with whitespace characters for padding.

  • Alignment fill calculation method: HotSpot alignment is 8 bytes, not enough Padding Padding, formula:

    (object header + instance data + padding) % 8 == 0 (0<= padding <8)Copy the code
  • For a Java object, you need at least a 16-byte object header (for 64-bit JVM objects, it typically consists of 8-byte Word).

Kafka lightweight object storage

  • Kafka uses Java NIO’s ByteBuffer to store messages and relies on page caching provided by the file system rather than Java’s heap cache. Paradox: When writing to a file system, if Java’s heap cache holds one object, then the page cache holds another. Why?
  • ByteBuffer is a compact binary byte structure that does not require padding, thus saving a lot of unnecessary memory overhead.
  • On a 64GB machine, Kafka can use between 58-62GB of memory without worrying about Java GC.
  • ByteBuffer can save a lot of space compared to Java’s heap-caching scheme.

4 V0 (message ancestor) => 14 bytes +12 (LOG_OVERHEAD)

  • Version: V0 magic=0, V1 magic=1,V2 magic=2

  • Property: Message compression type. Currently, only three compression methods are supported.

    0X00 Compression is Not enabled 0x01 GZIP 0x02 Snappy 0x03 LZ4Copy the code
  • Note that the key length field and the value length field are fixed and take 4 bytes to store -1

  • In addition to the key and value values, they are collectively referred to as headers, which occupy a total of 14 bytes.

    If the Key value is Key and value is value (one character is one byte, a total of eight bytes), header 14 bytes + value 8 bytes = 22. If the Key is empty, 19 bytes are occupiedCopy the code
  • LOG_OVERHEAD: Each Record (v0 and V1) must have an offset and message size. Each message has an offset to indicate its offset in the partition. The offset is a logical value, not the actual physical offset. Message size indicates the size of the message, which is called LOG_OVERHEAD and is fixed at 12B

4.1 V0 Set (replaced by V2Batch)

Summary: A message must contain both LOG_OVERHEAD and the message body. The minimum usage is 12B+14B=26B, excluding key and Value values.

If key =key, value=value takes 26(pure format)+8 (value space) =34B

Create topic msG_format_v0 (key= "key", value= "value"), msg_format_v0 (key= "key", value= "value") -rw-r--r-- 1 root root 34 Apr 26 02:52 00000000000000000000.log Insert key= NULL, value= "value" message again:  -rw-r--r-- 1 root root 65 Apr 26 02:56 00000000000000000000.logCopy the code

Bottom line: Each message must carry 12 bytes LOG_OVERHEAD, which is a scattered message format design that does not reflect the smell of a collection.

5 V1 => 22 bytes +12 (LOG_OVERHEAD)

  • Kafka uses the v1 message format from version 0.10.0 to version 0.11.0, which has a timestamp field indicating the timestamp of the message

  • Therefore, if you send a message with key= “key” and value= “value” as described in V0, this message will occupy 42B in V1

For example, if the first key= “key” and value= “value” message is sent, 22+12+8=42B is used. If the second key= NULL and value= “value” message is sent, 12+22+5=39B is used

Together, 42+39=81B

5.1 V1 Set (replaced by V2Batch)

6 V2(variable length integer and ZigZag) => 7 bytes + value Key + value value =15 bytes

  • Kafka has been using version v2 since version 0.11.0. This version of the message format is significantly different from v0 and V1, and also references the Protocol Buffer by introducing Varints and ZigZag encoding.

  • Varints is a way to serialize integers using one or more bytes, and the smaller the value, the fewer bytes it takes. ZigZag codes bounce back and forth between positive and negative integers in a zig-zags pattern to map signed integers to unsigned integers, so that negative numbers with smaller absolute values still enjoy smaller Varints values, such as -1 encoded as 1, 1 as 2, and -2 as 3.

  • Zig-zags invariably reserve the first byte of each byte for special use to indicate whether the byte is the last. If the highest bit is 1, the encoding is not finished. So only 7 bits are actually used for the actual encoding, 0-127. Another way to think about it is -1, 1, -2, 2 corresponds to 1, 3, 4. Therefore, the numbers between 0 and 63 account for 1 byte, the numbers between 64 and 8191 account for 2 bytes, and the numbers between 8192-1048575 account for 3 bytes. The default size of message.max.bytes for the Kafka Broker configuration is 1000012 (Varints encoding is 3 bytes).

  • Note that Varints does not always save space, an int32 can take up to 5 bytes (larger than the default 4 bytes), and an int64 can take up to 10 bytes (larger than the default 8 bytes).

Summary: The V2 message format removes the CRC field, adds length, timestamp delta, offset delta, and HEADERS information, and deprecates attributes.

6.1 V2 Record Batch => 61 bytes +7 bytes (pure format)+ Value key+ value value =76 bytes

The v2 version has a complete change for the RecordBatch, which takes up 61 bytes in total. For example, CRC is added to the Batch layer, idempotency is introduced, and PID identification is used. Epoch is introduced, identifying the current version. It seems to increase the size of the message, but in terms of large messages, it makes a qualitative leap, because a pure message format takes up only 7 bytes, compared to V1 which takes up 22 bytes and V0 which takes up 14 bytes.

First offset: start offset of the current RecordBatch. Length: Calculates the length between the partition leader epoch and headers. Partition leader epoch: used to ensure data reliability. For details, refer to KIP-101 MAGIC: version number of message format. For v2 version, magic equals 2. Attributes: Message attributes, note that this takes up two bytes. The lower three bits represent the compression format, which can be referred to as V0 and V1. Bit 4 indicates the timestamp type; The fifth bit indicates whether the RecordBatch is in a transaction, with 0 indicating non-transaction and 1 indicating transaction. Bit 6 indicates whether it is a Control message, 0 indicates a non-control message, and 1 indicates a Control message, which is used to support transaction functionality. Last offset delta: The difference between the offset and the first offset of the last Record in RecordBatch. It is primarily used by the broker to confirm the correctness of the assembly of Records in RecordBatch. First timestamp: timestamp of the first Record in RecordBatch. Max timestamp: The maximum timestamp in the RecordBatch, usually the timestamp of the last Record, used in the same way as the last offset delta to ensure correct message assembly. Producer ID: used to support idempotence. See KIP-98 for details. Producer epoch: Like producer ids, used to support idempotency. First sequence: Used to support idempotency, like producer ID and producer epoch. Records Count: Number of records in RecordBatch.Copy the code

7 sublimation

Disadvantages of V1 version:

  • The space utilization is low. The key and value length each occupy 4 bytes, which is wasteful.
  • Only the latest displacement is retained. As a result, if the first message is obtained, the first data needs to be traversed after compression and decompression.
  • Each message has a CRC check.
  • There is no concept of message length.

V2 New architecture advantages:

  • Increases the total message length
  • Keep the timestamp and use only 1 byte to save, for example: 10 messages, V2 requires 100 bytes. V1 needs 800 bytes. – Save the displacement increment.
  • The CRC check is deleted.
  • Unsubscribe attribute and move to the outer directory.

conclusion

This technical column is very difficult to write and must be verified by a large amount of data and experiments. Hard written, their cherish.

Qin Kaixin in Shenzhen