Kafka for performance optimization

Original address https://gyl-coder.top/kafka/kafka-optimization/

Performance problems typically occur in three places:

network
disk
The complexity of the

Performance optimization in Kafka is mainly reflected in three aspects:

Producer
Consumer
Borker

Kafka as a distributed queue, network and disk optimization is the most important. Optimization methods in Kafka are as follows:

The compression
The cache
batch
concurrent
algorithm

Order to write

Generally, a disk I/O is completed through three steps: seek, rotation, and data transfer.

Factors affecting disk I/O performance occur in the above three steps, so the main time is:

Seek time: Tseek is the time required to move the read/write head to the correct track. The shorter the seek time, the faster the I/O operation. Currently, the average seek time of a disk ranges from 3 to 15ms.
Rotation delay: Trotation is the amount of time it takes for the disk rotation to move the sector of requested data below the read/write disk. The rotation delay depends on the disk speed and is usually expressed as 1/2 of the time it takes for the disk to rotate once. For example, the average rotation delay of a 7200rpm disk is about 4.17ms (60 x 1000/7200/2), and that of a 15000rpm disk is 2ms.
Data transfer time: Ttransfer refers to the time required to complete the transmission of the requested data. It depends on the data transfer rate and its value is equal to the data size divided by the data transfer rate. Currently IDE/ATA can reach 133MB/s, SATA II can reach 300MB/s interface data transmission rate, data transmission time is usually much less than the consumption of the first two parts. It can be ignored in simple calculation.

Therefore, the disk read/write performance can be greatly improved by eliminating seek and rotation when writing to the disk.

Kafka uses sequential file writes to improve disk write performance. Sequential file writing basically reduces the number of disk seek and rotation. The heads no longer dance around the track, but zip along.

Each Partition in Kafka is an ordered, immutable sequence of messages. New messages are constantly appended to the end of a Partition. Partition is just a logical concept in Kafka. Each Segment corresponds to a physical file. Kafka writes to the Segment sequentially.

Why can Kafka use appending?

In simple terms, Kafka data is stored in queues. Queues are FIFO first-in, first-out (FIFO) models that ensure that data is ordered (on the same partition). It is this immutability and orderliness that allows Kafka to append files.

Zero copy

The core idea of zero copy is to minimize the number of data copies to reduce the CPU cost of copy and the number of context switches between user mode and kernel mode to optimize data transmission performance.

For details about zero copy, see zero copy

PageCache

When producer sends a message to the Broker, the Broker uses a write() system call (corresponding to the Java NIO filechannel.write () API) to write data at an offset, which is written to the Page cache first. When the consumer consumes the message, the Broker uses the sendFile () system call (corresponding to the Filechannel.transferto () API), Zero-copy data is transferred from the Page cache to the broker’s Socket buffer and then over the network.

The synchronization between the leader and followers is the same as the process of consumer consumption data above.

The data in the page cache is written back to disk with the scheduling of the flusher thread in the kernel and calls to sync()/fsync(), so there is no fear of data loss even if the process crashes. In addition, if a consumer wants to consume a message that is not in the Page cache, it will read it from disk, and it will prefetch some adjacent blocks into the Page cache for the next read.

Therefore, if Kafka Producer’s production rate is similar to the consumer’s consumption rate, the entire production-consumption process can be completed almost exclusively by reading and writing to the broker Page cache, with very few disk accesses.

A network model

Kafka implements the network model itself to do RPC. The base layer is based on Java NIO and uses the same Reactor thread model as Netty.

This part hasn’t been explored yet…

Batch and compression

Kafka Producer does not send messages to brokers one by one. Producer has two important parameters: batch.size and Lingering.ms. These two parameters are related to the batch sending of the Producer.

On the producer end, the messages will be cached into the RecordAccumulator after passing through the interceptor, serializer and divider. The message accumulator is mainly used to cache the messages, and the Sender thread can send them in batches, thus reducing the resource consumption of network transmission and improving the performance.

Kafka supports multiple compression algorithms: LZ4, SNappy, and GZIP. Kafka 2.1.0 officially supports ZStandard — Facebook’s open source compression algorithm designed to provide extremely high compression ratios

The Producer, Broker, and Consumer use the same compression algorithm. The Producer writes data to the Broker, and the Consumer reads data to the Broker without decompressing. Finally, the Consumer Poll does not decompress until the message arrives, which saves a lot of network and disk overhead.

The partition of concurrent

Kafka’s topics can be divided into multiple partitions, with each Paritition acting like a queue, ensuring that data is ordered. The partition is actually the smallest unit for tuning Kafka parallelism, so it can be said that each additional Paritition increases the consumption concurrency.

Kafka has an excellent partition allocation algorithm called StickyAssignor, which ensures that partition allocation is as balanced as possible and that the result of each reassignment is as consistent as the last one. In this way, the partitions of the cluster are balanced so that the processing of brokers and consumers is not too skewed.

The more partitions, the better? not

More partitions require more file handles to be opened

In Kafka’s broker, each partition is referenced to a directory of the file system. In Kafka’s data log file directory, each log data segment is allocated two files, an index file and a data file. Therefore, as the number of partitions increases, the number of file handles required increases sharply. You need to adjust the number of file handles allowed by the operating system if necessary.

The more memory the client/server side needs to use

The client producer has a parameter, batch.size, which is 16KB by default. It caches messages for each partition and, once full, packs them up and sends them out in bulk. It looks like a performance enhancing design. Obviously, because this parameter is partition-level, the more partitions there are, the more memory required for this part of the cache.

Reduce high availability

The more partitions there are, the more partitions are allocated to each Broker. When a Broker goes down, the recovery time is very long.

Efficient file data structures

Kafka messages are grouped by Topic, which are independent of each other. Each Topic can be divided into one or more partitions. Each partition has a log file for recording message data.

Kafka each partition log is physically divided into multiple segments by size.

Segment File Composition: Each LogSegment corresponds to one log file and two index files on the disk.index,.timeindex 、.logRepresent segment index file and data file respectively.
The name of each subsequent segment is the offset value of the last message in the preceding segment. Values are up to 64 bits long, 19 digits long, and no digits are padded with zeros.

Index files in Kafka index messages with sparse indexes. It does not guarantee that each message has an index entry in the index file. Each time a certain amount of messages is written (specified by the broker parameter log.index.interval.bytes, default 4096, or 4KB), an offset index entry is added to the offset index file and a timestamp index entry is added to the timestamp index file. Increasing or decreasing the value of log.index.interval.bytes increases or decreases the density of index entries.

Sparse indexes map index files to memory in the form of MMAP. In this way, the operation of index does not need to operate disk I/O, which speeds up the index query speed. The Java implementation of Mmap corresponds to MappedByteBuffer.

Mmap is a method of memory-mapping files. A file or other object is mapped to the address space of a process to realize the mapping between the file disk address and a segment of virtual address in the process virtual address space. After such mapping is achieved, the process can use Pointers to read and write the memory, and the system will automatically write back dirty pages to the corresponding file disk, that is, the operation on the file is completed without calling system call functions such as read and write. Conversely, changes made by the kernel space to this area directly reflect user space, allowing file sharing between different processes. For details about MMAP, see System Call MMAP

Offsets in an index file are monotonically increasing. When a query is made for a specified offset, the binary lookup method is used to quickly locate the offset. If the specified offset is not in the index file, the maximum offset less than the specified offset is returned.

The timestamp in the timestamp index file also keeps strictly monotonically increasing. When querying the specified timestamp, the binary search method is used to find the maximum offset not greater than the timestamp. In order to find the corresponding physical file location, we need to locate the timestamp again according to the offset index file.

Sparse indexing is a compromise between disk space, memory space, search time, and so on.

Use dichotomy to find.log and.index for segment less than offset
Subtract the offset in the filename from the destination offset to get the offset of the message in this segment.
Use dichotomy again to find the corresponding index in the index file.
Go to the log file and search sequentially until the message corresponding to offset is found.

Reference article:

Mp.weixin.qq.com/s/9i4lDtWqO…

Order to write

Zero copy

PageCache

A network model

Batch and compression

The partition of concurrent

Efficient file data structures

Related Posts

40 maps to understand TCP and UDP

Flask Tutorial (19)- Flask-factory functions with Gunicorn

To support Dubbo’s ecological development, Alibaba launched a new open source project Nacos