[Kafka core principle analysis] tuning strategy analysis
Kafka itself does not provide a good graphical monitoring system, but there are a number of third-party kafka monitoring tools that do a good job:
- Kafka Monitor
- Kafka Offset Monitor
- Kafka Eagle
In the usual development, developers using Kafka to send data has been very familiar, but in the process of use, many developers have not in-depth exploration of kafka in the process of using the parameter configuration, the loss is not fully play the advantages of KFKA, can not meet the business scenario.
Producer configuration and specification
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("buffer.memory", 67108864); props.put("batch.size", 131072); props.put("linger.ms", 100); props.put("max.request.size", 10485760); props.put("acks", "1"); props.put("retries", 10); props.put("retry.backoff.ms", 500); KafkaProducerString, String producer = new KafkaProducerString, String (props); Copy the code
A Kafka client sends data to the server. It is usually buffered. When you send messages using KafkaProducer, the messages are first sent to the client's local buffer, and then collected into batches and sent to the Broker. So this "buffer.memory" is essentially a constraint on the size of the buffer KafkaProducer can use, and its default value is 32MB. Now that you know what this means, think about how this parameter should be set on a production project.
If the memory buffer is set too small, what might happen? First of all, it should be made clear that a large number of messages are cached in the memory buffer, forming a Batch, and each Batch contains multiple messages. The KafkaProducer Sender thread then packs batches into a single Request and sends them to the Kafka server.
If the memory setting is too small, it can cause a problem where messages are quickly written to the memory buffer, but the Sender thread doesn't have time to send the Request to the Kafka server. Does this cause memory buffers to fill up quickly? Once full, it blocks the user thread from writing messages to Kafka. Therefore, the "buffer.memory" parameter should be measured according to your own situation. You need to calculate how many messages per second your user threads will write to the buffer in a production environment. If you say 300 messages per second, then you need to test, assuming that the memory buffer is 32MB, write 300 messages per second to the memory buffer, will the memory buffer always fill up? After this pressure test, you can debug to a reasonable memory size.
Batch. size specifies the batch data volume. The default value is 16KB. For example, if the sending frequency is 300 messages per second, if the value of batch.size is adjusted to 32KB or 64KB, can the overall throughput of sending messages be improved? Theoretically, increasing the batch size allows more data to be cached in the batch, so that more data can be sent out at a time, which might improve throughput. However, it cannot be infinite. If it is too large, the data buffer will be sent out in Batch, and the delay of sending messages will be very high.
For example, a message enters Batch, but waits 5 seconds for the Batch to fill up to 64KB before it is sent. The delay for this message is five seconds. Therefore, it is necessary to adjust different Batch sizes according to the message sending rate of the production environment to measure the final outgoing throughput and message delay, and set the most reasonable parameter.
If a Batch cannot be filled up, another parameter "lingering.ms" needs to be introduced at this time. It means that after the Batch is created, it must be sent regardless of whether the Batch is full or not.
For example, a batch. Size of 16KB is now sending messages slowly during a low peak period. This leads to the possibility that after Batch is created, messages come in one after another, but it is slow to gather up 16KB. Do you have to wait all the time at this time? If you now set "linger.ms" to 50ms, the Batch will be sent as long as it has passed 50ms since it was created, even if it has not reached 16KB. Therefore, "lingering. ms" determines that once your message is written into a Batch, the maximum waiting time is this much, and it must be sent out along with the Batch. Avoid a Batch of messages that cannot be sent out due to the delay of the Batch.
To be set in conjunction with batch.size. For example, assuming that a Batch is 32KB, we need to estimate how long it usually takes to gather a Batch under normal circumstances, for example, it may be 20ms to gather a Batch. Then Linger. ms can be set to 25ms, that is to say, most of the Batch will be filled up in 20ms, but your Linger. ms can guarantee that even in the low peak period, if there is not enough Batch in 20ms, the Batch will be forced to be sent out after 25ms.
If you set Linger. Ms too small, for example, the default is 0ms, or you set 5ms, it may cause that although your Batch is set to 32KB, it often fails to gather 32KB of data. After 5ms, you directly force the Batch to be sent, which will lead to your Batch being meaningless. Couldn't get the numbers together.
Max request size: max-.request.size. This parameter determines the maximum number of requests sent to the Kafka server at a time. It also limits the maximum number of messages that you can send to the Kafka server. For example, the messages that are sent are large message messages. Each message is a lot of data, and a message may be 20KB. Do you need to adjust batch.size to be larger?
Like 512KB? Then do you need to increase buffer.memory? Set up 128 MB? Only in this way can you use Batch to package multiple messages in large message scenarios. In this case, you can increase the value of max-request. size, for example, to 5MB.
Retries and retries. Backoff. Ms
Retries and retries.backoff.ms determine the retry mechanism, which means that if a request fails, it can be retried several times, each time
How many milliseconds is the interval between.
Confirmation mechanism: acks
This configuration is the confirmation value indicating when an Produce request is considered complete. In particular, how many other brokers must have been submitted
Data to their log and confirm this information to their leader. Typical values include:
0: indicates that the producer never waits for confirmation from the broker. This option provides the lowest latency but the highest risk
Data will be lost when the server goes down.
1: indicates the confirmation that the leader Replica has received data. This option has a low latency and ensures that the server acknowledges receipt
-1: Producer receives confirmation that all synchronous Replicas have received data. Simultaneous delay maximizes, however, this approach is not complete
Eliminate the risk of message loss because the number of synchronous Replicas can be 1. If you want to ensure that some replicas receive data, you should set min.insync.replicas in topic-level Settings.
This configuration specifies the minimum number of replica replies that can be successfully written when the producer sets the reply to "all"(or "-1"). If it doesn't satisfy this minimum
Number, the producers will throw an exception (NotEnoughReplicas or NotEnoughReplicasAfterAppend) min. Insync. Replicas and acks force greater durability. The typical scenario is to create a topic with a replica of 3, set the min.insync.replicas to 2, and set the acks to "all". If most copies do not receive writes, this ensures that the producer throws an exception.
Consumer side configuration and description
The minimum number of bytes that the server should return each time a fetch request is made. If not enough data is returned, the request waits until enough data is returned.
If true, the offset of the message fetched by the consumer will be automatically synchronized to the broker. The submitted offset will be used by the new consumer if the process dies.
To learn more
Cloud Wisdom is an open source integrated Operation Management Platform (OMP) that integrates lightweight, converged, and intelligent Operation and maintenance. It provides functions such as Management, deployment, monitoring, inspection, self-healing, backup, and recovery, providing convenient Operation and maintenance capabilities and service Management for users. In addition to improving the efficiency of operation and maintenance personnel, it greatly improves business continuity and security. Click on the link below to like OMP and send it to star to learn more
GitHub address: github.com/CloudWise-O...
Gitee address: gitee.com/CloudWise/O...