If the profile

Each topic in Kafka can create multiple partitions. The number of partitions is unlimited and is not restricted by the number of brokers, so the number of partitions can be set as much as one wants. So there are tradeoffs to consider in determining the number of partitions.

More partitions provide higher throughput

  • “A single partition is the smallest unit of parallelism in Kafka.” Each partition can receive push messages and be consumed by consumers independently. It is a subchannel of a topic. The relationship between a partition and a topic is like the relationship between a freeway lane and a freeway lane. The difference in Kafka is that there is no such thing as lane changes, and you need to choose the same lane at the entrance.

  • The throughput of Kafka is obvious, with enough resources, the more partitions, the faster.

  • To explain, if I have a partition with a maximum transfer speed of P, and there are three brokers in the Kafka cluster, each broker has enough resources to support the maximum transfer speed of three partitions, then my cluster has a maximum transfer speed of 33p=9p.

If you increase the number of partitions to 18 without increasing resources, each partition can only transfer data at P /2, so the transfer speed limit remains at 9p and cannot be increased, so the throughput design needs to consider the broker’s resource limit.

Kafka, like any other cluster, scales horizontally, adding three brokers of the same resource to achieve 18p speeds.

More partitions require more file handles to be opened

In Kafka’s broker, each partition is linked to a directory on the file system.

In Kafka’s data log file directory, each log data segment is allocated two files, an index file and a data file. Therefore, as the number of partitions increases, the required number of file handles increases sharply. If necessary, adjust the number of open file handles allowed by the operating system.

More partitions cause end-to-end latency

  • The delay in A Kafka is the time required for a producer to publish a message to a consumer to consume a message. The delay is the time that a consumer takes to receive a message minus the time that the producer takes to publish a message.

  • The bottleneck comes from the fact that Kafka is not exposed to consumers until the message has been properly received, that is, after the in-sync replica has been successfully replicated.

  • If the number of copies on the leader broker is n and the synchronization time is 1ms, the in-sync operation takes n * 1ms to complete. If the number of copies on the leader broker is 10000, the in-sync operation takes n * 1ms to complete. It takes 10 seconds for the synchronization state to return and the data to be exposed to the consumer, resulting in a large end-to-end delay.

More partitions means more memory is needed

  • In the new version of Kafka can support batch commit and batch consumption, and set batch commit and batch consumption, each partition will require a certain amount of memory space.
    • If the partition size is 100, each producer and consumer requires 10 MB of memory. When the partition size is 100000, both the producer and consumer end require 10 GB of memory.
    • An unlimited number of partitions can quickly take up a lot of memory, creating a performance bottleneck.

More partitions result in a longer recovery period

  • “Kafka achieves high availability and stability in Kafka through multi-copy replication technology.” Each partition has multiple copies on multiple brokers. One of the copies is the leader and the others are followers.

  • “When one of the brokers in the Kafka cluster fails, the leader on that broker needs to select a new copy on the other brokers to start as the leader.” This process is done by the Kafka Controller. The metadata information of the affected partition is read and modified from Zookeeper.

  • Normally, when a broker is scheduled to go down, the partition leaders on the broker will be removed one by one in order before the broker is shut down. If it takes 1ms to remove one partition leader and 10ms to remove 10 partition leaders, this will have little impact. And when one leader is moved, the other nine leaders are available. Therefore, each partition leader is actually unavailable for 1ms. But in the case of an outage, all 10 partitions

  • Leaders become unavailable at the same time and need to be removed in sequence. The longest leader needs to be unavailable for a window of 10ms, with an average window of 5.5ms. Suppose there are 10000 leaders on the down broker, the average unavailability window is 5.5s.

  • In the more extreme case, the broker is the node on which the Kafka controller is located and waits for the new Kafka leader node to be voted on and enabled. After that, the newly started Kafka leader needs to read the metadata information of each partition from ZooKeeper to initialize the data. Until then, the partition leader migration has been in the waiting state.

conclusion

In general, more partitions result in higher throughput, but there is also a small but significant performance cost and potential risk to broker nodes. Therefore, you need to set the number of partitions and replica according to the actual situation of broker nodes.

I cluster deployment in virtual machine, for example, 12 core CPU, can in kafka/config/sever. The properties in the configuration file, set the default partition 12, create topic every time is after 12 partitions.