This article is from: Le Byte

The article mainly explains: Kafka architecture design

For more JAVA related knowledge can pay attention to the public number “Le Byte” send: 999


At the request of most of my friends, a little kafka episode before Yarn is a lighthearted experience.

1. Kafka Foundation

The role of the messaging system

It should be clear to most of the partners, with oil packing for example

So the message system is what we call a repository, which can act as a cache in the intermediate process and realize the function of decoupling.

Let’s introduce a scenario. We know that the log processing of China Mobile, China Unicom and China Telecom is outsourced to do big data analysis. Now, suppose their logs are all handed over to your system to do user portrait analysis.

According to the function of the message system mentioned just now, we know that the message system is actually a mock cache, and it only acts as a cache but not a real cache. The data is still stored on disk instead of in memory.

1. The Topic subject

Kafka learned about database design, where he designed topics, which are similar to tables in a relational database

At this time, I need to obtain the data of China Mobile, so I can directly monitor Topica

2. The Partition Partition

Kafka also has a concept called Partition (Partition). The specific representation of Partition on the server is initially a directory, under a theme there are several partitions, these partitions will be stored on different servers, or in other words, on different hosts set up different directories. The main information for these partitions is stored in the.log file. Similar to the partition in the database, to improve performance.

As for the performance improvement, it is quite simple that multiple partitions, multiple threads, and parallel processing of multiple threads is definitely better than a single thread

Topic and partition are like the concepts of table and region in Hbase. Table is just a logical concept, and what really stores data is region. These regions will be distributed on each server, which is the same with Kafka. Topic is also a logical concept, while Partition is a distributed storage unit. This design is the basis of massive data processing. For comparison, if HDFS did not have blocks, a single 100T file could only be placed on a single server, which would take up the entire server. With the introduction of blocks, large files could be distributed on different servers.

Note: 1. Partitions will have a single point of failure problem, so we will set the number of copies for each partition

2. Partitions are numbered starting from 0


It is the producer that sends the data into the messaging system

4. Consumers

It’s the consumer that gets the data from Kafka

5. The Message – the Message

The data that we’re dealing with in Kafka is called a message

Second, the cluster architecture of Kafka

Create a Topica subject with three partitions stored on different servers, under the broker. Topic is a logical concept, and it is not possible to draw the related units of Topic directly in the diagram

Note: Kafka does not have replicas prior to 0.8, so it can lose data in the event of a server outage, so try to avoid using Kafka before this release

A copy of the up –

Partitions in Kafka can have multiple copies of each partition in order to ensure data security.

At this point, we set three copies for partition 0,1, and 2 (two copies would be more appropriate).

In fact, each copy is divided into roles. One copy will be selected as the leader and the others as the followers. When our producer sends data, it will directly send it to the leader partition. The follower partition then synchronizes the data itself with the leader. When consumers consume data, they also consume data from the leader.

Consumer Group — a Group of consumers

When we consume data, we will specify a in the code. This id represents the name of the consumption group, and the system will set this by default even if it is not set

Some of the messaging systems we are familiar with are generally designed in such a way that as long as one consumer consumes data in the messaging system, all other consumers cannot consume the data. But Kafka is not like that. For example, now Consumera has consumed the data in Topica.

If consumerB also consumes Topica’s data, it will not be able to consume it. However, we designate another in consumerC, so that consumerC can consume Topica’s data. On the other hand, consumerD can not be consumed, so in Kafka, different groups can have a single consumer to consume the data of the same topic.

Therefore, consumer group exists in which multiple consumers consume information in parallel, and they will not consume the same message. As follows, consumerA, B and C will not interfere with each other

As mentioned in the figure above, consumers will directly establish contact with leaders, so they consume three leaders respectively. Therefore, a partition will not allow multiple consumers in the consumer group to consume. However, in the case of consumer saturation, a consumer can consume data from multiple partitions.


I know a rule: 95% of big data distributed file systems are master-slave architectures, a few are equality architectures like ElasticSearch.

Kafka is also a master-slave architecture. The master node is called controller and the rest are slave nodes. The controller needs to cooperate with ZooKeeper to manage the entire Kafka cluster.

How do Kafka and ZooKeeper work together

Kafka relies heavily on the ZooKeeper cluster (so the previous ZooKeeper article is somewhat useful). All brokers register with ZooKeeper when they start up. The purpose is to elect a controller. This process is simple and straightforward, it is a who-first process with no algorithm problems.

ZooKeeper is a ZooKeeper controller that listens to multiple directories in ZooKeeper. For example, there is a directory called /brokers/ and other affiliates register themselves in this directory. In this case, the naming rules are usually their ID numbers. Such as/brokers / 0

When registering each node is bound to expose their own host name, port number, and so on information, at this time the controller is registered to read up from the node’s data (through the surveillance mechanism), to generate cluster metadata information, after the information were distributed to other servers, let the other server can perceive to other members of the cluster.

In this scenario, we create a theme (which is a directory like ZooKeeper/Topics/Topica), Kafka will generate the partition scheme in this directory, and the controller will listen for this change and synchronize the meta information of this directory. It is then delegated to its slave nodes to make the whole cluster aware of the partitioning scheme in this way. At this point, the slave nodes create their own directories and wait for the partition copy to be created. This is also the management mechanism for the entire cluster.

Extra time

1. What’s good about Kafka?

1) order to write

Each time an operating system reads or writes data from a disk, it needs to address — that is, find the physical location of the data on the disk before it can read or write the data. In the case of a mechanical hard disk, addressing takes longer. In Kafka’s design, the data is actually stored on disk. Generally, performance is best when the data is stored in memory. However, Kafka uses sequential write, and the appended data is appended to the end. The performance of sequential write on disk is extremely high. In the case of a certain number of disks and a certain number of revolutions, it is basically the same as the memory speed

Random writing, where you modify data somewhere in the file, results in lower performance.

(2) zero copy

Let’s look at the non-zero copy case first

It can be seen that the data copy is copied from the memory to the Kafka server process and then to the socket cache. The whole process consumes a lot of time. Kafka uses Linux’s Sendfile technology (NIO) to eliminate the need for process switching and a data copy, which makes the performance better.

2. Log segmented storage

Kafka has set a maximum of 1 gigabyte of.log files in a partition. The purpose of this restriction is to load the.log files into memory for easy operation

Numbers such as 9936472 represent the initial offset of the log segment file, indicating that the partition must have written at least 10 million pieces of data. The Kafka Broker has a parameter, log.segment. Bytes, which limits the size of each log segment to 1GB. When a log segment is full, it automatically opens a new log segment to write to. This process is called log rolling, and the log segment being written is called an active log segment.

If you’ve looked at the previous two HDFS articles, you’ll see that NameNode’s edits log also has limitations, so these frameworks take these into account.

3.Kafka’s network design

The network design of Kafka is related to the tuning of Kafka, which is why it supports high concurrency

Acceptor requests are sent by clients to an Acceptor broker. Acceptor requests are sent to an Acceptor broker by three threads (the default is three). Each of the three threads is called the processor. SocketChannels are directly packaged and sent to each of the processors to form a queue. This is done by polling, sending to the first processor, then to the second, then to the third, and then back to the first. When the consumer thread consumes these SocketChannels, it gets individual requests, which are accompanied by data.

There are 8 threads in the thread pool by default. These threads are used to process the request, parse the request, and if the request is a write request, write it to disk. Returns the result if read. The processor reads the response data from the response and then returns it to the client. This is Kafka’s three-tier network architecture.

So if we need to add more emphasis to Kafka, add processors and add more processing threads in the thread pool, we can achieve the effect. In fact, the part of request and response plays a caching effect, considering that the processors generate requests too fast and the number of threads is not enough to deal with them in a timely manner.

So this is a reinforced reactor network threading model.

Thank you for your recognition and support, Xiaobian will continue to forward “Le Byte” quality articles