Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.

By the end of this article, you will understand why a simple message queue can have so much knowledge; Be able to understand Kafka’s main functions and application scenarios; Be able to understand the main technical terms of Kafka. Learn what duty is!

As a distributed messaging system, Kafka has a point of view. It has to figure out what it is, who it is creating what value for, who it depends on, and what its responsibilities are.

Few systems have remained calm in the face of such an oppressive barrage, but Kafka stood up and was a true warrior.

At its core, Kafka acts as a message queue. So what is a message queue? If this question is not understood, then Kafka’s consciousness is not very high, and needs to continue to think, to further study.

To find out, we spoke to a milkman.

1. The Milkman’s Story

Milk is tasty and nutritious, whether it’s fresh milk or synthetic milk, so there are a lot of people in the community who order it.

Every morning, the milkman began to deliver milk with a cart full of milk. At first, he knocked on the door of each house according to the number in the book, and then pushed milk into the hands of customers. Sometimes, when customers are not at home, he has to find their phone numbers to communicate. But after a while, as the business grew bigger and bigger, the milkman had only one comment on the job: * Thankless.

Some customers sleepily open the door, complain that he disturb life; Some female customers wearing pajamas to pick up milk, complain that his eyes are wretched; Some customers go to work early, but on the route planning of the milkman, it is the last milk delivered, so they complain that his delivery is not timely.

Fortunately, the milkman was a former programmer. After a moment’s thought, he persuaded the boss to give each customer a milk crate. His job is to put fresh milk into the box regularly. He didn’t care when the client took it, washed his face or rubbed his hands.

From then on, he never saw the body looming under his pajamas again.

We noticed that in the scenario above, there were two main participants: the milkman and the customer. Before joining the milk tank, their interaction was blocked, the information processing was inefficient, and there were serious coupling problems that made the milkman look at things he shouldn’t.

Of course, with the addition of the milk tank, the interaction logic changes, which needs to be adapted; Also, milk crates cost money, and if you don’t have a lot of business, adding this stuff adds to the cost.

Let’s see it a little bit: the milk tank up there, that’s the messaging system. Each milk crate is a message queue. The milkman is the producer; Customer, is the consumer; And milk is the message. The customer has not taken the milk, is the message backlog. The customer sends you a message to confirm that the milk has been received, which is ACK…

2. The simplest generalized message system

Message system! To provide an intermediate layer, producers only need to submit messages to a specific intermediate layer, and consumers only need to fetch information from the middle layer.

So, in its simplest form, it’s a database.

The diagram above shows the typical architecture of some small systems. Considering the business scenario of orders, there are a large number of requests pointing to our business system. If we directly enter the business table through complex business logic, there will be a large number of requests timeout failure. So we added an intermediate buffer table to accept requests from users. Then, there is a timed task that continuously fetches data from the buffer table for really complex business logic processing.

Make no mistake, this is the most rudimentary of messaging systems, but it has a lot of problems.

  1. The polling interval of scheduled tasks is not controllable. Service processing is prone to delay.
  2. The processing capability cannot be expanded horizontally, and problems such as distributed locking and sequential guarantee are introduced.
  3. When other businesses need the order data, the business logic must be added to the scheduled task.

As traffic increases and business logic becomes more complex, a higher messaging model is needed.

3. Basic requirements of the messaging system

Our requirements for a messaging system are as follows:

  1. High performance including message delivery and message consumption, both fast. Generally, the parallel processing capability is obtained by increasing the number of fragments. The database is obviously a bottleneck.
  2. Messages must be reliable in certain scenarios and cannot be lost. The production, consumption, and MQ ends cannot lose messages. Generally by increasing the copy, forced to brush disk to solve. Databases are obviously backed up via master/slave.
  3. Expandable to accompany you to make the project bigger, accompany you to eternity. Adding nodes The performance of an enlarged cluster cannot be reduced. The extensibility of the database is certainly questionable, and you may introduce some complex sub-library and sub-table components.
  4. Ecological mature monitoring, operation and maintenance, multi-language support, active community. This determines whether you can trust the message queue you use.

Even more, XjjDog has another article to illustrate it: Distributed messaging System, Design essentials. It is hard to paint a bone than a dragon or a tiger

So much is required, but the model is so simple, where is the difficulty? Why do some students get headaches when they see Kafka?

4. How hard is it to do your job

Since the message system model is a simple producer-consumer model, why are today’s message systems so complex? In fact, its complexity, mainly reflected in the word distributed, has little to do with message queues, it needs to deal with some of the problems that all distributed systems face.

4.1 a copy of the

Any data on a single machine is unreliable, because the hard drive will fail, power will be out, and the cable will be mined. Therefore, data security is generally ensured by redundancy of multiple copies. Another function of the replica is to provide additional computing power, such as certain requests, to the replica. The more copies, the higher the availability.

!

After adding copy, it involves data synchronization. Even on the fastest Lans, there are delays, not to mention synchronization delays due to machine performance differences. The problem with this is that the request to read the copy may not read the latest data, which is that the consistency of the data has changed. Of course there are ways to ensure data consistency, but the more copies there are, the greater the latency.

The addition of replicas also introduces master-slave issues. When the master node dies, a replica node is added to the top. This process takes time to coordinate, and parts of it are not available.

All messaging systems require a lot of code to handle these exceptions.

4.2 partition

When a class of data is large enough (such as a table) to be too time-consuming to operate on, the data needs to be sliced and distributed across multiple machines. This cutting process is Sharding, which reduces the size of single query data and increases the cluster capacity by Sharding with certain rules.

Data in a shard can only be written to one place, that is, the master. Other copies copy data from the master.

Replicas can add parallel reads to read operations, but dirty data will be read. If you want the read data to be consistent, you can use the synchronous write copy mode. For example, KAFKA ack=-1. The commit is considered successful only when all synchronization succeeds.

But if you have too many copies, the process can be very slow. You may want to coordinate the efficiency of writes and reads by allocating the number of copies written and read, and Quorum’s R+W>N is a tradeoff policy.

5. Kafka

If we look at the noun definition of Kafka, it is much simpler.

Kafka is a distributed messaging (storage) system. Distributed system increases parallelism through fragmentation; Replicas increase reliability, and Kafka is no exception. Its structure does not escape the basic distributed theory we introduced above. If you put together the terms copy, partition, topic channel, producer, and consumer, the graph can become very large.

If you install Kafka on a machine, the machine is called a Broker. A Kafka cluster contains one or more such instances. It’s just a name, it doesn’t mean anything.

The components that write data to KAFKA are called producers. The producers of messages are written to the business system. It’s the same dimension as our milkman.

There may be multiple messages sent to KAFKA. How do you classify them? That’s the concept of Topic. Once a topic is distributed, it may exist on multiple brokers.

A Topic is divided into multiple segments, and each of these segments is called a Partition after increasing the parallelism. Partitions are generally evenly distributed across all machines.

The applications that consume data in Kafka are called consumers, and we give a name to a Consumer business on a topic, which is called a Consumer Group

Taking a look at the Kafka Server configuration file, the two most important parameters, partitions and replication.factor, are fairly straightforward.

Let’s talk about the most important concept. Kafka addresses synchronization between replicas using ISR, which is one of the interview points for Kafka.

ISR, which stands for in-Sync Replicas, is an important mechanism for ensuring HA and consistency. The number of copies has an impact on Kafka throughput, but greatly enhances availability. General 2-3 advisable.

Duplicate has two elements, one is the quantity should be enough, one is not to fall on the same instance. ISR is for partitions, and each Partition has a synchronization list. In N Replicas, one replica is the leader and the other replicas are followers. The leader processes all read and write requests to the partition, and the other replicas are replicas. At the same time, followers passively periodically copy the data on the leader.

If a flower falls too far behind a leader or has not initiated a data replication request for a certain amount of time, the Leader removes it from the reISR.

The Leader commits only when all replicas in the ISR send ACK packets to the Leader.

6. The role of messaging systems

Having said that, it’s time to explain what message queues do in computer terms:

  1. Peak cutting is used to accept requests that exceed the processing capacity of the service system to ensure smooth operation of services. This can result in significant cost savings, such as some split-kill activities that are not designed for peak capacity.
  2. Buffering exists as a buffer layer in the service layer and in the slow falling layer, similar to peak clipping, but mainly used for intra-service data flow. Like sending text messages in bulk.
  3. Decoupled from the beginning of the project does not determine specific requirements. Message queues can act as an interface layer to decouple important business processes. Just follow the conventions and program against the data to gain extended capabilities.
  4. Redundant message data can be used in a one-to-many manner by multiple unrelated businesses.
  5. Robust message queues can stack requests, so the consumer business can die for a short time without affecting the main business.

However, because Kafka is a good guy, he can do a lot of things because Kafka is very good. Its responsibilities are broader, including but not limited to:

  1. Delivering business messages
  2. User activity logs • Monitoring items
  3. The log
  4. Stream processing, such as certain aggregations
  5. Commit logs as redundancy for some important services
  6. Event Source, practice traceability, concepts in DDD

The following is a typical usage scenario for logging.

7. Why is KAFKA fast

Kafka, in general, runs at its speed, which at one point led to the idea that it could only handle journalling messages. The fact that Kafka supports even the most complex transactional messages is overshadowed by its speed.

So why is it so fast? There are the following reasons:

  1. Cache Filesystem Cache PageCache Cache
  2. Sequential writes Because modern operating systems provide prefetch and write technologies, sequential writes to disk are mostly faster than random writes to memory
  3. Zero-copy Indicates Zero copy, missing one memory swap
  4. Batching of Messages Batch processing Merge small requests and then stream them back and forth, hitting the network ceiling
  5. Pull mode The Pull mode is used to obtain and consume messages, which is consistent with the processing capability of the consumer

As for compression, JVM performance optimization, and so on, it’s just child’s play.

End

As you can see, Kafka is a versatile player, capable of both message processing and data storage. It has no regrets work, although the efficiency is extremely high, but also non-stop work, reflects the most tragic fate of workers.

Its distributed system design is also excellent, a system that its designers have designed for it: when one Kafka node falls, more Kafka nodes come up, and after a dozen seconds of pain, its sacrifice is forgotten.

Kafka is a distributed messaging system in its own right, but don’t push it endlessly. It only allocated 1 core CPU, 512M of memory, this is to hone its indomitable duty will in a difficult environment, ah!

Doesn’t it hurt your conscience to lose your data and call them unstable?

Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.