Talk about high concurrency and high availability - Kafka

directory

Why message queues

1. Asynchronous: an order process, you need to deduct points, deduct coupons, send SMS, etc. Some time-consuming and do not need to be dealt with immediately, can be thrown into the queue for asynchronous processing.

2. Peak clipping: according to the usual flow, the server can just load normally. Occasionally, when a special offer is launched, the number of requests skyrockets. Because the server Redis and MySQL have different capacities, if all requests are received, the server cannot bear the load, resulting in downtime. Add the machine, need to adjust the configuration, after the end of the activity will not be used, that is trouble and waste. The request can then be queued and consumed as the server can.

3. Decoupling: in an order process, multiple interfaces need to be called such as deduction of points, coupons and SMS, so it is difficult to troubleshoot problems. For example, there are a lot of places to use SMS, if the day to modify the SMS interface parameters, the use of the place have to modify. At this time can be sent to the content of the queue, a service to consume, unified send SMS.

Comparison analysis of high throughput and high availability MQ

RabbitMQ, RocketMQ, Kafka and Redis message queue and publish subscribe are mentioned on several recruitment sites.

Redis queues are modeled using List data structures that specify Push on one end and Pop on the other, and that a message can only be consumed by one program. If you want to consume one-to-many, you can use Redis’ publish-subscribe model. Redis publish subscriptions are consumed in real time. The server does not keep production messages and does not record which ones the client consumes. Messages are lost if the client is down during consumption. This is where advanced message queues such as RocketMQ, Kafka, and so on are needed.

ZeroMQ’s point-to-point mode is comparable to Redis’s publish-subscribe mode, and if it weren’t too performance demanding, I’d use another queue instead, since it’s hard enough to solve the dependency libraries required by the development environment.

RabbitMQ supports multiple languages and features, but the throughput is relatively small and is based on the Erlang language, which is not good for secondary development and maintenance.

RocketMQ has similar performance to Kafka, a topic-based subscription model. RocketMQ supports distributed transactions, but there is no automatic master/slave switchover under a cluster, causing some minor problems. RocketMQ uses a master-slave cluster. When the Master does not go down, the Slave serves as a DISASTER recovery machine and is idle. Kafka uses a leader-slave stateless cluster, where each server is both Master and Slave.

Kafka related concepts

In a high availability environment, multiple Kafka servers must be deployed to prevent service access after Kafka is down. Each Kafka machine in a Kafka cluster is a Broker. Operations such as Kafka topic name and Leader election depend on ZooKeeper.

Similarly, multiple ZooKeeper servers must be deployed to prevent service access failures due to ZooKeeper downtime. The producer data is written to the Kafka Leader node, and the Follower node Kafka pulls data from the Leader for synchronization. When writing data, you need to specify a Topic, which is the type of message.

A topic can have multiple partitions under which data is stored. There can also be multiple copies of a topic, each of which is a complete backup of the topic’s data. They are producing information and consumers are consuming information. A temporary Consumer Group is created when no Consumer Group is specified for the Consumer. Messages produced by the Producer can only be consumed by one Consumer in the same Consumer Group.

Broker: Each Kafka instance in a Kafka cluster
Zookeeper: Elects the Leader node and stores related data
Leader: Producers and consumers interact only with the Leader Kafka
Followers: The followers synchronize data from the Leader
Topic: Topic, equivalent to the category of published messages
Producer: Indicates the Producer of messages
Consumer: A Consumer of information
Partition, Partition
A copy of the up:
-Leonard: Consumer Group

Partitions, replicas, consumer groups

partition

The data of a topic is scattered into partitions according to the number of partitions. The total data of these partitions is the complete data of a topic. The number of partitions should be an integer multiple of the number of copies, so that each copy is evenly allocated to the number of partitions. Writes to a partition are sequential. To ensure global order, you can set only one partition.

If the number of partitions is smaller than the number of consumers, the preceding consumers will be assigned to a partition, and the following consumers who exceed the number of partitions will have no partitions to consume, unless the preceding consumers are down. If the number of partitions is greater than the number of consumers, each consumer is allocated at least one partition’s data, and some is allocated to both partitions. If a new consumer joins, one of the two partitions will be assigned to the new consumer.

The number of partitions can be set to 6, 12, etc. For example, 6, when there is only one consumer, all the six regions belong to this consumer; when another consumer is added, each consumer is responsible for three regions; when another consumer is added, each consumer is responsible for two regions. The number of partitions allocated to each consumer is the same and consumption can be evenly distributed.

A copy of the

The number of copies of a topic is the number of data backups. If the number of copies is 1, even if there are multiple Kafka machines, when the machine where the copy resides is down, the corresponding data will fail to be accessed.

When a topic is created in cluster mode, if both the number of partitions and the number of replicas are greater than 1, the topic evenly allocates the zone Leader to Kafka that has replicas. This allows clients to consume the topic distributed from Kafka message data on multiple machines.

The slave node needs to pull data from the master node for synchronization, which is usually set to the same as the number of Kafka machines. If you only need high availability, you can use the N+1 policy, set the number of copies to 2, and use a dedicated Kafka to back up data. Topics are then distributed on “N” Kafkas, and “+1” KafKas hold the complete topic data as a backup service.

Replicas indicates which Kafka machines have Replicas of the topic, and Isr indicates which Kafka machines currently have Replicas. When the Leader Kafka involved in the topic partition goes down, the partition involved in the Kafka downtime is allocated to other available Kafka nodes. As follows:

Consumer groups

Each consumption group records the consumption offset of each topic partition. If no consumption group is specified, a temporary consumption group is created by default. Messages produced by producers can only be consumed by one consumer in the same consumer group. If you want a message to be consumed by multiple consumers, you can join different consumer groups.

Maximum offset, message storage policy

The maximum offset value

The maximum value of long is (2^63)-1. The first is the sign bit, where positive has 2^62 and negative has 2^62, where +0 and -0 are the same, except that some languages count 0 into the negative and some languages count 0 into the positive. The offset is of type long, minus negative numbers, containing 0, with a maximum value of 2^62.

Message storage policy

The Kafka configuration item provides two policies. One is based on time: log.retention. Hours =168; the other is based on size: log.retention. Bytes =1073741824. Data that meets the criteria is marked for deletion, and Kafka will actually delete it at the appropriate time.

Kafka data stored on Zookeeper

How do I ensure that messages are consumed only once

As mentioned earlier, partitioned data in the same topic can only be consumed by one consumer in the same consumer group. When multiple consumers consume the same topic at the same time, add them all to the same consumer group, and the producer’s message can only be consumed by one of them.

Repeated consumption and data loss

producers

After a successful message is sent, the producer does not wait for confirmation that Kafka synchronization is complete before sending the next message. If the Leader Kafka is down in the sending process and the producer is not aware of it, the sent information cannot be received by Kafka, resulting in data loss. The solution is to set request.required. Acks to -1, which means that the producer waits for all replicas to acknowledge receipt before sending the next message.

Request.Required.Acks=0 indicates that the message is sent without waiting for confirmation (low reliability, low latency, and most likely to lose messages).

Request.required.Acks=1 indicates that the Leader sends the next message only after the synchronization is complete

consumers

Consumers have two situations. One is that the offset is automatically submitted during consumption, resulting in data loss: the offset increases by one when receiving the message. If the service processing fails, the offset has already increased by one when consuming the next time, and the data of the previous offset is lost.

The other is repeated consumption caused by manual submission of offsets. If the offsets are submitted manually after the service processing is successful, the service processing may succeed but the offset submission fails, and the next consumption will be the same message again.

How do you solve it? This is an OR problem, where offsets are either committed automatically or manually, and the corresponding problem is either data loss or repeated consumption. If the message requires high real-time performance and it doesn’t matter if you lose one or two, you can choose to automatically submit offsets. You can choose to manually commit offsets if none of the messages can be lost, and then design the business to be idempotent so that no matter how many times the message is consumed, the result will be the same as if the message were consumed once.

Linux Kafka operation command

Check out Kafka Topic

Check out Kafka for details

Consumption Topic

View all consumer groups

View the consumption status of the consumer group

Windows Visualization Tool Kafka Tool

Configuring the Hosts file

123.207.79.96 ZooKeeper - Kafka - 01Copy the code

Configure Kafka Tool connection information

View Kafka theme data

Producers and consumers use codes

For details, see github.com/wong-winnie/library

Subscription number: Wei Hong Winnie

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Talk about high concurrency and high availability – Kafka

Talk about high concurrency and high availability – Kafka

Related Posts

Seize the county, township small program development agent business opportunities

IM audio and video instant messaging system EasyRTC how to use WEBRTC technology to optimize and develop?

Source of inspiration: product design inspiration site selection