Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Kafka overview

Characteristics of the

  • Publish subscriptions are based on message queues
  • Respond to events in real time
  • A distributed system stores memory

Message queue-based programming

  • Traditional call: Traditional applications communicate through direct calls based on the interface protocol (HTTP/RPC).
    • Disadvantages: Strong coupling
  • Decoupling, if we communicate through MQ, we can do it without direct calls between systems, between services.
  • Buffer traffic. Kafka MQ can buffer traffic if the number of requests is too large. Peak reduction of traffic, security of service and temporary storage of requests.

Kafka API design and concepts

  • The producer API flows messages to a message topic
  • Consumer API applications can subscribe to message topics with one or more consumer terminals
  • The stream API consumes input streams in one or more topics, and then outputs one or more output topics to generate output streams. You can effectively convert an input stream to an output stream.
  • Reusable APIS
  • Kafka is a cross-language, high performance TCP-based and client-side communication protocol that provides multiple language apis

Topics and partitions

  • A topic can be subscribed by multiple consumers
  • Each topic maintains logs for one Partition.

  • After partitioning, messages are evenly distributed among partitions.
  • The order of messages on the partition is strictly sequential.
  • Each message on each segment will have a sequential ID called an offset that uniquely identifies the message.
  • By default messages have a default retention policy of 2 days.
  • Kafka performance is a constant, independent of the amount of data.
  • If offset is specified, messages can be repeatedly consumed or specified messages can be consumed. Each Consumer’s offset is independent and does not affect other consumers

  • A topic can have multiple partitions, which, as parallel units, are limited by the capacity of managed services and can be expanded horizontally by adding servers.

A copy of the

  • Each partition has its own copy, and each partition has a leader and multiple followers. If the leader fails, a new follower will be elected as the leader. Each server can be used as a partition

Laeder other partition followers. This ensures high availability of the service

producers

  • Producers are responsible for choosing which messages to allocate to which partitions for load balancing. It can also be done by some other algorithm

consumers

  • A consumer group identifies itself by a consumer group (Group name) name, and each message of the discovery topic is delivered to a unique consumer instance of each consumer group, which can be located in a different process or on a different machine.
  • (Unicast) If all consumer instances are in the same consumer group, it will balance the message load to each consumer instance.
  • (broadcast) If all consumers have different consumer groups, the message will be broadcast to each consumer instance.

  • Failure tolerance, in Kafka, is partitioned using logging so that it is fair on each instance partition. The Kafka protocol dynamically handles the maintenance of subscriber instances that join. If a new instance joins the group, it will take over some partitions,

If the instance goes offline, these partitions are assigned to other instances.

  • The implementation of consumption is placed in the log through messages. The way to maintain group members is implemented through the Kafka protocol. A new instance will occupy a partition, and each partition will guarantee the order. The order of messages across partitions is not ordered

ensure

  • Producer-generated messages will be appended in the order they are sent, if M1 and M2 are published by the same publisher. The offset of M1 is smaller than M2 and the log is further forward. Ensure that logs are in strict order.
  • The order in which messages are seen by consumers, and the order in which logs are stored
  • There will be N copies of the topic, and a maximum of N-1 copies will be allowed to crash, and messages will not be lost.

Kafka message system

  • Compared with traditional messaging systems, traditional messaging models are queuing and publish-subscribe. In a messaging system such as queues, a pool of consumers that can be read from a service

And each record goes into a record; In the publish-subscribe model, broadcast to all consumers. Both models have advantages and disadvantages

  • A queue model that can be processed on multiple consumers instances for scalability.
  • The publish-subscribe model, which can be broadcast to multiple subscribers, is not scalable because every message goes into every instance.
  • The Consumers Group generalizes both models and can be processed on a collection of processes, or in the case of the publish-subscribe model, to multiple consumers groups. Kafka’s

The advantage is that each Topic has one

  • Kafka has a guarantee of message order compared to traditional queues

  • Multiple messages are processed sequentially. However, the server distributes and processes them sequentially, but messages are delivered asynchronously to different consumers, but messages are sent asynchronously to consumers.

Then the order of records disappears in the parallel consumption scenario.

  • This is usually done by working around, using an “exclusive consumer”. In the subject of partitioning, load balancing can be guaranteed. Specified for dispatch

Partitions are in Topic.

  • Consumption is made by assigning a partition in a given topic to a consumer in a consumer group. In this case, consumer is the only one

Readers and read in order.

  • Since many partitions exist, this will balance many consumer instances for load balancing. The number of instances of consumer groups cannot exceed the number of extents.

Kafka data storage system

  • Kafka allows the producer to return an ACK to ensure that the message is written successfully and persisted.
  • Kafka could serve as a data storage system,

Kafka streaming data processing

  • The data processor can continuously receive the data processing stream.
  • For complex data transformations, Kafka provides several Stream apis for merging and evaluating streams. Handle some data sorting, data modification, data state filtering and so on.
  • The Kafka API provides a high-performance API for input, storage, and grouping

The comprehensive treatment

  • Stream processing and message processing content are grouped together. Make Kafka
  • It’s easier to store historical data and combine it with future data. Kafka has been very successful as a streaming data platform and pipeline
  • Individual data can process historical data as well as future data, and real-time subscription schemes can become message queues. Periodic data loading,
  • Streaming data processing can perform a real-time conversion of data

Kafka installation is created with producer consumers

Tar -zxvf kafka_2.12-2.1.0.tgz -- Start zk./zookeeper-server-start.sh.. / config/zookeeper. The properties - start kafka. / kafka - server - start. Sh.. / config/server properties -- create a theme. / kafka - switchable viewer. Sh -- create -- zookeeper localhost: 2181 - replication - factor 1 -- Partitions 1 --topic myTopic -- Sends messages to the topic./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic myTopic /bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopicCopy the code

Zookeeper explains

  • Provide distributed registration discovery and distributed coordination
  • Zookeeper abstracts distributed information

The resources

  • zhuanlan.zhihu.com/p/43843796