In the process of learning Kafka, I found that many tutorials are outdated with the evolution of Kafka version, so I will make a comb for the evolution of Kafka version

The original

X 0.8.

  1. 0.8.0 is the first release of Kafka as an Apache top-level project, introducing Replication for Topic

  2. In 0.8.2.0, Kafka introduced a new version of Producer API, but it is still unstable and buggy

  3. Updated the data structure of the message to change the data offset from physical to logical, slightly taxing the CPU when processing the offset

0.9.x (released in late 2015)

  1. Added support for security authentication, authorization management, and data encryption

  2. The Consumer API has been rewritten in Java and the connection has been switched from Zookeeper to Broker, but it is still unstable and buggy

  3. New Kafka Connect component for high performance data extraction

  4. The new producer API is stabilizing

  5. The kafka-topics. Sh script is deprecated. Kafka-configs. sh is recommended

  6. Sh is deprecated. Kafka -consumer-offset-checker.sh is recommended

  7. Java1.6 and scala2.9 are no longer supported

  8. Add quota restrictions, i.e. Multi-tenant control

0.10.0. X

  1. Introduction of Kafka Streams, officially upgraded to a distributed stream processing platform

  2. The message format has been changed and timestamp has been added. This version upgrade may cause the message format to be converted, which can then degrade the performance of the production environment due to the failure of Kafka’s 0 copy mechanism. Therefore, perform this operation with caution to ensure that the versions of the client and server are the same.

Format before change:

Add timestamp field to record:

  1. A New Kafka Streams client has been added. This client only applies to clusters with server version 0.10.x and later

0.10.1. X

  1. To determine the log retention time, change the last modification time of the log segment to the maximum timestamp of the message in the log segment

  2. Log scrolling is based on the timestamp in the new message instead of the creation time of the log segment

3.Kafka Streams upgraded to 0.10.1.x, downward incompatible

  1. The new Consumer API supports maintenance of heartbeats from background threads to prevent downlines and reblains caused by consumer blocking

  2. Added support for searching for offset by timestamp on a partition

0.11.x (June 2017)

  1. Provides an idempotent Producer API. Idempotent is used to enable exactly once semantics on the same partition for a single session of the Producer

  2. Transaction support is provided, but there are a few bugs

  3. Refactoring the kafka message format. The new message format saves more space when sending messages in batches. The larger the value of the message body, the more time is saved. The main idea is to adopt a new recording method for the time field and number field. The message displacement is changed to the initial logical displacement + the relative displacement of this batch of messages and other means to effectively reduce the size of the message

The sending format of the message

The compressed format of the message

  • Inner Message: The original batch message from the producer
  • Wrapper Message: Messages stored in the broker

The offset in the wrapper message is the maximum offset in the inner message, the absolute displacement of the partion (not the physical address of the disk, the logical displacement), and is stored as the relative displacement, starting from 0

  1. Unclean leader election (from a common replica other than the ISR in the event of a leader crash) is disabled by default. Both the default requirement for the new leader to be elected on the ISR replica is for the new leader to be elected on the ISR replica. Configurable unclean. Leader. Election. Enable = true support unclean leader

  2. Producers configuration block. On. Buffer. Full, metadata. The fetch. The timeout. Ms and timeout. Ms is deleted

  3. When using message compression, increase the size of the default compression block.

X 1.0.

  1. The theme deletion function is enabled by default. If you do not want to enable it, delete. Topic. enable=false

  2. Disk failover is implemented. The availability and reliability of Kafka is enhanced by the fact that data is automatically transferred to the normal disk when one of the Broker’s disks is damaged, which in previous versions would have brought the Broker down directly. Starting to support cross-path migration of copies, where partitioned copies can be moved between different disk directories on the same Broker, makes sense for disk load balancing

3. The kafka-consumer-offset-checker.sh command is deleted. Use the kafka-consumer-groups.sh command to get the details of the consumer group

  1. Strengthen the kafka stream

X 1.1.

  1. The default kafka logging component dependency is modified. Users can choose slF4J, Logback, etc

  2. Strengthen the kafka stream

  3. Enhanced Kafka Connect with message header support

2.0.0

1. The default retention period of offset is changed from 1 day to 7 days

  1. Java7 is no longer supported, minimum support to Java8.

  2. Security-related Changes

  3. Some of the clients associated with Scala are no longer supported or have been removed in favor of Java-related clients

  4. The Connect component is the default JSON converter

X 2.1.

The ZStandard compression mode is supported, which improves the message compression ratio and significantly reduces disk space and network I/O consumption

X 2.2.

  1. Kafka Streams 2.2.1 requires message formats of 0.11 or higher and does not work with older message formats

  2. You must explicitly set the consumer group ID from this version, or you will not be able to subscribe to the topic and submit the offset, and it is recommended that the group ID not use an empty string

  3. Kafka-topics. Sh can be connected directly to –bootstrap-server, the older — ZooKeeper option is still available.

X 2.3.

  1. Kafka Connect introduces a new rebalancing protocol based on incremental collaboration

  2. Consumer introduces static members to reduce rebalance during rolling upgrades

  3. Kafka Streams 2.3.0 requires message formats of 0.11 or higher and is not suitable for older message formats

X 2.4.

1. Upgrade the ZooKeeper dependency to 3.5.7

  1. The bin/kafka-preferred-replica-election.sh command line tool is deprecated. Procedure Bin /kafka-leader-election.sh has been replaced by bin/kafka-leader-election.sh

  2. The producer default partitioning policy is changed to a sticky partitioning policy, which means that records for a particular subject with an empty key and no assigned partition are sent to the same partition until the batch is ready to be sent. When a new batch is created, a new partition is selected. This reduces the latency that occurs, but in extreme cases, it can result in records being unevenly distributed across partitions. Typically, users are not affected, but the difference can be noticeable in test and other situations where records are generated in a very short time.

  3. Optimized features related to consumer rebalancing.

2.5 x to 2.7 x

Simplified extended implementation of Kafka transactions and improved exactly Once speech