Kafka installation with basic configuration

Kafka is an application developed in Java and can run on multiple operating systems with a Java environment installed. In addition, Kafka uses ZooKeeper to store broker metadata.

With the Java environment and ZooKeeper configured, you can download and install Kafka.

Install the Kafka Broker

The latest version of Kafka can be downloaded from Kafka’s official website, such as kafka_2.11-1.1.0.tgz. Using the command

$The tar - ZXF kafka_2. 11-1.1.0. TGZ
Copy the code

Decompress the ZooKeeper file, go to the Kafka home directory, and run the command to start ZooKeeper

$ bin/kafka-server-start.sh config/server.properties
Copy the code

Start the Kafka.

Basic configuration of the broker

The default configuration can be used for standalone debugging, but will need to be adjusted if you want to deploy broker clusters in a formal environment.

broker.id

The broker identifier, which defaults to 0, can be set to any integer and is unique throughout the Kafka cluster.

You are advised to set the value to an integer associated with the machine name.

port

The default port number Kafka listens on is 9092. It can be set to any available port number.

zookeeper.connect

ZooKeeper address used to store broker metadata. The default value is localhost:2181 and can be configured in the hostname:port/path format. Multiple addresses are separated by colons (:).

You are advised to configure Path as the chroot environment of the Kafka cluster. If path is not configured, the root directory of ZooKeeper is used by default.

log.dirs

Kafka uses disks to save messages. Log. dirs specifies the path for storing messages. The default path is/TMP /kafka-logs.

Data in the same partition is stored in the same path. Broker adds partitioned data to the path with the least amount of partitions, rather than to the path with the least amount of disk space.

num.recovery.threads.per.data.dir

The broker uses configurable thread pools to perform startup, shutdown, or crash restart operations. For servers with a large number of partitions, parallel operations can save a lot of time. Note that this configuration is for a single directory, and the final number of threads used is the product of this configuration and the number of directories.

auto.create.topics.enable

By default, a topic is automatically created when a broker acts on a topic that does not exist, that is, writing messages to, reading messages, or requesting topic metadata to a topic that does not exist. Change this configuration to false if you want to control topic creation.

Basic configuration of the topic

Kafka can be configured individually for each topic, such as the number of partitions, data retention policies, and so on, using management tools. The default configuration provided by the broker is applicable to general scenarios and can be used as a baseline.

num.partitions

This parameter specifies the number of partitions that the newly created theme contains. The default is 1. Kafka scales topics horizontally through partitions that can be used to balance the load of a cluster as new brokers join.

In order for partitions to be distributed across all brokers, the number of partitions must be greater than the number of brokers. If you need to load balance topics that contain a large number of messages, you need a large number of partitions.

It is generally possible to estimate the number of partitions required by dividing the topic throughput by the consumer throughput, that is, if 1 GB of data is written and read to the topic per second and each consumer can process 50 MB per second, then at least 20 partitions are required. If throughput is uncertain, as a rule of thumb, the amount of data per partition should be limited to 25 GB.

log.retention.ms

The number of milliseconds for data retention. Kafka uses log.retention. Hours to set the data retention time by default. The default value is 168 hours (1 week). It is recommended to use log.retention. Ms to configure the data retention time.

When multiple data retention periods are configured at the same time, the parameter with the minimum value is preferred.

log.retention.bytes

This configuration applies to each partition and indicates the maximum amount of data to be retained. For example, when the configuration is 1 GB, a topic with eight partitions can hold up to 8 GB of data.

This configuration, along with data retention times such as log.retention. Ms, deletes old data when either condition reaches a boundary.

log.segment.bytes

Kafka stores data as log fragments, and the log.retention. Ms and log.retention. Bytes mentioned earlier both work on log fragments rather than individual messages. Log.segment. bytes Specifies the size of a single log fragment. The default is 1 GB. Messages are written to the log fragment, which is closed and a new fragment is opened when the log fragment upper limit is reached.

Messages in a log fragment do not expire until it is closed. If the fragment cap is 1 GB, 100 MB messages are received per day, and the data retention time is 1 week, it takes up to 17 days for the log fragment to be deleted.

log.segment.ms

This configuration specifies how long it takes to turn off log fragments after they are enabled. There is no default value. This configuration, when used with log.segment.bytes, closes the fragment and enables a new log fragment when either takes effect.

message.max.bytes

Limit the size of a single message, which defaults to 1000000, or 1 MB. When a producer tries to send a message larger than this configuration, the message is not accepted and an error message is returned. This configuration is only the compressed message size, and the actual size can be larger than the configured value.

The broker cluster

Clusters can be used to balance load across servers and provide high availability by avoiding data loss caused by single points of failure through replication.

The number of the broker

To determine the number of brokers required, consider the amount of data first. If the entire cluster is to hold 10 TERabytes of data and a single broker can hold 2 terabytes, you need at least 5 brokers. If data replication is enabled, you need at least twice as much space, that is, at least 10 brokers.

Consider also the ability of the cluster to handle requests. By increasing brokers, you can increase the maximum number of requests per second and address performance issues such as low disk throughput or low memory.

Broker cluster configuration

To add the broker to the cluster, only two configuration parameters need to be modified.

First, modify zookeeper.connect so that all brokers use the same ZooKeeper address to store metadata.

Second, change the value of broker. Id. The broker id must be unique in the cluster; otherwise, it cannot be started.

reference

Kafka official documentation

The definitive guide to Kafka