1. What are the benefits of SparkStreaming?

1) decoupling

2) the buffer

2. Common scenarios of message queues

1) Decoupling between systems

2) Peak pressure buffer

3) Asynchronous communication

3. The architecture of kafka

4. Kafka’s message storage and production consumption model

2) The internal messages of each partition are strongly ordered, and each message has a serial number called offset

3) A partition corresponds to only one broker, and a broker can manage multiple partitions

4) Messages are written directly to files rather than stored in memory

5) Delete according to the time policy (one week by default) rather than delete after consumption

6) The producer decides which partition to write messages to, which can be polling load balancing or partition policies based on hash

7) Consumers maintain their own consumption to which offset, each consumer has a corresponding group, group is the queue consumption model

8) Each consumer consumes different partitions. A message is consumed only once in a group, and each group consumes independently without any influence on each other

5. The characteristics of kafka

  • Features of message system: Survivor consumer model, FIFO
  • High performance: A single node supports thousands of clients with throughput of 100 MB/s
  • Persistence: Messages persist directly on normal disks with good performance
  • Distributed: Data copy redundancy, traffic load balancing, and scalability
  • Very flexible: message persistence +Client maintains consumption state

6. Compare Kafka with other message queues

  • RabbitMQ: Distributed, supports multiple MQ protocols, heavyweight
  • ActiveMQ: similar to RabbitMQ
  • ZeroMQ: Provided as a library, complex to use, no persistence
  • Radis: Single machine, good pure memory, poor persistence
  • Kafka: Distributed, long time persistence, high performance, lightweight and flexible

Command 7.

1. The topic is generated

cd /usr/local/kafka/bin
./kafka-topics.sh --zookeeper ht-1:2181,ht-2:2181,ht-3:2181 --create --topic t0426 --partitions 3 --replication-factor 3
Copy the code

./kafka-topics.sh --zookeeper ht-1:2181,ht-2:2181,ht-3:2181 --list
Copy the code

3. Console as producer

./kafka-console-producer.sh --topic t0426 --broker-list ht-1:9092,ht-2:9092,ht-3:9092
Copy the code

4. Console as consumer

./kafka-console-consumer.sh --bootstrap-server ht-1:9092 --topic t0426
Copy the code

5. View the description of the topic

./kafka-topics.sh --zookeeper ht-1,ht-2,ht-3 --describe --topic t0426
Copy the code

./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list ht-1:9092 --topic t0427 --time -1
Copy the code

./kafka-topics.sh --zookeeper ht-1:2181,ht-2:2181,ht-3:2181 --delete --topic t0426
Copy the code

Kafka’s Leader balancing mechanism

When the leader of a partition fails, a new leader is found according to the copy-first principle, and the new broker manages the partition. When the leader is restarted, the partition is managed by the original broker again

9. Delete the topic

1. Command to delete topic. The current topic is marked for deletion and will be deleted after one week by default. 3. Go to ZooKeeper to delete the original data 4. Delete topic information marked for deletion from ZooKeeper

10.SparkStreaming + Kafka Receiver mode

11.SparkStreaming + Kafka Direct mode