1. Introduce the MQ

Reference: Introduction of MQ message queues and their advantages and disadvantages

Advantages:

  • Decoupling: reduce system coupling
  • Asynchronous: Accelerates system response
  • Peak clipping: Reduce the instantaneous pressure of the system

1.1 Points to consider when introducing MQ

  • MQ needs to solve the following problems:(See section 2 how Kafka designs to ensure the following points)
    • High performance of MQ
    • High availability of MQ
    • MQ message reliability
  • Problems to be solved in production and consumption systems:(See section 3)
    • Message loss
    • Repeat purchases
    • Message ordering

2. How is Kafka designed to address the above points of MQ

2.1 AKF theory

The basic principle of distributed AKF resolution is to improve the availability and performance of the system from three dimensions, and solve the single point of failure and performance bottleneck of the system:

  • X:Solve single point problems and ensure high availability
    • Mysql primary/secondary mechanism
    • Stateless microservice clusters scale horizontally
  • Y:Split services to reduce module complexity
    • The system service business is divided into user system, order system and message system.
    • The user database stores user information, the order database stores user order information, and the message database stores user messages.
  • The z axis:shard
    • The system is divided by region, such as the zones of the game
    • Mysql is split according to user ID, and different users are stored in different user libraries. (If the data cannot be stored according to service splitting, the data needs to be split. For example, if there are too many users, the performance and capacity of the user database are insufficient, the users need to be fragmented.)

2.2 kafka designed

2.2.1 Kafka high performance design

  • Kafka supports cluster deployment
  • Kafka supports topics, (Y-axis split) Different business data can be written to different topics, i.e. split by business, different business data into different topics
  • Kafka supports multiple partitions per topic, and data can be fragmented into different partitions.

2.2.2 Kafka high availability design

  • Kafka supports cluster deployment
  • The Kafka node supports the leader and follower roles. (X-axis split) When the leader fails, all followers are notified to elect a new leader (since the leader is elected, ZooKeeper coordinates the distribution).
  • Kafka’s topic partitions support duplicates (X-axis split). Kafka ensures data consistency between primary and secondary partitions. Data can be read and written only on the primary partition.

2.2.3 Reliability design of Kafka messages

  • Kafka Partiton data is written to disk for storage
  • Kafka partitions support duplicates. Each partition can have multiple duplicates.

3. Problems to be solved in production and consumption systems

  • Message loss
  • Repeat purchases
  • Message ordering

3.1 the production end

3.2 the consumer end

  • Sequential processing on the consumer side is guaranteed only if the production side writes to MQ in order and the same type of data (like a user) is written to the same partition. Otherwise the order is not guaranteed because the message order is out of order before it is consumed by the consumer side.

Suppose that after the MQ read to the message, the service performs three steps:

  • Querying Redis takes 20ms
  • Writing DB data takes 20ms
  • Commit offset, 20ms

3.2.1 Submit single line at the same time

  • A single thread takes 60ms because it is synchronized
  • Offset is submitted first, but subsequent operations fail, resulting in message loss
  • Finally, the offset is submitted. The previous operation is complete. The offset fails, causing repeated consumption of the message
  • Sequential consumption of messages within a partition is guaranteed
  • Single thread, sequential, single processing, offset is increasing, regardless of db, offset frequency, the cost is a bit high, CPU, network card, resource waste

3.2.2 Multi-threaded synchronous submission

3.2.3 Asynchronous Submission

  • In asynchronous commits, the commit thread and the message processing thread are independent, and the commit thread commits the offset at fixed intervals. Repeated data consumption and data loss may occur. This method is not recommended, so detailed analysis is not made
    • Reread consumption: when the message is successfully processed and not committed, the program crashes resulting in no offset and no commit
    • Message lost: Offset has been committed, but the message has not been processed, and the program crashes.