Consume log data, monthly data amount of billions of data, check regular data into the library. The main problems encountered are as follows

  • Library I/O performance bottleneck
  • The outbound cache traffic is too large
  • Consumption data delay

Solution steps

1. Store logs in sequence

Log correctly processed logs into the library one by one according to the consumption rate.

  • Advantages: no delay, as long as the consumption is fast enough I will be quickly.
  • Disadvantages: IO consumption is relatively large, I/O has become a bottleneck.

2. Call the official bulk repository SDK (failed)

  • Advantages: The library has the official written logic, only need to invoke according to the interface, the library import rate is guaranteed, low I/O consumption.
  • Disadvantages: The online documents provided by the authorities are not consistent with the version of the database we use, and there is no API for the higher version of the database we use.

3. Save a batch of memory and put it into storage

Store a batch of stored data in memory, such as a thousand pieces before storing them.

  • Advantages: IO will be reduced, the storage rate is also guaranteed.
  • Disadvantages: If the system restarts, the saved data in memory will be lost, up to 999 data per table. If the database entry rate cannot keep up with that of the database, data will be stored continuously in the database. As a result, the memory is full of data.

4. Redis is used for cache and re-entry

Modify the cached memory data to redis, use the List structure of Redis, save 1000 items, and then take out all of them and put them into the library.

  • Advantages: Data is not lost during restart, and the input rate and I/O are moderate.
  • Disadvantages: Redis outbound broadband consumption is too large, because it is too much data at a time. There is also the risk of multiple services fetching the same thousand pieces of data simultaneously.

5. Redis cache outbound strategy optimization

Redis entry strategy unchanged, one by one with pop out of the library. It’s a message queue. Left in, right out.

  • Advantages: There is no data consumption by multiple services at the same time, redis outbound width performance is greatly optimized. Because it is a uniform rate of consumption, to achieve peak, memory will not always accumulate, resulting in memory overflow.
  • Disadvantages:
  1. After redis pops up, it is one by one, or it needs to save a batch of data in memory for storage.
  2. If data will still be lost after the restart, Redis needs to use a backup queue and save the saved data to Redis at the same time. After the restart, redis will first go to the backup queue to consume the data into the database, so as not to lose data.
  3. The pop rate cannot keep up. The import rate does not reach the maximum performance.

6. Re-enter the consolidated data into the logging service and consume it

The structured data is put into SLS, and then a task is consumed and a batch of likes is put into the library

  • Advantages: May have higher performance than redis’ inbound and outbound policies. Regular logs can also be used normally for other service lines without verification.
  • Disadvantages: The log system has multiple copies of data.

7. Final plan

  • Services are split into producers and consumers, continuing with 6.
  • When consumers consume, they will save a batch of data in the memory and store it in the redis cache, and then delete it after the storage. If the service restarts or the storage fails, they will store the unstored data in the Redis cache and then consume the normal data.

8. Other programmes not implemented

  • Use Rabbitmq instead of Redis message queue to increase the rate.
  • The regular data is directly delivered to the database by log service, but it can only be applied to the delivery of fixed fields.

It took two weeks, five or six revisions, and I was running out of brains. It is a simple business record copy bar, later meet the same business logic to refer to.