Continue to answer the questions of planet Water friends: How to cache and push the data with frequent changes, and how to reduce the database pressure? I have not done any related business, so I would like to share my thoughts and ideas based on my architecture experience. I hope I can enlighten you.

First, business abstraction

(1) There are a lot of clients concerned about trading, assuming millions of levels;

(2) The amount of data is not necessarily large. The number of listed stocks is assumed to be ten thousand level.

(3) The amount of writing is relatively large, many transactions occur every second, assuming 100 levels per second;

(4) The calculation is complicated, including summation, grouping, sorting and other operations;

Second, potential technical tradeoffs

How to select the connection between client and server?

First, the disk port client can greatly improve performance and reduce server stress by establishing a TCP long connection with the server rather than establishing and destroying a short connection with each request.

How to meet the real-time requirements of the business?

Disk service has high requirements on real-time data. The server can push messages through TCP long connection to ensure real-time information.

Because the push magnitude is huge, it can be independently pushed to the cluster, specially implemented push. After the push cluster is independent, the push capability can be linearly improved by increasing the number of push servers.

As shown in the figure above, it is assumed that 100W users receive real-time push:

  • Set up a special push cluster, maintain the TCP long connection with the client, and push in real time

  • Each push service maintains a 10 w long connection, and 10 push services can serve 100W users

  • Through MQ decoupling between the push cluster and the service cluster, the push cluster only pushes messages without any service logic calculation. The content of the push messages is calculated by the service cluster

The biggest bottleneck of push service is how to push a message to the 10W clients connected with it the fastest?

  • If the message volume is not large, such as a few seconds of a message, you can have multiple threads, such as 100 threads, concurrently push

_ Voiceover: _ Corresponding to water friends mentioned, if the amount is not large, you can clinch a deal push a brushstroke.

  • If the number of messages is too large, for example, hundreds of messages per second, you can temporarily store the messages for one second and push them in batches

Voice-over: Corresponding to the water friend mentioned, if the message volume is huge, batch push is a good way.

How to meet data volume, write volume and scalability?

The number of stocks is small, and the volume of data is not the bottleneck.

The amount of data written in streams, hundreds of levels per second, even thousands of levels, database write performance is not a bottleneck, theoretically a library can withstand.

If the number of writes per second reaches ten thousand levels, horizontal sharding can be implemented at the database level, and the flow of different stocks can be split into different levels of sharding to linearly increase the number of writes in the database.

After horizontal split, the same stock, data in the same repository, different stocks, may be in different repositories, theoretically there is no need for cross-repository query.

If the number of writes per second reaches 100,000 or millions, you can also add MQ buffer requests to peak load and valley fill to protect the database.

In any case, given the amount of data and writes in this business, a single library should be no problem.

Complex business logic operations, how to meet?

Sum /group by/order BY cannot be used for each read request. The sum/group by/order BY cannot be used for each read request. Shui has thought of caching as a way to reduce the strain on the database, but worries that “over time, this bias will be amplified.”

To amplify the consistency of the cache, you can do this:

  • Do an asynchronous thread that accesses the database every second, calculates the complex business logic, and places it in the high availability cache

  • All read requests are no longer coupled to business logic calculations and read results directly from the high availability cache

As a result, complex business logic can be evaluated only once per second.

The problem is that a lot of data may be written to the database in a second, but it will not be reflected in the cache in real time. At worst, the user will read the data from a second ago.

However, this is a design tradeoff between performance and consistency.

All the schemes above are based on the premise of huge online customer weight and huge push message, push scheme is adopted. A lot of times, engineers will make assumptions, make problems very complicated, make solutions very complicated.

If the number of online users is small and the disk delay acceptable to users is long (for example, 5s), the polling pull solution can be adopted:

(1) Cancel the whole push cluster and MQ cluster;

(2) Disk port data, asynchronous thread writes to the high availability cache once every 1s;

(3) The client only pulls the latest disk port data from the cache every 5s polling;

Done!

Sum /group by/order by = sum/group by/order by

Third, summary

  • Long connections perform many times better than short ones

  • If the push volume is large, the push cluster must be decoupled from the service cluster

  • Concurrent push and batch push are common optimization methods when push volume is large

  • Horizontal sharding can be scaled up and MQ buffering can protect the database when there are large writes

  • When the service is complex and the read volume is huge, adding cache and periodic calculation can greatly reduce the database pressure

Thinking is more important than conclusion, I hope you have a harvest.

Please keep your questions open.

Discussion: friends who have done disk push, talk about your plan and learn together?