background

Instant Messaging, also known as IM, can be applied more or less in many business scenarios, some of which are core and some of which are auxiliary.

Since it is chat, it is inevitable to produce chat records, and chat records with the increase of the number of people and the passage of time, it is easy to appear explosive growth, this is actually a great pressure on storage.

For example, we are all familiar with the example of a group chat, a few minutes without reading, then open 99+ unread messages.

The same applies to instant messaging in a medical setting.

When patients go to offline hospitals to see a doctor, they must have questions and answers with doctors. These questions and answers are actually chat records for the system.

If this scene is put online, it is normal to chat with us in wechat.

Said so many not, but also the background account, then the next look at the selection of the chat record storage.

Technology selection

If you’re going to store it, there are definitely a lot of options, relational databases, non-relational databases, etc.

Of course, this is largely tied to the specific business scenario, without the business scenario, it is basically empty talk.

The chat record in the doctor-patient relationship is a very core content, and must be kept for a long time, can not be lost, can be queried.

Moreover, these chat records are strongly associated with a certain consultation, so it is meaningless to take out several chat records separately, because they are not related and cannot be connected in series.

According to past experience, the majority of chats between doctors and patients are within 50, and the minority are more than 50.

This may not be the same as other IM scenarios.

MySQL

At the beginning of the business, the probability is to choose MySQL to store these data, single library single table, but in this case, it is easy to reach the level of single table tens of millions and hundreds of millions.

Behind the face of the basic is the operation of the table.

Subdatabase subtable, basically according to the consultation number to hash, and then put into different tables in different libraries.

There’s an element of uncertainty here, how many libraries and how many tables?

More points, less money; Points of less, once again to reach the weight of time to re-points, big fight, this time is the most afraid of moving the rules of the hash.

So choose MySQL, to the late will indeed have a little powerless. Linear extension is very important for this piece.

If you want to make small changes to avoid separate database and table, you may try TiDB, but the configuration it requires is not acceptable to small and medium-sized enterprises, and more than 80% of the probability will be passed.

Docs.pingcap.com/zh/tidb/sta…

Cassandra

Cassandra is a distributed, centrless, elastically scalable NoSQL database based on the distributed design of Amazon Dynamo and the Data model of Google Bigtable.

Cassandra.apache.org/doc/latest/…

Why Cassandra? As for the doctor-patient scenario mentioned above, strictly speaking, it is more written and less read. The query is based on the consultation number, which is relatively clear.

Discord had a blog post back in 2017 about how they were storing billions of message logs. It went into a bit more detail.

Blog.discord.com/how-discord…

In fact, they choose the database appeal, is also in line with most of the IM involved in this piece.

Lao Huang was also inspired to know this database from here.

A common comparison would be HBase, which is popular at home and popular abroad.

Can take a look at this contrast, know about the similarities and differences of the two: www.scnsoft.com/blog/cassan…

If Cassandra is chosen, the design of the data model must be the most important step in all the links. If this step is not well done, it will be a disaster level and basically cannot be fun.

So this conversation model in the doctor-patient relationship is actually quite simple.

CREATE TABLE IF NOT EXISTS messages(
    inq_id text,
    send_time bigint,
    sender_id text,
    sender_role tinyint,
    msg_type tinyint,
    msg_body text,
    PRIMARY KEY (inq_id, send_time)
) WITH CLUSTERING ORDER BY (send_time ASC)
Copy the code

Compared to a normal IM group chat, this inq_id can be thought of as a group chat, a channel.

Why isn’t there a field like message Id? Multi – source, non – research, no practical significance.

Let’s see how this works in C# using the CassandraCSharpDriver client provided by DataStax.

Writing:

var cluster = Cassandra.Cluster.Builder()
                        .AddContactPoints("127.0.0.1")
                        .WithDefaultKeyspace("messaging")
                        .Build();

var inqId = "xxxxxx";
var sendTime = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
var senderId = "xxxx";
var senderRole = 1;
var msgType = 1;
var msgBody = "zzzz";

string INSERT_SQL = @" INSERT INTO messages(inq_id, send_time, sender_id, sender_role, msg_type, msg_body)
VALUES (?, ?, ?, ?, ?, ?) ";

var session = cluster.Connect();

var stmt = session.Prepare(INSERT_SQL).Bind(inqId, sendTime, senderId, senderRole, msgType, msgBody));

session.Execute(stmt);
Copy the code

Read:

var cluster = Cassandra.Cluster.Builder()
                .AddContactPoints("127.0.0.1")
                .WithDefaultKeyspace("messaging")
                .Build();

var inqId = "xxxxxx";

string GET_MSG_SQL = @" SELECT * FROM messages WHERE inq_id = ? ";

var session = cluster.Connect();

var stmt = session.Prepare(GET_MSG_SQL).Bind(inqId);

var rowset = session.Execute(stmt);

Console.WriteLine("Patient \t\t\t\ doctor \t");

foreach (var row in rowset)
{
    Parse the rows returned from Cassandra
    var msg_body = row.GetValue<string> ("msg_body");
    var sender_role = row.GetValue<sbyte> ("sender_role");

    if (sender_role == 0)
    {
        Console.WriteLine($"{msg_body}\t\t\t\t\t");
    }
    else
    {
        Console.WriteLine($"\t\t\t\t\t{msg_body}"); }}Copy the code

Write in the last

The choice of storage is actually a little tricky, according to different application scenarios, find out more suitable for the current scenario of several solutions, and then choose a cost is not so high.

Cassandra has some advantages for storing chat records, which can handle rapid data growth without too much adaptation in the business code layer. Relatively simple deployment, no special dependency, low operation and maintenance costs.

Follow my public account “Bucai Old Huang” and share with you what huang sees and hears in the first time.