Feed service project design thinking

Project background

At first, for the sake of retention, the product colleagues designed a feed function similar to Weibo in the APP. In terms of functions, our feed service is more like a combination of weibo and wechat moments. Both weibo hot scene, but also the shadow of wechat circle of friends.

Feature list

Feed information page

Similar to the photo album function of wechat circle of friends, you can see the feed dynamics posted by users.

Feed News Page

Similar to wechat circle of friends function, you can see the feed dynamics published by yourself and your friends (followers).

Feed square page

Similar to the recommendation or popular functions of weibo, do personalized recommendation for users.

The basic idea

Project thinking

1. Data storage

There are two main types of information stored in feed dynamics: dynamic content and dynamic indexes of different dimensions.

Feed data structure

Feed information needs to store additional information in addition to the basic content, and additional information may be subject to extensibility. So the basic definition of the feed data structure is as follows. The serialized FeedInfo information is stored in the database so that subsequent extensions can be supported without changing database fields.

message FeedItem { string feed_id = 1; // dynamic id int64 create_time = 2; // Publish time User from_user = 3; Int32 feed_type = 4; // Corresponding FeedType bytes attachment = 5; // Attachment information... } message FeedInfo { FeedItem feed_item = 1; bool deleted = 2; AuditType audit_type = 3; VisibleStatus = 4; // Dynamic visibility... }Copy the code

Content store

FeedId => Feed content

Dynamic content can be abstracted into key-value storage in the form of KV. Almost all scenarios obtain dynamic content based on dynamic ID and then proceed with subsequent processing. Therefore, HBase is selected as the preferred content storage service, and Mysql and Redis are used as the degraded solution. After all, this is the first time HBase has been used for online services. According to the final performance data, HBase performance is good.

Indexed data store

Index relationships for other dimensions are stored in Mysql, which may involve complex queries.

Page data feed

Similar to the photo album function of wechat circle of friends, it needs to store all the dynamic information posted by users, divided into tables according to the user dimension. The basic information is as follows:

attribute	note
user_id	The publisher ID
feed_id	Dynamic ID
feed_create_time	Dynamic creation time
visible_status	Dynamic visibility

New feed

Similar to the function of wechat circle of friends, it needs to store the dynamic information of all followers and oneself, divided into tables according to the user dimension. The basic information is as follows:

attribute	note
user_id	The user ID
feed_id	Dynamic ID
from_user_id	The publisher ID
feed_create_time	Dynamic creation time

Square Recommendation Feed

The popular recommendation function, similar to Weibo, needs to store all users’ posts according to the timeline. So square Feed is tabulated by date. There are 31 days in a month, which are divided into 31 tables, and feeds for different dates are stored in different tables. The basic information and news feed are basically the same, but the data stored and data dimensions are different.

attribute	note
user_id	The user ID
feed_id	Dynamic ID
from_user_id	The publisher ID
feed_create_time	Dynamic creation time

The following considerations are taken into account here by day: based on the current estimated release volume, 31 sub-tables should be sufficient and no further subdivision is required. According to the day table can basically ensure the continuity of data, more in line with the query habit.

2. Data diffusion

Diffusion modes

Data diffusion has two ways: write diffusion and read diffusion. Read diffusion is performed at the time of reading. In this case, a read request may involve multiple reads, and the read time is uncontrollable. Generally, write diffusion stores multiple copies of data. Although it stores a lot of data information in redundancy, the efficiency of reading can be greatly improved. Therefore, write diffusion is a more appropriate processing mode when disk and other hardware resources are not in short supply.

The data flow

In the feed service, the feed published by users will be visible in their profile page and news page first, and the experience of publishers will be guaranteed first. The published feed will then spread to its fan friends and square pages for exposure. In the following process, the user is basically unaware of the delay, so asynchronous write diffusion processing is carried out through the message queue. The basic processing flow is as follows:

3. Audit mechanism

User release dynamic needs to be audited, in order to avoid violations of the content pushed to users. There are generally two kinds of audit mechanism: first review and second review.

First review the coma

Feeds posted by users must be approved before they can be distributed to other users. So the feed can be intercepted after it has been written to the publisher’s profile page and news page, and before it is written to the message queue for diffusion. The feed is only written to the message queue for write diffusion once it has been approved, so that publishers and other users are not aware of it. For publishers, however, there may be a lag in interaction.

After review

The feed published by users can be spread and exposed first, and then deleted when the feed is not verified. The user experience is better in this case, but it adds some risk to the platform.

4. Like the design

User dimension likes list

User likes are stored in the database and redis cache. Due to the uncertainty of users’ “like” behavior, some users may frequently “like”. Therefore, the zset structure of Redis is used to cache part of users’ “like” data. Field is feedId, score is feedId, and it is arranged in reverse order according to feedId. Only the latest N “like” data of feed are retained.

Why isn’t the user likes list sorted by likes time? Since the user likes the feed at an uncertain time, it’s possible that the user liked the feed a long time ago. This means that when the feed likes are not found in the cache, it is impossible to determine whether the cache is missing or the user did not like the feed.

When the user’s “like” list is sorted by feedId, if the feedId is not in the list and the feedId is greater than the smallest feedId in the “like” cache, it can be confirmed that the user really did not click “like”, and there is no need to query from the database.

Feed dimension likes count

In our scenario, the feed dimension like count is the important data. At the beginning of the design, the count of Redis was used to synchronize the count, but there would be the problem of missing count or repeated count, which could not ensure the accuracy of the data. Later, considering that the likes would not be very frequent, we adjusted to get the count count from the database and then synchronize the count to the Redis cache.

Break up the likes process

The user experience is prioritized in the design, so the user dimension data is quickly returned after being stored, and the like request is written to the message queue. We then deal with the maintenance of the feed dimension like data, including adjusting the feed like count and sending likes to feed publishers. Split the process according to its importance and prioritize the user experience.

5. The trade off

Square recommended page bottom protection strategy

The plaza recommendation Feed is typically a recommendation service that returns a list of more interesting feeds based on the user’s characteristics. When the recommendation service is not available, there needs to be a bottom line policy to ensure that users can pull the feed without affecting the user experience. The bottom line strategy is to use Redis’ Zset structure to maintain the latest N feeds posted by all users. The list information is returned directly when the recommendation service is unavailable.

N latest feeds of followers

When a user follows another user, the user’s feed should appear on the user’s news page. Considering the real-time information, we only select the latest N feeds of the followers to insert into the user’s news, rather than the list of all feeds of the followers. This processing does not affect the user experience, processing is relatively simple.

conclusion

This was the first time to think about the design and implementation of the feed service, and basically covered the main scenarios. In this process, some small optimization has been carried out constantly, and there are also a lot of harvest, which can be optimized again.