This article is published by Xu Ning in Tencent Lecture Hall. The original question is “How do programmers push the content you care about to your eyes? Uncover the system design behind the information flow recommendation, “with changes and revisions.

1, the introduction

Feed flow (hereinafter referred to as “Feed stream”) is almost everywhere in our mobile APP (especially in social/community products), the most commonly used is wechat Moments, Sina Weibo, etc.

The simple definition of a Feed stream is that as long as you keep swiping your thumb down the screen of your phone, a stream of information keeps popping up. It’s like giving Feed to an animal. As long as it’s gone, more is added. Hence the name “Feed.”

Most products with Feed streams feature two types of Feed streams:

  • 1) One is based on algorithms: dynamic algorithm recommendation, such as Jinri Toutiao and Douyin short videos;
  • 2) One is based on attention: social/friend relationship, such as wechat and Zhihu.

For example, weibo and Zhihu in the picture below, both have “follow” and “recommend” pages in the top bar:

Like the two Feed streams above, the technologies behind them are quite different. Page is different from the “recommended” card that one thousand thousand surface algorithm recommended way, often “attention” card shown on page content order has fixed rules, the rules of the most common is to sort based on the time line, also is to show “I pay close attention to by people post, dynamic, mood, according to the release time from late early lined up”.

This paper will focus on the technical implementation of the “focus” function: firstly, summarize the common background technology implementation scheme based on the timeline Feed flow, and then combine with the specific business scenarios to make some flexible application in the basic design ideas according to the actual requirements.

2. The author of this article

** Ning Xu: ** Application Development engineer of Tencent, lecturer of Tencent College, graduated from Shanghai Jiao Tong University. Currently, I am in charge of the back-end development of Tencent’s smart retail business. I have rich experience in the development of video live broadcast and automated marketing system.

3, Feed flow technology implementation scheme 1: read diffusion

Read diffusion, also known as the “pull mode,” is probably the most intuitive way to implement technology.

The principle is as follows:

See the picture above: Every publisher has their own outbox (” My posts “), and every time we post a new post, we put it in our outbox. When our fans come to read, the system first needs to get all the followers, and then go through the outbox of all the publishers, take out their posts, and then according to the time of publication, show the reader.

This design: once a reader reads a Feed flow, the background is propagated to N reads (N equals the number of people concerned) and one aggregate operation, so it is called read diffusion. Each reading of the Feed is equivalent to actively pulling posts from a follower’s inbox, hence the name — pull mode.

This pattern:

  • 1) Benefits: simple underlying storage, no space waste;
  • 2) Disadvantages: each read operation will be very heavy, very many operations.

Imagine that if I follow a large number of people, the overhead of going through all the people I follow and then aggregating them would be so large that the latency would be unbearable.

Thus: Read diffusion is primarily suitable for systems where the reader is not following as many people and the Feed stream is not being streamed frequently.

Pull mode also has a big disadvantage: it is not convenient for paging, we brush Micro-blog or circle of friends, must be with the thumb constantly in the screen, page after page from the background to pull. With no other optimizations, just real-time aggregation, it can be very cumbersome to slide down to the bottom of the page.

4. Feed flow technology implementation Solution 2: write diffusion

According to statistics, most Feed streams have a read-write ratio of around 100:1, which means that most of the time you are scrolling through the Feed stream to see other people’s moments and tweets, and very rarely you are Posting a moment or tweet yourself.

Therefore: The heavy read logic of read diffusion is not suitable for most scenarios.

We’d rather make the Posting process more complicated than affect the user’s experience of reading the Feed stream, so a slight tweak to the previous solution resulted in write diffusion. Write diffusion, also known as the “push mode,” refines some of the shortcomings of the pull mode.

The principle is as follows:

As the picture above shows, each user in the system will have their own inbox as well as an outbox. When a publisher publishes a post, in addition to recording to their own outbox, it also traverses all of the publisher’s fans and puts a copy of the same content into their inboxes. This way, when readers read the Feed stream, they can read it directly from their inbox.

This design: every time a post is published, it will proliferated into M writes (M equals the number of followers), thus becoming write proliferated. Each post will be actively pushed to the inbox of all fans, hence the name push mode.

This pattern can be imagined: a post, behind will involve many write operations. Usually, for the sake of the user experience of the poster, the success of the post can be returned when the post is written to the outbox. Another asynchronous task is set up in the background, and the post can be delivered to the inbox of fans in a leisurely manner.

The benefit of write diffusion is that it improves the user experience through data redundancy (M copies of a post are stored). Normally proper data redundancy is not a problem, but when it comes to micro-blogging stars, it doesn’t work at all. For example, xie na and he jiong, two of the Top2 weibo fans, have over 100 million weibo fans.

Imagine this: If we just use the push mode, every time Xie Na he Jiong posts a micro blog, the micro blog background will earthquake. A micro blog leads to hundreds of millions of write operations in the background, which is obviously not feasible.

In addition: because writing diffusion is asynchronous operation, writing too slow will lead to the post sent out half a day, some fans still can not see, this experience is not very good.

Usually write diffusion is applicable to a small number of friends, such as wechat circle of friends is write diffusion mode. The maximum number of friends per wechat user is 5,000, which means that you can only send a post to 5000 writes at most. If asynchronous tasks perform better, there is no problem.

Technical information about wechat Circle of Friends:

  • 1) “wechat Moments massive Technology PPT [attachment download]”
  • 2) Technical Challenges and Practice Summary behind the 100 Billion Page Views of wechat Moments
  • 3) “The Way of Architecture: 3 programmers achieve 1 billion posts per day in wechat Moments [with video]”

5, Feed flow technology implementation solution 2: read and write mixed mode

The read-write mix, also known as a “push-pull combination,” has the advantages of both read diffusion and write diffusion.

Let’s first summarize the advantages and disadvantages of reading diffusion and writing diffusion:

See above: Carefully comparing the advantages and disadvantages of reading diffusion and writing diffusion, it is not difficult to find that the application scenarios of both are complementary.

Therefore: in the design of background storage, if we can distinguish the scenarios, choose the most suitable scheme in different scenarios, and dynamically adjust the strategy, the implementation of read and write mixed mode.

The principle is as follows:

Take Weibo for example: When someone with a large number of followers like He Jiong posts, the post is written to his jiong’s email box, and the more active fans of He Jiong (which is enough to screen out most of them) are extracted and the posts of He Jiong are written to their inbox. When a passer-by with a small number of fans posts, he uses the method of writing diffusion, traversing all his fans and writing the post into the inbox of fans.

For the active user who logs in to the Feed stream: he reads the posts directly from his inbox, ensuring the active user experience.

When an inactive user suddenly logs in to the Feed stream:

  • 1) Need to read his inbox on the one hand;
  • 2) The other side needs to traverse the outbox of big V users he pays attention to to extract posts, and do some aggregation display.

After the presentation: the system still needs a task to determine if it is necessary to upgrade the user to active user.

Because of the spread of reading, even in mixed mode, the number of people each reader can follow should be capped. For example, Sina Weibo limits that each account can follow up to 2,000 people.

If there is no upper limit: imagine a user who follows all the accounts of Weibo, then he opens the follow list and reads all the posts of weibo. Once there is read and spread, the system will inevitably crash (even if it is written and spread, his inbox can not accommodate so many tweets).

In mixed read-write mode, the system needs to make two judgments:

  • 1) Which users belong to big V, we can use the number of fans as a judgment indicator;
  • 2) Which users are active fans, this criterion can be the most recent login time, etc.

These two criteria need to be dynamically identified and adjusted in the process of system development, without fixed formula.

It can be seen that the combination of reading and writing mode combines the advantages of the two modes, which is the best solution.

However, his disadvantage is that the system mechanism is very complex, to the programmer a lot of trouble. Usually at the beginning of a project, when there are only one or two developers and the user base is very small, it is prudent to adopt this hybrid model in one step, because it is prone to bugs. When the scale of the project grew to the level of Sina Weibo, with a large team dedicated to the Feed stream, the read-write hybrid mode was necessary.

6. Paging problems in Feed streams

The previous sections have described several common design scenarios for Feed streams based on timelines, but this can be much more cumbersome in practice than in theory.

Next, I’ll focus on one pain point in the Feed flow technical scenario — the paging of Feed flows.

Whether read or write diffuses, a Feed stream is essentially a dynamic list whose contents change over time. The traditional front-end pagination parameters use page_size and page_num, with tables indicating the number of entries per page and the current page.

The problem with a dynamic list is the following:

As shown in the figure above: the first page is read at time T1, and at time T2 someone has published “Content 11”. If the second page is pulled at time T3, it causes a mismatch, and “Content 6” is returned on both the first and second pages. In fact, whenever content is added or removed between pages, it leads to misplacement problems.

To solve this problem, typically the page-in parameters for Feed streams do not use page-in size and page-in num, but instead use last_id to record the ID of the last piece of content on the previous page. When the front end reads the next page, it must take last_id as the input. The back end directly finds the corresponding data of last_id, and then offsets the page_size bar to return to the front end, so as to avoid the misplacement problem.

The diagram below:

The last_id scheme has one important condition: the last_id itself cannot be harddeleted.

Consider this:

  • 1) in the figure above, 5 entries are returned at time T1, last_id = content 6;
  • 2) At T2, content 6 is deleted by the publisher;
  • 3) When T3 requests the second page, we can’t find the data for last_id at all, so we can’t confirm the page offset.

Usually encounter delete scenario: we use soft delete mode, just put a flag bit on the content, indicating that the content has been deleted.

Since deleted content should not be returned to the front end, in soft delete mode, find last_id and offset the page_size bar. If there is deleted data in it, enough data will be obtained for the front end.

One solution here is to keep searching. Another solution is to negotiate with the front end to allow fewer than page_size bars to be returned. Page_size is only a suggested value. Even if you agree, you can leave the page_size parameter.

7. Practice of Feed flow technical solution in a live streaming application

7.1 Requirement Background

In this section, I’ll share a very specific Feed flow design that I encountered in a real world scenario, in the context of a real business.

Xx Live broadcast is a tool for livestreaming with goods. The anchor can create a live broadcast in the future and start selling goods after the time. After the livestreaming is over, the fans of the anchor can check the replay of the livestreaming.

This way, each live show has three statuses — in preview (creating a live show but not yet airing), in live, and in replay.

As a viewer, I can follow multiple streamers, and from a fan’s point of view, there will also be a Feed page of live events.

The most special thing about this Feed stream is its collation:

Explain the Feed stream sorting rule:

  • 1) All the anchors THAT I follow: the time of their live broadcast is ranked first; In the preview of the middle of the row; Replay of the last row;
  • 2) Many episodes are on the air: in order of the premiere time from late to early;
  • 3) Multiple episodes are in the trailer: in order of expected premiere time from morning to night;
  • 4) For those with multiple replays: order from late to early according to the end time of live broadcast.

7.2 Problem Analysis

The most complex aspect of this requirement is the “state” factor in the Feed stream content, where a state shift directly results in a different Feed stream order.

To clarify the impact on sorting, let’s look at the following figure:

In the picture above: 5 live broadcasts of 4 anchors are displayed. As an audience, when I opened the page at time T1, I saw that the sequence of broadcast time 3 was at the top, and the rest were in the state of preview, showing from morning to night according to the estimated premiere time. When I open the page at time T2, court 5 is at the top, the other three are in the middle of the preview state, and court 3 has finished, so it is ranked at the last. And so on, until all the live broadcasts are over, all the final state of the game will be changed to replay.

One thing to note here: If I open the first page at time T1 and stare at the page until time T4 and then cross down to the second page, the last_id of the previous page, the paging offset, is likely to fly somewhere because of the live state change, which will cause serious dislocation problems. The first page shows the live broadcast status at time T1, and the second page shows the live broadcast status at time T4).

7.3 Solutions

The live broadcast system is a one-way relationship chain, similar to microblog. Each audience will follow a small number of anchors, and each anchor may have a large number of followers.

Write diffusion is almost impossible because of state change.

This is because: If write diffusion is adopted, the change of the stage state caused by each time the anchor creates the live broadcast, starts the live broadcast, and ends the live broadcast, will spread into many write operations, which is not only complicated in operation, but also unacceptable in delay.

Micro blog can write diffusion: it is because a micro blog is sent, this micro blog will not have any influence on the status of the sort of change.

In our scene, “preview” and “live” are two intermediate states, while “replay” state is the final destination of all live broadcasts. Once it enters the replay state, there will be no state change for this live broadcast. Therefore, the “live” and “preview” states can adopt read diffusion mode, while the “playback” states adopt write diffusion mode.

The final scheme is shown in the figure below:

As shown above, the three events that will affect the live state (the creation of a live broadcast, the beginning of a live broadcast, and the end of a live broadcast) are all handled asynchronously by the listening queue.

We maintain a priority queue for each anchor in live + preview status:

  • 1) Whenever a host is monitored to create a live broadcast, the number of live broadcasts is added to the queue, and the score is the negative (negative) of the time stamp of the first broadcast;
  • 2) When listening to the broadcast of an anchor, the score of the broadcast in the queue is modified to the broadcast time (positive);
  • 3) Whenever an anchor is monitored to end the live broadcast, the broadcast information is asynchronously delivered to the playback queue of each audience.

Here’s a trick: As mentioned above, the states in the live broadcast are ordered from the most to the least in terms of the airing time, while the states in the trailer are ordered from the least to the most in terms of the airing time. Therefore, if the scores of the states in the trailer are all negative in terms of the airing time, the ranking will also become from the most to the least. Such a transformation can ensure that the live stream is in the same queue as the trailer. The score in the trailer is all negative, and the score in the live broadcast is all positive, and the final aggregation can ensure that all the live broadcasts are naturally ranked in front of the trailer.

In addition: Another problem mentioned above is that the first page is pulled at time T1, and the second page is pulled at time T4, resulting in the inconsistency between the first page and the second page.

The solution to this problem is to take a snapshot: when an audience pulls the first page of the Feed stream, we create a snapshot of all the sessions in the live and preview states based on the current time, using a Session_ID. Each time the front page pulls, we simply read from the snapshot. If the snapshot is read out, it proves that the audience has read all of the live and preview scenes, and the rest of the playback queue is used to supplement.

As such, our Feed flow system has four parameters that the front page pulls:

Every time session_id and last_id are left blank, it proves that the user wants to read the first page and needs to rebuild the snapshot.

Here is a derivative question: what is the session_id value?

The answer is:

  • If the user id is set to session_ID, the user id is set to session_ID. If the user id is set to session_ID, the user id is set to session_ID.
  • 2) In the case of multiple logins, session_ID must contain information about each end to avoid mutual impact of multiple snapshots.
  • 3) If you do not love memory, you can also use a random string as the Session_ID and set a long enough expiration time to let the snapshot expire naturally.

Above design: In fact, the most computative moment in the system is the overhead of pulling the first page and building the snapshot.

According to the current online data, for viewers who only follow less than 10 anchors (which is also the majority of scenes), the QPS of the first page can reach 15,000. If requests after the second page are included, the combined QPS of the Feed stream can be even higher and more than sufficient to support the current user base. If we pull the first page and only get the first 10 items, we can go straight back, and change the build snapshot operation to asynchronous, maybe the QPS can be higher, which may be a point of optimization later.

8. Summary of this paper

Almost all Feed streams based on timelines and concerns are bound to follow three basic design patterns:

  • 1) Read diffusion;
  • 2) Write diffusion;
  • 3) Mix reading and writing.

When it comes to the actual business, there may be more complex scenarios, such as this article:

  • 1) State flow influence ordering;
  • 2) In weibo moments of friends, there will be factors such as AD access, special attention and hot topics that may affect the ranking of Feed streams.

These scenarios can only be adapted according to business requirements. (This article was published simultaneously at www.52im.net/thread-3675…)