Abstract: Feed stream is a very common function, in many products have shown, through the Feed stream can be dynamic real-time spread to subscribers, is an effective way for users to obtain information flow. In the age of big data, it is still a challenge to build a Feed streaming system at a scale of 10 million.
In the Internet field, especially in the mobile Internet era, Feed stream products are very common. For example, circle of Friends and Weibo, which we use every day, are very typical Feed stream products. Pinterest, Petal network and other picture sharing websites are another form of Feed stream products. In addition, many apps have a module, either called dynamic or message square, and these are Feed stream products, so to speak, Feed stream products are found in all apps around the world.
Before we talk about how to design a Feed flow system, let's take a look at some concepts in Feed flows:
Feed: Each status or message in a Feed stream is a Feed. For example, a status in a circle of friends is a Feed, and a microblog is a Feed.
Feed stream: A stream of information that is continuously updated and presented to users with content. Everyone's circle of friends, Twitter follower page, etc., is a Feed stream.
Timeline: Timeline is actually a type of Feed stream. Both Weibo and Circle of Friends are of the Timeline type. However, Timeline type is the earliest, most widely used and most well-known, so it is sometimes used to represent the Feed stream.
Timeline: The page that displays other people's Feed messages, such as moments, the front page of weibo, etc.
Personal page Timeline: the page that displays the Feed messages sent by the user, such as the photo album in wechat, the personal page of Weibo, etc.
Characteristics of the
Feed streaming systems have some very typical features, such as:
Multi-account content flow: There must be thousands of accounts in the Feed flow system, and you can follow, unsubscribe, add friends, block and other operations between accounts. As long as this is true, it can be designed as a Feed flow system.
Unstable account relationship: The relationship between users in the system is constantly changing due to operations such as attention and clearance, which is an unstable state.
Read/write ratio 100:1: There is a serious imbalance between read and write. The read/write ratio is usually 10:1 or even higher than 100:1.
The message must reach the sexual requirement is high: for example, after sending a circle of friends, the result part of friends saw, part of friends did not see, if the girlfriend did not see, then may produce very serious emotional contradiction, the consequence is very serious.
With the above mentioned features of Feed streaming products, let's take a look at the classification of Feed streaming systems.
There are many categories of Feed streams, but the two most common categories are:
Timeline: Sorted according to the chronological order of publication. The first published items are seen first, and the last published items are ranked at the top, similar to wechat Moments, Weibo, etc. This is also the most common form. By choosing the Timeline type, the product assumes that there are not many feeds in the Feed stream, but that each Feed is important and needs to be seen by the user.
Rank: sort by a non-time factor, usually by the user's preference, with the user's favorite at the top and the next favorite at the bottom. The general assumption is that users are likely to see a large number of feeds, and the time they spend is limited, so the Top N results are selected for users to see. The application scenarios include photo sharing, news recommendation, product recommendation, and so on.
The above two are the most typical and common classification methods. In addition, there are other classification criteria. In other classification criteria, there will be two more types:
Aggregate: For example, if several friends watch the same movie, this can be aggregated into one Feed: A, B, C watched the movie Your Name. This is suitable for client aggregation. Generally, the Aggregate type is Timeline type + client aggregation.
Notice: Notification type. In fact, this is already a function type. Notification type is generally used for various kinds of notifications in the APP, such as private messages. This is also of the Timeline type, or the Aggregate type.
With the concepts, characteristics, and classification of Feed flow systems described above, let's move on to the key part: how to implement a Feed flow system with tens of millions of levels. Since it is impossible for all users in the system to be online and to refresh and publish feeds at the same time, a system that can support tens of millions of Feed streams can actually support hundreds of millions of users on the product.
If you want to design a Feed flow system, the two most critical cores are storage and push.
Let's start with storage. The content that needs to be stored in a Feed flow system is divided into two parts: account relationships (such as a follow list) and Feed message content. Either way, there are several issues to consider:
How can you support 100TB or even petabytes of data?
When you have a lot of data, the cost is critical. How can you make the cost cheaper?
How to ensure that account relationships and feeds are not lost?
We'll answer those three questions later, but stay tuned for feeds
The push system requires two functions: publishing feeds and reading Feed streams. For the delivery system, there are still some issues to consider before selection:
How can you provide tens of millions of TPS and QPS?
How to ensure that the read/write latency is less than 10ms or even less than 2ms?
How do I ensure that feeds are accessible?
Before answering these questions, let's first understand the TableStore of ali cloud.
TableStore is a professional distributed NoSQL database independently developed by aliyun. It is a semi-structured data storage platform based on shared storage with high performance, low cost, easy to expand and full hosting.
Support Internet and Internet of Things data efficient computing and analysis.
At present, there are thousands of systems in use by both alibaba Group's internal and external public cloud users. It covers heavy throughput offline applications as well as heavy stability, performance sensitive online applications. Some systems currently in use have more than 35 million rows written per second, more than 5GB of traffic per second, more than 10 trillion rows per table, and more than 10 pb of data per table.
The specific features of table storage can be seen in the following picture.
Here is not a detailed introduction of the functions and features of TableStore (TableStore), if you are interested in it, you can go to the official website page and cloud qi blog to understand, address is as follows:
Form the store's website address: www.aliyun.com/product/ots...
Table to store the cloud blog: yq.aliyun.com/teams/4/typ...
Table storage nail ac group: 11789671
Storage System Selection
Let's move on to the question that was raised earlier.
There are two types of systems that need to be stored in a Feed flow system: account relationships (such as a follow list) and Feed messages.
Storage Account Relationship
Let's first look at the storage of account relationships (such as the follower list). There are some characteristics of account relationships:
Is a series of variable length linked list, length up to 100 million level.
This leads to a large amount of data, but the relationship is extremely simple.
Another point is performance sensitivity, which directly affects the response speed of attention and clearance.
The most suitable system for storing account relationships (focusing on lists) should be a distributed NoSQL database, because of the large amount of data, simple relationships do not need complex join, and high performance requirements.
The internal design is simple and the external user experience is good.
In addition to the above features, there is one more:
Ordering: Ordering does not require sorting, just the ability to sort by primary key. As long as you can sort by primary key, the order of the focus list and fan list is fixed and predictable.
Use the open source HBase storage account
Open source HBase is one of the distributed NoSQL databases that can meet the requirements of order. Therefore, many enterprises choose open source HBase to store account relationships or focus lists.
In this way, the above four characteristics can be met and the system can be built, but there will be some troublesome problems:
The need to operate and maintain, investigate problems, Fix bugs, will bring great complexity and cost.
GC will cause large burrs, which will affect the user experience.
Use the TableStore to store account relationships
In addition, aliyun's table storage is also an ordered distributed NoSQL database. Before, many well-known systems chose to use table storage, which brought benefits to the system in the following places:
A single table can support 10 trillion rows + and 10 Pb + data volume. No matter how fast the data growth rate is, there is no need to worry.
Data is sorted by primary key column to ensure order and predictability.
The single-key read/write latency is at the millisecond level, ensuring the response time of attention and clearance.
Is a fully managed distributed NoSQL database service that does not require any operation and maintenance.
All in C++ implementation, completely no GC problems, also will not cause large burrs due to GC.
Using a TableStore to store account relationships is a good choice.
Next, look at the storage of Feed messages.
Storing Feed messages
Feed messages have one big feature:
The data volume is large, and in Feed streaming systems where the write spread (push mode) mode is often used, the data volume can expand by several orders of magnitude, so the data volume can easily reach 100 terabytes or even petabytes.
In addition, there are some other features:
Simple data format
Data cannot be lost and reliability is highly required
The auto-increment primary key function ensures that the message ID of the Feed sent by the individual is strictly incrementing in the individual outbox, so that only one range is needed for reading. Due to the low concurrency of feeds published by individuals, a timestamp can meet the basic requirements here, but it does not guarantee strict increments when application-layer queues are clogged, network latency becomes large, or time is rolled back. It's better to have an autoincrement here.
The lower the cost, the better
Potential storage systems
According to the above characteristics, the best system should be a distributed NoSQL database with primary key auto-increment, but in open source systems, there are no, so there are two common practices:
Relational database + sub-database sub-table
Relational database + distributed NoSQL database: The relational database provides the primary key auto-increment function.
Use a relational database to store Feed messages
Many users in the industry have opted for relational databases + sub-tables, including some of the most well-known Feed flow products, and while this architecture works, it has some problems.
Dividing libraries and tables brings complexity of operation and maintenance.
Dividing the database into tables brings great coupling between the logical layer and the data layer.
Relational databases, such as the open source MySQL database, have poor primary key auto-increment performance. Whether you are using MyISAM or InnoDB engines, you must use table locks to ensure that auto-incrementing ids are strictly incrementing. This granularity is very large and can severely limit concurrency and affect performance.
Some users feel that the reliability of relational databases is higher, but the reliability of relational databases is generally up to 6 9, which is completely different from the reliability of distributed databases, 4 to 5 levels lower.
Use the TableStore storage account
For this reason, some technology companies are starting to consider using a TableStore, which is a distributed NoSQL database with auto-incrementing primary keys, so that only one system is needed, as well as the following considerations:
A single table can reach 10 Pb, 10 trillion rows.
An SLA of 10 nines ensures that Feed content is not lost.
Natural distributed database, no need to divide the database table
Two instance types: High Performance Instances use all SSD storage media to provide excellent read/write performance. Hybrid storage instances use SSD+SATA storage media to provide extremely low storage cost.
The primary key autoincrement function is so good that all other systems need a lock to do it, but the primary key autoincrement function in the table store does not need a lock at all when writing to autoincrement columns, neither a table lock nor a row lock.
From the above, the use of TableStore, whether in function, performance, scalability or cost are more suitable.
After looking at the selection of the push system, let's look at the selection of the push scheme.
Let's review some of the biggest features of the Feed streaming system mentioned earlier:
There is a serious imbalance between reading and writing, with more reading and less writing, and generally the ratio of reading and writing is 10. 1, or even 100:1.
In addition, there is another aspect that will be affected by the push scheme:
The delay in publishing and refreshing feeds is essentially determined by the push scheme, and anything else is just optimization, qualitative, not qualitative.
Comparison of push mode and pull mode
In the push scheme, there are two schemes, which are:
Pull scheme: also known as read diffusion.
Push scheme: also become write diffusion.
For the pull and push schemes, they are completely opposite in many ways. One point to emphasize before looking at the comparison:
For users of Feed flow products, the latency sensitivity is much greater when the Feed flow is refreshed (read) than when it is published (write).
A side effect of the push mode
It's clear from the comparison above that the push mode is much better than the pull mode, but there is a side effect:
The data will swell enormously.
In view of this shortcoming, we can consider it from two aspects:
The current storage price is very low. Take table storage as an example. The capacity instance storage of 10TB data is 16,000 yuan per year at present (October 2017). And the bigger the data, the cheaper it is.
To save some money, continue to optimize:
Pull mode is used for large V, while push mode is used for ordinary users. This mode has a disadvantage, which will be analyzed later.
Use push mode for active fans and pull mode for inactive fans (this way can better avoid the impact of large traffic on the platform)
After comparing the above two schemes, the application scenarios of each scheme are summarized:
Many Feed streaming products adopt this approach in the first version, but quickly abandon it.
In addition, pull mode + graph calculation will be a different world, but this time the center of gravity is the graph calculation.
The most common and effective patterns in Feed flow systems;
The number of user relationships is more uniform, or there is an upper limit, such as circle of friends;
For the recommendation category, the same Feed has different value to different users, and scores need to be calculated for different users, such as Pinterest.
Most users have hundreds of accounts, but some have more than 10 million accounts, such as Weibo.
Above understand the push scheme, next look at the push system selection
To implement a Feed stream product of the order of ten million, the push system needs to have a few features:
With tens of millions of TPS/QPS capability.
Read/write link latency sensitive, read/write directly affects the user publish, refresh the Feed stream latency, especially the extremely sensitive refresh latency.
Feed messages are highly required.
The primary key auto-increment function still ensures that the Feed ID in the user's inbox is strictly incrementing, ensuring that the latest unread messages can be read through Scan(maximum ID read last time --MAX).
It is best to store all feeds in the Timeline for the user.
From the above features, the best push system is a NoSQL system with excellent performance and reliable auto-increasing function. Therefore, if the industry generally chooses open source system, it will choose open source Redis on the basis of the relational database as the storage system, so as to cover the above features. It also keeps the Feed flow system working, but it also creates a few other problems:
Pure memory system, memory price is extremely high, the overall cost is relatively high.
As a single-machine system, in order to support tens of millions of TPS and ensure message deliverability, cluster and replica mode is needed. As a result, not only the complexity of operation and maintenance is brought, but also the machine cost is increased, and the cost rises again.
As the cost rises, some architects start to consider whether they can save some cost. The only way to save cost is to reduce the amount of data stored in open source Redis. There are two ways to do this, both of which can reduce the amount of data stored in Redis:
Store Feed IDS only in open source Redis, not Feed content. The overall data volume will be greatly reduced, but the Feed ID needs to be read first, and then the Feed content needs to be read in the storage system. The network overhead doubles, and it is serial, which has a great impact on the refresh delay of users.
Use push mode only for regular or active users, and use pull mode directly for large Vs and inactive users.
Although the two schemes mentioned above can save cost, they are at the expense of user experience. In the end, a trade-off between cost and user experience is needed.
Use TableStore as the push system
In addition to the open source system, TahleStore of Aliyun can also be used. Many users choose TableStore as the push system for the following reasons:
Natural distribution, single table can support tens of millions of TPS/QPS.
The LSM storage engine is greatly optimized for write, and high-performance instances are greatly optimized for read.
Data is successfully written into a disk. Data reliability is guaranteed with 10 SLAs.
The cost of a disk-based database is several orders of magnitude lower than that of an in-memory database.
A single table can store more than ten trillion rows of data at a low price, easily holding all the Feed data in a user's Feed stream.
Above said the use of open source Redis and Aliyun TableStore similarities and differences, if use open source can use Redis, if choose Aliyun NoSQL database, can use TableStore.
Let's take a look at the use of TableStore architecture diagram, here in order to generality, the combination of push and pull, push mode is more simple.
Let's first look at the black box in the middle, this part is the data using TableStore, from left to right are:
Personal Timeline: This is each user's outbox, which is their personal page page.
Follower Timeline: This is the inbox of each user, which is their own follower page page, and the content is all the messages posted by their followers.
Follow list: Save account relationships, such as friends in your circle of friends; Follow list in microblog, etc.
Virtual focus list: This is mainly used for personalization and advertising.
Publish the Feed process
When you post a Feed, the process looks like this:
1. Feed messages first enter a queue service.
2. First read your own fan list from the follow list, and determine whether you are a big V.
3. Write your Feed messages to the personal page Timeline (outbox). If it is a large V, the writing process ends there.
4. If you are a regular user, you need to write your Feed to your followers. If you have 100 followers, you need to write your Feed to 100 users, including the Feed content and Feed ID.
5. Step 3 and Step 4 can be combined, using the BatchWriteRow interface to write multiple rows of data to the TableStore at once.
6. This is the end of the Feed publishing process.
Read the Feed flow
When refreshing your Feed stream, the process looks like this:
1. Read the list of big V's you care about first
2. To read your inbox, just use a GetRange to read a range. The range can start at the latest Feed ID read last time, and end at the current time or MAX. Since the primary key autoincrement was used earlier, you can use GetRange to read here.
3. If there are large V's of interest, the outbox of each large V is read concurrently again. If 10 large V's are of interest, then 10 accesses are required.
4. Merge the results of step 2 and Step 3 and return them to the user in chronological order.
At this point, the process of reading the Feed stream with a combination of push and pull publishing is over.
Simpler push mode
If you just use push mode, it's even easier:
Publish the Feed:
Regardless of whether it's a big V or not, the process for all users is the same. It's all three steps.
Read the Feed stream:
You don't need the first step, you don't need the third step, you just need the second step to reduce the previous 2 + N(N is the large number of V concerned) network overhead to 1 network overhead. Read latency is significantly reduced.
Personalization and targeted advertising
Personalization and targeted advertising are two very strong product needs. Personalization can serve users well, increase product competitiveness and user engagement, while targeted advertising can increase revenue channels for products, but also can not attract user disgust, so how to achieve these two ways? These two functions work in a similar way in Feeds streams. Let's take targeted advertising as an example:
1. Classify users by analyzing their characteristics. For example, one of them is freshmen: freshmen who just went to university this year. (The specific user characteristics can be analyzed by TableStore + MaxCompute, which will not be mentioned here).
2. Create an AD account: Freshman AD
3. Let these users with freshmen characteristics pay virtual attention to freshmen AD accounts. This level of concern is invisible to the user.
4. You will be able to send ads through your freshman AD account from July.
5. In the end, each user may have multiple characteristics, so it is possible to follow multiple AD accounts virtually.
The above is a relatively simple way to achieve targeted advertising, other ways will not be repeated.
We've talked in detail about using TableStore as an architecture for storage and push systems, so let's see how the new architecture can benefit us.
Use only 1 system, architecture, simple implementation. Eliminating the need to access multiple systems, architecture, development, testing, and oM can save a lot of human time.
The TableStore primary key auto-increment function provides excellent performance. Because of the schema, not only are table locks not required, but row locks are not required, so the performance is much better than relational databases.
All feeds can be saved. One is that the system can support the storage of all feeds, and the other is cheap and affordable.
You don't need to store the Feed ID and the content separately. It's cheap, and there's no need to store ids and content separately.
Full hosting service, no operation and maintenance operation, no need to divide the library table.
Disk type (SSD, Hybrid) database, low cost.
Reliability 10 9, more reliable data, more difficult to lose.
The sharding threshold of large V and ordinary users is higher, the number of times of reading large V is less, and the overall latency is lower.
A design flaw
If you use the large V/ average user shard, with the large V using pull mode and the average user using push mode, there is a big risk with this architecture.
For example, when a big V posts a topical Feed, it is possible that all users in the entire Feed will not be able to read the new content because:
The big V sends the Feed message.
Big V, use pull mode.
Big V's active fans (user group A) start reading Big V's new Feed in the pull mode (read step 3 in the architecture diagram).
The Feed was so topical that it spread quickly.
Fans of Big V who have not logged in (user group B) start to log in the product. After logging in, they will automatically refresh and read the Feed content of Big V again through the 3 steps.
Non-fans (user group C) go to the personal page Timeline of BIG V to watch, and read the personal Timeline of Big V again, as shown in 3.
As A result, the normal traffic is only user group A, but now it is user group A + user group B+ user group C. The traffic has increased several times, or even tens of times. As A result, the service module of read path 3 is called server Busy or the machine resources are full, resulting in the read path 3 cannot return the request for reading large V. If all users in the Feed product follow big V, then almost all users will be stuck on the read 3 path that reads big V and will not be able to refresh.
So when designing here, we need to focus on the following two points:
The unavailability of a single module should not prevent the entire critical Feed stream path from being read. If the large V cannot be read, but ordinary users should be able to return, and the content of the large V should be completed after the service is restored.
When a module is unable to handle this amount of traffic, the module should not be completely unservicable, but should continue to provide maximum service capacity, beyond which it should be rejected.
So how do you optimize?
Instead of using the big V/ regular user optimization, use the active user/inactive user optimization. In this way, user group A and part of user group B can be diverted to other, more dispersed, multiple paths. Moreover, even if the read 3 path is not available, there is no impact on active users.
Totally use push mode can completely solve the problem, but will lead to increase storage capacity, big V weibo send total time increases, from to the first fan to send a final fans may be a few minutes (one hundred million fans, 1 million lines per second, need 100 seconds), but also for the maximum concurrent reserved resources well, if you use the table storage, because it is a cloud services, There is no need to consider reserving the maximum amount of resources.
Let's implement a message square function. Many apps have the function of dynamic or message square. Generally, there are two tabs in the message square, one is to follow people, and the other is square. We will focus on the following people here.
The functions to be implemented are as follows:
Users can follow each other
Users can post new messages
Users can view a list of messages they have posted
Users can see the messages of people they follow
Take the previous approach:
Use TableStore as the storage and push system
The display mode of Timeline is adopted to ensure that users can carefully view each Feed
Adopt push mode
Next, let's look at the roles and the functions each role needs:
Send status: add_activity()
Attention: follow ()
Read the Feed stream: get_activity()
The Feed message should include at least the following:
Type: Verb, such as picture, video, text
Release new information
Interface: add_activity ()
The get_range interface calls the attention list and returns a list of fans.
The batch_write_ROW interface writes feed content and IDS in bulk to the personal page table (outbox) and to the follow page table for all fans (inbox), more than once if the volume is too large. Or call the asynchronous batch_write_row interface, currently provided by the C++ SDK and the JAVA SDK.
Interface: follow ()
The put_row interface simply writes a row of data (followers, fans) to the list of followers and the list of followers (fans, followers).
Gets the Feed flow message
Interface: get_activity ()
Get the ID of the last message read from the client: last_id
Use the get_range interface to read the latest message, starting at last_id and ending at MAX.
To read the contents of individual pages, access the individual page table. To read the contents of a concern page, access the concern page table.
The above shows how to do this using the TableStore API. Although this only uses a few interfaces, it still requires learning the TABLE storage API and features, which is still a bit time consuming.
For ease of use, we'll provide a complete feed stream solution that provides a LIB interface with add_activity(), follow(), and get_activity() interfaces that are much simpler and faster to use.
All of these are Timeline feeds, but there's another type of Feed flow that's more common, which is the Rank type used by news recommendations and photo sharing sites.
Let's review what the Rank type is good at:
There are so many potential feeds that users can't read all of them, and they don't need to read all of them. Therefore, users need to select the content that they want to see most, such as photo sharing websites, news recommendation websites, etc.
Let's start with an architecture diagram:
This Rank method is relatively lightweight, suitable for the combination of push and pull scene.
The write process is basically the same
In the read process, all the Feed content will be read first, which is the same as the Timeline. In the Timeline, it will be returned directly to the user, but the Rank type needs to be sorted in a sorting module, sorted by a certain attribute value, and then all the results will be stored in a Timeline cache. Return the N results with the highest score, and on the next read return the result of [N+1, 2N].
Here's another one:
This kind of heavy weight applies to pure push mode.
The write process is the same as the Timeline.
Each user has two inboxes:
One is the focus page Timeline, which saves the original Feed content and prevents users from viewing the inbox directly.
One is the Rank Timeline, which saves selected feeds for the user and allows the user to view the inbox directly.
After the write process, there is a data processing process. Personalized sorting system obtains the new Feed content from the original Feed inbox and calculates a score according to the characteristics of users and feeds. Each Feed may have different scores in the Timeline of different users. After the calculation is completed, it is sorted and then written into the final rank Timeline.
This way can really do for each user "thousands of faces".
The above two ways are simple and common ways to implement Rank.
From the above, the TableStore can support 10 petabytes of storage and tens of millions of TPS/QPS per second for push, which can be of great value in Feed streaming systems.
A number of well-known companies are already using The TableStore to build their own Feed streaming systems, resulting in significant gains for the system, the product, and the company.