Forum system architecture design and business model analysis

Participated in the development of the forum system, sum up

Technology stack

Use Git for code versioning
Manage server asset permissions using JumpServer
Use Kafka as a message queue system
Use ElasticSearch to store bulk associated data
Deploy the audit system using in-house developed code
Using the YII1 framework (legacy system)
Use QCONF as a distributed configuration center
Use Logrotate for log rotation
Use Sphinx for text segmentation and search
Use confluence for document management
Use Zen Tao for project bug management

Content master table design

Content system The main body of the content table is to store the common fields of the content, such as the title of the post, the type of the post, the number of views and so on

Some special fields, such as the post content field, are not suitable for the content body table because they take up a lot of space (usually the text data type), and can be stored in a separate table

CREATE TABLE `tb_subject` (
  `subject_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL DEFAULT ' '.`subject_type` text NOT NULL DEFAULT 0.`uid` bigint(20) unsigned NOT NULL DEFAULT 0.`view_cnt` int(11) NOT NULL DEFAULT 0.`display_yn` tinyint(1) NOT NULL DEFAULT 1.`create_date` datetime NOT NULL DEFAULT current_timestamp(a)COMMENT 'Creation time'.`update_date` datetime NOT NULL DEFAULT current_timestamp(a)ON UPDATE current_timestamp(a)COMMENT 'Update Time',
  PRIMARY KEY (`subject_id`),
  KEY `idx_uid` (`uid`.`create_date`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Post body table';
Copy the code

Fields of the post content text type are usually designed to be stored separately in a new table

CREATE TABLE `tb_subject_content` (
  `subject_id` int(11) unsigned NOT NULL DEFAULT 0.`content` text DEFAULT NULL COMMENT 'Post content',
  PRIMARY KEY (`post_id`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Post content';
Copy the code

Extended property sheet design

Structured storage of non-standard content data, such as tags, keywords, etc. (only some posts have them, and the number is not fixed)

CREATE TABLE `tb_column_info` (
  `column_id` int(11) unsigned NOT NULL DEFAULT 0.`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `extend_type` mediumint(6) NOT NULL DEFAULT 0.`extend_id` int(11) unsigned NOT NULL DEFAULT 0.`extend_status` tinyint(1) NOT NULL DEFAULT 0.`extend_content` varchar(500) NOT NULL DEFAULT ' '.`create_time` datetime NOT NULL DEFAULT current_timestamp(),
  PRIMARY KEY (`column_id`.`id`),
  KEY `idx_id` (`id`),
  KEY `idx_column_id` (`extend_id`.`extend_type`.`extend_status`.`column_id`),
  KEY `idx_type_time` (`extend_type`.`extend_status`.`create_time`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Property Extension Table';
Copy the code

Category/label table design

Classification table there are two kinds of design schemes, one is specially designed for post subject table a classification table, and the thread id is strong, the other is to design a general classification, can be related, and post the content of the news content, for example, generally USES the general classification table design, so that we can minimize repetition of classification of maintenance, The classification table design reference is as follows:

CREATE TABLE `tb_tag` (
  `tag_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `tag_name` varchar(255) NOT NULL DEFAULT ' '.`tag_type` tinyint(3) NOT NULL DEFAULT 0,
  PRIMARY KEY (`tag_id`),
  KEY `idx_tag_name` (`tag_name`),ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Label list';
Copy the code

Statistical data processing

Content related systems usually have a lot of statistics, such as comments, views, etc

Redundant field design

Number of comments: When a user makes a comment under a post, the comment counter in the post body table will be +1

Number of views: the number of redundant views in the post body table +1 when the user browses the details of the post

Data update mechanism

Interface Real-time update
Update using message queues

Big data technology

In the above data update mechanism, the problem is that redundant fields will be updated frequently if the number of visits is large. In this case, big data technology such as Spark Streaming can be used for data aggregation and batch update

Early content source

It can be adapted to the content data table by scraping

zhihu
netease
Today’s headline

The three services

Video storage services: seven cows reference: www.qiniu.com/products/ko…

Table skills

Currently, there are many hash algorithms, and it is recommended to use the modulus algorithm. In this way, the data can be determined in the sub-table according to the end number of the sub-table ID, which improves the efficiency of finding the table when checking data

Content of the audit

Machine trial

Sensitive word filtering can use easy shield, see: dun.163.com/

Cheating data can use for the United States, reference: www.ishumei.com/solution/so…

People whose

Development audit background by audit specialist audit content

A business model

This content is the traffic entrance of the company’s business, undertaking the task of retaining old users and attracting new users

Flow inlet:

weibo
The public,
Circle of friends

A couple of points to note

Comments and replies to content can be stored separately, there is no need to share a table

The resources

Blog.csdn.net/xjk201/arti…
www.geek-share.com/detail/2605…

Continuously updated…