Tagging resources is a common requirement in the process of building a website. Sometimes we need to tag articles and sometimes we need to tag users. It’s not that hard to implement a tagging system. It’s essentially a many-to-many relationship – I can tag the same blog with multiple tags, and I can tag different blogs with one tag. This article mainly analyzes the principle of tag system, and uses PostgreSQL to implement a tag system that can tag various resources.

1. Single resource label system

We’ll start with a single resource, where we label only one data resource. Assuming we need to tag blog posts, we need to build the following tables:

  1. The article tablepostsFor storing basic information about articles.
  2. The label tabletagsIs used to store basic label information.
  3. Tag-article tabletags_posts, stores the ids of both parties and forms a many-to-many relationship.

The diagram is about

Go to the database engine and create the corresponding database

postgres=# create database blog;
CREATE DATABASE

postgres=# \c blog;
blog=#
Copy the code

Create the data tables mentioned above through SQL statements

CREATE TABLE posts (
    id              SERIAL.body            text,
    title           varchar(80));CREATE TABLE tags (
    id              SERIAL.name            varchar(80));CREATE TABLE tags_posts (
    id              SERIAL,
    tag_id          integer,
    post_id         integer
);

Copy the code

Each table contains only the most basic fields of the resource, and at this point you have built a very simple labeling system. The next step is to fill in the data. My strategy is to add two articles with five tags, tag the article titled Ruby with Language and the article titled Docker with Container, and tag both articles with tech tags

Populate the article data
INSERT INTO posts (body, title) VALUES ('Hello Ruby'.'Ruby');
INSERT INTO posts (body, title) VALUES ('Hello Docker'.'Docker');

-- Populates label data
INSERT INTO tags (name) VALUES ('language');
INSERT INTO tags (name) VALUES ('container');
INSERT INTO tags (name) VALUES ('tech');

-- Tag related resources
INSERT INTO tags_posts (tag_id, post_id) VALUES ((SELECT id FROM tags WHERE name = 'container'), (SELECT id FROM posts WHERE title = 'Docker'));
INSERT INTO tags_posts (tag_id, post_id) VALUES ((SELECT id FROM tags WHERE name = 'tech'), (SELECT id FROM posts WHERE title = 'Docker'));
INSERT INTO tags_posts (tag_id, post_id) VALUES ((SELECT id FROM tags WHERE name = 'tech'), (SELECT id FROM posts WHERE title = 'Ruby'));
INSERT INTO tags_posts (tag_id, post_id) VALUES ((SELECT id FROM tags WHERE name = 'language'), (SELECT id FROM posts WHERE title = 'Ruby'));
Copy the code

Then look up the labels on each article.

blog=# SELECT tags.name FROM tags, posts, tags_posts WHERE tags.id = tags_posts.tag_id AND posts.id = tags_posts.post_id AND posts.title = 'Ruby';
   name
----------
 language
 tech
(2 rows)

blog=# SELECT tags.name FROM tags, posts, tags_posts WHERE tags.id = tags_posts.tag_id AND posts.id = tags_posts.post_id AND posts.title = 'Docker';
   name
-----------
 container
 tech
(2 rows)
Copy the code

The two articles have been labeled as expected. The related statements are a little long. Generally, the production line will not operate the database directly like this. The community of programming languages generally encapsulates this kind of database operation, which makes writing business code a lot easier.

2. Label multiple resources

If you only need to label one table, it is sufficient to design the table following the logic above. But the real world is not always that simple. What if you needed to tag a user table as well as a blog post? We need to make our watches more flexible. If you continue to use the tags table to store tag data, you will have to create a separate table named TAGs_Users to store the relationship between tags and user data in order to tag users.

But a better approach would be to use a design called polymorphism. Create associative table taggings, the associative table in addition to the two id associated storage, also will be tagged storage resource type, we according to the type to distinguish between the tagging exactly is what kind of resources, it will save more on each record type data, but the advantage is that I can get a table, all tags relationship through a table to store.

Ruby’s popular tag system, ActsAsTaggableOn, uses this design, but its type field stores the class name of the corresponding resource directly, perhaps for easier programming. The data is roughly as follows:

naive_development=# select id, tag_id, taggable_type, taggable_id from taggings;
 id | tag_id |    taggable_type     | taggable_id
----+--------+----------------------+-------------
  1 |      1 | Refinery::Blog::Post |           1
  2 |      2 | Refinery::Blog::Post |           1
  3 |      3 | Refinery::Blog::Post |           1
Copy the code

The taggable_type is used to get the class name, and then the taggable_id is used to get the exact resource.

A. Modify the original table

The table design diagram is roughly as follows

Instead of rebuilding the table, I will modify the existing table and migrate the data

  1. increasetypeField for storage resource type.
  2. Rename the original data table to a more generic nametaggings.
  3. The originalpost_idChange the field to a more generic nametaggable_id.
  4. Populate the existing resource with data,typeFields uniformly fill datapost.
ALTER TABLE tags_posts ADD COLUMN type varchar(80);
ALTER TABLE tags_posts RENAME TO taggings;
ALTER TABLE taggings RENAME COLUMN post_id TO taggable_id;
UPDATE taggings SET type='post';
Copy the code

B. Add a user

Create a user table and populate the data before you label the user

Create a simple user table
CREATE TABLE users (
    id              SERIAL,
    username        varchar(80),
    age             integer
);


Add a user named LAN and two related labels

INSERT INTO users (username, age) values ('lan'.26);

INSERT INTO tags (name) VALUES ('student');
INSERT INTO tags (name) VALUES ('programmer');
Copy the code

C. Label the user

The next step is to label the user’s LAN, make some adjustments to the original SQL statement, and fill the type field with user when labeling.

INSERT INTO taggings (tag_id, taggable_id, type) VALUES ((SELECT id FROM tags WHERE name = 'student'), (SELECT id FROM users WHERE username = 'lan'), 'user');

INSERT INTO taggings (tag_id, taggable_id, type) VALUES ((SELECT id FROM tags WHERE name = 'programmer'), (SELECT id FROM users WHERE username = 'lan'), 'user');
Copy the code

The above SQL statement labels the user student and Programmer.

D. View labels

To accomplish this task, we still need to join the three tables and constrain the type of type

  • The user is calledlanAll the tags that users are tagged with
blog=# SELECT tags.name FROM tags, users, taggings WHERE tags.id = taggings.tag_id AND users.id = taggings.taggable_id AND taggings.type = 'user' AND users.username = 'lan';

    name
------------
 student
 programmer
(2 rows)
Copy the code
  • The headlineRubyAll the tags on the articles
blog=# SELECT tags.name FROM tags, posts, taggings WHERE tags.id = taggings.tag_id AND posts.id = taggings.taggable_id AND taggings.type = 'post' AND posts.title = 'Ruby';

   name
----------
 language
 tech
Copy the code

OK, as expected, now the tagging system is more general.

conclusion

This article constructs a tag system based on PostgreSQL basic statements. It is not difficult to implement a tagging system, and communities in each language should have relevant integrations. I also want to put aside programming language, from the database level to analyze the basic principle of a label system.

PS: Another good Model Design tool is recommendeddbdiagram, you can use the text of the data table design, design side preview. Finally, it can be exported as PNG, PDF or even SQL source files. The data table diagrams in this paper are made by this software.