A simple tree

Application examples:

Comment list, unlimited replies and comments.

Target: Hierarchical storage and query

Recursive data is common and often organized like a tree or hierarchy. In a tree structure, instances are called nodes, and each node has multiple children and a parent node. The uppermost node is called the root node and has no parent node. The lowest node with no children is called a leaf, while the middle node is simply called a nonleaf.

Application Scenarios:

Organization chart, topic discussion.

Anti-pattern: Always rely on the parent node

The most common simple solution is to add a parent_ID field that references other replies in the same table. A foreign key can be created to maintain this relationship.

CREATE TABLE Comments(
	comment_id SERIAL PRIMARY KEY,
  parent_id BIGINT UNSIGNED,
  bug_id  BIGINT UNSIGNED NOT NULL
	...
)
Copy the code

Such a design, called an adjacency list, is probably the most common scheme programmers use to store hierarchical data.

Query the tree using the adjacency table

Use a relational query to get a comment and its immediate descendants

SELECT c1.*, c2*.
FROM Comments c1 LEFT OUTER JOIN Comments c2
	ON c2.parent_id = c1.comment_id
Copy the code

When you use adjacency lists, such queries become inelegant because each additional level of the query requires an additional join to be extended, and SQL queries have a maximum number of joins.

Use antipatterns wisely

Don’t overdesign. If adjacency lists work, they have the advantage of being quick to get a given parent node and easy to insert new nodes.

Solution: Use other tree models

Path enumeration

One of the disadvantages of adjacency tables is the overhead of getting all the ancestor nodes of a given node from the tree.

A path enumeration is a complete path consisting of successive direct hierarchies, such as

The UNIX path to usr/local/lib is an enumeration of paths to the file system.

In the Comments table, we use the path field of type VARCHAR instead of the parent_id field,

CREATE TABLE Comments(
	comment_id SERIAL  PRIMARY KEY,
	path       VARCHAR(100)
	bug_id     BIGINT UNSIGNED NOT NULL,
	author     BIGINT UNSIGNED NOT NULL,
	comment_date DATETIME NOT NULL,
	comment    TEXT NOT NULL,
	FOREIGN KEY (bug_id) REFRENCES Bugs(bug_id)
	FOREIGN KEY (author) REFRENCES Accounts(account_id)
)
Copy the code

You can query the ancestor of a node by comparing the paths of each node, for example, to find the ancestor of comment #7 — the path is 1/4/6/7

SELECT *
FROM Comments AS c
WHERE '1/4/6/7' LIKE c.path || The '%'
Copy the code

This query will match nodes with paths 1/4/6/%, 1/4/%, and 1/% that are the ancestors of comment #7.

Insert a node

You can insert a leaf node without modifying any other rows. All you need to do is copy the path of the logical parent node you want to insert and append the ID of the new node to the end of the path. If the ID is automatically generated at insert time, you might want to insert the record first, then get the record’s ID, and update its path. For example, if you are using MySQL, its built-in function LAST_INSERT_ID() returns the ID of the last inserted record in the current session. Call this function to get the ID you need, and then get the full path from the parent node of the new node.

INSERT INFO Comments (author, comment) VALUES ('Ollie'.'Good job! ');

UPDATE Comments
	SET path = (SELECT path FROM Comments WHERE comment_id = 7)
		|| LAST_INSERT_ID() || '/'
WHERE comment_id = LAST_INSERT_ID()
Copy the code

Disadvantages:

  • As with the shortcomings described in Chapter 1 jaywalking, the database cannot ensure that the path is always properly formed or that the nodes in the path are definitely present. Depending on the application’s logical code to maintain the path’s string and validate the correctness of the string is expensive, no matter how large the length of the VARCHAR is set, there is still a length limit.

Nested set

The nested set solution stores information about descendant nodes rather than their direct ancestors. We represent this information by encoding each node with two arrays.

Closure table

Closure tables are a simple and elegant solution to hierarchical storage that records the relationships between all nodes in a tree, not just the direct parent-child relationships.

# create-table.sql
CREATE TABLE Comments(
	comment_id SERIAL PRIMARY KEY,
	bug_id     BIGINT UNSIGNED NOT NULL,
	author     BIGINT UNSIGNED NOT NULL,
	comment_date DATETIME NOT NULL,
	comment    TEXT NOT NULL,
	FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id),
	FOREIGN KEY (author) REFERENCES Accounts(account_id)
);

CREATE TABLE TreePaths(
	ancestor     BIGINT UNSIGNED NOT NULL,
	descendant   BIGINT UNSIGNED NOT NULL,
	PRIMARY KEY(ancestor, descendant)
	FOREIGN KEY(ancestor) REFERENCES Comments(comment_id)
	FOREIGN KEY(descendant) REFERENCES Comments(comment_id) 
)
Copy the code

Any node pairs in the tree that have ancestor-descendant relationships are stored in a row in the TreePaths, even if the two nodes are not directly parent-child, and we also add a row pointing to the nodes themselves.

Retrieving ancestors and descendants through the TreePaths table is more straightforward than using nested sets. For example, to get a descendant of comment #4:

# descendants.sql
SELECT c.*
FROM Comments AS c
	JOIN TreePaths AS t ON c.comment_id = t.descendant
WHERE t.ancestor = 4;
Copy the code

To get the top of the #6 review list, just search the TreePaths for rows with descendants that don’t have a #6 review list.

SELECT c.*
FROM Comments AS c
	JOIN TreePaths AS t ON c.comment_id = t.ancestor
WHERE t.descendant = 6;
Copy the code

To insert a new leaf node, such as a child of comment #5, first insert a self to your own relationship, then search the TreePaths for the node whose descendants are comments #5 and add the “” ancestor/descendant relationship between the node and the newly inserted” “

INSERT INTO TreePaths(ancestor, descendant)
	SELECT t.ancestor, 8
	FRON TreePaths AS t
	WHERE t.descendant = 5,
UINON ALL
	SELECT 8, 8;
Copy the code

To delete a leaf node such as comment #7, delete all rows in the TreePaths whose descendants are Comment #7:

DELETE FRON TreePaths WHERE descendant = 7;
Copy the code

To delete an entire subtree, such as comment #4 and all its descendants, delete all the rows in the TreePaths with descendants #4 and the rows with comments #4 as descendants:

DELETE FROM TreePaths
WHERE descendant IN (
	FROM Treepaths
	WHERE ancestor = 4
)
Copy the code

Please note that if you delete a record in the TreePaths, you are not actually deleting the comment, which might be strange for the comment system example, but it would make more sense in other types of tree structures.

To move a subtree from one place to another, first disconnect the subtree from its ancestors. All you need to do is find the vertices of the subtree and delete the relationships between all of its children and all of its ancestors.

For example, if you move comment #6 from its current position (child of comment #4) to comment #3, do the following first (make sure you don’t delete self-references from comment #6)

DELETE FROM TreePaths WHERE descendant IN (SELECT descendant FROM TreePaths WHERE ancestor = 6) AND ancestor IN (SELECT ancestor FROM TreePaths WHERE descendant = 6 AND ancestor ! = descendant);Copy the code

Query the ancestors of comment #6 (excluding comment #6 itself), and the descendants of comment #6 (including comment #6 itself), and then delete the relationship between them, which will correctly remove all paths between comment #6’s ancestors to Comment #6 and its descendants.

The isolated tree is then related to the new node and its ancestors. You can use the CROSS JOIN statement to create a Cartesian product between a new node and its ancestors and all the nodes in the isolated tree to establish all the required relationships.

INSERT INTO TreePaths (ancestor, descendant)
	SELECT supertree.ancestor, subtree.descendant
	FROM TreePaths AS supertree
		CROSS JOIN TreePaths AS subtree
	WHERE supertree.descendant = 3
		AND subtree.ancestor = 6;
Copy the code

You can optimize the closure table to make it easier to query immediate parent nodes or children: Adding a path_length field to the TreePaths makes it straightforward to query comment #4’s child nodes by having a self-referent path_length of 0 for a node, 1 for its immediate child nodes, 2 for the next tier, and so on.

SELECT *
FORM TreePaths
WHERE ancestor = 4 AND path_length = 1;
Copy the code