Front-end data structure and Algorithm (8) : Implementation and application of Trie tree, a magic tool for word prefix matching

Data Structures and Algorithms for front-end science (7) : Implementing priority queue from zero – heap and its applications

preface

After binary tree and heap, we will introduce another data structure of tree -Trie tree, which can also be called prefix tree or dictionary tree. For example, after we enter a few keywords in the search engine, the subsequent content will be automatically continued. In this case, the keyword is the prefix, followed by the matching content, and the underlying data structure of such a function is the Trie tree. So what exactly is a Trie tree? Again, there are three steps to get familiar with it: first understand, then implement, and finally apply.

What is a Trie tree?

This is a multi-fork tree, which mainly solves the problem of being able to quickly match a certain string in a set of strings. And its efficiency is based on the idea of a space for time algorithm, because every character in a string becomes a node in a tree, For example, if we Trie a set of words [‘bag’, ‘and’, ‘banana’, ‘ban’, ‘am’, ‘board’, ‘ball’], it looks like this:

The root node is empty because the children store the beginning of the word. The nature of the Trie tree is to combine common prefixes between words. This causes the words ban and banana to share the same path, so you need to give an identifier at the end of the word to indicate that the character is the end of the word. Therefore, when the keyword ba is entered, the words bag, banana and ball can be presented to the user only by traversing the following nodes. Isn’t that cool

Implement a Trie tree from zero

We’ve been talking about binary trees, so it’s convenient to use left and right children, but a Trie tree is a multi-fork tree, and if you store only lowercase letters, each parent node has at most 26 children. So child nodes are stored with a single character as their key, so that no matter how many children there are, it doesn’t matter. Trie has two main operations: one is to add words to the tree and the other is to check if there is a word in the tree.

Class Node {// Constructor (isWord = false) {this.isword = isWord // This.next = new Map() // Class Trie {// Constructor () {this.root = new Node()}... }Copy the code

Add the word to Trie

Break the word into individual characters, each of which is an instance of the Node class. Finally, when the word reaches the end, set the isWord property of the Node Node to true.

class Trie { ... Const _helper = (node, word) => {if (word === "") {// Recurse to the end, Return} const c = word[0] // Start with the first letter of the word if (! Node.next-has (c)) {node.next-set (c, new node ()) // set as a new child node} _helper(node.next-get (c), Word. Slice (1))} _helper(this.root, word) // add to the root node}}Copy the code

With the add method, we can build a Trie tree, but the main purpose of building it is to quickly query, so we also need a search method that can quickly query whether the word is in the Trie tree.

Search for words in Trie (search)

Because there is already a Trie tree, it is also very simple to query, just need to decompose the word to be queried into characters and match with the Trie tree node layer by layer. As long as there is no node in the Trie tree, it can be judged that the Trie tree does not exist this word. After word decomposition, Return the isWord property of the last remaining node.

class Trie { search(word) { const _helper = (node, Const c = word[0] if (!) const c = word[0] if (! Node.next-get (c)) {return _helper(node.next-get (c),} return _helper(node.next-get (c), Slice (1)) // Layer by layer} return _helper(this.root, word) // Start from the root node}}Copy the code

Output each word in the Trie tree (log)

This method simply adds a method to the Trie tree when the person is familiar with it. Each call prints out all the words in the tree for debugging purposes.

class Trie { ... log() { const ret = [] const _helper = (node, Path) => {if (node.isword) {ret.push(path.join('')))} for (const [key, Value] of node.next) {// iterate over each child node path.push(key) // add the word path _helper(value, Path) path.pop() // Traceback}} _helper(this.root, []) console.log(ret)}}Copy the code

Return the word whose prefix matches (match)

This method is a purely personal addition, and many sources that introduce Trie trees do not write about this method. Personally, I feel that this method is a very good combination of Trie tree features, because only for accurate queries, it is really not much better than hash table, red black tree. But if you just return words that match the prefix, this is a big advantage. Such as input method of automatic association, IDE auto – complete functions can be achieved with this method.

class Trie { ... match(prefix) { if (prefix === '') { return [] } let cur = this.root for (let i = 0; i < prefix.length; Const c = prefix[I] if (! Cur.next. Get (c)) {// The prefix does not match, Return []} cur = cur.next. Get (c) const ret = [] const _helper = (node,) const ret = [] const _helper = (node,) Path) => {if (node.isword) {// If it is a word ret.push(prefix + path) // add it to return result} for (const [key, Value] of node.next) {path += key _helper(value, path) // Path = path.slice(0, -1) // backtrace}} _helper(cur, ") // Match down from cur return ret // return result}}Copy the code

Application of Trie tree

First try applying the Trie class we implemented above:

const trie = new Trie(); const words = ['bag', 'and', 'banana', 'an', 'am', 'board', 'ball']; words.forEach(word => { trie.add(word); // Build Trie tree}); Trie. The log () / / print all the words the console log (trie. Match (' ba ')) / / [' ' ', 'banana', 'ball']Copy the code

The most important thing to learn about Trie tree is to learn how it handles problems. Next, let’s take a few dictionary tree related problems and see how to solve them skillfully using Trie tree ideas.

720 – Longest word in the dictionary ↓

Gives an English dictionary consisting of an array of strings words. Find the longest word that is made up of other words in the words dictionary by adding one letter step by step. If there are more than one possible answer, return the word with the smallest lexicographic order in the answer. If there is no answer, an empty string is returned. Example input: Words = [" A ", "banana", "app", "appl", "AP ", "apply", "apple"] Output: "apple" Explanation: both "apply" and "apple" can be composed of words from the dictionary. But "apple" has less lexicographical order than "apply".Copy the code

The simple answer is to find the longest word, but that word must be the sum of all the other words, so there is no jumping. Idea is that we get this dictionary into a Trie tree, the tree to do a good job in the end of the each word in the tag, can only be down is the word matching, so the depth-first traversal, but as long as there is a character is not a word, on the road for the following traversal, finally return to match the length of the longest word can. The dictionary is converted to the Trie tree as shown below:

Obviously banana is eliminated because the first character of the string is not a word, and the final race is for apply and apple, because these two words come from other words along the way. The implementation code is as follows:

class Node { constructor(isWord) { this.isWord = isWord this.next = new Map() } } class Trie { constructor() { this.root = new Node()} add(word) {const _helper = (Node, word) => { if (word === '') { node.isWord = true return } const c = word[0] if (! node.next.get(c)) { node.next.set(c, new Node()) } _helper(node.next.get(c), word.slice(1)) } _helper(this.root, Word)}} var longestWord = function (words) {const trie = new trie () words. ForEach (word => {// build trie tree Trie.add (word)}) let res = "const _helper = (node, Path) = > {the if (path length > res. Length | | (path) length = = = res) length && res > path)) {res = the path / / as long as the match to the word length is greater than the word length certificates } for (const [key, value] of node.next) {// traverses the multi-tree if (! Value.isword) {// As long as this node is not the end of the word, _helper(value, path) // Continue matching path = path.slice(0, -1) // After traversing a branch, Subtracting the branch character}} _helper(trie.root, ") return res // returns the longest word};Copy the code

677 – Key value mapping ↓

Implement two methods in a MapSum class, insert and sum. For the insert method, you get a pair of (string, integer) key-value pairs. Strings represent keys and integers represent values. If the key already exists, the original key-value pair is replaced with the new one. For the method sum, you get a string representing the prefix, and you need to return the sum of all the values of the keys that start with that prefix. Example: Input: INSERT ("apple", 3), output: Null input: sum("ap"), output: 3 Input: INSERT ("app", 2), output: Null input: sum("ap"), output: 5Copy the code

To put it simply, first input some words and their weight, then input the prefix, and then add up the weight value of each matched word. We put the insert word into a Trie tree, and the end of the word is the weight of the word. Therefore, first locate the last node that the prefix stays on, and then traverse all subsequent nodes. Since the weight value at the end of a word is not 0, the weight value of all nodes can be accumulated. The code is as follows:

Constructor (val = 0) {this.val = val; This.next = new Map()}} var MapSum = function () {this.root = new Node()}; MapSum.prototype.insert = function (key, val) { let cur = this.root for (let i = 0; i < key.length; i++) { const c = key[i] if (! cur.next.get(c)) { cur.next.set(c, new Node()) } cur = cur.next.get(c) } cur.val = val }; MapSum.prototype.sum = function (prefix) { let cur = this.root for (let i = 0; i < prefix.length; i++) { const c = prefix[i] if (! Cur = cur.next-get (c)) {return 0} cur = cur.next-get (c); Const _helper = node => {res += node.val;} let res = 0 const _helper = node => {res += node. For (const item of node.next) {_helper(item[1])}} _helper(cur) return res};Copy the code

648 – Word replacement ↓

In English we have a concept called root, which can be followed by other words to form another longer word -- a word we call successors. For example, the root an, followed by the word other, can form the new word another. Now, given a dictionary consisting of many roots and a sentence. You need to replace all inherited words in the sentence with roots. If the inherited word has many roots that can form it, it is replaced with the shortest root. You need to print the sentence after the substitution. Example 1: Input: dictionary = ["cat","bat","rat"], sentence = "the cattle was rattled by the battery" output: Example 2: Input: dictionary = ["a","b","c"], sentence = "aadsfasf absbs bbab cadsfafs" output: "A ab c"Copy the code

Let’s use the same ideaTrieTree, will all the prefixes(root)Build as oneTrieTree, and then iterate to match each word with the prefix tree. When the prefix tree reaches the end, replace the original string with the root. As shown in the figure:

The code is as follows:

class Node { constructor(idWord = false) { this.isWord = idWord this.next = new Map() } } class Trie { constructor() { this.root = new Node() } add(word) { const _helper = (node, word) => { if (word === '') { node.isWord = true return } const c = word[0] if (! node.next.get(c)) { node.next.set(c, new Node()) } _helper(node.next.get(c), word.slice(1)) } _helper(this.root, word) } change(words) { for (let i = 0; i < words.length; i++) { const word = words[i] let cur = this.root let dict = '' for (let j = 0; j < word.length; J ++) {const c = word[j] // iterate over every character of every word if (cur.next-get (c)) {dict += c if the word has a matching root cur = cur.next-get (c) If (cur.isWord) {// Words [I] = dict if (cur.isWord) {// Words [I] = dict if (cur.isWord) {// Words [I] = dict if (cur.isWord) {// Words [I] = dict if (cur.isWord) {// Words [I] = dict if (cur.isWord) {// Words [I] = dict if (cur.isWord) {// }} return words. Join (' ')}} var replaceWords = function (dictionary, Sentence) {const trie = new trie () dictionary.foreach (dict => {trie.add(dict) // build tree}) return Trie.change (sentence.split(' ')) // split words};Copy the code

The Trie tree is three separate branches. If the Trie tree looks like this, there is no need to use the Trie tree at all, so this is also the scenario limitation of using the Trie tree.

The last

Through the above implementation and application, I believe that you have enough understanding of Trie, this is a very good idea to solve problems, scenarios used at the right time, can play a huge advantage. If the scenario doesn’t fit, try not to use this data structure. Because… Let’s summarize the pros and cons of this data structure:

advantages

Performance efficient, the time complexity of matching a word from any number of strings is only the length of the word.
Prefix matching, like searching andIDEFor auto-completion scenarios, useTrieTrees are perfect for that.

disadvantages

Strict data requirements, if the character set does not have many common prefixes(This is the case in question 3)Not doing well. Because each node can store not only lowercase letters, but also uppercase letters, numbers, and so on, oneTrieThe tree is incredibly large and very memory intensive.
JavaScriptThere is no existing class to use, write your own and make sure there is nobugThe trouble.

Data structure section now come to an end, the original planning and line segment tree, sets, hash tables and check, the AVL tree, red and black tree, but the basic it is difficult to use as a front end, the above supplement after the series of chapters, temporarily not die upon, the next chapter we start the algorithm more interesting ~.

Data structures and algorithms (nine) : the implementation of five common sorting algorithms and their advantages and disadvantages

This chapter github source code