Recently, when we perfect the full-text search function in Electron project, we have encountered many pits, most of which are related to regularization. Since the regex part is always weak, let me summarize.

Sql wildcards

A full-text search will inevitably use the keyword like in the SQL query, so you can write the following query without thinking:

'select * FROM \'tbl_msg\' WHERE content like \'%'+key+'%\''

The test then reported a problem: search keywords with % and _ turned up a lot of irrelevant content.

Because % and _ are wildcards in SQL statements, the above statement will not be recognized as text to match, need to add an escape character.

Because the keyword to be searched is a variable, you need to replace the wildcard in the keyword with an ESCAPE character prefix. The ESCAPE character can be defined by the ESCAPE keyword.

// Replace the keyword before executing the SQL query
const escapeTexts = [The '%'.'_'];
_.each(escapeTexts, item= > {
  searchKey = String(searchKey).replace(new RegExp(item, 'g'), ` \ \${item}`);
});

`'select * FROM \'tbl_msg\' WHERE content like \'%'+key+'%\'' ESCAPE \'\\\'`
Copy the code

We defined the escape character \ to replace occurrences of % and _ in the keyword.

Address book search

Full-text search involves searching user lists in local databases and servers. In this search scenario, punctuation marks and special characters in keywords are meaningless and need to be filtered before search.

// Remove special characters and punctuation from the text
 escapePunctuation(text) {
   return text.replace(/ [\ | the \ | ~ \ ` | \! $| | \ @ # | | \ \ | \ \ % ^ | | \ \ & | \ * (| \ | the \ | \ _ | \ + | = \ | the \ | | \ \ | the \ | \ [| \] {| \} | \; | \ : | | \ '\', | | \ \ < | \. | \ > | \ | \? /] /g.' ');
  },
Copy the code

Keyword highlight

When displaying search results, you need to highlight matching content. In JSX, you can do this by replacing the matching content with an HTML string, but writing it that way can be problematic.

  • Escape of special characters

The escapeRegExp method provided by LoDash is used

  • Substitution of case

When searching for a match, we should ignore the distinction of case, but when highlighting the substitution, cannot directly use the search keyword to replace, this will cause the original text of case is replaced.

text = text.replace(/keyReg/gi.(keyText) = > `<span class="highlight">${keyText}</span>`)
Copy the code
  • HTML turn JSX

Because keyword highlighting converts text to a string with HTML, you need to use the dangerouslySetInnerHTML attribute when writing JSX, which is obviously not good for encapsulation and component reuse. You are advised to use HTMR libraries to directly convert react nodes.

  • Special character conversion in text

Before using HTMR, consider the possibility of special characters in the text to be replaced, especially <>. For example,

123

after being converted by HTMR, JSX will identify the text as AN HTML tag, resulting in an error. Therefore, the first step in highlighting keywords should be to replace special characters.

  const escapeHtml = (html) = > {
  	return String(html)
      .replace(/ & (? ! \w+;) /g.'& ')
      .replace(/</g.'< ')
      .replace(/>/g.'> ')
      .replace(/"/g.'" ')
      .replace(/'/g.'the & # 039; ');
  }
  
  const convertHtmlWithSearchKey = (text, searchKey, noWrap) = > {
    text = escapeHtml(text);
    const keyReg = new RegExp(escapeRegExp(searchKey), 'gi');
    text = text.replace(keyReg, (keyText) = > `<span class="highlight">${keyText}</span>`);
    if(! noWrap) { text = text.replace(/(\r\n)|\n|\r/g.'<br>');
    }
    return convert(text);
  },
Copy the code