Let’s say you have a piece of text and a huge library of keywords (starting with hundreds of thousands), and now you need to find those keywords from the text, and even make some substitutions based on the value of those keywords. What do you do?

In the face of this simple and common requirement, STRTR str_replace preg_replace and everything else fails because the thesaurus is too large.

SCWS? I did this at first, but it can only help you find out the words in the thesaurus, not based on the corresponding value of the keyword. While you can take advantage of the property fields of the thesaurus, it only supports two bytes, which is obviously not enough.

I searched Google for a long time but couldn’t find the wheels I needed, so I had to whip up a thesaurus tool: SimplBenedict

  1. Simple. A pure PHP implementation that does not install extensions and does not rely on caches like XCache Memcache Redis.
  2. Practical. Support large thesaurus, my own 40W thesaurus run easily. Also support callback type replacement oh!
  3. Fast. Using the tall TRIE tree for matching, the search time is related to the length of the text, and the size of the thesaurus is not related to the speed. The dictionary is saved as a binary file, and only one file handle is maintained when used, which avoids the disadvantage of huge memory overhead of the traditional TRIE tree.

Welcome to have the need of friends to try 😛