Problem description

With the following table of address information in the database, you need to implement a search for possible matches based on anything the user has entered.

MySQL > select * from mysql. MySQL > CREATE TABLE Address (id BIGINT NOT NULL AUTO_INCREMENT, Address VARCHAR(100) NOT NULL DEFAULT '', city VARCHAR(50) NOT NULL DEFAULT '', state VARCHAR(50) NOT NULL DEFAULT '', country VARCHAR(50) NOT NULL DEFAULT '', zip_code VARCHAR(10) NOT NULL DEFAULT '', FULLTEXT ftidx_location(address, city, state, country, zip_code) ) ENGINE=INNODB DEFAULT CHARSET=utf8; insert into Address(city, state) values ('Irving', 'TX');

It is easy to think of using the following SQL for retrieval.

${input} select * from Address where match(Address, city, state, country, zip_code) against (${input});

However, for input that is too short, such as “TX”, the SQL will not be able to retrieve any results even if the data for state = TX exists in the database. Or if you type “Irvin” you won’t find anything. This problem is analyzed and resolved below, using “Irvin,TX” as user input (without double quotes).

Cause analysis,

MySQL > INDEX (address, city, state, country, zip_code); MySQL > select * from ‘FULLTEXT INDEX’ where MySQL = ‘FULLTEXT INDEX’;

mysql> SHOW VARIABLES LIKE '%ft%';
+---------------------------------+----------------+
| Variable_name                   | Value          |
+---------------------------------+----------------+
| ft_boolean_syntax               | + -><()~*:""&| |
| ft_max_word_len                 | 84             |
| ft_min_word_len                 | 4              |
| ft_query_expansion_limit        | 20             |
| ft_stopword_file                | (built-in)     |
| innodb_ft_aux_table             |                |
| innodb_ft_cache_size            | 8000000        |
| innodb_ft_enable_diag_print     | OFF            |
| innodb_ft_enable_stopword       | ON             |
| innodb_ft_max_token_size        | 84             |
| innodb_ft_min_token_size        | 3              |
| innodb_ft_num_word_optimize     | 2000           |
| innodb_ft_result_cache_limit    | 2000000000     |
| innodb_ft_server_stopword_table |                |
| innodb_ft_sort_pll_degree       | 2              |
| innodb_ft_total_cache_size      | 640000000      |
| innodb_ft_user_stopword_table   |                |
+---------------------------------+----------------+

FullText indexes are based on “words”. MySQL default word segmentation method is that all special symbols that are not alphabetic or numeric are word participles (if you want to do word segmentation for Chinese, you can use MySQL built-in Ngram FULLTEXT retrieval plug-in). According to the participle method, “Irving,TX” will be divided into two words: “Irving” and “TX”.

Innodb_ft_min_token_size is the shortest indexing term and will only index keywords with 3 or more English characters. MySQL does not create an index on “TX”, which is why it is impossible to search for “TX”. The reason why we can’t search for “Irvin” is because Fulltext builds an index on the word, so there is only an index for “Irving” in the index file, and there is no index for “Irvin”.

The solution

Modify the fullText INDEX configuration

Modify minimum word item length 2 to allow indexing of words of length 2. And use In Boolean Mode to match incomplete words.

Modify minimum word item length 2 to allow indexing of words of length 2

I think 1 is too small, most words will not be one letter, and if this value is too small, will lead to the index file is too large, is not conducive to the update of the index. So let’s just change it to 2. In the MyISAM database engine FT_MIN_WORD_LEN is used, whereas in InnoDB innodb_FT_MIN_TOKEN_SIZE is used.

If executed before modification, the result of the query will be null even if the database contains data of STATE =TX.

select * from Address where match(address, city, state, country, zip_code) against ('TX'); 
  1. Modify my.cnf to add a configuration item after [mysqld].

    sudo vim /etc/mysql/my.cnf

    Configure the content

    innodb_ft_min_token_size=2
    ft_min_word_len=2
  2. Restart the MySQL service.

    sudo service mysql restart
  3. Rebuild the index file.

    A manual fix is required for tables that use MyISAM. What to do when ‘repair table’ query won’t work in MySQL? And fine-tuning MySQL full-text Search or fine-tuning its Chinese translation to MySQL full-text Search.

    REPAIR TABLE Address QUICK;

    For tables using InnoDB, you can use the following instructions to rebuild the table index. This action acquires the read lock on the table.

    ALTER TABLE Address ENGINE=INNODB;

    The same can be done with the optimization directive, which does even more optimization. While OPTIMIZE TABLE is running, MySQL locks the TABLE.

    OPTIMIZE TABLE Project; Table does not support optimize, doing recreate + analyze instead
  4. Check to see if this is in effect.

    show variables like 'innodb_ft_min_token_size'; 
    show variables like 'ft_min_word_len'; 

    If the database contains state=TX, it will be queried.

    select * from Address where match(address, city, state, country, zip_code) against ('TX'); 

use
IN BOOLEAN MODEIncomplete word matches

The order of user input content is from left to right, that is, if the user wants to type more than one word, then the leftmost word must be complete, and the rightmost word may be incomplete. You can then add the * wildcard character to the end of the search address entered by the user to make it possible to match incomplete words. It can be modified as:

select * from Address where match(address, city, state, country, zip_code) against ('TX,Irvin*' IN BOOLEAN MODE);

This method depends on MySQL’s own configuration. If you decide to use FULLTEXT INDEX, you should configure it when creating the database so that it does not affect the system that is already online.

Note: If the user’s own input also contains the wildcard specified in ft_boolean_syntax, then we need to do a formatting in the program to prevent illegal queries such as’ Irvin** ‘, etc. Alternatively, you can restrict the characters of ft_boolean_syntax.

Method 2: Use ‘LIKE’ for matching

Fulltext INDEX is replaced by “Like”. In the program according to the MySQL method for word segmentation, and between each word to increase the wildcard %.

select * from Address where concat(address, ',', city, ',', state, ' ', zip_code, ' ', country) like '%Irvin%TX%';

With this method, you lose the index of the address information and need to scan the entire table because of the USE OF LIKE and the query content. In addition, Like has no matching pairs, which means the order of the results will be independent of the matching degree.

Note: Like only uses indexes in statements that do not begin with a wildcard character. For example, “Irvin%” will use indexes, while “%Irvin” will not use indexes.

ADD FULL_ADDRESS; SELECT * FROM ‘LIKE’ WHERE FULL_ADDRESS = ‘LIKE’

Add a full full_address field with address, city, state zip_code, country.

select * from Address where full_address like '%Irvin%TX%';

This method mainly uses space for time, and solves the time-consuming operation in the second method that each query needs to concatenate the string.

Method 4: Use a professional search engine

Use a more specialized search engine like Elasticsearch or Solr.

reference

  • Why set ft_min_word_len= 1@csdn for MySQL
  • MySQL uses FullText Index @cnBlogs
  • what to do when ‘ repair table ‘ query won’t work in mysql? @stackoverflow
  • Fine-Tuning MySQL Full-Text Search @mysql
  • Fine tuning of MySQL full-text search @docs4dev
  • FullText Search Innodb Fails, MyIsam Returns Results @stackoverflow
  • This is an example of how optimize the optimize table is important for MySQL @51yip
  • InnoDB/ MyISAM storage engine supports Chinese full-text index @51CTO