InnoDB full text index

FULLTEXT indexes are created on text-based columns (CHAR, VARCHAR, or TEXT columns) to help speed up queries and DML operations on the data contained in those columns, ignoring any words defined as stop words.

A FULLTEXT INDEX is defined as a part of a CREATE TABLE description or used to add ALTER TABLE or CREATE INDEX to an existing TABLE.

Use the MATCH ()… AGAINST syntax performs full-text search. For usage information, see full-text search capabilities.

InnoDB full-text index design

InnoDB FULLTEXT index has inverted index design. The inverted index stores a list of words, and for each word, stores a list of documents in which the word appears. To support proximity search, the location information for each word is also stored as a byte offset.

InnoDB full-text index table

When InnoDB FULLTEXT index is created, a set of index tables are created, as shown in the following example:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;

mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
       WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name                                               | space |
+----------+----------------------------------------------------+-------+
|      333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 |   289 |
|      334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 |   290 |
|      335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 |   291 |
|      336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 |   292 |
|      337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 |   293 |
|      338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 |   294 |
|      330 | test/FTS_0000000000000147_BEING_DELETED            |   286 |
|      331 | test/FTS_0000000000000147_BEING_DELETED_CACHE      |   287 |
|      332 | test/FTS_0000000000000147_CONFIG                   |   288 |
|      328 | test/FTS_0000000000000147_DELETED                  |   284 |
|      329 | test/FTS_0000000000000147_DELETED_CACHE            |   285 |
|      327 | test/opening_lines                                 |   283 |
+----------+----------------------------------------------------+-------+
Copy the code

The first six tables represent inverted indexes and are called secondary index tables. When an incoming document is tagged, individual words (also known as “tags”) are inserted into the index table along with location information and the associated document ID (DOC_ID). Words are fully sorted and partitioned in six index tables based on the character set sorting weight of the first character of the word.

Inverted indexes are divided into six secondary index tables to support parallel index creation. By default, two threads mark, sort, and insert words and related data in the index table. The number of threads can be configured using the innodb_ft_sort_pll_degree option. FULLTEXT When creating indexes on large tables, consider increasing the number of threads.

Secondary index table names are prefixed with FTS_ and suffixed with INDEX_*. Each index table is associated with the index table by a hexadecimal value matching the table_ID index table in the index table name. For example, the test/ Opening_lines table described in table_id is 327, for which the hexadecimal value is 0x147. As shown in the previous example, the hex value “147” appears in the name of the index table associated with the test/ OpenING_lines table.

The FULLTEXT index representing the hexadecimal value index_id also appears in the secondary index table name. For example, in the secondary table name test/FTS_0000000000000147_00000000000001c9_INDEX_1, the hex value 1C9 has the decimal value 457. The index information_schema.innoDB_SYS_INDEXES defined on the Opening_lines table (IDX) can be identified by querying this value (457) in the table.

mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES
       WHERE index_id=457;
+----------+------+----------+-------+
| index_id | name | table_id | space |
+----------+------+----------+-------+
|      457 | idx  |      327 |   283 |
+----------+------+----------+-------+
Copy the code

If the primary table is created in each table file table space, the index table is stored in its own table space.

The other index tables shown in the previous example are called generic index tables and are used for deletion processing and storing the internal state of the FULLTEXT index. Unlike the inverted index tables created for each full-text index, this set of tables is common to all full-text indexes created on a particular table.

Even if full-text indexes are removed, the common secondary tables are retained. When deleting a full-text index, leave FTS_DOC_ID as the FTS_DOC_ID column created by the index, because deleting this column will require rebuilding the table. A generic axillary table is required to manage the FTS_DOC_ID column.

  • FTS_ * _DELETED and FTS_ * _DELETED_CACHE

    The document ID (DOC_ID) that contains documents that have been deleted but whose data has not been removed from the full-text index. The FTS_*_DELETED_CACHE is the FTS_*_DELETED table of the memory version.

  • FTS_ * _BEING_DELETED and FTS_ * _BEING_DELETED_CACHE

    The document ID (DOC_ID) containing deleted documents whose data is currently being deleted from the full-text index. The FTS_*_BEING_DELETED_CACHE table is the memory version of the table FTS_*_BEING_DELETED.

  • FTS_*_CONFIG

    Stores information about the internal state of the FULLTEXT index. Most importantly, it stores FTS_SYNCED_DOC_ID, which is used to identify documents that have been parsed and flushed to disk. In the case of crash recovery, FTS_SYNCED_DOC_ID will use a value to identify documents that have not been flushed to disk so that they can be reparsed and added back to the FULLTEXT index cache. To view the data in this table, query the information_schema.innodb_ft_config table.

InnoDB full-text index cache

Once the document is inserted, it is tokenized and the individual words and associated data are inserted into the FULLTEXT index. Even for small documents, this process can result in a large number of small insert secondary index tables, making simultaneous access to them a point of contention. To avoid this problem, InnoDB uses the FULLTEXT index cache to temporarily cache index table inserts for the most recently inserted rows. The cache structure in this memory will remain inserted until the cache is full, and then it will be flushed to disk in batches (to secondary index tables). You can query the information_schema.innodb_fT_index_cache table to see the tokenized data for the most recently inserted rows.

The cache and batch refresh behavior avoids frequent updates to secondary index tables, which can cause concurrent access problems during busy insert and update times. Batch processing also avoids multiple insertions of the same word and minimizes repeated typing. Instead of refreshing each word individually, inserts of the same word are combined and flushed to disk as a single entry, thus increasing insert efficiency while keeping the secondary index table as small as possible.

The innodb_fT_cache_size variable is used to configure the full-text index cache size (per table), which affects how often the full-text index cache is refreshed. You can also use this innodb_ft_total_cache_size option to define a global full-text index cache size limit for all tables in a given instance.

Full-text index caches store the same information as secondary index tables. However, the full-text index cache caches only the tokenized data of the most recently inserted row. Data flushed to disk (full-text auxiliary table) is not brought back to the full-text index cache when queried. The data in the secondary index table is queried directly, the results in the secondary index table are merged with the results in the full-text index cache, and then returned.

InnoDB full-text index document ID and FTS_DOC_ID columns

InnoDB uses a unique document identifier called document ID (DOC_ID) to map words in the full-text index to the document record where the word appears. The mapping requires FTS_DOC_ID to have a column on the index table. If FTS_DOC_ID does not define a column, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.

The following table definition does not include the FTS_DOC_ID column:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;
Copy the code

When you CREATE a full-text INDEX on a table using the CREATE FULLTEXT INDEX syntax, a warning is returned reporting that InnoDB is rebuilding the table to add FTS_DOC_ID columns.

mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level   | Code | Message                                          |
+---------+------+--------------------------------------------------+
| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+
Copy the code

The same warning is returned when used for ALTER TABLE to add a full-text index to a TABLE that does not have a FTS_DOC_ID column. If you CREATE the full-text index CREATE TABLE once and do not specify the FTS_DOC_ID column, InnoDB adds hidden FTS_DOC_ID column without warning.

Defining columns is much cheaper in time than FTS_DOC_ID creating a full-text index on a TABLE where CREATE TABLE has already loaded data. If FTS_DOC_ID defines columns on the table before loading the data, new columns can be added without rebuilding the table and its indexes. If you don’t care about CREATE FULLTEXT INDEX performance, ignore the FTS_DOC_ID column and InnoDB will CREATE performance for you. InnoDB creates a hidden FTS_DOC_ID column and a unique index () on the FTS_DOC_ID_INDEX column. If you want to create your own FTS_DOC_ID column, you must define the column as BIGINT UNSIGNED NOT NULL and name it FTS_DOC_ID (all caps), as shown in the following example:

Pay attention to

The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column, but AUTO_INCREMENT makes loading data much easier.

mysql> CREATE TABLE opening_lines (
       FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;
Copy the code

If you choose to define your own FTS_DOC_ID column, it is your responsibility to manage that column to avoid null or duplicate values. The FTS_DOC_ID value cannot be reused, which means that the FTS_DOC_ID value must be constantly increased.

(Optional) You can FTS_DOC_ID_INDEX create the required unique (all caps) on the FTS_DOC_ID column.

mysql> CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);
Copy the code

If you do not create FTS_DOC_ID_INDEX, InnoDB will do so automatically.

Prior to MySQL 5.7.13, the maximum allowable gap between a used FTS_DOC_ID value and a new FTS_DOC_ID value was 10000. In MySQL 5.7.13 and later, the allowed gap is 65535.

To avoid rebuilding the table, FTS_DOC_ID deletes the full-text index and preserves the column.

InnoDB full-text index deletion processing

Deleting records with full-text index columns can result in many small deletions in secondary index tables, making concurrent access to those tables a point of contention. To avoid this problem, whenever a DOC_ID record is deleted from the index table, the document ID () of the deleted document is recorded in the special FTS_*_DELETED table, and the index record remains in the full-text index. Before the query result is returned, the FTS_*_DELETED table is used to filter the ids of deleted documents. The advantage of this design is that deletion is quick and cheap. The disadvantage is that deleting a record does not immediately reduce the size of the index. To delete the full-text index entry of the deleted record, the OPTIMIZE TABLE runs Innodb_Optimize_fullTEXT_only =ON ON the TABLE with the index to rebuild the full-text index. For more information, see Optimizing InnoDB full-text Index.

InnoDB full-text index transactions

The InnoDB FULLTEXT index has special transactional properties due to its caching and batch behavior. Specifically, FULLTEXT index updates and inserts are handled at transaction commit time, which means FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search returns only the result after the inserted row has been committed.

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;

mysql> BEGIN;

mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
       ('Call me Ishmael.'.'Herman Melville'.'Moby-Dick'),
       ('A screaming comes across the sky.'.'Thomas Pynchon'.'Gravity\'s Rainbow'), ('I am an invisible man.', 'Ralph Ellison', 'Invisible Man'), ('Where now? Who now? When now?', 'Samuel Beckett', 'The Unnamable'), ('It was love at first sight.', 'Joseph Heller', 'Catch- 22'), ('All this happened, more or less.', 'Kurt Vonnegut', 'Slaughterhouse-Five'), ('Mrs. Dalloway said she would buy the flowers herself.', 'Virginia Woolf', 'Mrs. Dalloway'), ('It was a pleasure to burn.', 'Ray Bradbury', 'Fahrenheit 451'); mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael'); +----------+ | COUNT(*) | +----------+ | 0 | +----------+ mysql> COMMIT; mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael'); +----------+ | COUNT(*) | +----------+ | 1 | +----------+Copy the code
Monitor InnoDB full-text index

You can monitor and check specific text processing aspects of the index by querying the INFORMATION_SCHEMA table under InnoDB FULLTEXT:

  • INNODB_FT_CONFIG
  • INNODB_FT_INDEX_TABLE
  • INNODB_FT_INDEX_CACHE
  • INNODB_FT_DEFAULT_STOPWORD
  • INNODB_FT_DELETED
  • INNODB_FT_BEING_DELETED

You can also FULLTEXT by querying INNODB_SYS_INDEXES and viewing basic information about indexes and tables INNODB_SYS_TABLES.

For more information, see “InnoDB INFORMATION_SCHEMA FULLTEXT Index Table.”

More content welcome to pay attention to my personal public number “Han Elder brother has words”, 100G artificial intelligence learning materials, a large number of back-end learning materials waiting for you to take.