One: Introduction to Sphinx

Sphinx is a SQL-based full-text search engine, can be combined with MySQL,PostgreSQL to do full-text search, it can provide more professional than the database itself search function, so that the application process is easier to achieve professional full-text search. Sphinx specifically designs search APIs for scripting languages such as PHP,Python,Perl,Ruby, etc. Sphinx also designs a storage engine plug-in for MySQL.

Two: Sphinx features

1. High-speed indexing (peak performance up to 10 MB/ SEC on modern CPUs);

2: High performance search (average response time per search is less than 0.1 second on 2-4GB text data);

3. It can process massive data (currently known to be able to process more than 100 GB of text data, on a single CPU system can process 100 MB of documents);

4. Provide excellent correlation algorithm, compound Ranking method based on phrase similarity and statistics (BM25);

5: Support distributed search;

6: Support for phrase search

7: Provide document summary generation

8: can be used as a MySQL storage engine to provide search services;

9: Support Boolean, phrase, word similarity and other retrieval modes;

10: The document supports multiple full-text search fields (no more than 32);

11: The document supports multiple additional attribute information (e.g. grouping information, timestamp, etc.);

12: support word segmentation;

Three: the process of Sphinx execution

Step 1: Use Sphinx to get and create index files from MySQL

Step 2: Use PHP to query Sphinx for data and return the ID

MySQL > select ID from mysql. MySQL > select ID from mysql. MySQL

Database: Data source, which is the data source Sphinx indexes.

Indexer: An Indexer that retrieves data from a data source and generates a full-text index of the data. Run Indexer periodically as required to update the index periodically.

After Sphinx reads the data from the database using a configuration file, it passes the data to the Indexer program, which then reads the entries one by one and indexes each entry based on a word segmentation algorithm, either unary or MMSEG.

Searchd: Searchd talks directly to the client program and uses the Indexer program to build indexes to quickly process search queries.

App client: Receive a search string from user input, send the query to the searchd program and display the returned results.

Four: How Sphinx works

The entire workflow of Sphinx is that the Indexer program extracts data from the database, parses the data, and then generates a single or multiple indexes based on the generated parsers and passes them to the searchd program. The client can then search through API calls.

Five: use scenarios

1. Fast, efficient, extensible and core full-text retrieval

It is faster than MyISAM and InnoDB when data is large.

The ability to create indexes on mixed data from multiple source tables, not limited to fields on a single table.

The ability to consolidate search results from multiple indexes.

Full-text search can be optimized according to additional conditions on attributes.

Use WHERE clauses and LIMIT clauses effectively

When a SELECT query is performed on multiple WHERE conditions, the index is less selective or has no supported fields at all, resulting in poor performance. Sphinx can cable keywords

Lead. The difference is that in MySQL, the internal engine decides whether to use an index or a full scan, whereas Sphinx lets you choose which access method to use. Because the sphinx is

The data is stored in RAM, so Sphinx doesn’t do much I/O. MySQL has what’s called a semi-random I/O disk read, which reads records line by line into a sort buffer

Section, then sort, and discard most of the rows. So Sphinx uses less memory and disk I/O.

3. Optimize the Group By query

Sphinx uses fixed memory for sorting and grouping, which is slightly more efficient than similar MySQL queries where all data sets can be placed in RAM.

4. Generate result sets in parallel

Sphinx allows you to produce several copies of the same data at the same time, again using a fixed amount of memory. As a contrast, traditional SQL methods either run two queries, or

Create a temporary table for each search result set. Sphinx uses a multi-query mechanism to do this. Instead of launching queries one after another, you put several queries into a batch and submit them in a single request.

5. Spread up and out

Scaling up: Added CPU/ kernel, extended disk I/O

Scaling out: Multiple machines, known as distributed Sphinx

6. Aggregate sharded data

Ideal for situations where data is distributed among different physical MySQL servers.

Example: There is a 1 terabyte table with 1 billion articles, sharded to 10 MySQL servers by user ID, which is of course fast under a single user query, such as

If you want to implement an archive paging feature that shows all the posts posted by a user’s friends. That would require multiple MySQL servers to be accessed by colleagues. It’s going to be slow. while

Sphinx only needs to create a few instances, map frequently accessed article properties in each table, and then paging queries can be done in three lines of configuration.

If you have any good suggestions, please enter your comments below.

Visit my blog at https://guanchao.site