Sphinx installation and use

Sphinx can be used in two ways:

1. Use Sphinx’s API to manipulate Sphinx. PHP compiles the API into PHP as an extension

2. Use MySQL’s Sphinx storage engine

Sphinx is an English full-text search engine, Coreseek is a full-text search engine that supports Chinese lexicon, and Lucene is a full-text search engine implemented in Java.

The data is indexed using the Sphinx search engine. The data is loaded once and stored in memory, and the user only needs to retrieve the data from the Sphinx server to perform the search. The Indexer program extracts data from the database, participles the data, generates one or more indexes based on the generated participles, and passes them to the searchd program, which can then be searched by the client through API calls.

Flow chart interpretation:

Database: Data source, which is the data source Sphinx indexes.

Indexer: An Indexer that retrieves data from a data source and generates a full-text index of the data. Run Indexer periodically as required to update the index periodically.

After Sphinx reads the data from the database using a configuration file, it passes the data to the Indexer program, which then reads the entries one by one and indexes each entry based on a word segmentation algorithm, either unary or MMSEG.

Searchd: Searchd talks directly to the client program and uses the Indexer program to build indexes to quickly process search queries.

App client: Receive a search string from user input, send the query to the searchd program and display the returned results.

[Installation process]

# to the sphinx's official website to download the source code file: http://sphinxsearch.com/files/sphinx-2.2.10-release.tar.gz

[root@localhost ~]`# cd /usr/local/src`

[root@localhost ~] ‘# tar-zxvf sphinx-2.2.10-release.tar.gz’

[root @ localhost ~] ` # CD sphinx – 2.2.10 – release `

[root@localhost sphinx-2.2.10-release] ‘#./configure –prefix=/usr/local/sphinx –with-mysql’

[root@localhost sphinx-2.2.10-release] ‘# make && make install’

# LibSphinxClient installation (required for PHP module)

[root @ localhost sphinx – 2.2.10 – release] ` # API CD/libsphinxclient `

[root@localhost libsphinxclient]`# ./configure –prefix=/usr/local/sphinx`

[root@localhost libsphinxclient]`# make && make install`

Install the PHP Sphinx module

# download sphinx expansion pack: http://pecl.php.Net/package/sphinx

[root@localhost SRC] ‘# tar-zxvf sphinx-1.3.3.tgz’

[root @ localhost SRC] ` # CD sphinx – 1.3.3 `

[root @ localhost sphinx – 1.3.3] ` # phpize `

[root @ localhost sphinx – 1.3.3] ` #. / configure — with PHP – config = / usr/local/PHP/bin/PHP – config –with-sphinx=/usr/local/sphinx/`

[root@localhost sphinx-1.3.3] ‘# make && make install’

# install successfully:

Installing shared extensions:    `/usr/local/php/lib/php/extensions/no-debug-non-zts-20131226/`

# editing PHP ini

[root @ localhost sphinx – 1.3.3] ` # vim/usr/local/PHP/etc/PHP ini `

Add: extension = sphinx. So

# restart nginx server

[root@localhost ~]`# vim /etc/ld.so.conf`

# add the following:

/usr/local/mysql/lib

[root@localhost ~] ‘# ldconfig # Enable command’

[Sphinx profile]

# Sphinx can define multiple indexes and data sources, which can be applied to different tables or to different applications for full-text retrieval.

## Data source SRC1

source`src1`

{

The data source type can be: mysql, MSSQL, odbc, etc

type =  mysql

The following are the port, user name, password, database name, and so on of the SQL database.

sql_host  =  localhost

sql_user  =  root

sql_pass  =  root

sql_db     = `test`

sql_port  =  3306 

Mysql > select * from mysql. mysql > select * from mysql. mysql

sql_query_pre = SET NAMES UTF8

## Full text index to display (try not to use WHERE, GROUP BY, give their contents to Sphinx)

Sphinx uses this statement to pull data from the database. The select field must contain a unique primary key and the field to be retrieved in full text.

sql_query = SELECT`id`, name from tablename

The following are the properties used for filtering or conditional queries

## Multiple query operations when the data source is too large

SQL_QUERY_RANGE = SELECT MIN(‘ id ‘), MAX(id) FROM Documents ##

Sql_range_step = 1000 ## Step size of query ‘

SQL_RANGED_THROttle = 0 ‘## Sets the time interval for the query in milliseconds

## The following are the data of different attributes (attribute fields). The attributes are stored in the index. It is not indexed in full text, but can only be used for filtering and sorting

## Fields that appear in WHERE, ORDERBY, GROUPBY are defined with a single attribute (starting with SQL_ATTR_). Different types of fields are defined with different attribute names.

SQL_ATTR_UINT = CAT_ID ‘## Unsigned integer type’

sql_attr_unit = member_id

SQL_ATTR_TIMESTAMP = ADD_TIME ‘## UNIX TIMESTAMP

## for command line interface call tests

sql_query_info =`select* from tablename whereid=$id`

}

# # index

index test1

{

Src1 ‘##

Path = ` / usr/local/sphinx/var/data/test1 # # index file storage path and filename ` index

## MMSEG participle ##

##charset_dictpath = /usr/local/mmseg3/etc ## Specify the directory from which the dictionary participle should be read. The directory must contain the uni.lib dictionary. This is required when participle is enabled

## Set the data encoding UTF-8 / GBK

## Unary participle ##

## New Sphinx does not support the charset_type setting

Charset_table = ‘## Character table and case conversion rules’

Ngram_chars = ‘## The valid character set to be recognized for unary character splitting mode’

Ngram_len = 1 ‘## participle length’

}

## indexer configuration

indexer

{

Mem_limit = 256 ‘## memory limit’

}

## Sphinx server process

searchd

{

Listen = 9312 ‘## Listening to port

listen = 9306:mysql41

The log = ` / usr/local/sphinx/var/log/searchd log log ` # # service process

Query_log = ` / usr/local/sphinx/var/log/query log query log ` # # client

Read_time = 5 ‘## request timeout’

Max_children = 30 ‘## Maximum number of searchd processes that can be executed simultaneously’

Pid_file = ` / usr/local/sphinx/var/log/searchd pid # # ` process id file

Max_matches = 1000 ## Maximum number of query results returned ‘

Seamless_rotate = 1 ## Seamless_rotate = 1

}

[Generate index]

Invoke the indexer program to generate all indexes:

[root@localhost ~]`# /usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/sphinx.conf –all`

Specify a data source to generate an index:

[root @ localhost ~] ` # / usr/local/sphinx/bin/indexer – config/usr/local/sphinx/etc/sphinx. Conf index name ` (defined in the configuration file)

If the searchd daemon has been started at this point, add the –rotate argument:

[root@localhost ~]`# /usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/sphinx.conf –all –rotate`

[Start Sphinx]

[root@localhost ~]`# /usr/local/sphinx/bin/searchd –config /usr/local/sphinx/etc/sphinx.conf`

[Using Sphinx]

1, new SphinxClient (); ## Create Sphinx client interface object

2, setServer (host, port); ## Set the connection to the Sphinx host and port

3. SetMatchMode(mode); ## Sets the matching mode of the full-text query

SetFilter(string $attribute, array $values [, bool $exclude = false]

String $attribute name

Array $values; array $values

Bool $exclude matches whether documents with this filtering rule will be excluded from the result

If (int mode [, string $sortBy])

Int $offset (int $offset, int $limit

$Query(string $Query \[, string $index=’*’]

String $query = string $query

String $index index name, which can be multiple, separated by commas or ‘*’ for all indexes

Data structure returned:

key

The value that

“matches”

Stores the document ID and its corresponding hash table containing the document weights and attributes worth it

“total”

The total number of matching documents retrieved by this query on the server (that is, the size of the server-side result set, depending on the relevant Settings)

“total_found”

The total number of matched documents in the index

“words”

Map the query keywords (keywords converted to case, stemming, and other processing) to a statistic that contains information about the keyword. How many documents did ‘docs’ occur and how many times did ‘hits’ occur altogether?

“error”

The error message reported by searchd

“warning”

The warning message reported by searchd

BuildExcerpts (array $docs, string $index, string $words \[, array $opts]) ## Highlight keyword text snippets for abstracts

Array $docs array of document content strings

String $index retrieves the name

String $words to highlight the key words

Additional highlighting options for the array $opts associative array

[Sphinx incremental index update]

Index establishment composition: 1, fixed primary index. 2. Incremental index reconstruction. 3. Merge index data.

In practice, you would need to create a secondary table for the incremental index creation so that you can remember the record ID from which the index was last created to do the actual incremental index creation.

Create secondary table create secondary table create secondary table CREATE TABLE ‘sph_counter’ (‘ counter_id ‘int(11) NOT NULL COMMENT’, ‘max_doc_id’ int(11) NOT NULL COMMENT MyISAM DEFAULT CHARSET=utf8,PRIMARY KEY (‘ counter_id ‘)) ENGINE= myISAM DEFAULT CHARSET=utf8

SELECT * FROM SPH_COUNTER WHERE MAX_DOC_ID <=(SELECT MAX_DOC_ID FROM SPH_COUNTER WHERE COUNTER_ID = 1) WHERE COUNTER_ID = 1

3) In the data source of incremental index, inherit the main index data source, and add WHERE conditional statement to the query statement of SQL_QUERY. WHERE id > (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1)

4) Configure the index definition configuration of the primary index and the incremental index respectively.

Create primary index, add crontab, rebuild primary index regularly:

/usr/local/sphinx/bin/indexer`–config/usr/local/sphinx/etc/sphinx.conf –rotate test1`

Create an incremental index and merge it, which can be added to a crontab task every once in a while:

/usr/local/sphinx/bin/indexer`–config/usr/local/sphinx/etc/sphinx.conf –rotate delta`

/usr/local/sphinx/bin/indexer`–config/usr/local/sphinx/etc/sphinx.conf –merge test1 delta –rotate`