solr

Solr is a high-performance, Lucene-based full-text search server. At the same time, it has been extended to provide a richer query language than Lucene, and at the same time, it has achieved configurable, extensible, and optimized the query performance, and has provided a perfect function management interface. It is a very excellent full-text search engine.

lucene

Lucene is a subproject of Apache Jakarta project, is an open source full text search engine development kit, but it is not a complete full text search engine, but a full text search engine architecture, provides a complete query engine and indexing engine, part of the text analysis engine. The purpose of Lucene is to provide software developers with an easy-to-use toolkit to easily implement full-text search in the target system, or to build a complete full-text search engine based on it.

Inverted index

In general, we first find the document, and then find the words contained in the document;

Inverted indexing is the process of turning the word around to find the document in which it occurs.

The actual example

Word segmentation results

Inverted index

Lucene API is introduced

Create indexes

Create a new Maven project and add dependencies

<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > < modelVersion > 4.0.0 < / modelVersion > < groupId > cn. Tedu < / groupId > < artifactId > luceme - demo < / artifactId > 1.0 the SNAPSHOT < version > < / version > < dependencies > < the dependency > < groupId > org.. Apache lucene < / groupId > <artifactId>lucene-core</artifactId> <version>8.1.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId> jUnit </artifactId> <version>4.12</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId> Lucene-analyzers </artifactId> <version>8.1.1</version> </dependency> </ Dependencies > <build> </artifactId> Lucene-analyzers </artifactId> <version>8.1.1</version> </ Dependencies > <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> </version> 3.8.0</version> <configuration> <source>1.8</source> 1.8</target> </configuration> </plugins>  </build> </project>

Create the test class and add the following code

package test; import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.FSDirectory; import org.junit.Test; import java.io.File; import java.io.IOException; Public class Test1 {String[] a = {"3, huawei-huawei-PC, hot style ", "4, huawei-phone, flag ", "5, novo-thinkpad, business book ", "6, novo-phone, selfie-timer "}; @test public void test1() throws Exception { FsDirectory directory = fsDirectory. Open (new File("d:/ ABC ").topath ()); IndexWriterConfig = new IndexWriterConfig(new SmartChineseAnalyzer()); // IndexWriter write = new IndexWriter(directory, config); String[] arr = s.plit (","); String[] arr = s.plit (","); String[] arr = s.plit (","); Documentdoc = new Document(); documentdoc = new Document(); doc.add(new LongPoint("id", Long.parseLong(arr[0]))); doc.add(new StoredField("id", Long.parseLong(arr[0]))); // Add (new TextField("title", arr[1], field.store. YES)); doc.add(new TextField("sellPoint", arr[2],Field.Store.YES)); write.addDocument(doc); } write.flush(); write.close(); }}

View index

Run Luke

Run the Luke application in Lucene 8.1.1 and specify the directory where the index will be placed

To view the document

Specify a word splitter and test the word splitter

Query test

Id of the query

Query from index

Add the test2() test method to the test class

package test; import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.FSDirectory; import org.junit.Test; import java.io.File; import java.io.IOException; Public class Test1 {String[] a = {"3, huawei-huawei-PC, hot style ", "4, huawei-phone, flag ", "5, novo-thinkpad, business book ", "6, novo-phone, selfie-timer "}; @test public void test1() throws Exception { FsDirectory directory = fsDirectory. Open (new File("d:/ ABC ").topath ()); IndexWriterConfig = new IndexWriterConfig(new SmartChineseAnalyzer()); // IndexWriter write = new IndexWriter(directory, config); String[] arr = s.plit (","); String[] arr = s.plit (","); String[] arr = s.plit (","); Documentdoc = new Document(); documentdoc = new Document(); doc.add(new LongPoint("id", Long.parseLong(arr[0]))); doc.add(new StoredField("id", Long.parseLong(arr[0]))); // Add (new TextField("title", arr[1], field.store. YES)); doc.add(new TextField("sellPoint", arr[2],Field.Store.YES)); write.addDocument(doc); } write.flush(); write.close(); } @Test public void test2() throws IOException {// FSDirectory directory = FSDirectory. Open (new) File("d:/abc").toPath()); // DirectoryReader = DirectoryReader.open(directory); [{doc:3.score:0.679},{doc:1,score:0.5151}] IndexSearcher = new IndexSearcher(reader); TermQuery = new termQuery (new Term("title", "huawei ")); {} topDocs topDocs= searcher.search(query,20); // for (ScoreDoc sd: topdocs.scoredocs){int id = sd.doc; // float score = sd.score; Documentdoc = searcher.doc(id); // System.out.println(id); System.out.println(score); System.out.println(doc.get("id")); System.out.println(doc.get("titlle")); System.out.println(doc.get("se;; Poimt")); System.out.println("--------------------------------"); }}}

Solr installation

Now let’s install the Solr server

Pass solr-8.1.1.tgz to the server

Let’s go to /usr/local

cd /usr/local

Upload the file to /usr/local

Unpack the solr

CD /usr/local # upload solr-8.1.1.tgz to /usr/local # and unzip tar-xzf solr-8.1.1.tgz

Start the solr

CD /usr/local/solr-8.1.1 # -force /solr start force # open port firewall-cmd --zone=public firewall-cmd --zone=public --add-port=8983/tcp --permanent firewall-cmd --reload

Browser accesses the Solr console

http://192.168.64.170:8983

Be careful to change the address

To create the core

The index data is stored in Solr. The index data is stored in Solr. A core is created in Solr to store the index data

To create a core named pd, first prepare the following directory structure:

/server/solr/ # pd/ data/ CD /usr/local/ solr-8.1.1mkdir server/solr/ solr/ # pd/conf mkdir server/solr/pd/data

The conf directory is the configuration directory for core. It stores a set of configuration files. We’ll start with the default configuration and change it step by step

Copy the default configuration

CD /usr/local/solr-8.1.1 cp-r server/solr/configsets/_default/conf server/solr/pd

Create a core named pd

Chinese word segmentation test

Fill in the following text and observe the word segmentation results:

Solr is a high performance, Java5 developed, Lucene-based full-text search server. At the same time, it has been extended to provide a richer query language than Lucene, and at the same time, it has realized configurable, extensible and optimized the query performance, and it has provided a perfect function management interface. It is a very excellent full-text search engine.

Chinese word segmentation tool – IK-Analyzer

https://github.com/magese/ik-…

Download the IK-Analyzer participle JAR file and upload it to the solr directory /server/solr-webapp/webapp/ web-inf /lib
- In order to facilitate subsequent operation, we upload the JAR files used later to the server together, including four files:
  - Ik – analyzer – 8.1.0. Jar
  - Mysql connector – Java – 5.1.46. Jar
  - Solr – dataimporthandler – 8.1.1. Jar
  - Solr – dataimporthandler – extras – 8.1.1. Jar
Copy 6 files toSolr directory/server/solr webapp/webapp/WEB - INF/classes

Mkdir /usr/local/solr-8.1.1/server/solr-webapp/webapp/ web-inf /classes

These 6 files are copied to the classes directory resources/ ikanalyzer.cfg.xml ext.dic stopwords.txt ik.conf dynamicdic.txt

Configuration managed – schema

Modify solr directory/server/solr/pd/conf/managed – schema, add ik – analyzer participle

<! -- ik word splitter --> <fieldType name="text_ik" class=" solr.textfield "> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" conf="ik.conf"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

Restart the Solr service

CD /usr/local/solr-8.1.1 bin/solr resist-force

IK-Analyzer is used to test Chinese word segmentation

Fill in the following text and select the text_ik word splitter to see the results:

Solr is a high performance, Java5 developed, Lucene-based full-text search server. At the same time, it has been extended to provide a richer query language than Lucene, and at the same time, it has realized configurable, extensible and optimized the query performance, and it has provided a perfect function management interface. It is a very excellent full-text search engine.

Set the stop word

Upload the stop word configuration file to the solr directory /server/solr-webapp/webapp/ web-inf /classes

stopword.dic
stopwords.txt

Restarted the service and observed that the stop word was ignored in the result

bin/solr restart -force

Prepare MySQL database data

Execute pd.sql with sqlyog
Note: The password of the root user for remote login is set here. The password of the root user for local login remains unchanged

DROP the USER 'root' @ '%'; CREATE USER 'root'@'%' IDENTIFIED BY 'root'; GRANT ALL ON *.* TO 'root'@'%';

Randomly modify 30% of the goods, so that the goods removed from the shelf, for later query testing

UPDATE pd_item SET STATUS=0 WHERE RAND()<0

Import product data from MySQL

Set the field

title text_ik
sellPoint text_ik
price plong
barcode string
image string
cid plong
status pint
created pdate
updated pdate

Copy Field

When querying, you need to query by field, such as title: computer, you can combine the values of multiple fields into one field for querying, and the default query field is _text_

Copy the title and sellPoint to the _text_ field

Data Import Handler configuration

Add a JAR file

The JAR files for the Data Import Handler are stored in the solr /dist directory

Solr - dataimporthandler - 8.1.1. Jar solr - dataimporthandler - extras - 8.1.1. Jar

Copy these files and the MySQL JAR files to the solr directory /server/solr-webapp/webapp/ web-inf /lib

dih-config.xml

Mysql > alter IP address of mysql

Solr directory/server/solr/pd/conf
Add the DIH configuration to solrconfig.xml

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">  
        <str name="config">dih-config.xml</str>  
    </lst>  
</requestHandler>

Restart the solr

CD /usr/local/solr-8.1.1 bin/solr resist-force

Import data

After restarting Solr, import the data and confirm that the number of documents imported is 3160

Query test

Copy the field`_text_`Look for`The computer`

Look in the title`The computer`

Find the complete word in double quotation marks`"Notebook"`

search`+ lenovo, a computer`

search`+ lenovo computer`

Statistical cid

A price range

In the Raw Query Parameters input box, enter the following: facet.range=price&facet.range.start=0&facet.range.end=10000&facet.range.gap=2000

Multi-field statistics

Enter the following in the Raw Query Parameters input box: facet.pivot=cid,status

Pinduo mall to achieve the full text of commodity retrieval

Modify the hosts file and add the www.pd.com mapping

127.0.0.1      www.pd.com

Eclipse imports the PD-Web project

Modify the database connection configuration

In the application.yml configuration file, modify the connection configuration

spring: datasource: type: com.alibaba.druid.pool.DruidDataSource driver-class-name: com.mysql.jdbc.Driver url: JDBC: mysql: / / 127.0.0.1:3306 / pd_store? useUnicode=true&characterEncoding=UTF-8 username: root password: root

To start the project, visit www.pd.com

Product retrieval call analysis

Pom.xml adds Solr and Lombok dependencies

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-solr</artifactId>
</dependency>

<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
</dependency>

Application.yml Add Solr connection information

Spring: data: solr: # notice modify the host IP address: http://192.168.64.170:8983/solr/pd

The Item entity class

package com.pd.pojo;

import java.io.Serializable;

import org.apache.solr.client.solrj.beans.Field;

import lombok.Data;

@Data
public class Item implements Serializable {
        private static final long serialVersionUID = 1L;
        
        @Field("id")
        private String id;
        @Field("title")
        private String title;
        @Field("sellPoint")
        private String sellPoint;
        @Field("price")
        private Long price;
        @Field("image")
        private String image;

}

SearchService business interface

package com.pd.service;

import java.util.List;

import com.pd.pojo.Item;

public interface SearchService {
    List<Item> findItemByKey(String key) throws Exception;
}

SearchServiceImpl business implementation class

package com.pd.service.impl; import java.util.List; import org.apache.solr.client.solrj.SolrClient; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.response.QueryResponse; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Configuration; import org.springframework.stereotype.Service; import com.pd.pojo.Item; import com.pd.service.SearchService; @Service public class SearchServiceImpl implements SearchService {/* * SolrClient instance is created in the SolrautoConfiguration class * * SolrautoConfiguration adds the @Configuration annotation, */ @Autowired private SolrClient; */ Autowired private SolrClient; */ @Autowired private SolrClient; Override public List<Item> findITEMByKey (String Key) throws Exception {// Enable the key of the query as well as the facet of the specified field SolrQuery query = new SolrQuery(key); Query. SetStart (0); query.setRows(20); QueryResponse Qr = solrClient.query(query); List<Item> beans = Qr.getBeans (item.class); return beans; }}

SearchController controller

package com.pd.controller; import java.util.List; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.GetMapping; import com.pd.pojo.Item; import com.pd.service.SearchService; @Controller public class SearchController { @Autowired private SearchService searchService; @GetMapping("/search/toSearch.html") public String search(String key, Model model) throws Exception { List<Item> itemList = searchService.findItemByKey(key); model.addAttribute("list", itemList); return "/search.jsp"; }}

Solr (full) – Lucene-based full-text search server

solr

lucene

Inverted index

The actual example

Lucene API is introduced

Create indexes

Create a new Maven project and add dependencies

Create the test class and add the following code

View index

Run Luke

To view the document

Specify a word splitter and test the word splitter

Query test

Query from index

Solr installation

Pass solr-8.1.1.tgz to the server

Unpack the solr

Start the solr

Browser accesses the Solr console

To create the core

Copy the default configuration

Create a core named pd

Chinese word segmentation test

Chinese word segmentation tool – IK-Analyzer

IK-Analyzer is used to test Chinese word segmentation

Set the stop word

Prepare MySQL database data

Import product data from MySQL

Set the field

Copy Field

Data Import Handler configuration

Import data

Query test

Copy the field_text_Look forThe computer

Look in the titleThe computer

Find the complete word in double quotation marks"Notebook"

search+ lenovo, a computer

search+ lenovo computer

Statistical cid

A price range

Multi-field statistics

Pinduo mall to achieve the full text of commodity retrieval

Modify the hosts file and add the www.pd.com mapping

Eclipse imports the PD-Web project

Modify the database connection configuration

To start the project, visit www.pd.com

Product retrieval call analysis

Pom.xml adds Solr and Lombok dependencies

Application.yml Add Solr connection information

The Item entity class

SearchService business interface

SearchServiceImpl business implementation class

SearchController controller

Related Posts

Based on Lucene, trillion-level multi-dimensional retrieval and real-time analysis are realized

– Check Elasticsearch’s Cache footprint (Qbit)

Learn about Lucene’s full-text index in 5 minutes

Copy the field`_text_`Look for`The computer`

Look in the title`The computer`

Find the complete word in double quotation marks`"Notebook"`

search`+ lenovo, a computer`

search`+ lenovo computer`