Record an ElasticSearch fuzzy match

When I first started elasticSearch, I wanted to use this technology to wrap all interfaces, but some of the ideas were not real. At first, search was really a speed pleasure, so I CRUD any requirements with this technology. Without further ado, now let’s introduce the small problems encountered in the use process. The requirements are as follows:

Like %XXXXXX% in mysql
Complete matching based on ElasticSearch such as %XXXXXX%
For example, if the data of a field is “zhang SAN”, then you must use “zhang” – “three” – “zhang SAN” and other three words can query the corresponding data

Two online word segmentation tests can be provided below

Ik participle test online https://www.itgongju.com/ikAn…
Ansj participle test online https://www.itgongju.com/ansj…

The WildCardQuery is used to query the query (provided that the query fields in ES are not participles). The following code is used to create an index without participles (tested directly using Junit).

/** * @Description: */ @test public void createIndex2() throws Exception{Client Client = esutils.getesClient (); Index CreateIndexResponse CreateIndexResponse = client.admin().indices() .prepareCreate("testindex2").execute().actionGet(); System. Out. Println (" create without word segmentation index 2 = "+ createIndexResponse. IsAcknowledged ()); PutMappingRequestBuilder mappingRequest = client.admin().indices().preparePutMapping("testindex2").setType("indextype2").setSource(createTestModelMapping2()); PutMappingResponse putMappingResponse = mappingRequest.execute().actionGet(); System. The out. Println (" create index 2 without participle putMappingResponse = "+ putMappingResponse. IsAcknowledged ()); EsUtils.closeClient(); Private XContentBuilder createTestModelMapping2()throws Exception{XContentBuilder Builder = XContentFactory.jsonBuilder().startObject().startObject("indextype2").startObject("properties").startObject("wildcardStr ").field("type", "string").field("index", "not_analyzed").field("store", true) .endObject() .endObject() .endObject(); return builder; }

So our mapping has been created, and now we insert data. At present, I insert 3000000 data for query, as shown in the figure

Now that we are ready to query, I will use the Head plugin directly to query the description in the requirement

Let’s start with the word “zhang.” (Remember, when using wildcard queries, don’t split words.)
Let’s look up the word “three” first
Let’s look up the word “Zhang SAN”

Let’s test the speed of WildCard query again. I’ll continue to show you how to select the exact match. Here is a WildCardQuery that I ran 10 times on 300,000 data indexes

The key point here is that we are going to carry out the realization of the unitary participle. First let’s go ahead and create an index

/*** * Create an index with a segmentation 1 */ @test public void creatIndex1() throws Exception{Client Client = esutils.getesClient (); Index CreateIndexResponse CreateIndexResponse = client.admin().indices() .prepareCreate("testindex1").setSettings(createTestIndex1Settings()).execute().actionGet(); System. The out. Println (" create a participle index 1 = "+ createIndexResponse. IsAcknowledged ()); PutMappingRequestBuilder mappingRequest = client.admin().indices().preparePutMapping("testindex1") .setType("indextype1") .setSource(createTestModelMapping1()); PutMappingResponse putMappingResponse = mappingRequest.execute().actionGet(); System. Out. Println (" create a participle index 1 putMappingResponse = "+ putMappingResponse. IsAcknowledged ()); EsUtils.closeClient(); Settings private XContentBuilder createTestIndex1Settings() throws Exception{XContentBuilder Settings =  XContentFactory.jsonBuilder().startObject() .startObject("analysis") .startObject("analyzer") .startObject("ngramAnalyzer").field("tokenizer", "my_ngramAnalyzer").endObject().endObject() .startObject("tokenizer") .startObject("my_ngramAnalyzer").field("type", "ngram").field("min_gram", 1).field("max_gram", 1).field("token_chars", "letter, digit") .endObject() .endObject() .endObject() .endObject(); return settings; Private XContentBuilder createTestModelMapping1()throws Exception{XContentBuilder Builder = XContentFactory.jsonBuilder().startObject() .startObject("indextype1") .startObject("properties") .startObject("anlyzerStr").field("type", "string").field("analyzer", "ngramAnalyzer").field("store", true) .endObject() .endObject() .endObject() .endObject(); return builder; }

Once this has been created, let’s continue to look at the construction of the entire mapping

As you can see from the figure above, we specify the setting and the word participle. Ngram is the word participle in es. Now we insert the data

Insert data is also 3000000 data, let’s query

From the above three figures, we can see that the use of unitary participles can also achieve complete matching, and the following is a comparison of the query speed, as shown in the figure

QueryString is the fastest query for Elasticsearch. QueryString is the fastest query for Elasticsearch. QueryString is the fastest query for Elasticsearch.

Related Posts

Little White metropolis email push? Can’t you?

Learn more about the locking mechanism for fair and unfair locks in ReentrantLock

A little thought on the agile development process