Install Elasticsearch with Chinese Word Segmenting + Pinyin (online + offline)

1, online networking installation

Go directly into the container for editing

Elasticsearch -plugin install: https://www.github elasticsearch.co.uk https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.7.0/elasticsearch-analysis-pinyin-6.7.0.zip # elasticsearch-plugin install elasticsearch-plugin https://github.91chifun.workers.dev//https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.7.0/ela Sticsearch - analysis - pinyin - 6.7.0. Zip

Wait for the download to complete and then CD, then check to see if the Pinyin plugin is available

cd plugins/
ls

If the Pinyin plugin is available, the installation is complete, then reboot ES and access it

2. Offline installation

Download and install

# Download wget install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.7.0/elasticsearch-analysis-pinyin-6.7.0.zip /pinyin unzip elasticsearch-analysis-pinyin -6.7.0.zip-d./pinyin/ # Elasticsearch: / usr/share/elasticsearch/plugins / # restart es nodes docker restart elasticsearch

3, test,

You can use the curl command or the Kibana test

Way # Curl Curl -x POST - H "content-type: application/json" 3-d "{\" analyzer \ ": \" pinyin \ ", \ "text \" : \ \ "Andy lau"}" http://139.9.70.155:10092/_analyze # Kibana way GET _analyze {" text ", "Andy lau", the "analyzer" : "pinyin"}

Returning Pinyin indicates the installation is complete

{
  "tokens" : [
    {
      "token" : "liu",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "de",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "hua",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "ldh",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 2
    }
  ]
}

Next, we combine pinyin and word segmentation to search. First of all, we create an index, which indicates that word segmentation is carried out in a custom way. IK_SMART and IK_MAX_WORD are respectively integrated into pinyin, and we have 3 main shards, one copy set for each shard

PUT /test_pinyin
{
  "settings": {
        "analysis": {
            "analyzer": {
                "ik_smart_pinyin": {
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": ["my_pinyin", "word_delimiter"]
                },
                "ik_max_word_pinyin": {
                    "type": "custom",
                    "tokenizer": "ik_max_word",
                    "filter": ["my_pinyin", "word_delimiter"]
                }
            },
            "filter": {
                "my_pinyin": {
                    "type" : "pinyin",
                    "keep_separate_first_letter" : true,
                    "keep_full_pinyin" : true,
                    "keep_original" : true,
                    "first_letter": "prefix",
                    "limit_first_letter_length" : 16,
                    "lowercase" : true,
                    "remove_duplicated_term" : true 
                }
            }
        },
        "number_of_shards": 3,
        "number_of_replicas": 1
  }
}

Then we will create a _mapping template which is of type test, which will set the field specifying which word splitter to use, and manually create the mapping

PUT /test_pinyin/test/_mapping
{
    "properties": {
        "content": {
            "type": "text",
                        "analyzer": "ik_smart_pinyin",
                        "search_analyzer": "ik_smart_pinyin",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
        "age": {
            "type": "long"
        }
    }
}

And then once we’ve created it we’re going to import a couple of bars of data

POST /test_pinyin/test/ {"content":" Mi phone has something ", "age":18} POST /test_pinyin/test/ {"content":" People's Republic of China has a Andy Liu ", "age":18 }

Then we can start a happy query, first of all we do direct Chinese search without words

POST /test_pinyin/test/_search {"query":{"match":{"content":" Andy Lau "}}} # POST /test_pinyin/test/_search {"query":{"match":{"content":" Andy Lau "}} # POST /test_pinyin/test/_search {"query":{"match":{"content":"liudehua"}} "Query ":{"match":{"content":" millet"}} # "query":{"match":{"content":" millet "}} "content":"xiaomi" } } }

Install Elasticsearch with Chinese Word Segmenting + Pinyin (online + offline)

Related Posts

A brief description of how to remove type from Elasticsearch

Be.java-ElasticSearch Notes – Technical Unveils

Elastic provides role-based access control and external authentication capabilities in the GA version of ECE 2.3