Small knowledge, big challenge! This paper is participating in theEssentials for programmers”Creative activities

In order to better operate ES, Kibana is installed here for auxiliary operation, and kibana is also used in the actual operation process

Install the Sense

Sense is a Kibana application with an interactive console that helps you submit requests to Elasticsearch directly from your browser. Many of the code examples in the online edition of the book include the View in Sense link. When you click, it will automatically run the code in the Sense console. You don't have to install Sense, but that would take away a lot of the fun of interacting with the book and experimenting with code directly in your local cluster.Copy the code

Run the following command in the Kibana directory to download and install the Sense program :(devtool support after 5.0)

Write using command

The GET method

curl -XGET 'http://localhost:9200/_count? pretty' -d '
{ 
    "query": {
    "match_all": {}}} 'Copy the code
  1. Corresponding HTTP request method or variable: GET, POST, PUT, HEAD, or DELETE.
  2. The access protocol, host name, and port of any node in the cluster.
  3. The requested path.
  4. Any query plus? Pretty generates more elegant JSON feedback for better readability.
  5. A JSON-encoded request body, if required.
Response content
{
    "count" : 0."_shards" : {
        "total" : 5."successful" : 5."failed" : 0}}Copy the code

Create a list of employees

Imagine we are creating a new employee list system for the HR department of a company called Megacorp. These lists should allow for real-time collaboration, so it should meet the following requirements:

  • Data can contain labels, numbers, and plain text content with multiple values,
  • Can retrieve all data of any employee.
  • Allow structured search. For example, find employees over 30.
  • Allows simple full-text searches as well as relatively complex phrase searches.
  • Highlight the keyword in the matching document returned.
  • Have data statistics and management background.
Relational database flicker Database flicker index flickerCopy the code

So to create a list of employees, we need to do the following:

  • Create indexes for each employee’s documents, each of which contains all information about an employee
  • Each document will be marked as of type Employee.
  • This type will live in the Megacorp index.
  • This index will be stored in the Elasticsearch cluster

Add an index library

curl -XPUT 'http://localhost:9200/megacorp/employee/1? pretty' -d '
{
	"first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests": [ "sports"."music" ]
}'

curl -XPUT 'http://localhost:9200/megacorp/employee/2? pretty' -d '
{
    "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests": [ "music" ]
}
'

curl -XPUT 'http://localhost:9200/megacorp/employee/3? pretty' -d '
{
    "first_name" : "Douglas"."last_name" : "Fir"."age" : 35."about": "I like to build cabinets"."interests": [ "forestry"]} 'Copy the code
{
  "_index" : "megacorp"."_type" : "employee"."_id" : "1"."_version" : 1."result" : "created"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "created" : true} Name of the megacorp index Name of the Employee type1ID of the current employeeCopy the code

Retrieve the document

curl -XGET 'localhost:9200/megacorp/employee/1? pretty'Copy the code

The returned content contains metadata information for the document, and John Smith’s original JSON document appears in the _source field:

{
    "_index" : "megacorp"."_type" : "employee"."_id" : "1"."_version" : 1."found" : true."_source" : {
        "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests": [ "sports"."music"]}}Copy the code

We can GET the document by changing the post-HTTP request from PUT to GET. Similarly, we can DELETE the document by changing it to DELETE. HEAD is used to check whether the document exists. If you want to replace an existing document, you simply issue the request again using PUT.

The simple search

Search all employees:

curl -XGET 'localhost:9200/megacorp/employee/_search? pretty'Copy the code

The response data

{
  "took" : 9."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 3."max_score" : 1.0."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "2"."_score" : 1.0."_source" : {
          "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests" : [
            "music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 1.0."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "3"."_score" : 1.0."_source" : {
          "first_name" : "Douglas"."last_name" : "Fir"."age" : 35."about" : "I like to build cabinets"."interests" : [
            "forestry"}}]}}Copy the code

Query string search

Abbreviations:

GET /megacorp/employee/_search? q=last_name:SmithCopy the code

The abbreviations:

curl -XGET 'localhost:9200/megacorp/employee/_search? Q =last_name:Fir&pretty' (last_name=Far)Copy the code

Search using Query DSL

The query string is a point-to-point AD hoc search through a command statement, but this has its limitations (see

Search Limitations section). Elasticsearch provides a richer and more flexible Query language called Query

DSL, through which you can perform more complex and powerful search tasks.

The query

curl -XGET '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"query" : {
        "match" : {
        	"last_name" : "Smith"}}} 'Copy the code

The query results

{
  "took" : 2."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 2."max_score" : 0.2876821."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "2"."_score" : 0.2876821."_source" : {
          "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests" : [
            "music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.2876821."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"}}]}}Copy the code

More sophisticated searches

Next, let’s make the search a little harder. We’re still looking for someone with the last name Smith, but we’ll add a qualification that they’re older than 30. Our query statement will have some minor tweaks to recognize the filter qualifier for structured search:

curl -XGET 'localhost:9200/megacorp/employee/_search? pretty' -d ' {"query" : {
        "filtered": {"filter": {"range": {"age": {"gt": "30"}}},"query": {"match": {"last_name":"Smith"}}}}} 'Copy the code

The statement in this section is the Range filter, which queries all data over the age of 30 — gt stands for greater than.

no [query] registered for [filtered]**

Workaround: Filtering queries have been deprecated and removed in ES 5.0. The bool/must/filter query should now be used.

curl -XPOST '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 20}}},"must": {
        "match": {
          "last_name": "Smith"}}}}} 'Copy the code

The results of

{
  "took" : 9."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 2."max_score" : 0.2876821."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "2"."_score" : 0.2876821."_source" : {
          "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests" : [
            "music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.2876821."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"}}]}}Copy the code

Full-text search

curl -XPOST '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"query" : {
        "match" : {
        	"about" : "rock climbing"}}} 'Copy the code

The results of

{
  "took" : 25."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 2."max_score" : 0.53484553."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.53484553."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "2"."_score" : 0.26742277."_source" : {
          "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests" : [
            "music"}}]}}Copy the code

You will see that we also used the match query to search rock Climbing in the About field. We get two matching documents:

Elasticsearch is usually ordered by correlation, and in the first result, John Smith’s About field explicitly writes to Rock Climbing. In Jane Smith’s about field, rock was mentioned but climbing was not mentioned, so the _score of the latter was lower than that of the former.

Paragraph search

The curl - XPOST '172.18.118.222:9200 / megacorp/employee / _search? pretty' -d ' { "query" : { "match_phrase" : { "about" : "rock climbing" } } } 'Copy the code

The results of

{
  "took" : 23."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 1."max_score" : 0.53484553."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.53484553."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"}}]}}Copy the code

Highlight our search

curl -XPOST '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"query" : {
        "match_phrase" : {
        	"about" : "rock climbing"}},"highlight": {
        "fields" : {
        	"about": {}}}} 'Copy the code

The results of

{
  "took" : 7."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 1."max_score" : 0.53484553."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.53484553."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"]},"highlight" : {
          "about" : [
            "I love to go <em>rock</em> <em>climbing</em>"}}]}}Copy the code

statistical

Finally, we have one more requirement to complete: we can let the boss do statistics in the employee directory. Elasticsearch calls this feature Aggregations, and with this feature we can perform complex statistics on your data. This functionality is somewhat similar to, but more powerful than, GROUP BY in SQL.

For example, let’s find out what the most popular interests among employees are:

The curl - XPOST '172.18.118.222:9200 / megacorp/employee / _search? pretty' -d ' { "aggs": { "all_interests": { "terms": { "field": "interests" } } } } 'Copy the code

Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead

Fielddata is disabled on text fields by default. Set FieldData = true on [interest] to load fieldData in memory by reversing the index. Please note that this can take up a lot of memory. Alternatively, you can use keyword fields

(FieldData can consume a lot of stack memory, especially when it comes to loading text, so once a single Fielddata is loaded, it will always be there.)

The curl - XPOST '172.18.118.222:9200 / megacorp/employee / _search? pretty' -d ' { "aggs": { "all_interests": { "terms": { "field": "interests.keyword" } } } } 'Copy the code

Results (excerpts)

"aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "music"."doc_count" : 2
        },
        {
          "key" : "forestry"."doc_count" : 1
        },
        {
          "key" : "sports"."doc_count" : 1}}}]Copy the code

Query summary

curl -XGET '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"query": {
        "match": {
        	"last_name": "Smith"}},"aggs": {
            "all_interests": {
                "terms": {
                    "field": "interests.keyword"}}}} 'Copy the code

The results of

{
  "took" : 11."timed_out" : false."_shards" : {
    "total" : 5."successful" : 5."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : 2."max_score" : 0.2876821."hits": [{"_index" : "megacorp"."_type" : "employee"."_id" : "2"."_score" : 0.2876821."_source" : {
          "first_name" : "Jane"."last_name" : "Smith"."age" : 32."about" : "I like to collect rock albums"."interests" : [
            "music"]}}, {"_index" : "megacorp"."_type" : "employee"."_id" : "1"."_score" : 0.2876821."_source" : {
          "first_name" : "John"."last_name" : "Smith"."age" : 25."about" : "I love to go rock climbing"."interests" : [
            "sports"."music"]}}]},"aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "music"."doc_count" : 2
        },
        {
          "key" : "sports"."doc_count" : 1}]}}}Copy the code

Summaries also allow for multiple levels of statistics. For example, we can also count the average age of each interest:

curl -XGET '172.18118.222.:9200/megacorp/employee/_search? pretty' -d ' {"aggs" : {
        "all_interests" : {
        	"terms" : { "field" : "interests.keyword" },
            "aggs" : {
                "avg_age" : {
                	"avg" : { "field" : "age"}}}}}} 'Copy the code

The results of

 "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "music"."doc_count" : 2."avg_age" : {
            "value" : 28.5}}, {"key" : "forestry"."doc_count" : 1."avg_age" : {
            "value" : 35.0}}, {"key" : "sports"."doc_count" : 1."avg_age" : {
            "value" : 25.0}}]}}Copy the code

ElasticSearch is a distributed feature

Elasticsearch works hard to avoid complex distributed systems, and many operations are automated:

  • You can partition your documents into different containers or shards. These documents may be stored in one node or multiple nodes.
  • Balancing indexing and search load across nodes in a cluster.
  • Automatically copy your data to provide redundant copies, preventing data loss due to hardware errors.
  • Automatically routes between nodes to help you find the data you want. Seamlessly scale or restore your cluster

An empty cluster

  • A node is a running instance of Elasticsearch, while a cluster contains one or more nodes with the same cluster.name that work together, share data, and share the workload. Since the nodes are subordinate to the cluster, the cluster reorganizes itself to distribute data evenly.
  • A node in the cluster is selected as the master node, which is responsible for managing changes to the cluster scope, such as creating or removing cables, adding nodes to the cluster, or removing nodes from the cluster. Master nodes are not required to participate in changes and searches at the document level, which means that having only one master node is not a bottleneck due to traffic growth. Any node can be a master node. Our example cluster has only one node, so it acts as a master node.
  • As users, we can access any node in the cluster, including the master node. Each node knows the location of each document and can forward our request directly to the node that has the data we want. No matter which node we visit, it controls the process of collecting the response from the node with the data and returning the final result to the client. This is all transparently managed by Elasticsearch.

Cluster health

There are many things you can monitor in Elasticsearch clusters, the most important of which is cluster Health. Its status is green, yellow, and red.

GET /_cluster/health
Copy the code
{
    "cluster_name": "elasticsearch"."status": "green"< 1 >,"timed_out": false."number_of_nodes": 1."number_of_data_nodes": 1."active_primary_shards": 0."active_shards": 0."relocating_shards": 0."initializing_shards": 0."unassigned_shards": 0
}
Copy the code

Status is the field we should focus on the most.

state meaning
green All master and slave shards are available
yellow All master shards are available, but there are slave shards that are not
red A major shard is unavailable

Adding indexes

PUT /blogs
{
    "settings" : {
        "number_of_shards" : 3."number_of_replicas" : 1}}Copy the code