We know that Elasticsearch searches are different from traditional RDMS searches. It cannot use joins to join and search two different indexes. Our search for multiple indexes is limited to:

GET index1,index2,other_index*/_search
Copy the code

This operation. This does not make our search results any more relevant because the search results are separate. In practice, if we want to search for a keyword from an index, the keyword can be used as a parameter in another search. This means that the second search keyword is dynamic, not fixed. For example, in the following search, I want the keyword blue to be searched from another index rather than hard-coded.

GET my-index-000001/_search
{
  "query": {
    "term": {
      "color": {
        "value": "blue"
      }
    }
  }
}
Copy the code

We want to do this with a search command, so how do we do this?

In today’s article, I’ll use Terms Lookup Query to show you how to do this.

 

What is Terms Lookup?

The Terms Lookup retrieves the field values of an existing document. Elasticsearch then uses these values as search terms. This can be helpful when searching for a large number of terms. Because term lookup retrieves values from documents, the _source mapping field must be enabled to use term lookup. The _source field is enabled by default.

Note: By default, Elasticsearch limits word queries to a maximum of 65,536 words. This includes terms retrieved using term lookup. You can change this limit with the index.max_terms_count setting.

 

To perform a term lookup, use the following parameters

index

(Required, string) The name of the index from which to get the field value.

id

(Required, string) The ID of the document from which to get the field value.

path

(Required, string) The name of the field from which to get the field value. Elasticsearch uses these values as the search terms for the query. If the field value contains an array of nested internal objects, these objects can be accessed using dot notation syntax.

routing

(Optional, string) A custom route value for the document from which the term value is obtained. This parameter is required if a custom routing value is provided when indexing the document.

 

Terms lookup example

To see how term lookup works, try the following example.

We create two different indexes as follows:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "color": { "type": "keyword" }
    }
  }
}

PUT my-index-000002
{
  "mappings": {
    "properties": {
      "favorite_color": { "type": "keyword" }
    }
  }
}
Copy the code

We use the following method to create the contents of the above two indexes:

POST _bulk
{ "index" : { "_index" : "my-index-000001", "_id" : "1" } }
{ "color" : ["blue", "green"] }
{ "index" : { "_index" : "my-index-000001", "_id" : "2" } }
{ "color" : ["blue"] }
{ "index" : { "_index" : "my-index-000002", "_id" : "1" } }
{ "favorite_color" : "blue" }
Copy the code

Above, we create two documents for index my-index-000001 and one for index my-index-000002.

As a normal search, we want to search all documents with blue color in my-index-000001, so I can use the following command:

GET my-index-000001/_search
{
  "query": {
    "match": {
      "color": "blue"
    }
  }
}
Copy the code

Or:

GET my-index-000001/_search
{
  "query": {
    "term": {
      "color": {
        "value": "blue"
      }
    }
  }
}
Copy the code

The command above will return the following result:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 2, the "base" : "eq"}, "max_score" : 0.21110919, "hits" : [{" _index ": "My - index - 000001", "_type" : "_doc", "_id" : "1", "_score" : 0.21110919, "_source" : {" color ": [ "blue", "green" ] } }, { "_index" : "my-index-000001", "_type" : "_doc", "_id" : "2", "_score" : 0.21110919, the "_source" : {" color ": [" blue"]}}}}]Copy the code

Above, we used a fixed blue keyword in the search command. Let’s say there’s a situation where my blue is not hard-coded, but needs to change dynamically. It can be searched from another index, so how do we do this search? We can do this using terms Lookup Query. It is written like this:

GET my-index-000001/_search
{
  "query": {
    "terms": {
      "color": {
        "index": "my-index-000002",
        "id": "1",
        "path": "favorite_color"
      }
    }
  }
}
Copy the code

Above, we use the my-index-000002 index to search for the document with ID “1” and use favorite_color as path. We know that in the favorite_color document id “1”, its value is blue, which means that we use blue for our query. The above query returns the following result:

{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 2, the "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ":" my - index - 000001 ", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : {" color ": [" blue", "green"]}}, {" _index ": "My - index - 000001", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "_source" : {" color ": [" blue"]}}}}]Copy the code

This is the same result as our previous query.

Next, we use the same query, but before the query, we modify the contents of the my-index-000002 index with id “1” :

PUT my-index-000002/_doc/1
{
  "favorite_color": "green"
}
Copy the code

Let’s change the content of this document to Green. We do the same query:

GET my-index-000001/_search
{
  "query": {
    "terms": {
      "color": {
        "index": "my-index-000002",
        "id": "1",
        "path": "favorite_color"
      }
    }
  }
}
Copy the code

The command above shows the result:

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 1, the "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ":" my - index - 000001 ", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : {" color ": [" blue", "green"]}}}}]Copy the code

We see that only one document matches, that is, the query correctly reads the updated content from my-index-000002 and searches for the desired result.

 

A practical example

Let’s say there’s a website that remembers the login name when the user logs in. In the user’s profile, we can see the user’s favorite books, for example:

PUT user-profiles/_doc/alex
{
  "preferred_categories" : ["technology"]
}
Copy the code

The user, Alex, likes books on science and technology. The site also has an index of the following books:

PUT books/_bulk
{"index":{"_id":"elasticsearch-definitive-guide"}}
{"name":"Elasticsearch - The definitive guide","category":"technology"}
{"index":{"_id":"seven-databases"}}
{"name":"Seven Databases in Seven Weeks","category":"technology"}
{"index":{"_id":"seven-threads"}}
{"name":"Seven Concurrency Models in Seven Weeks","category":"technology"}
{"index":{"_id":"hell-week"}}
{"name":"Seven days to be your best self","category":"motivational"}
{"index":{"_id":"seven-ways"}}
{"name":"Seven Ways: Easy Ideas for Every Day of the Week","category":"cookbooks"}
{"index":{"_id":"seven-book"}}
{"name":"Seven: A journey from seven to seventy-seven","category":"numberphile"}
Copy the code

So here’s the question:

Problem a:

Search for all books with seven in the title. This one is actually quite simple:

GET books/_search
{
  "query": {
    "match": {
      "name": "seven"
    }
  }
}
Copy the code

The results are as follows:

{ "took" : 661, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : , "hits" : {0} "total" : {" value ": 5," base ":" eq "}, "max_score" : 0.36339492, "hits" : [{" _index ": "Books", "_type" : "_doc", "_id" : "seven - book", "_score" : 0.36339492, "_source" : {" name ":" seven: A journey from seven to seventy-seven", "category" : "numberphile" } }, { "_index" : "books", "_type" : "_doc", "_id" : "Seven-databases ", "_score" : 0.35667667, "_source" : {"name" : "seven databases in seven Weeks", "category" : "Technology"}}, {" _index ":" books ", "_type" : "_doc", "_id" : "seven - threads", "_score" : 0.3411939, "_source" : { "name" : "Seven Concurrency Models in Seven Weeks", "category" : "technology" } }, { "_index" : "books", "_type" : "_doc", "_id" : "hell - week", "_score" : 0.23632807, "_source" : {" name ": "Seven days to be your best self", "category" : "motivational" } }, { "_index" : "books", "_type" : "_doc", "_id" : "_score" : 0.20021, "_source" : {"name" : "seven ways: Easy Ideas for Every Day of the Week", "category" : "cookbooks" } } ] } }Copy the code

From the above, we can see that, in terms of relevance, technology books are not at the top of the list. In order to keep the titles of the technology category at the top, this is because the user Alex likes technology books, and we want his favorite books to be at the top when he logs in.

Question 2

How to rank all technology books first to improve relevance.

We can use the following command:

GET books/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "seven"
          }
        }
      ],
      "should": [
        {
          "match": {
            "category": "technology"
          }
        }
      ]
    }
  }
}
Copy the code

The command above shows the result:

"Hits" : {" total ": {" value" : 5, "base" : "eq"}, "max_score" : 1.0498238, "hits" : [{" _index ": "Books", "_type" : "_doc", "_id" : "seven - databases", "_score" : 1.0498238, "_source" : {" name ": "Seven Databases in Seven Weeks", "category" : "technology" } }, { "_index" : "books", "_type" : "_doc", "_id" : "Seven-threads ", "_score" : 1.0343411, "_source" : {"name" : "Seven Concurrency Models in seven Weeks", "category" : "Technology"}}, {" _index ":" books ", "_type" : "_doc", "_id" : "seven - book", "_score" : 0.36339492, "_source" : { "name" : "Seven: A journey from seven to seventy-seven", "category" : "numberphile" } }, { "_index" : "Books", "_type" : "_doc", "_id" : "hell - week", "_score" : 0.23632807, "_source" : {" name ": "Seven days to be your best self", "category" : "motivational" } }, { "_index" : "books", "_type" : "_doc", "_id" : "_score" : 0.20021, "_source" : {"name" : "seven ways: Easy Ideas for Every Day of the Week", "category" : "cookbooks" } } ] }Copy the code

As can be seen from the above results, we have achieved our goal. All the technology books are at the top of the list.

Question 3

So alex likes Technology, maybe Tom or some other user likes another category. In our design, it is impossible to fix all of our searches to give extra points to technology. We must give extra points to each user’s profile.

To solve this problem, we can use terms lookup query:

GET books/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "seven"
          }
        }
      ],
      "should": [
        {
          "terms": {
            "category": {
              "index": "user-profiles",
              "id": "alex",
              "path": "preferred_categories"
            }
          }
        }
      ]
    }
  }
}
Copy the code

So when each user logs in, we just need to replace the content of the above ID when querying. It will automatically find the user’s preferred category from preferred_categories. In this way, the user interface can show the user’s favorite content in front. Run the command above and you can see the same result as in question 2.

 

conclusion

Terms Lookup Query is useful for many queries where we need to operate on multiple indexes at the same time. This is very useful for searching a large number of etymologies!