“This is the second day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

For Elasticsearch, you can search by group, or group by. For Elasticsearch, you can search by group, or group by.

1. What is aggregated search

Aggregation allows you to aggregate your data, call it a metric of dimensions, statistics. Aggregation is a good way to answer the following questions:

  • What is the average load time of my site?
  • Who is my most valuable customer based on transaction volume?
  • What is considered a large file on the network?
  • How many products are there in each product category?

Elasticsearch divides aggregations into three categories:

  • An aggregation of metrics (such as summation or average) calculated from field values.
  • Bucket aggregation, also known as buckets, groups documents into buckets based on field values, ranges, or other criteria.
  • Pipe aggregation that takes input from other aggregations rather than documents or fields.

Let’s take a look at how to implement this aggregation.

2. Basic DSL syntax

Let’s first insert a few pieces of data into the book index, and then see what happens.

2.1 Index Aggregation

For example, if we want to get the average of the prices in the index, we can write it like this:

GET /book/_search
​
{
    "aggs": {
        "avg_bucket": {
            "avg": {
                "field": "price"
            }
        }
    }
}
Copy the code

Aggs is the abbreviation of Aggregrations, which stands for aggregate search, just as the query used in the previous search stands for regular search. Avg_bucket is the name of the aggregate search that we define ourselves, and the attribute avG at the next level represents the calculation to be performed, where the keywords are avg, sum, Max, and min, and the field at the next level is the field to be calculated. Then we can get the average price of all books, which returns the following result:

{ "took": 62, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { ... Aggregations: {" avG_bucket ": {"value": 65.64846185537485}}Copy the code

What if we only want to return the results of our calculation, but not the hits value of our search? Set the search size to 0:

GET /book/_search
{
    "size": 0,
    "aggs": {
        "avg_bucket": {
            "avg": {
                "field": "price"
            }
        }
    }
}
Copy the code

You can get the following result:

{ "took": 43, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 13, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "avg_bucket": {"value": 65.64846185537485}}Copy the code

2.2 barrels of polymerization

Now, we can do a more complex bucket aggregation, so let’s say we want to see which book is the most expensive in each book category. The basic logic is to group books by category first, and then have a top1 book in each category. This requires bucket aggregation to search:

GET /book/_search { "size": 0, "aggs": { "group": { "terms": { "field": "categoryId" }, "aggs": { "price_top_1": { "top_hits": { "size": 1, "_source": { "includes": [// Note: To save returning data, here we define to return only these fields "ID ", "bookName", "categoryId", "categoryName", "price"]}, "sort": [{"price": { "order": "desc" } } ] } } } } } }Copy the code

Let’s look layer by layer. The outermost AGGs is the initial aggregate search condition. Terms indicates that we need to group by a field. And then we have an AGGS and this is an aggregate search that is going to look at the content under the class, the outermost sub_aggregration, and then instead of taking all the data under the class, we’re just taking one, so we’re using the top_hits method, Price_top_1 is the user-defined name of the aggregation search, and top_hits is the keyword of the aggregation search, indicating that some data is collected here. Then the property in top_hits is similar to the property we normally search for. We define size: 1, which is sorted in reverse order by price, indicating that we only take the first one in reverse order by price. So the final result is:

{ "took":23, "timed_out":false, "_shards":{ "total":3, "successful":3, "skipped":0, "failed":0 }, "Hits" : {" total ": {" value" : 13, / / the total number of document "base" : "eq"}, "max_score:" null, "hits" : []}. "Aggregations ":{"buckets": {"key":1, "aggregations":{"buckets":[// Price_top_1 ":{// custom top1 aggregation result "hits":{"total":{"value":6, / / document number "function", "eq"}, "hits" : / / / here is to get under the category 1 the price of the highest books {" _index ":" book ", "_type" : "_doc", "_id" : "3", "_score:" null, "_source" : {" price ": 88," id ": 3," bookName ":" Java programming ideas ", "categoryName" : "textbook", "categoryId" : 1}}}}}, { "key":3, "doc_count":4, "price_top_1":{ "hits":{ "total":{ "value":4, "relation":"eq" }, "max_score":null, "Hits" : [{" _index ":" book ", "_type" : "_doc", "_id" : "12", "_source" : {" price ": 99.68," id ": 12," bookName ":" zhou enlai biography ", "CategoryName" : "biography", "categoryId" : 3}}}}}, {" key ": 2," doc_count ": 3, "price_top_1":{ "hits":{ "total":{ "value":3, "relation":"eq" }, "max_score":null, "hits":[ { "_index":"book", "_type" : "_doc", "_id" : "8", "_source" : {40, "price" : "id" : 8, "bookName" : "giant", "categoryName" : "novel", "categoryId":2 } } ] } } } ] } } }Copy the code

Finally, we got the grouping results and the data we wanted.

Note: you must be smart enough to see from the search results of the group, the group is not sorted by category ID. There are two default sort criteria for grouping here: one isdoc_count, that is, the number of documents in the group is in descending order by default. The more documents, the higher the number of documents. The other one is_key, that is, the group key value, the default is chronological sort, basically is numeric sort or character sort. So how do you define aggregate search sort? You can explore this for yourself.

3. The Java code

So how do we implement the DSL statements in 2.2 from Java code?

The formula is still fixed, first you need to build a search request, and then assemble the search condition, but the original query condition, into the current aggregation condition, the code is as follows:

Private SearchResponse searchGroupData() {// build SearchRequest SearchRequest = new SearchRequest(EsConstant.BOOK_INDEX_NAME); AggregationBuilder Groupagish = AggregationBuilders // here is the name of the outermost aggregation defined.terms("group") // This is the number of groups returned, Max_result_window.size (10) // categoryId ("categoryId"); // categoryId ("categoryId"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); Searchsourcebuilder.size (0); // Define returned field String[] includes = {" ID ", "bookName", "categoryId", "categoryName", "price"}; String[] excludes = {}; AggregationBuilder topHits = AggregationBuilders // Custom topHits name. TopHits ("price_top_1") // In reverse order by price .sort(SortBuilders. FieldSort ("price").Order (sortOrder.desc)). FetchSource (includes, excludes) // Only take 1. Size (1); / / combine two polymerization, topHits as a group of child aggregate groupAggregation. SubAggregation (topHits); / / for the total number of grouping after CardinalityAggregationBuilder totalAggregation = AggregationBuilders. The cardinality (" total ") .field("categoryId"); / / the polymerization in the search for the constructor searchSourceBuilder aggregation (groupAggregation); searchSourceBuilder.aggregation(totalAggregation); searchRequest.source(searchSourceBuilder); try { return client.search(searchRequest, COMMON_OPTIONS); } catch (IOException e) { log.error("Failed to aggregations search data", e); } return null; }Copy the code

This is our query code, and then even if we get the result, how to parse to get the result we want? I wrote a simple example, you can refer to:

@Override public List<CategoryGroup> categoryGroup() { SearchResponse groupSearchResponse = searchGroupData(); if (Objects.isNull(groupSearchResponse)) { return Collections.emptyList(); } Aggregations aggregations = groupSearchResponse.getAggregations(); List<CategoryGroup> groupList = new ArrayList<>(); If (objects.nonnull (aggregations)) {// categoryId = categoryId, ES = long, ParsedLongTerms = aggregations. Get ("group"); ParsedLongTerms = aggregations. Aggregation ParsedCardinality totalCount = aggregations. Get ("total"); // Aggregate ParsedCardinality totalCount = aggregations. // We will not return the total number here, just print log.info(" get the total number of groups: {}", totalCount); // Iterate over the Bucket for (Terms.Bucket Bucket: ParsedTopHits topHits = bucket.getaggregations ().get("price_top_1"); SearchHits groupHits = topHits.getHits(); SearchHit[] hits = groupHits.getHits(); List<Book> books = new ArrayList<>(); Arrays.stream(hits).forEach(hit -> { Map<String, Object> sourceAsMap = hit.getSourceAsMap(); Book book = BeanUtil.mapToBean(sourceAsMap, Book.class, false, new CopyOptions()); if (Objects.nonNull(book.getId())) { books.add(book); }}); CategoryGroup categoryGroup = new CategoryGroup(); / / the key value is the category ID categoryGroup. SetCategoryId (Integer. The valueOf (bucket. GetKeyAsString ())); categoryGroup.setBooks(books); groupList.add(categoryGroup); } } return groupList; }Copy the code

conclusion

In this article we briefly showed you what aggregate search can do, but there are a few more things you need to try out for yourself, such as how aggregated results can be paged, how to count the total number of groups, and how sorting can be customized. If you encounter problems or have your own methods, we can also communicate and discuss together.

link

  • Elasticsearch Guide Aggregations
  • Code address: github.com/lq920320/es…