preface

This blog post was supposed to be about the fifth or sixth part of the series, since queries are needed after index creation, index document data generation, and some basic concepts are introduced. Some of the current knowledge concepts are all explained after the explanation query is the best, but recently the company is busy and often work overtime, after all, the end of the year. But if I don’t write, I am afraid that I will put it off longer and longer, and finally it will be nothing. So it happened to be snowing in Shanghai at the weekend, and it was too cold to go out, so I sat down in front of my computer and typed this blog post. Because I was responsible for the company’s queries, I did more research and wrote more smoothly. So let’s go to the text.

Why SQL query

The Query language for Elasticsearch is the Query DSL (ES). So why do we use SQL queries? Is this superfluous?

In fact, there is a reason for existence, existence is reasonable. SQL as a database query language, its syntax is simple, easy to write and most server programmers are clear understanding and familiar with its writing. However, as an ES newcomer, even if he is already an experienced programmer, if he is not familiar with ES, he must first learn Query DSL if he wants to use the ES service established by the company. The cost of learning is also a factor affecting the progress of technology development and has high instability. However, if ES query supports SQL, maybe even if he is a student who has worked for one or two years, he can use ES well and participate in the development team smoothly even though he does not understand the complex concept of ES. After all, who can write SQL?

Elasticsearch-SQL

Elasticsearch-SQL is a new extension for Elasticsearch, which is available from NLPChina. The main function is to query ES through SQL. In fact, its underlying is to interpret SQL, convert SQL into DSL syntax, and then query through DSL.

Elasticsearch-sql now supports all versions of ES, and the latest version 6.5. X is also supported, so you can see that the maintenance is quite frequent.

Installing a plug-in

Because of the ES 2.x and 5.x version differences (see version selection for details), we installed the ES plugin slightly differently.

Before 5.0, plugin install was used

./bin/plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/2.4.6.0/elasticsearch-sql-2.4.6.0.zip
Copy the code

After 5.0 (including 6.x) you can install elasticSearch -plugin install

./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/5.0.1/elasticsearch-sql-5.0.1.0.zip
Copy the code

If the installation fails, you can download the elasticSearch-SQL plugin, unzip it, rename the folder to SQL, and put it in the plugins directory of the ES installation path, for example: \ elasticsearch – 6.4.0 \ plugins \ SQL.

Invalid index name [SQL], must not start with ”]; “, “status” : 400}.

Front-end visual interface

The Elasticsearch-SQL extension provides a visual interface for executing SQL queries.

In elasticSearch 1.x / 2.x, you can access the following address directly:

http://localhost:9200/_plugin/sql/
Copy the code

On ElasticSearch 5.x/6.x, this requires installing Node.js and downloading and unpacking the site, then launching the Web front end like this:

cd site-server
npm install express --save
node node-server.js 
Copy the code

The query syntax

Mysql > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL

http://localhost:9200/_sql?sql=select * from indexName limit 10
Copy the code

A simple query

Start with a simple query syntax:

SELECT fields from indexName WHERE conditions
Copy the code

Select * from tableName where indexName = indexName;

SELECT fields from indexName/type WHERE conditions
Copy the code

You can also query multiple types of indexes at the same time. The syntax is as follows:

SELECT fields from indexName/type1,indexName/type2 WHERE conditions
Copy the code

If you want to know how the current SQL interprets SQL as the Query DSL of Elasticsearch, you can use the keyword explain.

http://localhost:9200/_sql/_explain?sql=select * from indexName limit 10
Copy the code

Aggregate class function queries

select COUNT(*),SUM(age),MIN(age) as m, MAX(age),AVG(age)
  FROM bank GROUP BY gender ORDER BY SUM(age), m DESC
Copy the code

Additional enhanced query

Search

 SELECT address FROM bank WHERE address = matchQuery('880 Holmes Lane') ORDER BY _score DESC LIMIT 3
Copy the code

Aggregations

  • Range the age group 20 to 25, 25 to 30, 30-35, 35-40
SELECT COUNT(age) FROM bank GROUP BY range(age, 20.25.30.35.40)
Copy the code
  • range date group by day
SELECT online FROM online GROUP BY date_histogram(field='insert_time'.'interval'='1d')
Copy the code
  • range date group by your config
  SELECT online FROM online GROUP BY date_range(field='insert_time'.'format'='yyyy-MM-dd' ,'2014-08-18'.'2014-08-17'.'now-8d'.'now-7d'.'now-6d'.'now')
Copy the code

Geographic query

Elasticsearch combines geolocation, full-text search, structured search, and analytics. Elasticsearch-sql also supports all location-based queries. The content of Elasticsearch is Geolocation.

1. Geographic coordinate box model filter

The Geo Bounding Box Filter specifies the top, bottom, left and right boundaries of a rectangle. The Filter then only needs to determine whether the longitude of the coordinates is between the left and right boundaries and whether the latitude is between the upper and lower boundaries.

Grammar:

GEO_BOUNDING_BOX(fieldName,topLeftLongitude,topLeftLatitude,bottomRightLongitude,bottomRightLatitude)
Copy the code

Example:

SELECT * FROM location WHERE GEO_BOUNDING_BOX(center,100.0.1.0.101.0.0)
Copy the code

2. Geographical distance filter

The geo_distance filter draws a circle centered on a given location to find documents whose geographic coordinates fall within the specified distance range.

Grammar:

GEO_DISTANCE(fieldName,distance,fromLongitude,fromLatitude)
Copy the code

Example:

SELECT * FROM location WHERE GEO_DISTANCE(center,'1km'.100.5.0.5)
Copy the code

3. Geographical distance interval filter

The Range Distance filter takes the given position as the center of the circle, draws a circle at two given distances respectively, and finds the point between the given minimum Distance and the maximum Distance from the specified point. The only difference from the geo_Distance Filter is that the Range Distance Filter is a loop that excludes the portion of the document that falls within the inner loop.

Grammar:

GEO_DISTANCE_RANGE(fieldName,distanceFrom,distanceTo,fromLongitude,fromLatitude)
Copy the code

Example:

SELECT * FROM location WHERE GEO_DISTANCE_RANGE(center,'1m'.'1km'.100.5.0.50001)
Copy the code

4, Polygon filter (works on points)

Find the points that fall into the polygon. This filter is expensive to use. When you think you need to use it, it’s best to look at Geo-Shapes first.

Grammar:

GEO_POLYGON(fieldName,lon1,lat1,lon2,lat2,lon3,lat3,...)
Copy the code

Example:

SELECT * FROM location WHERE GEO_POLYGON(center,100.0.100.5.2.101.0.0)
Copy the code

5. GeoShape Intersects Filter (Works on GeoShapes)

Here you need to use WKT to represent the shape at query time. Grammar:

GEO_INTERSECTS(fieldName,'WKT')
Copy the code

Example:

SELECT * FROM location WHERE GEO_INTERSECTS(place,'POLYGON ((102 2, 103 2, 103 3, 102 3, 102 2))
Copy the code

More information about geography can be found here.

Practical usage

Let’s use the index NBA we created in the first tutorial of this series as an example:

**1, ** Query information about all NBA teams

http://localhost:9200/_sql?sql=select * from nba limit 10
Copy the code

Query result:

** * select ** from lebron James’ team

http://localhost:9200/_sql?sql=select * from NBA where topStar = "lebron James"Copy the code

Query result:

**3, ** is arranged in descending order according to the team building time

http://localhost:9200/_sql?sql=select * from nba order by date desc
Copy the code

Query result:

** select * from teams with more than 5 championships

http://localhost:9200/_sql?sql=select * from nba where championship  >= 5
Copy the code

Query result:

Select * from team where the number of championships is 1-5, 5-10, 10-15, 15-20, respectively

http://localhost:9200/_sql?sql=SELECT COUNT(championship) FROM nba GROUP BY range(championship, 1,5,10,15,20) 
Copy the code

Query result:

Of course, there are more ways to write, the specific implementation is not described here, interested readers can build their own project and then try, more characteristics of SQL writing can refer to here:

  • Basic condition query
  • Geographic query
  • Aggregation query
  • Additional SQL functionality
  • Scan and scroll
  • Join queries with limited functionality
  • Show Commands
  • Script Fields
  • NestedTypes support
  • Union & Minus support

Java implementation

Select * from elasticSearch-SQL where you want to query elasticSearch-SQL. Select * from elasticSearch-SQL where you want to query elasticSearch-SQL. The parse result is returned in a fixed key-value format.

Introduction of depend on

We need to introduce maven dependencies before we can use them

<dependency>
    <groupId>org.nlpcn</groupId>
    <artifactId>elasticsearch-sql</artifactId>
    <version>x.x.x.0</version>
</dependency>
Copy the code

The version number (X.X.X) must be consistent with that of Elasticsearch. For details, see the following figure.

However, not all versions are available from Maven Repository. We can only obtain the following versions of dependencies directly from Maven Repository, and many of them are missing:

How do we solve the jar dependency problem if we are using another version of ES? Remember when we started downloading the SQL folder after the plug-in was unzipped? For example, the content of the decompressed plug-in folder is as follows:

There is the JAR package we need, and we can add it directly to the project. Of course, the best way is to upload it to the company’s private repository and rely on it through the POM file.

Set up the project

After the jar package problem is resolved, we can formally enter the development phase, create a new SpringBoot project, introduce various dependencies, all ready to find, how to connect to ES?

There are two ways to implement our function. One is to connect to ES through JDBC, just like connecting to database. Another option is through the Tansport client.

The way the JDBC

Code sample

public void testJDBC(a) throws Exception {
        Properties properties = new Properties();
        properties.put("url"."JDBC: elasticsearch: / / 192.168.3.31:9300192168 3.32:9300 /" + TestsConstants.TEST_INDEX);
        DruidDataSource dds = (DruidDataSource) ElasticSearchDruidDataSourceFactory.createDataSource(properties);
        Connection connection = dds.getConnection();
        PreparedStatement ps = connection.prepareStatement("SELECT gender,lastname,age from " + TestsConstants.TEST_INDEX + " where lastname='Heath'");
        ResultSet resultSet = ps.executeQuery();
        List<String> result = new ArrayList<String>();
        while (resultSet.next()) {
              System.out.println(resultSet.getString("lastname") + "," + resultSet.getInt("age") + "," + resultSet.getString("gender"))
        }
        ps.close();
        connection.close();
        dds.close();
    }
Copy the code

This approach is the most intuitive and uses the Druid connection pool, so we also need to introduce Druid dependencies into the project, and we need to pay attention to the version of the dependency, otherwise we will get an error.

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid</artifactId>
    <version>1.0.15</version>
</dependency>
Copy the code

This approach was easy to understand and easy to develop, but I applied it in the project and found it had a lot of shortcomings, so I ended up looking at the source code myself and repackaging the call through the API.

API methods

Elasticsearch-sql () service (” service “, “service”, “service”) Reading the source code, we found the following obvious Service class.

public class SearchDao {

	private static final Set<String> END_TABLE_MAP = new HashSet<>();

	static {
		END_TABLE_MAP.add("limit");
		END_TABLE_MAP.add("order");
		END_TABLE_MAP.add("where");
		END_TABLE_MAP.add("group");

	}

	private Client client = null;


	public SearchDao(Client client) {
		this.client = client;
	}

    public Client getClient(a) {
        return client;
    }

    /**
	 * Prepare action And transform sql
	 * into ES ActionRequest
	 * @param sql SQL query to execute.
	 * @return ES request
	 * @throws SqlParseException
	 */
	public QueryAction explain(String sql) throws SqlParseException, SQLFeatureNotSupportedException {
		returnESActionFactory.create(client, sql); }}Copy the code

The SearchDao class has an Explain method that takes a string SQL and returns QueryAction, which is an abstract class with the following subclasses

As you can see, each subclass corresponds to a query function, aggregate query, default query, delete, hash join query, join query, nested query and so on.

QueryAction we can obtained through QueryActionElasticExecutor class executeAnyAction method to accept, and internal processes, and then can get the corresponding results.

 public static Object executeAnyAction(Client client , QueryAction queryAction) throws SqlParseException, IOException {
        if(queryAction instanceof DefaultQueryAction)
            return executeSearchAction((DefaultQueryAction) queryAction);
        if(queryAction instanceof AggregationQueryAction)
            return executeAggregationAction((AggregationQueryAction) queryAction);
        if(queryAction instanceof ESJoinQueryAction)
            return executeJoinSearchAction(client, (ESJoinQueryAction) queryAction);
        if(queryAction instanceof MultiQueryAction)
            return executeMultiQueryAction(client, (MultiQueryAction) queryAction);
        if(queryAction instanceof DeleteQueryAction )
            return executeDeleteAction((DeleteQueryAction) queryAction);
        return null;
    }
Copy the code

We got the query result, but it is of type Object, so we need to customize it a little bit. We noticed a class called ObjectResultsExtractor, which has the following constructor. The constructor contains three Boolean arguments. What they do is they include score in the result set, they include type in the result set, they include ID in the result set, we can set them all to false.

public ObjectResultsExtractor(boolean includeScore, boolean includeType, boolean includeId) {
    this.includeScore = includeScore;
    this.includeType = includeType;
    this.includeId = includeId;
    this.currentLineIndex = 0;
}
Copy the code

ObjectResultsExtractor it has only one external pulic modified method extractResults.

public ObjectResult extractResults(Object queryResult, boolean flat) throws ObjectResultsExtractException {
    if (queryResult instanceof SearchHits) {
        SearchHit[] hits = ((SearchHits) queryResult).getHits();
        List<Map<String, Object>> docsAsMap = new ArrayList<>();
        List<String> headers = createHeadersAndFillDocsMap(flat, hits, docsAsMap);
        List<List<Object>> lines = createLinesFromDocs(flat, docsAsMap, headers);
        return new ObjectResult(headers, lines);
    }
    if (queryResult instanceof Aggregations) {
        List<String> headers = new ArrayList<>();
        List<List<Object>> lines = new ArrayList<>();
        lines.add(new ArrayList<Object>());
        handleAggregations((Aggregations) queryResult, headers, lines);
        
        // remove empty line.
        if(lines.get(0).size() == 0) {
            lines.remove(0);
        }
        //todo: need to handle more options for aggregations:
        //Aggregations that inhrit from base
        //ScriptedMetric

        return new ObjectResult(headers, lines);

    }
    return null;
}
Copy the code

So far we have a rough understanding of its query API, and then we just need to do the following code calls in our project to complete our query function, and finally the ObjectResult is our final query result set.

/ / 1. Explain SQL
SearchDao searchDao = new SearchDao(transportClient);
QueryAction queryAction = searchDao.explain(sql);
/ / (2) is carried out
Object execution = QueryActionElasticExecutor.executeAnyAction(searchDao.getClient(), queryAction);
Format the query result
ObjectResult result = (new ObjectResultsExtractor(true.false.false)).extractResults(execution, true);
Copy the code

At this point, the code development is completed, we come to test the results of the operation, I provided three external interfaces, one is THE API way to query, a JDBC way to query, and an explanation of SQL.

@RestController
@RequestMapping("/es/data")
public class ElasticSearchController {

    @Autowired
    private ElasticSearchSqlService elasticSearchSqlService;

    @PostMapping(value = "/search")
    public CommonResult search(@RequestBody QueryDto queryDto) {
        SearchResultDTO resultDTO = elasticSearchSqlService.search(queryDto.getSql());
        return CommonResult.success(resultDTO.getResult());
    }

    @PostMapping(value = "/query")
    public CommonResult query(@RequestBody QueryDto queryDto) {
        SearchResultDTO resultDTO = elasticSearchSqlService.query(queryDto.getSql(), queryDto.getIndex());
        return CommonResult.success(resultDTO.getResult());
    }

    @PostMapping(value = "/explain")
    public CommonResult explain(@RequestBody QueryDto queryDto) {
        returnCommonResult.success(elasticSearchSqlService.explain(queryDto.getSql())); }}Copy the code

Example request:

Example Query result:

conclusion

Although SQL is not the official recommendation of ES query language, but because of its convenience, ES officials also began to realize this. ES has also started to support SQL after version 6.3.0, but it is by introducing X-pack. If we can use it through REST, there are still some problems when we introduce it into development, and platinum membership is required. I don’t know if it will be released in the future.

In addition, although SQL is more convenient to use, but after all, it is not officially specified, so it is inevitable that there are defects in the function. There is no DSL function is powerful, and there are more pits in it, but basic queries are supported. So if not have to, I still recommend using DSL, and some simple operations can use SQL to assist, this article source has been uploaded to my Github, if interested readers can pay attention to my Github.


Personal public account: JaJian

Welcome long press the picture below to pay attention to the public number: JaJian!

We regularly provide you with the explanation and analysis of distributed, micro-services and other first-line Internet companies.