Elasticsearch is a very powerful search engine. IT is now widely used by IT companies. Elasticsearch is created and maintained open source by Elastic. Its open source is at github.com/elastic/ela… . Elastic also has Logstash and Kibana open source projects. Taken together, these three open source projects form the ELK software stack. Together, the three of them form a powerful ecosystem. To put it simply, Logstash is responsible for data collection and processing (data enrichment, data transformation, etc.) while Kibana is responsible for data exhibition, analysis and management. Elasticsearch is at the heart of the game and allows you to quickly search and analyze your data.

Elasticsearch’s complete stack is as follows:

  • Beats 7.3
  • APM Server 7.3
  • Elasticsearch 7.3
  • Elasticsearch Hadoop 7.3
  • Kibana 7.3
  • Logstash 7.3

Beats are some lightweight proxies that can be allowed in client servers. It doesn’t need to be deployed to our Elastic cloud. It helps us gather all the events we need. If BEATS were included in my architecture, then Elastic’s stack could be expressed as:

In today’s article, I will briefly introduce what Elasticsearch is.

Elastic Product Ecology

Elastic has built a number of mature solutions around Elasticsearch. Please refer to our official website www.elastic.co/ for more details.

 

Elasticsearch

Simply put, Elaaticsearch is a distributed search engine that uses the REST interface. Its products can be in www.elastic.co/products/el… Download. Elasticsearch is a distributed reST-interface based search engine designed for the cloud.

Elasticsearch is an open source search engine based on Apache Lucene (TM). Lucene is arguably the most advanced, high-performance, and full-featured search engine library to date, both open source and proprietary. However, Lucene is just a library. Lucene itself does not provide high availability and distributed deployment. To take full advantage of this, you need to use Java and integrate it into your application. Lucene is very complex, and you need to dig deep into retrieval knowledge to understand how it works.

Elasticsearch is also written in Java and uses Lucene to index and search, but it is intended to make full-text search easy and hide the complexity of Lucene with a simple and coherent RESTful API.

Elasticsearch is more than just Lucene and a full text search engine, however, it also provides:

  • Distributed real-time file storage where each field is indexed and searchable
  • Distributed search engine for real-time analysis
  • Scalable to hundreds of servers, processing petabytes of structured or unstructured data

Moreover, all of these functions are integrated into a single server that your application can interact with through simple RESTful apis, clients in various languages, and even the command line. Getting started with Elasticsearch is very simple, providing many reasonable defaults and hiding complex search engine theory from beginners. It is available out of the box and requires little learning to be used in a production environment. Elasticsearch is licensed under the Apache 2 license and is free to download, use, and modify. As you accumulate knowledge, you can customize Elasticsearch’s advanced features for different problem domains, all of which are configurable and very flexible.

Elasticsearch features an extremely fast search experience. This is due to its high speed. Elasticsearch can search in seconds compared to some other big data engines, but for them it can take hours to complete. The Cluster of Elasticsearch is a distributed deployment that is easy to scale. This makes it easy to handle petabytes’ database capacity. The most important thing is Elasticsearch is that its search results are sorted by score, so it provides us with the most relevant search results.

 

Distributed and highly available search engine

  1. Each index is fully sharded using a configurable number of shards
  2. Each shard can have one or more copies
  3. Read/search operations performed on any replica shard

multi-tenant

  1. Support for multiple indexes
  2. Index level configuration (Number of shards, index storage,……)

A variety of API

  1. HTTP RESTful API
  2. Native Java API
  3. All apis perform automatic node operation rerouting

document-oriented

  1. No upfront schema definition required
  2. Schemas can be defined to customize the indexing process

Reliable, asynchronous write, long – term persistence

(near) real-time search

Built on top of Lucene

  1. Each shard is a full-featured Lucene index
  2. All of Lucene’s capabilities can be easily exposed through a simple configuration/plug-in

Consistency per operation

  1. Single document-level operations have atomicity, consistency, isolation, and persistence.

 

Getting started guide

First, don’t panic. It takes 5 minutes to get all of Elasticsearch.

A requirement

You need to have the latest Version of Java installed on your computer (in the latest version, Java does not need to be installed because the Java installation package is already included in the installation package). You can see the Setup link for more information.

The installation

  1. You can Download the latest release of Elasticsearch from the Download link. See the documentation “Elastic: A Beginner’s Guide” to install Elasticsearch
  2. Run bin/ ElasticSearch on Unix/Linux or bin\ ElasticSearch.bat on Windows
  3. Run curl -x GET http://localhost:9200. You can install Cygwin on Windows to run the curl command
  4. Running more servers…

Use the cURL command to talk to Elasticsearch

You can use cURL to submit a request from the command line to a local Elasticsearch instance. A request to Elasticsearch contains the same parts as any HTTP request:

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>? <QUERY_STRING>' -d '<BODY>'Copy the code

This example uses the following variables:


  • : The appropriate HTTP method or VERB. For example, GET, POST, PUT, HEAD or DELETE
  • : HTTP or HTTPS. Use the latter if you have an HTTPS proxy in front of Elasticsearch or if you use Elasticsearch security features to encrypt HTTP traffic

  • : HOST name of any node in the Elasticsearch cluster. Alternatively, use localhost for a node on your local computer
  • : specifies the PORT on which the Elasticsearch HTTP service is running. The default value is 9200
  • : API endpoint that can contain multiple components, such as _cluster /stats or _nodes/stats/ JVM

  • : Any optional query string argument. For example,? Pretty will print the JSON response nicely to make it easier to read
  • : JSON-encoded request BODY (if necessary)

If Elasticsearch security is enabled, you must also provide a valid user name (and password) that is authorized to run the API. For example, use the -u or –u cUR L command arguments. Such as:

curl -u elastic:password -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>? <QUERY_STRING>' -d '<BODY>'Copy the code

Here elastic and password represent the user name and password.

 

Check that Elastic is properly installed

If you’re familiar with Postman, see my article “Elastic: Using Postman to Access Elastic Stack” for the following exercise.

In our terminal we can hit the following command:

$ curl -XGET 'http://localhost:9200/' -H 'Content-Type: application/json'
Copy the code

If you see something like this, it looks like our ElasticSearch is installed correctly:

In the future we can install the security plugin provided by Elastic, so we can do this with the following command:

$ curl -XGET -u "elastic:changeme" 'http://localhost:9200/' -H 'Content-Type: application/json'
Copy the code

The -u option lets you set the username and password to log in to elasticSearch. In future chapters we will show you how to install and use secure plug-ins.

Create an Index

Let’s try indexing some twitter-like information. First, let’s index some tweets (twitter indexes will be created automatically) :

curl -XPUT 'http://localhost:9200/twitter/_doc/1? pretty' -H 'Content-Type: application/json' -d ' { "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out Elasticsearch, so far so good?" }' curl -XPUT 'http://localhost:9200/twitter/_doc/2? pretty' -H 'Content-Type: application/json' -d ' { "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "Another tweet, will it be indexed?" }' curl -XPUT 'http://localhost:9200/twitter/_doc/3? pretty' -H 'Content-Type: application/json' -d ' { "user": "elastic", "post_date": "2010-01-15T01:46:38", "message": "Building the site, should be kewl" }'Copy the code

Now, let’s check to see if the above information has been added to the index as I did above. We can use GET to query:

curl -XGET 'http://localhost:9200/twitter/_doc/1? pretty=true' curl -XGET 'http://localhost:9200/twitter/_doc/2? pretty=true' curl -XGET 'http://localhost:9200/twitter/_doc/3? pretty=true'Copy the code

search

Let’s find all of Kimchy’s tweets:

curl -XGET 'http://localhost:9200/twitter/_search? q=user:kimchy&pretty=true'Copy the code

We can also use the JSON query language provided by Elasticsearch instead of the query string:

curl -XGET 'http://localhost:9200/twitter/_search? pretty=true' -H 'Content-Type: application/json' -d ' { "query" : { "match" : { "user": "kimchy" } } }'Copy the code

The query above will display all tweets posted by Kimchy. Just for fun, let’s get all the stored documents (we can see all the tweets posted by user Elastic).

curl -XGET 'http://localhost:9200/twitter/_search? pretty=true' -H 'Content-Type: application/json' -d ' { "query" : { "match_all" : {} } }'Copy the code

We can also do a range search (post_dat e is automatically identified as date when indexing).

curl -XGET 'http://localhost:9200/twitter/_search? pretty=true' -H 'Content-Type: application/json' -d ' { "query" : { "range" : { "post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" } } } }'Copy the code

There are more options to perform a search, after all, is this a search product? All familiar Lucene queries are available through the JSON query language or the query parser.

 

Multi-tenant – index and type

Dude, that Twitter index might get bigger (in which case index size == valuation). Let’s see if we can change our tweet system slightly to support such a large amount of data.

Elasticsearch supports multiple indexes. In the previous example, we used an index called Twitter, which stores tweets for each user.

Another way to define our simple Twitter system is to provide a different index for each user (note that each index has overhead). Here’s the index curl in this case:

curl -XPUT 'http://localhost:9200/kimchy/_doc/1? pretty' -H 'Content-Type: application/json' -d ' { "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out Elasticsearch, so far so good?" }' curl -XPUT 'http://localhost:9200/kimchy/_doc/2? pretty' -H 'Content-Type: application/json' -d ' { "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "Another tweet, will it be indexed?" } 'Copy the code

The above puts the index information into the Kimchy index. Each user will get their own special index.

Allows full control of index levels. For example, in the above case, we might want to change from the default of 1 shard per index 1 replica to 2 shards per index 1 replica (because this user tweets a lot). Here’s how to do this (configuration can also be in a YAML file) :

curl -XPUT http://localhost:9200/another_user? pretty -H 'Content-Type: application/json' -d ' { "settings" : { "index.number_of_shards" : 2, "index.number_of_replicas" : 1 } }'Copy the code

Searches (and similar operations) have multiple index awareness. This means we can easily search for multiple indexes (Twitter users), for example:

curl -XGET 'http://localhost:9200/kimchy,another_user/_search? pretty=true' -H 'Content-Type: application/json' -d ' { "query" : { "match_all" : {} } }'Copy the code

Or search through all the indexes:

curl -XGET 'http://localhost:9200/_search? pretty=true' -H 'Content-Type: application/json' -d ' { "query" : { "match_all" : {} } }'Copy the code

{One liner Teaser} : About the cool part? You can easily search multiple Twitter users (indexes), each with a different promotion level (indexes), making social search even easier (my friends’ results rank higher than my friends’ friends’ results).

 

Distributed and highly available

Elasticsearch is a highly available distributed search engine. Each index is decomposed into shards, and each shard can have one or more copies. By default, an index is created with 1 shard and 1 copy per shard (1/1). Many topologies can be used, including 1/10 (to improve search performance) or 20/1 (to improve indexing performance).

To use the distributed nature of Elasticsearch, just start more nodes and shut them down. The system will continue to make requests for the latest data in the index (make sure the correct HTTP port is used).

 

Where are we going to start?

We just covered a little bit of Elasticsearch. See the elastice.co website for more information. General questions can be asked on the Elastic forum or IRC on Freenode # ElasticSearch. The Elasticsearch GitHub repository is only used for error reporting and feature requests.

 

From the Source to build

Elasticsearch uses Gradle as its build system.

To create a distribution, simply run the./gradlew assembly command in the clone directory.

Distributions for each project will be created under the project’s Build/Bend directory.

For more information about running the Elasticsearch test suite, see the TESTING file.

 

Upgrade from the old version of Elasticsearch

To ensure a smooth upgrade from earlier versions of Elasticsearch, please refer to our upgrade documentation for more details on the upgrade process.

The next step

If you really want to use Elastic Kibana to do Index manipulation, please read my post:

  • Elastic: Beginner’s Guide
  • How do I install Elasticsearch on Linux, MacOS and Windows
  • Start using Elasticsearch (1)