1. The background

I had a chance to see ElasticSearch two years ago, but I didn’t study it thoroughly, just used it for work. More and more I found es is a good thing, so I spent some time to study it. In the process of learning, I also found some problems. Most of the materials on the Internet were scattered, and most of them were experimental demos. Many problems were not clearly explained and a complete set of plans were not systematically explained, so I explored and summarized some things patiently and shared them.

After all, there are a lot of problems when you use ES with production standards, and this raises new standards for your learning.

For example, there is a bug in the configuration of elasticSearch ServiceWrapper that prevents loading the elasticSearch JAR classes.

There are also huge differences between versions of ES. For example, distributed routing in 1.0 has been significantly modified in 2.0. Routing was configured with mapping, but in 2.0 routing is configured with index. The essential purpose of this adjustment is to give different types of the same index the opportunity to select the shard slice key. If you follow the mapping, you can only limit all types of the current index.

Es is a good thing, and more and more distributed systems are using it to solve problems. It is needed to support real-time big data search and analysis from the tool of system layer like ELK to the design of the core business transaction system of e-commerce platform. For example, tens of millions of SKUs in the commodity center need real-time search, and massive online orders need real-time search.

Es is needed in several DevOps tools to provide powerful real-time search capabilities. It’s worth spending some time studying.

As an e-commerce architect, there is no reason not to learn and use it to improve the overall service level of your system. This article will summarize their learning experience in this period of time to share with you.

2. Install

First you need a few Linux machines, or you can run virtual machines. You can complete the installation and configuration on one VM, and clone the current VM to change the IP address, HWaddr, and UUID. This way, you do not need to repeat the installation and configuration.

1. I have three Linux Servers with IP addresses of 192.168.0.10, 192.168.0.20 and 192.168.0.30.

(We did the installation configuration on 192.168.0.10, and when everything was ready we clone the node to modify the configuration, then set the cluster parameters, and finally form a working three-node cluster instance.)

2. As ElasticSearch is developed in Java, you need to install the Java environment beforehand. I am using JDK8 and install directly with yum. Yum repository has the latest source.

Check whether the Java environment is installed on your current machine:

yum info installed |grep java*

If the Java environment already exists and it is not what you want, you can uninstall and then reinstall the version you want. Yum — y remove XXX yum — y remove XXX yum — y remove XXX yum — y remove XXX Linux systems are file-based, and you can delete almost anything you can find.

Take a look at what’s available:

yum search java

Java-1.8.0-openjdk. X86_64: OpenJDK Runtime Environment

Then perform the installation:

Yum -y install Java – 1.8.0 comes with – its. X86_64

View Java parameters after installation:

Java version

Now that we’ve done the prep work, we’ll perform the environment installation and configuration for ElasticSearch.

2.1. Find and download the RPM package and install it

You can install it several ways. Yum Repository is the fastest and most convenient way to use it, but it is usually late. So I downloaded the RPM package directly from the official website.

Elasticsearch official download address: www.elastic.co/downloads/e…

Find your system type file, of course, if you are Windows then directly download zip package to use on the line. Here I need the RPM file.

You can also install the local yum source and use the yum command again.

I use the Wget tool to download the RPM file directly to the local. (If your package has dependencies, you are advised to install it using yum.)

(yum -y install wget if wget doesn’t work)

Wgetdownload. Elastic. Co/elasticsear…

Then wait for the download to complete.

If you want to install the latest version of ElasticSearch, it is recommended to install a slightly lower version of elasticSearch. I installed version 2.3.4 locally. The reason for this emphasis is that when you install a very high version, there is a big question of whether the Chinese word segmentation can support this version. After 2.3.5, it went directly to version 2.4.0. I installed version 2.3.5 at that time, but I found a problem that IK Chinese word segmentation had to be cloned and compiled before I could get the output deployment file. Therefore, we recommend you to install version 2.3.4, version 2.3.4 Chinese word segmentation can be directly downloaded and deployed in the Linux server, very convenient.

Perform installation:

The RPM – iv elasticsearch – 2.3.4. RPM

Then wait for the installation to complete.

The installation should be complete without incident. Let’s do a basic installation check to see if any files are missing after the installation. Some packages are missing config configurations. If it’s missing we have to complete it.

To make it easy to see the files involved in the installation, navigate to find in the root directory.

cd /

The find. – name elasticsearch

./var/lib/elasticsearch

./var/log/elasticsearch

./var/run/elasticsearch

./etc/rc.d/init.d/elasticsearch

./etc/sysconfig/elasticsearch

./etc/elasticsearch

./usr/share/elasticsearch

./usr/share/elasticsearch/bin/elasticsearch

It’s almost there, you’ll have to check if the config is missing because it was when I installed it.

cd /usr/share/elasticsearch/

ll

Drwxr-xr-x. 2 Root root 4096 9月 4 01:10 bin drwxr-xr-x. 2 root root 4096 9月 4 01:10 lib-rw-r –r–. 1 Root root 11358 6月 TXT drwxr-xr-x. 5 root root 4096 9月 4 01:10 modules-rw-r –r–. 1 root root 150 6月 30 19:22 notice.txt Drwxr-xr-x. 2 Plugins for elasticSearch elasticSearch 4096 6月 30 19:32 –r– 1 root root 8700 6月 30 19:22 README.textile

You probably also lack the config folder. We also need to set up this folder and also need a elasticSearch.yml configuration file. Otherwise, there must be an error when it starts.

mkdir config

cd config

vim elasticsearch.yml

Find the elasticSearch. yml configuration and paste it, or you can send it as a file. These configurations are basic and will need to be adjusted later. Some configurations are not in the configuration file, but also need to go to the official search. So there is no complete or incomplete configuration file. There’s a lot of information online about configuration items, so it doesn’t matter.

Save the elasticSearch. yml file.

You will also need a logging.yml logging configuration file. Es as a server running in the background of the service is definitely required log files. These logs are collected and monitored by your logging platform for o&M health checks. Yml is essentially a log4J configuration file, which should be familiar to you. Similar to ElasticSearch.yml, either copy and paste or transfer files.

The output of the logs is in the logs directory, which is automatically created. But I still like to create well, I don’t like to have uncertainty, maybe it won’t create automatically.

mkdir logs

Finally, we need to set the execution permission of the file we added just now. Otherwise your file name should be white, which is not allowed to be executed.

cd ..

Chmod – R u + x config /

Now that the installation is almost complete, try CD to the file’s startup directory and start es to check if it works.

Not surprisingly you will receive a “Java. Lang. RuntimeException: don ‘t run elasticsearch as root”. This means that we have completed the first step of the installation process, and the next section will look at the issue of starting accounts.

2.2. Configure elasticSearch accounts and groups

By default es does not allow root to start for security reasons. Es has the functionality of the Groovy script engine built in by default, as well as many plugin script engine plugins, which are really insecure. Es is still a groovy bug when it first came out, so it is recommended to turn this script off in es Instance on production. Although it’s not on by default, check your configuration to be safe.

So we need to configure separate accounts and groups for ES. Before creating an ES account, check whether an ES account already exists in the system. The elasticSearch group and user will be automatically installed during RPM installation. Check first if your installation does not have dedicated groups and users before you create. So you don’t get confused with what you’re adding and what the system is creating.

View the following groups:

cat /etc/group

View the user:

cat /etc/passwd

It’s basically all set up. Group 499 has its elasticSearch account created in passwd.

If your system does not automatically create the corresponding group and account, you can create your own, as follows:

Create a group:

groupadd elasticsearch_group

Create a user:

useradd elasticsearch_user -g elasticsearch_group -s /sbin/nologin

Note: This account does not have login rights. Its shell is in /sbin/nologin.

For demonstration purposes, I have two sets of elasticsearch accounts on my computer. I will delete the accounts ending in “_group” and “_user” and install the es boot account (Elasticsearch) automatically with RPM.

2.3. Set the owner of the ElasticSearch file

Now you need to associate the ES file with the elasticSearch account, and set the es file as the owner of the elasticSearch user, so that the elasticSearch user can use all es files without any permissions.

Navigate to elasticSearch parent directory:

cd /usr/share

ll

chown -R elasticsearch:elasticsearch elasticsearch/

At this point, the owner of your ElasticSearch file is ElasticSearch.

2.4. Test whether switching to the elasticSearch account can start successfully

To test starting the ES instance, we need to temporarily switch the elasticSearch user to /bin/bash. So we can su ElasticSearch and start the ES instance.

su elasticsearch

cd /usr/share/elasticsearch/bin

./elasticsearch

Startup is complete, and no exceptions should occur at this point. Check whether the system port is successfully started.

Netstat – TNL

Check whether the HTTP service is started properly.

The curl – get http://192.168.0.103:9200/_cat

Since we do not have any auxiliary administration tools installed at this time, such as plugin/head. So using the built-in _cat rest endpoit is quite handy.

The curl – get http://192.168.0.103:9200/_cat/nodes

192.168.0.103 192.168.0.103 4 64 0.00 d * node-1

As you can see, there is only one active node, 192.168.0.103, and it is a data node.

(Note: In order to save time, I will temporarily use a 103 clean environment for installation and environment construction demonstration. When building the cluster, I will clone and change the IP address.)

2.5. Install the elasticSearch ServiceWrapper package

Es system boot has an open wrapper package available. If you don’t use the wrapper, you can write your own shell script, but many of the parameters need to be very clear, plus some key parameters need to be set. It is recommended that you use the ElasticSearchWrapper wrapper package as a base for your changes, and you can also see some of the underlying es configuration and principles in the ElasticSearch shell.

(Note: If you are. Neter, you can think of ServiceWrapper as open source. Net Topshelf. You can install, uninstall, start, stop, or simply run it in the foreground.)

2.5.1. Download elasticSearch ServiceWrapper package

Elasticsearchwrapper github homepage, github.com/elastic/ela…

Copy the Git Repository address to the clipboard and clone it directly to the local directory.

Git clone github.com/elastic/ela…

(You need to install the git client on your current Linux machine: yum — y install git. I installed version 1.7 by default.)

Then wait for the clone to complete.

Take a look at the clone files from the local repository. Go to ElasticSearchWrapper and see the current Git branch.

cd /root/elasticsearch-servicewrapper

git branch

*master

ll

Everything is normal, indicating that we have no problem with clone, including branch is also very clear. The service file is the installation file we will install.

We need to copy the service file to elasticSearch /bin.

cp -R service/ /usr/share/elasticsearch/bin/

cd /usr/share/elasticsearch/bin/

The installation file in service needs to work in the elasticSearch /bin directory. cd service/

ll

./elasticsearch

See the instructions for ElasticSearchWrapper on Github. Elasticsearch ServiceWrapper has a lot of functionality. Status and dump are great tools for checking and debugging.

Before installing, we need to temporarily run the ES instance in the foreground so that we can see if some of the logs are abnormal. The arguments to Parameter are clearly written, and we use the console output to start the ES instance.

./elasticsearch console

2.5.2 ElasticSearch ServiceWrapper Configuration Bug

You should receive an Error message:

WrapperSimpleApp Error: Unable to locate the class org.elasticsearch.bootstrap.ElasticsearchF : java.lang.ClassNotFoundException: org.elasticsearch.bootstrap.ElasticsearchF

I was a little confused when I first saw this, what this ElasticsearchF object is. The name is a bit peculiar, but if you look at the Exception information, it’s actually a ClassNotFoundException. The ElasticSearchF class cannot be found.

There are two possibilities. The first is that the Java ElasticSearch package is missing. However, this is highly unlikely, since running ElasticSearch directly was successful before. I checked the package of ES with JD-GUI, and there was no such class.

There is no ElasticsearchF class.

We view the service/elasticsearch. Conf configuration file have this’ elasticsearchF string. (The wrapper package uses elasticSearch. conf from the current directory as the configuration file)

The grep -i elasticsearchf elasticsearch. Conf

There is such a string, so let’s edit it and save it to get rid of the last ‘F’.

And then we’re trying to start it up.

./elasticsearch console

I don’t know if you might be in the same situation as me, indicating that the relevant commands are not normative.

The link goes through three paths: the service/ ElasticSearch shell startup script, the command analysis command, and the related Java ServiceWrapper program under Exec.

The Java ServiceWrapper program, version 3.5.14. By looking at the ElasticSearch shell program, it will start the Java ServiceWrapper program under Exec when it receives an external command. I want to try editing the ElasticSearch shell file and output some information to see if there is an error in getting the relevant path or parameters. (At least we’ll follow along and see what’s going on.)

vim ./elasticsearch

esc

:/console

Find where console is, add debug text, and print it to the screen.

Run the command again to check whether the command parameters are correct.

Check, the output parameters are basically no problem. No solution for a moment. — exec/ elasticSearch-linux-x86-64. so = x86-64.so = exec/ elasticSearch-linux-x86-64. so = exec/ elasticSearch-linux-x86-64. so = exec/ elasticSearch-linux-x86-64. so = exec/ ElasticSearch-linux-x86-64 So I find another other way, I found the Windows version servicewrapper, found that there is no 32-bit Windows of elasticsearchservicewrapper servicewrapper. I tried to run the same error, but Windows wrapper error message is more, indicating the cause of the error.

I want to change the log output level to see if I can output some useful information. Editing service/elasticsearch. Conf wrapper package special configuration.

# Log Level for console output. (See docs for log levels)

wrapper.console.loglevel=TRACE

# Log Level for console output. (See docs for log levels)

wrapper.console.loglevel=TRACE

Let’s set the log output level to Trace. There are two things that need to be set. Let’s look at the output.

Is output some useful information, you can view the log file details.

WrapperManager Debug: Received a packet LOGFILE : /usr/share/elasticsearch/logs/service.log

But there’s only one piece of information about error.

So I’ll stop there. Our purpose is to use the console to run, and we want to see some run logs, but it doesn’t matter if it doesn’t run, so let’s go ahead and install.

(If any blogger knows where the problem is, please share it. I think this problem is not an occasional one, it should be common. I’ll at least serve future users by asking questions first. Thanks in advance.

You can download a Java SerivceWrapper to wrap ElasticSearch instead of using ElasticSearch ServiceWrapper.

Back to the theme, since we can’t console and can’t see something happening with the Wrapper Console, we’ll have to install it.

2.5.3 ServiceWrapper Installation (ElasticSearch Init. d boot file Settings User, OpenFile, configPath)

We perform the installation as instructed by the ElasticSearch ServiceWrapper parameter parameter.

./elasticsearch install

Installing the Elasticsearch daemon..

The daemon process is installed. We still go to the system directory to check whether the installation is successful (it is necessary for the technical staff to always maintain a careful mind). Go to the /etc/init.d/ directory to check.

ll /etc/init.d/

-rwxrwxr–. 1 root root 4496 10月 4 01:43 elasticsearch

Chmod u+x./ elasticSearch. Don’t forget to set file execution permissions, which we will not repeat in section 2.1.

Let’s start editing the ElasticSearch startup file.

This is the main section, fill in the configured es special account (ElasticSearch [section 2.2.]), and the corresponding file path. MAX_OPEN_FILES and MAX_MAP_COUNT are ignored here, which will be explained later in section 3.3.

2.5.4 chkconfig -add Adds the Linux startup service to the list

Add it to the system service so it can be automatically started by the system.

chkconfig –add elasticsearch

The chkconfig — the list

The system automatic startup service list has been added.

service elasticsearch start

Start the ES instance and wait for the port to complete. Wait for a moment to check the port status.

Netstat – TNL

Port 9300 starts earlier than port 9200 because port 9300 is an internal cluster management port. 9200 is the REST endpoint service port. Of course, this extension won’t be very long.

After the ports are started successfully, let’s check whether the ES instance can be accessed normally.

The curl – get http://192.168.0.103:9200/ {” name “:” node – 1 “, “cluster_name” : “orderSearch_cluster”, “version” : {” number “:” 2.3.4 “, “build_hash” : “e455fd0c13dceca8dbbdbb1665d068ae55dabe3f”, “build_timestamp” : “2016-06-30T11:24:3z “, “build_snapshot” : false, “lucene_version” : “5.5.0”}, “tagline” : “You Know, for Search”}

Let’s use _cat rest endpoint again.

The curl – get http://192.168.0.103:9200/_cat/nodes 192.168.0.103 192.168.0.103 4 0.00 d * node – 1 61

If you can access it on your computer, but cannot access it in an external browser, it is probably a firewall setup problem. You can set up a firewall.

vim /etc/sysconfig/iptables

Restart the network service to load firewall Settings.

service network restart

Then try again to see if it can be accessed externally. If not, Telnet.

There is also a problem with elasticSearch.yml configuration item. See section 3.1.1.

Restart the machine to see if the ES instance starts automatically.

Shutdown – r now

Wait a moment, then try to connect the machine.

If nothing goes wrong, everything should be fine, and the port has started successfully. We have completed the es instance auto-start function, which is now automatically managed as a Linux system service.

Once installed as a service, ElasticSearch ServiceWrapper doesn’t have much to do with us. Since its parameters are all around the user, we use them based on ServiceWrapper.

2.6. Install _plugin/head management plug-in (auxiliary management)

In order to manage the cluster well, we need a tool. Head is popular, universal, and free. Of course, there are many other useful tools, such as Bigdesk and Marvel (commercially charged). The plugin installation is pretty much the same, so we use the generic head tool here.

First, head gives us a clear view of cluster node management.

This is an example of an ES cluster with three nodes. It is a two-dimensional matrix arrangement, the top is the index horizontally, the left is the node, where the intersection is the index fragment information and fragment ratio.

The head plugin is easy to install, and you can use it by copying files directly. In elasticSearch’s home directory there is a plugins directory. This is where all plugins are found and loaded.

Let’s look at how to install the head plug-in. There is a plugin executable in the elasticSearch /bin directory that is used to install plug-ins.

./plugin -install mobz/elasticsearch-head

There are several paths to elasticSearch and Github. I will try to find it on Github and wait for a while until the installation is complete. We tried to access the head plugin address rest address /_plugin/head.

Node-1 is the master node by default.

2.7. Install elasticSearch client plug-in in Chrom

There are many elasticSearch client plug-ins available in Chrom to facilitate development and maintenance. You are advised to use the elasticSearch client plug-ins in Chrom. Just search the elasticSearch keyword and you’ll see a lot.

EalsticSearch Toolbox and Sense are two commonly used and useful tools. Chrom plugins are cool and enjoyable to use.

Elasticsearch Toolbox makes it easy to query and export data.

Sense lets you edit ElasticSearch DSL-specific languages with boot prompt help, making writing complex DSLS efficient and error-resistant. Other tools I have not used, feel can try to use to see.

(Note: If you can’t access the Chrom store center, you’ll need a special treatment that won’t be explained here.)

2.8. Use the _cat tool provided with ElasticSearch

In some cases you may not be able to use the plugin directly to manage or view the cluster for you. You can use rest _cat of ElasticSearch to check the cluster. For example, if _plugin/head has some nodes missing, but you are not sure what happened, you can use /_cat/nodes to check all nodes. Sometimes there are nodes that don’t start up, but most of the time they are split and you may need to re-elect them or speed up the election process.

http://192.168.0.20:9200/_cat/nodes?v (see the nodes)

The _cat REST endpoint takes a parameter v, which is a parameter to help you read. The _search REST endpoint takes the pretty parameter, which is used to help query data reading. Each endpoint basically has its own auxiliary reading parameters.

http://192.168.0.20:9200/_cat/shards?v (see the shards)

http://192.168.0.20:9200/_cat/ (see all the possible functions of the cat)

You can view aliases, segments, indices, etc.

2.9. Clone the virtual machine (Modify the IP address, HWaddr, and UUID configurations, and finally modify the system time)

Once we’ve installed a machine, we need to build distributed systems. Distributed systems require multi-node machines, and as per es distributed cluster building best practice, you need at least three nodes. So we clone two of the machines that have been installed to form a working three-node distributed system.

First clone the current installed machine, 192.168.0.103, after clone, start to modify several configurations. (Because you are clone, the configuration has been repeated, such as network card address, IP address)

Edit the nic configuration file:

vim /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth3 HWADDR=00:0C:29:CF:48:23 TYPE=Ethernet UUID=b848e750-d491-4c9d-b2ca-c853f21bf40b ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static BROADCAST=192.168.233.255 IPADDR=192.168.0.103 NETMASK=255.255.255.0 GATEWAY = 192.168.0.1

DEVICE is the identifier of the network adapter. You can change it to the corresponding identifier according to your local network adapter. You can view it through ifconfig. HWADDR Indicates the IP address of the NIC. Change the value at will and ensure that the IP address does not duplicate in your network segment. The VALUE of UUID is the same as that of HWADDR.

Change the IP address to the one you feel is appropriate, preferably by referring to the configuration of your current physical machine. GATEWAY GATEWAY address Refer to the GATEWAY address of your physical machine. If your VM is in bridge mode, you need to set this parameter. Otherwise, the network cannot be connected.

Restart the network service:

service network restart

Wait for a moment, SSH reconnect, and then ifconfig to check whether the network parameters are correct. Finally, ping the external website and the IP address of your current physical machine to ensure that the network is smooth.

Finally, we need to change the Linux system time to prevent server time inconsistencies from causing minor problems, such as timestamp problems with es cluster master elections, logging problems with log4j output, and so on. In distributed systems, clocks are very important.

date -s ‘20161008 20:47:00’

You can set the time zone if you want, but I don’t need it here.

You clone several machines according to your own needs. According to the default mode, we roughly agreed to 192.168.0.10, 192.168.0.20, 192.168.0.30, these three machines will form an ES distributed cluster.

Configuration of 3.

Now that we have the nodes of the cluster ready, we are ready to configure the cluster so that the three nodes can be connected together. This section describes how to configure elasticSearch. You can check elasticSearch’s official website or Baidu for information about elasticSearch.

Some of the configuration here has been simplified by ElasticSearch ServiceWrapper.

From here, we will configure three machines, 192.168.160.10, 192.168.160.20, 192.168.160.30.

3.1. The elasticsearch. Yml configuration

The config directory of ElasticSearch contains all configuration files. Navigate to the CD/usr/share/elasticsearch/config directory.

3.1.1.IP access restriction and default port 9200

There are two caveats here. The first is IP access restriction and the second is the default port number 9200 for es instances. IP access restriction can restrict specific IP access servers, which has certain security filtering effect.

Set the bind address to a specific IP address (IPv4 or IPv6): # network. Host: 0.0.0.0

If set to 0.0.0.0, no IP access is restricted. Typically, a production server may have a limited number of IP addresses, usually for administrative use.

The default port 9200 is also a bit risky in general. You can change the default port to another one for another reason, because you are afraid that developers will misoperate and connect to the cluster. Of course, it doesn’t matter if your company’s network isolation is good.

#

# Set a custom port for HTTP:

#

http.port: 9200

transport.tcp.port: 9300

9300 is the port used for communication within the cluster. This port can also be changed. Since there are two ways to connect to the cluster, you can also enter the cluster by playing as a cluster node, so to be safe, change the default port.

(Note: Remember to modify the same configuration of the three nodes. Otherwise, an error will be reported if the connection between nodes cannot be established.)

3.1.2. Cluster discovery IP address list, node name, and cluster name

Then change the cluster node IP address to allow the cluster to work between the specified nodes. Elasticsearch uses the automatic IP discovery mechanism by default. In the current network segment, IP addresses that can be automatically sensed can be automatically added to the cluster. This has both advantages and disadvantages. The benefit is automation, which comes in handy when your ES cluster needs to be cloud-enabled. However, it will also bring some instability, such as master election issues, data replication issues.

One of the factors leading to master elections is that there are nodes in the cluster. The cluster is also affected when data replication occurs because of data balancing replication and redundancy. This can be independent of the master cluster, master cluster data node capability.

There are two configuration modes for IP discovery of a fixed list: interdependence discovery and full discovery. Each has its own advantages. I do this using dependency discovery. One important criterion is how fast your cluster can scale. Because one of the problems with this is that when it comes to full discovery, when it comes to initializing the cluster, it’s a big problem because the master is really long, and then the startup speed varies from node to node. So I used a reliable dependency discovery.

You will need to configure elasticSearch for 192.168.0.20 as follows:

# ——————————— Discovery ———————————- # # Pass an initial list of hosts to perform discovery when new node is started: # The default list of hosts is [” 127.0.0.1 “, “[: : 1]]” # discovery. Zen. Ping. Unicast. Hosts: [192.168.0.10:9300 “”]

Let him find the machine of 10, and then extrapolate it to complete the remaining configuration of 30.

(Note: there are a lot of discovery configurations on the web for different scenarios, and you can find a lot of information on this topic if you are interested.)

Then you need to configure the cluster name, which is the name of the cluster in which your current node is located, to help you plan your cluster. Only the same cluster name can form a logical cluster.

# ———————————- Cluster ———————————–

#

# Use a descriptive name for your cluster:

#

cluster.name: orderSearch_cluster

#

# ———————————— Node ————————————

#

# Use a descriptive name for the node:

#

node.name: node-2

Then, complete the configuration of the other two nodes. The name of cluster.name must be the same. Then set node.name respectively.

3.1.3. The master node starts the switchover

Here is a small experience to share, when I use the cluster, because I am a virtualized machine so often shut down and restart the cluster. One of the problems with finding cluster master announcements is that if your cluster is shut down in the wrong way, it directly affects the logic of the next master election.

I looked up the general logic of the election, and it was an important logic of the election based on the freshness of the fragmented data. (Logs, data, and time are all important indicators of the cluster master.)

For data consistency, of course, use the latest data node as the master, and then copy the new data and refresh the other nodes.

If you find that a node fails to enter the cluster, restart the ES service to make the cluster master global again.

3.2. Linux maximum number of open files (system threshold when used as index)

In Linux, you need to request maximum system resources from the operating system. Because ElasticSearch uses a lot of file handle resources for index, the original Linux default resources may not be enough. So here we need to use the time to set up in advance.

ElasticSearch is an extensible open source ElasticSearch solution for ElasticSearch.

This configuration is configured for us in elasticSearch Service Wrapper.

vim /etc/init.d/elasticsearch

This configuration is set to the ES instance at startup.

Try restarting the ES instances of all three machines and see if you can see the cluster status of all three machines in _plugin/head. (Remember to visit the machine where the HEAD plugin is installed; I installed it on machine 10.)

The red ones are the node names you set for Node.name, which works in a cluster.

3.3. Install Chinese word segmentation IK (note the corresponding version problem)

By this time the cluster should be working, we also need to configure the Chinese word segmentation, after all, the Chinese word segmentation of elasticSearch is not suitable for local.

I am using ik participles at github address: github.com/medcl/elast…

Clone: ElasticSearch is available for ik segmentation.

We are using Version 2.3.4 of ElasticSearch. So we need to find the appropriate VERSION of the IK, otherwise we will not be able to load the corresponding version of the IK plugin when booting. Go to the release list, find the corresponding version and download it.

You can download it directly to a Linux machine, or you can download it to your host machine and copy it to a virtual machine. If your version of ElasticSearch is up to date, you may need to download the IK source code and compile it before deploying it.

You can use git+ Maven to install git. For details, see github.com/medcl/elast…

That’s a little bit easier, so I’m not going to repeat it here. Once installed, restart the ES instance.

3.4. Elasticsearch cluster planning (try not to use master as data node, independent master as Commander)

    You can plan a cluster like this. You can have two master nodes. These two nodes are used as commander to coordinate cluster-level transactions. Cancel the data rights of these two nodes. Then, after planning the data cluster of three nodes, cancel the master right of these three nodes. Let them feel at ease to do a good job of data storage and retrieval services. This is a minimum-grained cluster structure on which you can scale.

    One advantage of this is that the responsibilities are clearly defined, which can maximize prevent the master node from affecting the data node, resulting in unstable factors. For example, data replication, data balancing, routing, and so on directly affect the stability of the master. This can lead to split brain problems.

    Development of 4.

    We are entering the final stage, everything is ready, should we operate this powerful search engine. On.

    4.1. Access cluster mode

    High availability, high concurrency, big data, horizontal scaling and so on come with clustering. How does elasticSearh cluster work?

    First client in the access cluster in order to ensure high availability is not the use of VIP drift high availability, such as Keepalived. Elasticserach first loads the client SDK by configuring multiple IP addresses when the client connects. This is already common practice in distributed systems. Only centralized clusters such as DB and cache need to be used because of their usage characteristics. (Data consistency)

    All elasticSearch nodes can handle requests. The more nodes you have, the higher the QPS, the lower the TPS, but the lower the TPS is not proportional to the number of nodes. (It uses a quorum algorithm to ensure availability.) So the replication of nodes is not what we take for granted.

    There are two ways to connect the ES cluster. The high performance one is to directly act the client as a cluster node into the cluster and cancel its own data rights. This is usually used for secondary development. You can clone the source code on Github and add your own scenarios and then go into the cluster. You might interfere with elections, you might interfere with Sharding, you might interfere with cluster balancing.

    Elasticsearch uses its own DSL language, which is used restfully based on the rest end point. For example, _search, _cat, _query, and so on. These are rest endpoints for Pointers. You can then post the DSL to the ElasticSearch server for processing.

    Elasticsearch search DSL: www.elastic.co/guide/en/el…

    Elasticsearch DSL API: elasticsearch – DSL. Readthedocs. IO/en/latest /

    Ex. :

    POST _search

    {

    “query”: {

    “bool” : {

    “must” : {

    “query_string” : {

    “query” : “query some test”

    }

    },

    “filter” : {

    “term” : { “user” : “plen” }

    }

    }

    }

    }

    It is very readable and easy to write with the Help of the Chrome plugin Sense.

    However, this is not usually done, and clusters are usually connected using SDKS. Direct use of DSL is mostly done when testing data or debugging. See if the DSL output from the SDK is correct. It’s similar to debugging SQL.

    4.1.1.NET Nest Usage (Connecting to an ES Cluster using a Pool)

    Nest is an open source package for.net applications, which can be installed by searching Nuget.

    Website address: www.elastic.co/guide/en/el…

    Connect to the cluster in pool high availability mode.

    Var node1 = new Uri(“http://192.168.0.10:9200”); Var node2 = new Uri(“http://192.168.0.20:9200”); Var node3 = new Uri(“http://192.168.0.30:9200”);

    var connectionPool = new SniffingConnectionPool(new[] { node1, node2, node3 });

    var settings = new ConnectionSettings(connectionPool);

    var client = new ElasticClient(settings);

    At this time, the client object is soft load, it will be based on a certain policy to balance the connection between the three nodes in the background. (May be average, may be weighted, specific research has not been done)

    4.1.2. Use Java jest

    For Java, I use Jest. We create a Maven project and then add the corresponding JAR package maven reference for Jest.

    <dependency> <groupId> IO. Searchbox </groupId> <artifactId>jest</artifactId> <version>2.0.3</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> The < version > 2.3.5 < / version > < / dependency > < / dependencies >Copy the code
     Copy the code
    JestClientFactory factory = new JestClientFactory();
    
    List<String> nodes = new LinkedList<String>();
    nodes.add("http://192.168.0.10:9200");
    nodes.add("http://192.168.0.20:9200");
    nodes.add("http://192.168.0.30:9200");
    
    HttpClientConfig config = new HttpClientConfig.Builder(nodes).multiThreaded(true).build();
    factory.setHttpClientConfig(config);
    JestHttpClient client = (JestHttpClient) factory.getObject();
    
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.queryStringQuery(People's Republic of China));
    searchSourceBuilder.field("name");
    
    Search search = new Search.Builder(searchSourceBuilder.toString()).build();
    
    JestResult rs = client.execute(search);
    System.out.println(rs.getJsonString());Copy the code
     Copy the code
    Copy the code
     Copy the code

    { “took”: 71, “timed_out”: false, “_shards”: { “total”: 45, “successful”: 45, “failed”: 0 }, “hits”: { “total”: 6, “max_score” : 0.6614378, “hits” : [{” _index “:” posts “, “_type” : “post”, “_id” : “1”, “_score” : 0.6614378, “fields” : {” name “: [” Wang Qingpei”]}}, {” _index “:” posts “, “_type” : “post”, “_id” : “5”, “_score” : 0.57875806, “fields” : {” name “: [” Wang Qingpei “]}}, {” _index “:” posts “, “_type” : “post”, “_id” : “2”, “_score” : 0.57875806, “fields” : {” name “: [” Wang Qingpei “]}}, {” _index “:” posts “, “_type” : “post”, “_id” : “AVaKENIckgl39nrAi9V5”, “_score” : 0.57875806, “fields” : {” name “: [” Wang Qingpei”]}}, {” _index “:” class “, “_type” : “student”, “_id” : “1”, “_score” : 0.17759356}, {” _index “: “Posts,” “_type” : “post”, “_id” : “3”, “_score” : 0.17759356, “fields” : {” name “: [” Wang Qingpei”]}}}}]

    The data returned spans multiple indexes. You can constantly debug to see if the link IP is switched on and available.Copy the code

    4.2. The index development

    The general steps of index development are relatively simple. First, establish the corresponding mapping mapping and configure the characteristics of fields in each type.

    2. The mapping configuration

    Mapping is used by es instance to operate each field in index. For example, username, whether the field should be indexed, stored, length, and so on. Although elasticSearch can handle this dynamically, it is recommended to create a corresponding index map for management and operation purposes. This map can be saved in a file for future index reconstruction.

    POST /demoindex

    {

    “mappings”: {

    “demotype”: {

    “properties”: {

    “contents”: {

    “type”: “string”,

    “index”: “analyzed”

    },

    “name”: {

    “store”: true,

    “type”: “string”,

    “index”: “analyzed”

    },

    “id”: {

    “store”: true,

    “type”: “long”

    },

    “userId”: {

    “store”: true,

    “type”: “long”

    }

    }

    }

    }

    }

    This is the simplest mapping that defines the mapping whose index name is Demoindex and type is DemoType. Each field is a JSON object, which contains the type and whether the index is needed.

    This is edited in Sense and then posted directly.

    {

    “acknowledged”: true

    }

    Check to see if the created index is the mapping setting you submitted.

    4.2.2. Mapping the template configuration

    It’s always inefficient to manually create a similar mapping every time, elasticserach allows you to create a mapping template and then have the template automatically match which mapping definition you use.

    PUT log_template

    {

    “order”: 10,

    “template”: “log_*”,

    “settings”: {

    “index”: {

    “number_of_replicas”: “2”,

    “number_of_shards”: “5”

    }

    },

    “mappings”: {

    “_default_”: {

    “_source_”: {

    “enable”: false

    }

    }

    }

    }

    Create an index mapping of log type. We set two basic properties, “number_of_replicas”: “2” replication score, “number_of_shards”: “5” number of fragments. The source field in mappings is not enabled by default.

    This mapping template will be automatically hit when we submit all indexes with the “log_xxx” name format.

    You can view an existing mapping template through the _Template REST endpoint, or through the Templates menu in Information in the upper right corner of the Head plug-in.

    {
      "mq_template" : {
        "order" : 10,
        "template" : "mq*"."settings" : {
          "index" : {
            "number_of_shards" : "5"."number_of_replicas" : "2"}},"mappings" : {
          "_default_" : {
            "_source_" : {
              "enable" : false}}},"aliases": {}},"log_template" : {
        "order" : 10,
        "template" : "log_*"."settings" : {
          "index" : {
            "number_of_shards" : "5"."number_of_replicas" : "2"}},"mappings" : {
          "_default_" : {
            "_source_" : {
              "enable" : false}}},"aliases": {}},"error_template" : {
        "order" : 10,
        "template" : "error_*"."settings" : {
          "index" : {
            "number_of_shards" : "5"."number_of_replicas" : "2"}},"mappings" : {
          "_default_" : {
            "_source_" : {
              "enable" : false}}},"aliases": {}}}Copy the code
    This is usually used in stores where business does not want to be closed, such as logs, messages, critical error alerts, etc., as long as the repeated mapping is regular.Copy the code

    4.2.3. Index Routing Configure index routes

    When es shards data, it hashes it, so you can pass a fixed key, and that key will be your fixed routing rule. The _routing parameter can be set at the time of creating mappings. This was set up in version 1.0, which means that all documents under your current type are routed using this routing key only. After ES2.0, routing follows the index metadata. This allows you to control the routing rules for individual indexes. You can specify the _routing parameter when submitting an index, instead of setting the “mappings” parameter.

    The mappings configuration _routing parameter is no longer supported after 2.0.

    www.elastic.co/guide/en/el…

    In 1.0, for example, you could use the userID as the routing key, which would put all of the current user’s data in a shard, which would speed up the query process.

    {

    “mappings”: {

    “post”: {

    “_routing”: {

    “required”: true,

    “path”:”userid”

    },

    “properties”: {

    “contents”: {

    “type”: “string”

    },

    “name”: {

    “store”: true,

    “type”: “string”

    },

    “id”: {

    “store”: true,

    “type”: “long”

    },

    “userId”: {

    “store”: true,

    “type”: “long”

    }

    }

    }

    }

    }

    This _routing is set on the mapping and applies to all types. Will use userID as sharding key. In 2.0, however, you must specify a routing path explicitly.

    After you have added the mappings, you must specify the &routing= XXX, parameter when creating the current index. A big advantage of this is that you can freely adjust sharding strategies for different business dimensions.

    5. To summarize

      Distributed things still have many special and challenging aspects, especially its distribution, but also to solve many consistency issues, usability issues and so on. I’ve only scratched the surface with ElasticSearch, and its distributed nature appeals to me, so I’m looking forward to sharing more in the next article. For example, the internal principles of routing, copy balancing algorithm and so on. This article is a simple summary of how ELASticSearch is used, and I hope it will be helpful to you. Thank you for reading it and supporting it.

      See the ElasticSearch extensible open source ElasticSearch solution, ElastcSearch’s definitive guide.