Writing in the front

ElasticSearch profile

  • Distributed, RESTful search and analysis engine based on Lucene;
  • Real-time search, analysis, stable, reliable, fast;
  • JAVA, open source, using JSON open source to index data through HTTP;

Project introduction

  • History: Transferred from my former colleague, I used to have a set of ES1.7 cluster, but the cluster was unusable due to frequent JVM running. I have no ES tuning experience or even no experience in using ES. I know ES from zero, and the excessive version is ES5.3~5.6.
  • Background: The data source is mainly Nginx access logs, because Nginx is a cluster, the corresponding log distribution falls on each machine, when the overall log to do some data analysis, troubleshooting, because the data scattered and large, through the script has not been able to do analysis processing, in order to do full text search, analysis, real-time monitoring of Nginx logs, etc., So I went to ELK;
  • Data volume: 6-7TB per day, Docs around 10 billion

Architecture diagram

Part 1: Log Collection (Nginx + Rsyslog)

  • Using Nginx built-in syslog module, each machine enable local Rsyslog, through UDP transmission of local port 514 (Rsyslog), and then Rsyslog in the data forward to Kafka;
  • Selection comparison: There are many ways to collect logs, such as Flume, FileBeat, lua scripts, etc., but these components require additional client installation, and Rsyslog is integrated with Linux. Syslog in Nginx is relatively flexible to a change. JSON logs have nothing to do with landing logs (currently in two formats: Nginx is in the control of the local log format, the machine read JSON network transmission format), the log output directly through the network transmission (network consumption is very low) is not affected by the local disk, the different server_name or location can be modified, in short, the control is in the hands of Nginx. Nginx maintenance students can customize the source data format, relatively low management cost;
  • The Rsyslog configuration (double Kafka) is currently 0.8, and the logstash5.x test requires Kafka0.10 (eventually replacing logstash with hangout), so a new set of clusters is set up. Rsyslog writes data to two Kafka clusters separately. Configuration is as follows

    Module (load="imudp")
    Module (load="omkafka")
    Input (type="imudp" port="514")
    Module (load="mmsequence")
    $MaxMessageSize 4k
    
    local5.none /var/log/messages
    local5.none @log.domain.com:514
    set$! newmsg =replace($msg,'\\x'.'\\u00')
    
    template(name="kafka_topic" type="string" string="%programname%")
    template(name="kafka_msg" type="string" string="%! newmsg%")
    if ($syslogfacility-text= ='local5' and $syslogseverity-text= ='info') then{
    
    action(type="omkafka" topic="kafka_topic" partitions.auto="on"
    dynatopic="on" dynatopic.cachesize="1000"
    confParam=["compression.codec=snappy"]
    #kafka broker addr
    broker=["10.10.10.1:9092"."10.10.10.2:9092",]
    template="kafka_msg"
    errorfile="/var/log/omkafka/log_kafka_failures.log")
    
    action(type="omkafka" topic="kafka_topic" partitions.auto="on"
    dynatopic="on" dynatopic.cachesize="1000"
    confParam=["compression.codec=snappy"]
    #kafka broker addr
    broker=["20.20.20.1:9092"."20.20.20.2:9092",]
    template="kafka_msg"
    errorfile="/var/log/omkafka/log_kafka_failures.log")
    
    stop
    }
    Copy the code
  • Configure Nginx JSON logs

    log_format json_format  '{"@timestamp":"$time_iso8601", ' '"cookie_id":"$cookie_id", '# internal cookie_id
            '"client_ip":"$remote_addr", ' '"remote_user":"$remote_user", ' '"request_method":"$request_method", ' '"domain":"$host", ' '"user_agent":"$http_user_agent", ' '"xff":"$http_x_forwarded_for", ' '"upstream_addr":"$upstream_addr", ' '"upstream_response_time":"$upstream_response_time", ' '"request_time":"$request_time", ' '"size":"$body_bytes_sent", ' '"idc_tag":"tjtx", ' '"cluster":"$host_pass", ' '"status":"$status", ' '"upstream_status":"$upstream_status", ' '"host":"$hostname", ' '"via":"$http_via", ' '"protocol":"$scheme", ' '"request_uri":"$request_uri", ' '"http_referer":"$http_referer"} ';Copy the code
  • Nginx is configured with a built-in syslog module and references the JSON log format just defined

    access_log syslog:local5:info: 127.0.0.1: 514:nginx_aggregation_log json_format;
    #nginx_aggregation_logThis is customTopicCopy the code

    Introduction to the NginxSyslog module

    Note:

    1) UDP transmission is fast, but Ethernet (Ethernet) data frame length must be between 46-1500 bytes, UDP can not like TCP reassembly data packets, remove IP and UDP data packets, the final use only 1472 bytes. If a message is larger than this length, it will not be discarded like UDP itself, but will corrupt the source data format and truncate the data beyond the limit bytes. 2) For Nginx logs, as long as the POST data is not retained, a basic message will not exceed the limit of bytes, I did not see in the Nginx syslog presentation support TCP, using lua script to implement TCP transmission, but read many posts do not recommend using TCP log transmission in Nginx. Because TCP transmission is reliable, but such as network jitter, transmission exceptions, may not stop retry multiple times or wait, directly affecting the request, but also directly affecting the user; 3) What if the message exceeds the UDP transmission limit? I currently reserve important fields of a message, such as the above jSON_format format, and put request_URI, HTTP_referer and other fields that may be larger at the end. If the message is really found to be incomplete, Http_referer = request_URI; (This is implemented in logstash or hangout filters. See hangout-filters below for details.)

Part ii – Storage Middleware (Kafka)

  • Kafka is a high-performance, sequential write to disk, high-throughput distributed publish-subscribe messaging system
  • Kafka has never been a bottleneck, nor has there been much in-depth optimization, with Topic data remaining for 12 hours at 1 copy
  • For different topics, the number of partitions is slightly changed, currently 5 servers, previously simple test increased Partition performance, from 8, 16, 32, 64 increase, the obvious situation is that Partition increases, CPU usage will also increase, because kafka itself is not the bottleneck. No other obvious problems were encountered;
  • At present, the biggest Topic here is nearly 5T data per day, 64Partition without any problem, and some small topics are 16Partitio. The IDLE CPU of the whole Kafka cluster is above 80%, and there is no pressure on memory and IO. We will consider reducing the machine in the future. The Kafka team has a number of recommended values for daily data size less than 50 GB: 4 partitions, 50 GB and less than 100 GB: 8 partitions, greater than 100 GB and less than 500 GB: 16 partitions, greater than 500 GB: 24 partitions.
  • Kafka monitor plugin: kafka-monitor and Kafka-manager

    Note: The kakfa cluster is kafka_2.10-0.8.1.1, but logstash5.x requires kafka >0.10. Later, hangout was used and several JAR packages were replaced to fix the problem

Part III – Data Hangout

  • An application that mimics logstash is not as functional as logstash, but it has all the basic uses. Written in Java, the performance can be multiplied several times. The function used is to subscribe to messages from Kafka, do some simple filtering, and then write ES; Hangout is currently deployed on two servers with 8GB of ram per process and around 60-70 CPU.

    inputs:
       - Kafka:
        topic: 
            nginx_aggregation_log: 32
        codec: 
            json
        consumer_settings:
            group.id: es-nginx_aggregation_log
            zookeeper.connect: "10.10.10.1:2181,10.10. 10.2:2181"
            auto.commit.interval.ms: "20000"
            socket.receive.buffer.bytes: "1048576"
            fetch.message.max.bytes: "1048576"
            num.consumer.fetchers: "1"
    filters:
       - Filters:
       if:
            - '<#if message?? >true
            'If it is not complete JSON, message will appear, then use this logicfilters:
           - Grok:
               match:
                 - '(? 
            
             {"@timestamp":.*"request_uri":([^\?] +) \?) '
            # regular match@timestampStart to the first one after request_URI? As -Gsub:
               fields:
                   msg: ['$'.'"}'] # complete symbol, complete new JSON format -Json:
               field: msg
               remove_fields: ['message'] # delete incorrect dataConvert:
        fields:
            request_time:
                to: float
                remove_if_fail: true
            upstream_response_time:
                to: float
                remove_if_fail: true
            size:
                to: integer
                remove_if_fail: true
    - GeoIP2:
        source: client_ip
        database: '/opt/soft/hangout/etc/other/GeoLite2-City.mmdb'
        - Json:
             field: geoip
    - Remove:
            fields:
                - msg
    - Add:
           fields:
              request_url: '<#assign a=request_uri? split("?" )>${a[0]}'The cardinality of the term # request_URI is high, so? The former is used for aggregation and the original is used for searchif:
             - '<#if request_uri?? >true
            '
    outputs:
    - Elasticsearch:
    cluster: es-nginx
    timezone: "Asia/Shanghai"
    hosts: "10.10.10.1:9300,10.10. 10.2:9300"
    index: 'hangout-nginx_aggregation_log-%{+YYYY.MM.dd}'
    Copy the code
  • The Hangout process management tool is designed to monitor the Hangout process and is designed to enable, stop, and restart the Hangout process on the web interface
  • topic: nginx_aggregation_log: 32 is the number of child threads that need to be set up to read data from Kafka. The number should be equal to the number of partitions. If the number is less than the Partition, one thread will read messages from two partitions simultaneously. If the value is larger than Partition, some processes will not work

Part 4 -Elasticsearch (ES)

  • Hardware environment

    CPU:32C, memory,:128GHard disk:STAT 6T* 12, network card: 10 MBCopy the code
  • Software environment:

    [System] : Centos7 kernel3.10【 JDK 】 :1.8.0_66/31g (it is said that this version of JDK has bugs, please install the latest JDK1】 : the vm swappiness =1 [Reduce hard disk cache][System parameter modification2】 : the vm max_map_count =262144 Elasticsearch uses a mix of NioFS and MMapFS for all types of files, so that there is enough virtual memory available for MMAPPED files. Copy the code
  • ES Configuration file

    cluster.name: es-nginx
    node.name: 10.10.10.1# Use Node for late hot and cold data.attr.rack_id: hdd 
    path.data: /data 
    path.logs: /opt/logs/elasticsearch/
    network.host: 0.0.0.0 
    http.port: 9200Set discovery to a list of primary nodes that can be discovered when a new node is started.zen.ping.unicast.hosts: ["10.10.10.1"."10.10.10.2"."10.10.10.3"] # Prevent brain split (n/2+1) discovery.zen.minimum_master_nodes: 2
    node.master: true
    node.data: falseCopy the code
  • ES jumps into the first pit: node.master and node.data are simultaneously serviced

    The first version of ES was ES5.3. Three machines were built, with one node for each machine, and master and data were configured to provide services together. The high availability architecture cluster was built, but the writing performance was particularly poor, with CPU usage of 20-30% and a small amount of IOAt that time, I thought there must be a problem with the Logstash since the ES hardware was very idle. I checked that the Logstash GC was indeed very serious. I started to expand the NUMBER of SERVERS from 2 to 4, but later I found nothing. From 2, 4, 6, 8… Sector 64 is still fruitless. At that time, this pit can be climbed for a period of time. Later, I accidentally saw a post in Google that said, do not enable both master and data. Then I made a change according to the change, master single point, data two, the problem is solved, the effect map can not be found, at least double some;

    [Master does not consume anything except cards]Copy the code
  • template

    Since the number of shards, field types, and other Settings are generated during creation, corresponding templates should be created in advance for standardized management and reference. The following are some Settings for Shard and aliases:

    {
          "template": "agg-nginx-*"."aliases": {  
            "agg-nginx": {}},"settings": {
            "number_of_shards": 4."number_of_replicas": 1."index.routing.allocation.include.rack_id": "ssd"
          }Copy the code

PUT to _template/ur_name is successfully defined on the shard, but like agg-nginx-

And the test — agg — the test

If you create another “template”: “agg-nginx-test-*”, both index names will match the first one. The default value is “order”: “0”. The higher the value is, the higher the priority is

  • mapping

    ES mapping is very similar to data typing in static languages: declare a variable of type INT, and then the variable can only store data of type int. Similarly, a mapping field of number type can only store data of number type. Mapping not only tells ES what type of value is in a field, it also tells ES how to index the data and whether the data can be searched

    Here is an abridged version of MappingCopy the code
    "mappings": {
        "ngx_log": {
           "_all": {
            "enabled": false
          },
          "properties": {
            "@timestamp": {
              "type": "date"
            },
            "client_ip": {
              "type": "ip"
            },
            "domain": {
              "type": "keyword"
            },
            "geoip": {
              "properties": {
                "city_name": {
                  "type": "keyword"
                },
                "country_name": {
                  "type": "keyword"
                },
                "latitude": {
                  "type": "float"
                },
                "location": {
                  "type": "geo_point"
                },
                "longitude": {
                  "type": "float"}}},"request_time": {
              "type": "float"
            },
            "request_url": {
              "type": "keyword"
            },
            "status": {
              "type": "keyword"
      ype":"keyword"},}}}Copy the code
  1. _ALL Field This _ALL field is a special Catch-all field that concatenates the values of all other fields into a large string, delimited by Spaces, then parsed and indexed, but not stored. That is, it can be queried, but not retrieved. Because the value of each Key is defined in advance, Nginx does not need full-text query, does not need to enable the _all field, and saves half of the storage space

  2. The default text type

    Data is mainly for log analysis, the format of each data has been very clear, mainly for log analysis, so there is no need for word segmentation. Businesses like some all engines are more suitable for needing participles;

  3. For example, the get_ip location field defaults to text, but you can’t use maps without specifying geo_point, which is important
  4. Data types such as request_time require computations such as average, sum, greater than, less than, and so on. The default text can also be used, but with much less efficiency than float
  5. There are many types of fields, such as IP, DATE, etc., for details, please go to the official website for details
  • shard & replicas

    1) Sharding algorithm: Shard = hash(routing) % number_of_primary_shards The shard = hash(routing) % number_of_primary_shards the shard = hash(routing) % number_of_primary_shards the shard = hash(routing) % number_of_primary_shards the shard = hash(routing) % number_of_primary_shards Then divide that with the number of primary slices to get a remainder, which always ranges from 0 to number_of_primary_SHREds – 1, which is the shard in which the particular document resides.

    This also explains why the number of master slices can only be defined at index creation time and cannot be changed: if the number of master slices changes in the future, all previous route values are invalidated and the document is never found.

    All document apis (GET, index, DELETE, BULK, Update, mGET) accept a routing parameter that maps documents to shards using custom definitions. Custom routing values ensure all relevant documentation. For example, users’ articles can be routed according to user accounts so that documents belonging to the same user can be saved in the same shard.

    2) Sharding and copy interaction:

    New, index, and delete requests are write operations that must be successfully completed on the master shard to be copied to the related replicated shard. Below we list the sequential steps necessary to successfully create, index, or delete a document on the master and replicated shard:

    1. The client sends a new, index, or delete request to Node 1.

    2. The node uses the document _id to determine that the document belongs to Fragment 0. It forwards the request to Node 3, where shard 0 is located.

    3. Node 3 executes the request on the primary shard, and if successful, it forwards the request to the corresponding replication nodes on Node 1 and Node 2. When all replicated nodes report success, Node 3 reports success to the requested Node, which reports back to the client.

    By the time the client receives a successful response, the document changes have been applied to both the master shard and all replicated shards. Your changes are in effect.

    How many slices does an index have? When should it be expanded? Depending on the hardware and your requirements for response speed, generally speaking, the data volume of a shard is controlled in the level of 10 or 20 million, the speed is ok, over 100 million will be relatively slow. However, every coin has two sides. The smaller the Shard division, the larger the number of certainties. This may be fine if the user is doing a day or two data search, but if the search takes longer, more shards will need to be searched in parallel. If you have a limited number of nodes, it can be difficult and you need to scale more nodes

  • routing

    It is said to be the king of optimization, often take the city as an example, for example, I want to see the pv of Beijing of the website. If I follow the default hash logic, I must scan the whole SHard, and then aggregate the results. However, if ROUTING is set in advance, for example, the city field is specified for hash calculation, Routing values are placed in certain shards, so that it does not need to scan the whole shard; The disadvantage of this is that the shard size will be uneven, so routing optimization needs to spend some time to observe and test; Currently kibana does not support routing query, so routing is not used on Kibana at present, which is the key point of optimization, so it is recorded first. My follow-up idea is that the fields like nginx log domain name are all English letters, routing for the first letter, when you want to see a certain domain name, do not scan the whole, query optimization will have obvious effect, there will be a result to share with you later; In addition, Hangout started to support routing, and then raised a small issue on GitHub, which was quickly added and liked.

Part V -Kibana graphic display

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You can use Kibana to search, view, and interact with Elasticsearch index data. Using various charts, tables, maps, and more, Kibana makes it easy to display advanced data analysis and visualization. Here are a few features:

  • X-pack offers free access to some features, most notably monitoring, which collects the essential metrics
  • Dev_tools code completion artifact, similar to IDE; There is also a robust visual Search Profiler for direct problem location
  • A post describing how to use the Kibana search box gracefully is a gre at way to Visualize, Visualize, Dashboard, Timelion search, and a gre at way to Visualize
  • There’s also an official Elastic Demo that you can use for your graphics

(╯ man ╰) poking fun at once, with SF, wrote, now is almost can’t use Chrome, typing a little show

  • Let’s just go ahead and get some final results

  • [QPS show]

  • [QPS month-on-month]

  • [Abnormal status code trend]

  • [Slow request QPS]

  • [Status code ratio]

  • [Domin and TOP URL]

  • Another night [heat map], children in Beijing and Shanghai always sleep well

  • The preceding image is a work off (Visualize) by Timelion, the others are all pre-work

    .es(index=index1,timefield=@timestamp).label('Today').title(QPS).color(#1E90FF), 
    .es(offset=-24h,index=index2,timefield=@timestamp).label('Yesterday').lines(fill=1,width=0.5).color(gray)Copy the code
  • Because x-Pack trial version will be the authentication function was castrated, direct exposure of content is too risky, so use the Nignx plug-in auth_BASIC function, to make a simple authentication module

The final tuning

Core problem: The query is slow and the data query for 15 minutes times out

  • Data volume: data in 6-7TB per day,Docs around 10 billion, with a large index in 4-5TB
  • The ES update will bring some new features and bug fixes, but it will also cause new problems. The upgrade from ES5.x to 5.5 is a rolling upgrade. The upgrade procedure is relatively simple, but it takes a long time to complete the upgrade one by one due to the large amount of data. In addition, 1 x upgrade is said to be very painful, have interest to see, the connection here reindex – upgrade description under 5. X upgrade automatically assigned: 1) turn off the divided the process of “cluster. Routing. Allocation. The enable” : “None” 2) Force flush, write cached data to disk as much as possible: _flush/synced 3) Stop a node, start upgrade and upgrade plug-ins, then start 4) Enable automatic sharding: “Cluster routing. Allocation. The enable” : “all” 5) wait for cluster back to green, continue to the next
  • capacity

    Expansion is simple, there is nothing to say, except the node.name in the configuration file is different, everything else is the same;

    Initially, the capacity expansion only expanded the number of data nodes from 3 to 20, and there was no problem in writing 1-2K data, and the single-machine performance was 30-40K.

  • Mapping field optimization:

    Mainly like the introduction of mapping above, type optimization is done, regardless of keyword, mathematical calculation is changed to integer or floating point, etc

    The value of the status column is 200, 301, 302, 404, 502, and at most a few hundred. The value of the status column is 200, 301, 302, 404, 502, and at most a few hundred. For example, with body_size, the maximum and minimum values can vary significantly, whereas the keyword index is inverted and only holds a dictionary of all the statuses, so the keyword is more suitable

  • Shard number adjustment:

    At the beginning, the number of master fragments is 1 (20shard x 1Replicas) per machine, and each shard has reached nearly 200G, which is more than the official recommended value between 30 and 50G (depending on the actual situation). So I multiply the number by 40shard * 1replicas, and it doesn’t make any difference to the query, it doesn’t make any difference to the write, so I double it, it still doesn’t make any difference and the more fragments you have, the more CPU you consume when you write, right

    Overallocation of shards increases the complexity of Lucene’s effort to merge shard query results, which in turn increases the time it takes to merge shard query results, and the cost of creating new indexes. Generally speaking, it is better to start with as few sharding as possible, only when there is too much data on a shard and a single query is too slow to consider adding sharding. Sharding also has the downside of increasing concurrency, which leads to more overhead, and more shards generally accompany more segment files. If the number of nodes is the same, there will be more small files per node, and the concurrency of searching will be higher if the disk IO and CPU are still able to handle it


  • Break up the index

    Changing the shard has no significant effect, so it starts with unindexing. Nginx logs are used to retrieve the first letter of the domain name and write it to the corresponding index. For example: Nginx-log -{initial} creates 26 replicas (shard20 * 1replicas), and the CPU immediately loads 30+ replicas. Then the ES data cannot keep up, and rejects many of them, as shown in the following figure

    The above index split is silly, the first known problem is that the domains starting with A may be very large, other small is very uneven

  • Node.attr. Rack_id = node.attr. Rack_id = node.attr. When data is written to SSD, there is no pressure in the whole I/O process. When data is written to an index of 4-5T per day, the CPU is around 50%, but the query is still not effective. After countless shard adjustments, I/O is still not idle
  • kibana BUG

    I have no experience in ES before, so THERE is no breakthrough in the methods and concerns I can think of. When I am at my wit’s end, I am very lucky to know KennyW. Thanks again for KennyW’s support. The root cause of the problem is that the two versions of Kibana 5.4-5.5 are querying the problem. In order to save space, I turned off the _all field. If no content is entered in the search box, it will add * to query all fields, resulting in a large number of bool conditions, causing defects in Kibana.



    The version KennyW gave at the time was known to be a temporary fix as follows:

    See here for detailsKibana performance issues caused by ES 5.4+This issue has been fixed since Kibana5.5.2, so the final version is still upgraded to 5.6, which is the current latest version
  • In fact, SSD has not found other performance bright spots. After communication with KennyW, they have done a comparison between SSD and disks with multiple HDDS, and there is no significant increase in the amount of write, so it is not a real bottleneck in disk, SSD has significantly improved in query. But the high cost of SSDS doesn’t mean much for a few seconds. In contrast, for the search business of some companies, the data magnitude is small, and for some monitoring business, the real-time requirement is very high. SSD is a good choice, fast speed, capacity does not need too much, but also relatively affordable

    [KennyW’s experience guide] When a lot of buckets are complex, they mainly consume CPU and memory, and disk is not the bottleneck. When we actually use HDD and SSD, we feel that there is no difference in big data scenarios. SSD advantage lies in a large number of random disk IO, and ES itself has made a lot of optimization. Make data sequence of disk access, and search a data block, ES can use file system cache to speed up, so even if use HDD, may have to search for the first time disk access bring extra seconds time consuming, but many times to perform the same search, behind several little disk IO overhead, speed will have obvious ascension

  • The importance of RAID0

    When all the clusters were changed back to HDD clusters, a previously neglected metric, per-disk utilization, was discovered. Just start to solve our machines are naked plate directly hung, path. Data: data1, data2, data… Mount like this, but found such disk problems in 5.x version (CMD:iostat -x 1)This seems very strange, why this disk is so busy, observe for a while, and then find that other disks are very busy, and df check disk usage, hit the heart

    There are so many disks that you can’t even write to half of them. In this case, there’s nothing left to say. Let’s just go to raid0
  • finishing

    If the index is less than 1T a day, there is no problem with any query. The query returns all data within 10s (shown by Kibana in the figure above). However, there is still a problem with the index that is 4-5T a day. The problem is that the request_URI platform is too scattered, and aggregation problems occur as shown in figure: more than 20 million different values appear in 15 minutes, which can be incredibly scary if calculated for a long time

    Request_uri generates a lot of disk IO. When ES does terms aggregation, in order to save memory, the contents of all terms will not be directly read out as bucket, because some terms may be very long, which will consume memory. So he relies on a data structure called Oridinals, which looks something like this: 1 ABC 2 EFG 3 hFA………….

    A sequential list of all the different values of the field. When bucket aggregation is performed, you only need to use this sequence number as the key. After the aggregation result is obtained, you can check the ordinals table with this key to obtain the actual filling result of the key value. But the ordinals are generated with the segment file, and the ordinals may be in different order for each segment file. Therefore, a global ordinals is required for aggregation. This data structure is generated in real time and cached in memory during aggregation. If a field has a lot of different values that are very expensive to compute, then the construction process is very slow, and in addition to consuming a lot of disk IO, there is also a lot of memory consumption.

    Below is a pre – and post-deployment comparison before and after switching off the problematic VISUALIZE, which is unpleasant, but considerably less promising

    Later on on uri’s? The following parameters are discarded, such problems can only be reduced and then used for aggregation. The original data can be used for search, but due to the large data, complex calculation will still timeout 30s. In this case, the cardinality of the team can only be reduced, or sharding, adding machines, or disindexing will be added. So a few tweaks have been made to the Kibana timeout, as well as some minor peripheral changes below

    1) Kibana default 30s timeout is changed to 2min, modified in Kibana.yml

    elasticsearch.requestTimeout: 120000

    2) Kibana default map using Autonavi, kibana. Yml new

    tilemap.url: 'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=7&x={x}&y={y}&z={z}'

    3) combined withcerebroPlug-ins for efficient index management

subsequent

Here is the end of the basic phase of the project, although part of the query did not return quickly, but the basic data can be displayed, the follow-up will focus on several points to continue in-depth optimization and adjustment, there are results to share with you

  • Routing Optimized query
  • Curator manages expired indexes
  • Compare the large index index to split
  • Add client nodes to reduce cluster impact
  • Hot and cold data separation (Node.attr. rack), regular force_merge of cold data, optimized query speed

conclusion

Thanks to rockybean and KennyW for their support. My summary is that I have too little ES experience and have stepped in many unnecessary pits. In addition, I have not properly read the official website documents, which greatly affects efficiency and progress. In many cases, I am at a loss as to where to start. This is the first time to write a technical post, and the time is a little tight, please point out any bad or sensitive writing, I will correct in time. Hope this article, can give the small white user like me step on a little less pit, a little more love.

The article is a little long, we think the author summed up can also pay attention to the author’s public account “Java technology zhai”, the public account is not only talking about Java technology knowledge, there are interviews and other dry goods, there are a lot of architecture dry goods. Everyone pay attention to it! Follow technology ZHAI for more information…………..