Elasticsearch (1)

Course Content introduction

Core Knowledge (Key Points)

Course characteristics

(1) Use the latest Version of Elasticsearch 5.2 to explain how the ES core works

(2) Covers all core knowledge points of Elasticsearch, systematic, complete and detailed, with some depth, including complete Java development demonstration

(2-1) Comprehensive knowledge system, including working principle, document management, index management, search, aggregation analysis, word segmentation, data modeling, Java API, etc. (2-2) In-depth and detailed knowledge, completely killing the existing books and videos on the market, such as index,segment, Merge principle, Optimistic lock concurrency control, index alias and zero shutdown, correlation scoring algorithm and customization, approximate aggregation algorithm, Doc Values and FieldData mechanism principle, parent-child relationship data modeling, Java API implementation of Scroll Search and other complex operations, and so on

(3) I have to practice every lecture in the whole process. A large number of practical cases and computer experiments are carried out. Real knowledge comes from practical knowledge, and the knowledge learned in practical knowledge includes a practical project

(4) can take the es all core knowledge and understanding of core principles es, and able to skillfully operate all knowledge and function, and the ability to master the basic deployment of es cluster, and is based on Java development a search engine is suitable for small and medium enterprises, and data analysis system, achieve learned can fit to the point of use in small to medium sized projects

Master Advanced Chapter (key)

Course characteristics

(1) Use the latest Version of Elasticsearch 5.2

(2) Contains all the advanced knowledge of Elasticsearch that is almost not available on the market: Suggester Search has been complete with advanced knowledge about geolocation search and aggregation analysis, term Vector, suggester Search, search template customization, Query execution analysis, dozens of comprehensive aggregation analysis, SPAN Query, SHard assignment customization, and ES plug-in development

(3) The whole process of every lecture must practice, a large number of cases of actual combat and computer experiments

(4) Including a complex actual combat project, using the knowledge learned to develop a complex location-based smart restaurant APP search engine and data analysis system, using ES from the core to the advanced part of all the high-level knowledge points

(5) I can master all knowledge points of ES from core to advanced, master a complete and in-depth KNOWLEDGE system of ES, and at the same time operate all knowledge points and functions by hand. Finally, I can develop an advanced search engine based on geographical location based on Java in small and medium-sized companies through project practice. And advanced real-time data analysis systems using complex aggregation operations for analysis

Operation and Maintenance optimization of large cluster

Course characteristics

(1) the most comprehensive Elasticsearch operations, management, tuning, troubleshooting of knowledge: the construction of the enterprise monitoring system, deployment of enterprise clusters, cluster daily management strategy, the cluster upgrade scheme, cluster benchmark pressure measurement scheme, clusters of data backup and recovery, system core configuration parameters, performance tuning scheme of fault handling solution

(2) The whole process of each lecture must be practiced, a large number of computer experiments, all the operation and maintenance, management, deployment, optimization, all the computer experiments

(3) From scratch, gradually build a large scalable, high-performance, complete monitoring system, complete management system of the distributed cluster

(4) Besides the development of complex ES search/analysis system, I can also build a distributed large-scale ES cluster from scratch in any company, and develop perfect monitoring, operation and maintenance, management, optimization and other programs

Large Project Architecture section

Course characteristics

(1) Covers the two core application areas of Elasticsearch: vertical search engine and real-time data analysis

(2) Developed two enterprise-level large-scale complex projects, which are completely real large-scale enterprise projects, e-commerce search engine and e-commerce real-time data analysis platform

(2-1) large electric commercial search engine, including the real complex large enterprises, large commercial grade search engine architecture of the project, including the retrieval, data update, sorting, participles, query, analysis, and other core modules, at the same time architecture implements complex caching mechanism, mechanism of warm start, avalanche prevention mechanism, automatic degradation mechanism of high availability, and so on

(2 – (2) large electricity business real-time data analysis platform, integrated, complex and large electrical business data analysis, including the index system of comprehensive data analysis (operational indicators, flow index, index of sales conversion, customer value index, commodity index, index of marketing, risk control indicators, the market competition index), one-stop construct complex, at the enterprise level, E-commerce data analysis platform

(2-3) to separate out a large project of actual combat, because, before a few projects, architecture is relatively simple, more business than complex projects, mainly is suitable for the small and medium-sized companies, and the two projects are mainly focused on the use of ES technology itself to develop the function of the need to (search/analysis). The focus of this project is to adopt large complex projects of large companies as the background, so that students can master the ability of large-scale project architecture based on ES technology and reach the level of architects. For example, the large-scale e-commerce search engine is mainly realized by ES, but besides ES, there are complex architectures of large-scale systems that need to be explained. There are also large e-commerce real-time data analysis platforms, which are mainly characterized by cumbersome and complex businesses. Therefore, large-scale data analysis platform architecture needs to be built based on ES.

(3) mastered es in the previous course of enterprise cluster operations management, complex search engines and data based on analysis of the development of reference technology, can now use of knowledge, combined with electric business domain knowledge, develop the business of the real and complex, large electricity system such as search, analysis, and the architecture, Thoroughly grasp the experience and ability to develop large project architecture using ES and ELK related technology stack in medium and large enterprises

ELK is easy to understand

Course characteristics

(1) E-commerce system log retrieval platform, using ELK technology stack, will explain logStash and Kibana technologies in detail, including logStash plug-in mechanism, monitoring scheme, large-scale expansion scheme, upgrade scheme, performance tuning scheme and Kibana visual presentation scheme. How to use ELK technology stack to develop a large scale log storage and retrieval platform

(2) Finally, it is necessary to explain the Logstash and Kibana technologies in depth, and combine ES technology with ELK technology stack to realize a large-scale enterprise-level log collection and retrieval platform.

(3) Can thoroughly master ELK technology stack, and can use ELK technology stack to quickly build log retrieval platform, as well as data visualization platform

Elasticsearch (2)

What is a search?

Baidu: For example, when we want to find any information, we will search hundreds of times, for example, to find a favorite movie, or to find a favorite book, or to find an interesting news (referring to the first impression of the search) baidu = search, this is wrong

Vertical search (site search)

  • Internet search: e-commerce websites, recruitment websites, news websites, various APPS

  • IT system search: OA software, office automation software, meeting management, schedule management, project management, staff management

    • Search “Zhang SAN”, “Zhang SAN er”, “Zhang Xiaosan”;
    • There’s an e-commerce site, a seller, a backend management system, search for “toothpaste,” orders, “toothpaste related orders.”

Search, in any situation, is to find the information you want, in this case, you enter a paragraph of the keyword you want to search, and then expect to find some information about that keyword


What if you use a database to do a search?

If you do software development, or have a certain understanding of IT and computer, you know that data are stored in the database, such as commodity information of e-commerce websites, job information of recruitment websites, news information of news websites, and so on. So, it’s very natural, if you think from a technical point of view, how to achieve, say, e-commerce site internal search function, you can consider, to use the database to search.

  • What if you use a database to do a search?

1, for instance, each record in the specified field in the text, can be long, such as the length of the “goods description” field, there are a thousands, even tens of thousands of characters, this time, every time to scan the text for each record, lazy judgment, said your package does not contain I specify the keywords (such as “toothpaste”)

2. You can’t break down the search terms yet. Try to find as many results as you want, for example, if you type in “biochemical machine”, you won’t get “Resident Evil”

Weaknesses are summed up in simple words:

  1. Large amount of data leads to poor search performance
  2. Unable to split words, matching, the accuracy of the search is relatively low

Using a database to achieve search, is not very reliable. Generally speaking, the performance will be poor.

What is full text search and Lucene?

(1) Full text retrieval, inverted index

To understand:

  1. Inverted index (divide words by entering terms, associate terms after terms, and find corresponding terms by searching terms)
  2. The reason why the database is not so good: the database entry is 1 million, the whole table search, and the need for comprehensive matching (although there is a fuzzy matching, time-consuming), slightly inconsistent, can not be extracted from the data, and time consuming
  3. The 1 million entry by inverted index, participles, the number of separate word 10 million, we search, search to the first meet the conditional word, can stop, find the words associated with entry, is returned as a result, data retrieval speed, at the same time can be found to stop (stop) early, so very fast search of the whole time

(2) Lucene, is a JAR package, which contains all kinds of encapsulated code to build inverted index and search, including various algorithms. We use Java development, the introduction of Lucene JAR, and then based on Lucene API to carry out development can be. With Lucene, we can index existing data, and Lucene will organize the index data structure for us on the local disk. In addition, we can use some of the functions and apis provided by Lucene to search for index data on disk

What is Elasticsearch?

(1) Graphic analysis

The process of evolution:

  1. Lucene is actually have been able to complete all the inverted index and data retrieval process, but the service is only for single, we can extend to multiple machines, forming a distributed structure, but there is no connection between each other machines, we also need to write a scheduling program to control distributed lucene data storage and data retrieval, As well as maintaining the use of participles in the distributed, but also to ensure data loss and high availability, lucene urgently needs to solve the problem

  2. Elasticsearch solves all of the above problems

    • Elasticsearch shard is lucene with special function, can perform the basic function of inverted index, data retrieval
    • Elasticsearch also has primary-shard and replicated shard structures: this ensures that data is not lost and load balanced
    • Elasticsearch is also a distributed structure, which solves the problem of large amount of data and machine expansion. At the same time, the problem of different Lucene scheduling is solved
    • Elasticsearch packages more advanced features to support more complex search, aggregation analysis and other traditional database data analysis capabilities