How do I collect data to ElasticSearch service

When it comes to data search and analysis, ElasticSearch is everywhere. Developers and the community can use ElasticSearch to find a wide variety of use cases, from application search and website search to logging, infrastructure monitoring, APM, and security analysis. While there are now free solutions for these use cases, developers first need to provide their data to ElasticSearch.

This article describes several of the most common ways to gather data into the ElasticSearch service. This could be a cluster hosted on Elastic Cloud or its local solution, Elastic Cloud Enterprise. Although we are mainly focusing on these services, the data gathered from the managed Elasticsearch cluster looks pretty much the same. The only change is the way you handle clusters.

Before we delve into the technical details, a word of caution: If you encounter any questions while reading this article, feel free to visit discuss.elastic. Our community is very active and you can expect to find the answers to your questions there.

Next, we begin to delve into data collection using the following methods:

Elastic Beats
Logstash
Language Client
Kibana development tools

支那

Gather to the ElasticSearch service

支那

Elasticsearch provides a flexible RESTful API for communicating with client applications. Thus, REST calls are used to collect data, perform searches and data analysis, and manage the cluster and its indexes. In fact, all of the above methods rely on this API to gather data into ElasticSearch.

In the remainder of this article, we will assume that you have already created an Elasticsearch service cluster. If you haven’t already created one, sign up for an Elastic Cloud free trial. After you create the cluster, you will be provided with the Elastic superuser account’s cloud ID and password. The format of the cloud ID is as follows: cluster_name:ZXVy… Q2Zg = =. It encodes the URLs of your cluster and, as we will see, simplifies data collection.

支那

Elastic Beats

支那

Elastic Beats is a set of lightweight data collectors that send data to Elasticsearch. Being lightweight, Beats does not incur much runtime overhead, so it can run and collect data on devices with limited hardware resources, such as IoT devices, edge devices, or embedded devices. If you need to collect data but don’t have the resources to run a resource-intensive data collector, then Beats is your best choice. This ubiquitous data collection approach, covering all connected devices, allows you to quickly detect and respond to unusual situations, such as system-wide issues and security incidents.

Of course, Beats isn’t limited to systems with limited resources; they can also be used on systems with more hardware resources available.

支那

BEATS series: overview

支那

Beats has a variety of styles and can collect different types of data:

FileBeat lets you read, process, and transfer data from sources that are provided as files. Although most users use FileBeat to read log files, it also supports any non-binary file format. FileBeat also supports many other data sources, including TCP/UDP, containers, Redis, and Syslog. Rich modules make it easy to collect and parse data for the logging formats of common applications such as Apache, MySQL, and Kafka.
MetricBeat collects and preprocesses system and service metrics. System metrics include information about running processes and data on CPU/ memory/disk/network utilization. These modules can be used to collect data from many different services, including Kafka, Palo Alto Networks, Redis, and more.
PacketBeat collects and preprocesses real-time network data to support application monitoring, security and network performance analysis. In addition, PacketBeat supports the following protocols: DHCP, DNS, HTTP, MongoDB, NFS and TLS.
WinLogbeat captures event logs from the Windows operating system, including application events, hardware events, and security and system events. For many use cases, the wealth of information obtained from the Windows event log can be very useful.
Auditbeat detects changes to key files and collects events from the Linux audit framework. Different modules simplify its deployment, which is primarily used in security analysis use cases.
Heartbeat uses probes to monitor the availability of systems and services. Therefore, Heartbeat can be useful in many scenarios, such as infrastructure monitoring and security analysis. ICMP, TCP, and HTTP are all supported protocols.
FunctionBeat collects logs and metrics from serverless environments such as AWS Lambda.

Once you’ve decided which Beats to use in a particular scenario, getting started, as described in the next section, is pretty straightforward.

支那

Introduction to Beats

支那

In this section, we’ll learn how to get started using Beats using MetricBeat as an example. For the other Beats, the steps are similar. For your specific BEAT and operating system, refer to this document and follow the steps below.

Download and install the required BEAT. There are many ways to install Beats, but most users will choose to use the Elastic repository (DEB/RPM) provided for the operating system’s package manager, or simply download and unzip the provided TGZ /zip package.
Configure BEAT and enable any required modules.

For example, to collect metrics about Docker containers running on the system, passsudo metricbeat modules enable dockerTo enable theDocker module(if you install using the package manager). If you installed by unzipping the TGZ /zip package/metricbeat modules enable docker.
Cloud ID is an easy way to specify which ElasticSearch service to send the collected data to. Add the cloud ID and authentication information to the MetricBeat profile (metricbeat.yml) :

cloud.id: cluster_name:ZXVy… Q2Zg== cloud.auth: “elastic:YOUR_PASSWORD”

As mentioned earlier,cloud.idIt is provided to you when you create the cluster.cloud.authIs a colon-delimited concatenation of username and password that has been granted sufficient permissions on the Elasticsearch cluster.
For a quick start, use the Elastic superuser and password that was provided when the cluster was created. If you use the package manager installation, you can install it at/etc/metricbeatDirectory to find the configuration file; If you use the TGZ /zip package to install, it is in the unzipped directory.

Load the preinstalled dashboard into Kibana. Most BEATS and their modules come with pre-defined Kibana dashboards. Yes, if you are using the package manager installationsudo metricbeat setupLoad them into Kibana; If installed using the TGZ /zip package, run it in the unzipped directory./metricbeat setup.
Run the Beat. If you are using the package manager installation on a systemd-based Linux systemsudo systemctl start metricbeat; If you are using the TGZ /zip package to install./metricbeat -e

If everything works, the data will start flowing into the ElasticSearch service.

支那

Explore the preloaded dashboard

支那

Go to Kibana in the Elasticsearch service to see the data:

In Kibana Discover, selectmetricbeat-*Index mode, you will be able to see the individual documents that have been collected.
The Kibana Infrastructure TAB allows you to examine system and Docker metrics in a more graphical way by showing usage of system resources (CPU, memory, network) in a variety of charts.
In the Kibana dashboard, select any dashboard with the [MetricBeat System] prefix to view the data interactively.

支那

Logstash

支那

Logstash is a powerful and flexible tool that can read, process, and transfer any type of data. Logstash provides many features that are not currently available or too expensive to implement through Beats, such as enriching documents by performing lookups to external data sources. Either way, Logstash’s power and flexibility comes at a cost. Logstash also has significantly higher hardware requirements than Beats. Strictly speaking, Logstash should not normally be deployed on low-resource devices. Therefore, Logstash can be used as an alternative in cases where Beats functionality is insufficient to meet the requirements of a particular use case.

A common architectural pattern is a combination of Beats and Logstash: use Beats to collect data, and use Logstash to perform any data processing that Beats can’t.

支那

Logstash overview

支那

Logstash works by executing event handling pipes, each of which contains at least one of the following:

Input reads data from the data source. Multiple data sources are officially supported, including file, HTTP, IMAP, JDBC, Kafka, syslog, TCP, and UDP.
Filters process and enrich data in a variety of ways. In many cases, you first need to parse the unstructured log rows into a more structured format. So, among other things, Logstash provides filters to parse CSV, JSON, key/value pairs, delimited unstructured data, and complex unstructured data (Grok filters) on top of regular expressions. Logstash also provides additional filters to enrich data by performing DNS lookups, adding geographic information about IP addresses, or performing lookups against custom directories or Elasticsearch indexes. With these additional filters, you can perform various transformations on the data, such as renaming, deleting, and copying data fields and values (mutate filters).
The output is parsed and richly written to the data sink, which is the final stage of the Logstash processing pipeline. There are many output plugins available, but here we’ll focus on how to use ElasticSearch output to capture data into the ElasticSearch service.

支那

Logstash sample pipe

支那

No two use cases are the same. As a result, you may have to develop a Logstash pipeline that meets your specific data input and requirements.

We provide a sample Logstash pipe that can

Read the Elastic Blog RSS feed
Perform some simple data preprocessing by copying/renaming fields and removing special characters and HTML tags
Collect documents to ElaticSearch

Here are the steps:

Install Logstash from the package manager or by downloading and unpacking the TGZ /zip file.
Install the LogstashRSS input plug-in, which enables reading RSS data sources:./bin/logstash-plugin install logstash-input-rss
Copy the following Logstash pipe definitions to a new file, such as ~/ Elastic-RSS.conf:

input { rss { url => “/blog/feed” interval => 120 } } filter { mutate { rename => [ “message”, “blog_html” ] copy => { “blog_html” => “blog_text” } copy => { “published” => “@timestamp” } } mutate { gsub => [ “blog_text”, “<.*?>”, “”, “blog_text”, “[\n\t]”, ” ” ] remove_field => [ “published”, “author” ] } } output { stdout { codec => dots } elasticsearch { hosts => [ “https://<your-elsaticsearch-url>” ] index => “elastic_blogRead More

In the file above, change the parameters host and password to match your Elasticsearch service endpoint and Elastic user password. In Elastic Cloud, you can get the ElasticSearch endpoint URL (copy endpoint URL) from the details of the deployment page.

Executing the pipeline by starting Logstash:./bin/ Logstash-f ~/ Elastic-RSS.conf

It takes a few seconds to start up Logstash. You should see dots appear on the console (…..) . Each dot represents a document that has been captured for Elasticsearch.

Open the Kibana. In the Kibana Developer console, do the following to confirm that 20 documents have been captured: Post Elastic_Blog / _Search

For more details, see the excellent blog post Logstash Practical Introduction. Refer to the Logstash documentation for full details.

Language Client

In some cases, it is best to integrate data collection with custom application code. For this we recommend using an officially supported Elasticsearch client. These clients are libraries that abstract away the low-level details of data collection, allowing you to focus on the actual work of a particular application. Java, JavaScript, Go,.NET, PHP, Perl, Python, and Ruby all have official clients. For all details and code examples of your language of choice, just refer to the documentation. If your application is not written in any of the languages listed above, chances are you will have a community-contributed client.

Kibana development tools

Our recommended tool for developing and debugging Elasticsearch requests is the Kibana Development Tools Console. The developer tools expose the full power and flexibility of the generic Elasticsearch REST API, while abstractions the technical details of the underlying HTTP requests. Unsurprisingly, you can use the development tools console to put the raw JSON document into Elasticsearch:

PUT my_first_index/_doc/1 { “title” :”How to Ingest Into Elasticsearch Service”, “date” :”2019-08-15T14:12:12″, “description” :”This is an overview article about the various ways to ingest into Elasticsearch Service” }

支那

Other REST clients

** With Elasticsearch providing a common REST interface, you can find any REST client you like to communicate with Elasticsearch and collect documents. While we recommend trying the tools mentioned above first, there are a number of reasons you might want to consider other options. For example, curl is a tool that is often used as a last resort for development, debugging, or integration with custom scripts.

conclusion

There are numerous ways to gather data into the ElasticSearch service. No two scenarios are the same; The specific method or tool you choose to collect data depends on your specific use case, requirements, and environment. Beats offers a convenient, lightweight out-of-the-box solution to collect and capture data from many different sources. The modules packaged with BEATS provide configuration for data acquisition, parsing, indexing, and visualization for many common databases, operating systems, container environments, Web servers, caches, and more. These modules provide a five-minute data-to-dashboard experience. Because Beats is lightweight, it is ideal for resource-constrained embedded devices, such as IoT devices or firewalls. Logstash, on the other hand, is a flexible tool that can be used to read, transform, and collect data, with a large number of filters, input, and output plug-ins. If the functionality of Beats is not enough for some use cases, a common architectural pattern is to use Beats to collect data and further process it through Logstash before gathering it into Elasticsearch. We recommend using the officially supported client libraries when collecting data directly from the application. The Kibana Development Tools console is great for development and debugging. Finally, the Elasticsearch REST API gives you the flexibility to use your favorite REST client. Are you ready to learn more? I recommend you to read the following articles:

Should I use the Logstash or Elasticsearch acquisition node?
Use BEATS system module to get system logs and metrics to ElasticSearch

To learn more about Elastic technology, please follow and sign up for the webinar. The upcoming schedule is as follows:

Wednesday, February 19, 2020 15:00-16:00 Building Omni-Observable Instances Using Elastic Stack

Wednesday, February 26, 2016 15:00-16:00 Kibana Lens Webinar

Wednesday, March 4, 2020 15:00-16:00 Elastic Endpoint Security Overview Network

Monitor website resources with Elastic Stack Wednesday, March 11, 2020 15:00-16:00