Migrate to Elastic Common Schema (ECS) author Mathieu Martin in the BEATS environment

In February 2019, we launched Elastic Common Schema (ECS). In a nutshell: By defining a common set of fields and data types, ECS enables users to perform unified search, visualization, and analysis across different data sources. This is a huge advantage in a diverse environment with a variety of vendor standards, where similar but different data sources are often used at the same time.

In addition, we mentioned that the process of implementing ECS is not an easy task. In fact, in order to generate ECS-compliant events, many of the fields emitted through the event source must be copied or renamed during data collection.

At the end of our introduction to ECS, we mentioned that if you have configured the Elasticsearch index template and have written a few conversion functions using either the Logstash or Elasticsearch capture pipelines, you have a pretty good idea of what you need to do. Depending on how you design the Elastic Stack data acquisition pipeline, the amount of work required to migrate your environment to ECS will vary. On the one hand, BEATS and BEATS modules will support the completion of well-organized migration tasks to ECS. The BEATS 7 event is already in ECS format, and the only job left is to establish a link between the existing analysis content and the new ECS data. The other side is the way the user has built all the custom pipes.

Migration to ECS via Elastic Stack 7

Changing the names of many of the event fields Beats uses would be a disruptive change for users. With this in mind, we have introduced ECS field names in the latest major release of Elastic Stack, version 7.

This post will begin with an overview of how to migrate to ECS using Beats in the context of the upgrade from Elastic Stack 6.8 to 7.2. Next, we’ll walk through an example of migration of a BEATS event source.

It’s important to note that this blog post will cover only the part of the migration to Version 7. Beats should be upgraded after Elasticsearch and Kibana as our Stack Upgrade Guide suggests. Therefore, the migration example in this blog post will only cover upgrading Beats and assume that Elasticsearch and Kibana have been upgraded to version 7. This will allow us to focus on the details of upgrading BEATS from its previous architecture to ECS.

When planning an Elastic Stack 7 migration, be sure to check out [the guide above] and the Kibana Upgrade Assistant, and of course, be sure to carefully check out the Upgrade Notes and major changes to any part of the Stack you’re currently using.

Note: If you are considering adopting Beats and don’t have data from Beats 6, you don’t need to worry about migration. You can just start using BEATS 7.0 or later, which generates events in ECS format immediately.

A conceptual overview of migration to ECS

Any migration to ECS will involve the following steps:

  1. Convert the data source to ECS
  2. Resolve discrepancies and conflicts between event formats prior to ECS and ECS events
  3. Tune the analysis content, pipeline, and application to use ECS events
  4. Make events prior to ECS compatible with ECS for smooth migration
  5. Remove field aliases after active migration to ECS

In this blog post, we discuss these steps in detail in the context of migrating the BEATS environment to ECS.

After the overview below, we will provide step-by-step examples of how to upgrade a FileBeat module from 6.8 to 7.2. The purpose of providing this sample migration is to make it easy for you to migrate on your workstation, perform each part of the migration, and experiment during the migration.

Overview of migrating BEATS environment to ECS

There are many ways to complete each part of the migration process described above. Let’s discuss the migration of each part in the context of migrating your BEATS events to ECS.

Convert the data source to ECS

Beats comes with a selection of event sources. Since BEATS 7.0, all event sources have been converted to ECS format for you. BEATS handlers that add metadata to events, such as ADD_HOST_METADATA, have also been converted to ECS.

However, it’s important to understand that Beats can sometimes serve as a simple transport for events. For example, events collected by WinLogBeat and JournalBeat, as well as any FileBeat input using custom logs and events (except for the FileBeat module itself), can be transmitted using Beats. You need to map to ECS each custom event source that you are currently collecting and parsing yourself.

Resolve architectural differences and conflicts

In fact, the essence of migrating to ECS is to standardize field names across many data sources. This means that many fields will be renamed.

Field renames and field aliases

During the transition between the two formats, there are several ways to handle both pre-ECS events and ECS events. Here are the main options:

  • Use the Elasticsearch field alias so that the new index recognizes the old field name
  • Duplicate the data in the same event (populate both the old field and the ECS field)
  • Do nothing: Old content is only applicable to old data, new content is only applicable to new data

The simplest and most cost effective way to do this is to use the Elasticsearch field alias. This is the migration path chosen for the BEATS upgrade process.

However, field aliases have some limitations and are not a foolproof solution. Let’s discuss the benefits and limitations of field aliases.

The field alias is just an additional field in the ElasticSearch mapping for the new index. They allow the new index to respond to the query with the old field names. Let’s look at a simplified example showing only one field:

More precisely, field aliases will help:

  • Aggregate and visualize alias fields
  • Filter and search for alias fields
  • Performs auto-completion on the alias field

Here are some caveats about field aliases:

  • Field aliases are just a feature of the Elasticsearch mapping (search index). Therefore, they do not modify the document source or its field names. The document consists of either the old field names or the new field names. To illustrate this point, here are some places where aliases don’t help, since you can access fields in the document directly:

    • Column to display in a saved search
    • Other processing in your data collection pipeline
    • Any application that uses Beats events (for example, through the Elasticsearch API)
  • Because a field alias is itself a field entry, we will not be able to create the alias when there is a new ECS field with the same name
  • Field aliases only apply to leaf fields – they cannot be aliased for complex fields, such as those containing other nested fieldsobjectfield

By default, these field aliases are not created in the BEATS 7 index. They must be enabled by setting migration.6_to_7.enabled: true in each Beat’s YAML profile before running the BEATS installation steps. This option and the corresponding alias will be available for the Elastic Stack 7.x lifetime, but will be removed in 8.0.

conflict

Depending on the data source you use, you may also encounter field conflicts when migrating to ECS.

It is important to note that some types of collisions will only be detected in the fields that are actually used. This means that any changes or conflicts in your unused data sources will not affect you. But it also means that when planning your migration, you should sample events from each data source in both formats (BEATS 6 and 7) in your test environment to find any conflicts that need to be resolved.

There will be two types of conflicts:

  • The data type of the field will be changed to a more appropriate type
  • The field names used before ECS are also defined in ECS, but they have different meanings. We call these incompatible fields

The exact consequences of each conflict will vary. But in general, when a field is about to change its data type or when an incompatible field is also changed in terms of nesting (for example, a keyword field becomes an object field), you will not be able to query the field in both the previous ECS data source and the ECS data source.

As the data enters BEATS 6 and 7 indexes, refreshing the Kibana index mode exposes these conflicts. If you do not see a conflict warning after refreshing the index mode, there is no conflict to resolve. If you receive a warning, you can set the data type selector so that only conflicts are shown:

These conflicts are handled by re-indexing the data from the past to make it more compatible with the new architecture. Conflicts caused by type changes are easy to resolve. You can override the BEATS 6 index schema directly to use a more appropriate data type, re-marshal the data into the new index (so that the updated mapping takes effect), and then drop the old index.

If you have incompatible fields, you must decide whether to delete or rename them. If you decide to rename a field, be sure to define it in indexing mode first.

Adjust your environment to use ECS events

As many of the field names change, the sample analysis content Beats provides (such as the dashboard) will be modified to use the new ECS field names. The new content only applies to ECS data generated by BEATS 7.0 and higher. Given this fact, the Beats installation will not overwrite existing Beats 6 content, but will create a second copy of each Kibana visualization. Each new Kibana visualization name is the same as the previous one, with the addition of “ECS” at the end.

Due to the use of field aliases, the BEATS 6 sample content and custom content based on this old architecture will generally continue to use BEATS 6 and 7 data. However, as we mentioned earlier, field aliases are only a partial and temporary solution to help with migration to ECS. Therefore, the migration should also include updating or copying your custom dashboard so that you can start using the new field names.

Let’s illustrate this with a table:

In addition to viewing and modifying the analysis content in Kibana, you also need to look at any custom parts of the event pipeline, as well as the application that accesses Beats events through the Elasticsearch API.

Make events prior to ECS compatible with ECS

We have already discussed how you can use reconstructed indexes to resolve data type conflicts and incompatible fields. Although rebuilding the index to resolve these two types of changes is not the only solution, it is very simple to implement. So it’s worth a try in most cases. For simple use cases, ignoring conflicts is also a solution, but be aware that potential field conflicts will affect you from the time you start collecting BEATS 7 data to the time BEATS 6 becomes obsolete in the cluster.

Rebuilding the index

If the support provided by field aliases is not enough to address your situation, you can also rebuild the index on past data to backfill ECS field names in BEATS 6 data. This ensures that all new analytics that rely on ECS fields (new BEATS 7 content and updated customizations) can query not only BEATS 7 data, but also old data.

Modify events during data collection

If you expect the Beats 7 agent to be out for a long time, there are other options besides rebuilding indexes from the past. You can modify incoming BEATS 6 events during data collection.

There are several ways to rebuild an index and perform document operations, such as copying, deleting, or renaming fields. The most straightforward way to do this is to use ElasticSearch to capture the pipeline. Some of the advantages of this approach are listed below:

  • They are easy to test via the _Simulate API
  • You can use a pipe to rebuild an index from the past
  • You can use pipes to modify BEATS 6 events that are still passing in

To modify an event when it comes in, in most cases you just need to configure the “pipe” Settings that Elasticsearch output to send to your pipe. The same is true for Logstash and Beats.

Note that the FileBeat module already uses the capture pipe to perform its parsing. You can also modify them by overwriting the FileBeat 6 pipe and adding an annotation to the adjustment pipe.

Delete field alias

When a field alias is no longer needed, you should consider removing it. We have already mentioned that field aliases are more lightweight than copying all the data, but they still consume a key resource – memory – in clustered state. Furthermore, they also appear in Kibana’s auto-complete tasks, unnecessarily adding complexity.

To remove the old field alias, simply remove migration.6_to_7.enabled (or set to false) from the BEATS configuration (such as filebeat.yml), perform the “install” operation again and overwrite the template.

Note that although the template is overwritten and no longer contains the alias, you still have to wait for the index to flip before the index map stops containing the alias. Then, when the index flips, you need to wait for the BEATS 7 data containing the alias to expire in the cluster before the alias disappears completely.

Build your own migration strategy

We have reviewed the support that Beats provides in helping you migrate BEATS data to ECS. In addition, we discussed other steps you can take to make your migration smoother.

You should independently evaluate the effort required for each data source. For the least important data sources, you can reduce the effort required accordingly.

Here are some criteria to consider when looking at each data source:

  • How long is your retention period and is it externally enforced? Can you delete data early during this migration?
  • Does the data need continuity? Or is it possible to convert directly? As mentioned above, this will help you determine if you need to backfill
  • How long will it take for your Beats 7 to launch? Do you need to modify the incoming BEATS 6 event?

If you’re going to backfill a lot of fields, you should check dev-tools/ecs-migration.yml in the BEats repository. This file lists all the field changes for BEATS 6 to migrate to 7.

Sample migration

In the rest of this blog post, we’ll take a step-by-step look at how to migrate to ECS by upgrading Beat from 6.8 to 7.2; How aliases help and what their limitations are; How to resolve conflicts; How to rebuild the index for the most recent data to aid migration, and how to modify BEATS 6 events that are still coming in. In this example, we’ll use the FileBeat Syslog module for illustration.

As we already mentioned, this example will not cover the full Elastic Stack upgrade process. Let’s assume that Elasticsearch and Kibana have been upgraded to version 7 so that we can focus on how to upgrade your data schema to ECS.

If you want to follow this example, use the latest versions of Elasticsearch 7 and Kibana 7. You can try out Elastic Cloud accounts for free or run them locally by following the installation instructions for Elasticsearch and Kibana.

Run Beats 6.8

In this demonstration, we will run FileBeat 6.8 and 7.2 on the same machine. Therefore, it is important to install them using an archive installation (using.zip or.tar.gz). The archive installation is separate in its directory, which makes the process simple.

After running Elasticsearch and Kibana 7, install FileBeat 6.8. If you have a Windows system, you can experiment by installing WinLogbeat.

On most systems, Syslog uses the local time as its timestamp and does not specify the time zone. Next, we will configure FileBeat to add the time zone for each event through the add_locale handler, and then configure the system module pipeline to interpret the timestamp accordingly. This will ensure that we can verify the correctness of the ECS migration later when we look at the recently received events.

In filebeat.yml, find the section “Processors” and add the add_locale processor. Under the “Processors” section, add the following module configuration:

If you are running Elasticsearch and Kibana locally, the above configuration should be sufficient. If you are using Elastic Cloud, you will also need to add Cloud credentials to Filebeat.yml, both of which can be found in Elastic Cloud at the time of cluster creation:

Now let’s set FileBeat 6.8 to capture system logs:

Verify that data is coming in by looking at the dashboard called [FileBeat System] SysLog Dashboard. We should see the latest Syslog event generated on a system with FileBeat installed.

This dashboard is very interesting because it contains both visual and saved searches. This will be useful in demonstrating which field aliases are useful and their limitations.

Run BEATS 7 (ECS)

We should be aware that not all environments can be switched directly from one version of Beats to another. Events may be passed in simultaneously from FileBeat 6 and 7 over a period of time. So, we’ll show the same thing in this example.

To do this, we can simply run FileBeat 7.2 and 6.8 on the same system. Exextract FileBeat 7.2 in a different directory and apply the same configuration changes we applied on 6.8.

But don’t run the installation yet! For BEATS 7, we also need to enable migration Settings to create field aliases. Uncomment the line at the end of filebeat.yml:

Our 7.2 profile should now include this additional migration property,add_localeThe configuration of the processor, system modules, and (if required) our cloud credentials.

conflict

Before looking at the dashboard, let’s go straight to Kibana index management to confirm that a new index has been created and data is coming in. You should see something like the following:

In addition, we need to go to indexing mode and refresh FileBeat -* indexing mode. After flushing indexing mode for 6.8 and 7.2 data, there should be some conflicts. We can highlight conflicts by changing the data type selector on the right from _All field types_ (all field type) to _conflict_ (conflict) :

Let’s take a look at the two conflicts above and discuss how to resolve them.

First, let’s look at the syslog specific conflict: _system.syslog.pid_. Going to the index management page and looking at the map for 6.8, we can see that this field is indexed as _ keyword _. If we look at the 7.2 index mapping, we can see that System.syslog.pid is an alias for process.pid. That’s right, it’s not the cause of conflict. However, following the alias and looking at the datatype of process.pid, we can see that its datatype is now long_. Moving from _keyword to long causes our data type conflict.

Second, let’s look at the conflicts caused by incompatible fields. This field is common to all FileBeat migrations: the _source_ field. In FileBeat 6, _source_ is a keyword field that usually contains the path to the file (and sometimes the syslog source address). In the ECS and Filebeat 7 field mappings, _source_ becomes an object with nested fields that describe the source of network events (_source.ip_, _source.port_, and so on). Since there is still a field named Source in Beats 7, we cannot create an alias field there.

As part of the migration process, we have identified two fields that can be processed. We’ll discuss these two fields later.

The alias

Let’s move on to the Beats 6 [FileBeat System] Syslog dashboard. Since the filebeat-* indexing mode has changed since the first time this dashboard was loaded, let’s reload the entire page using command-r or F5.

In the new TAB, open a new dashboard _[FileBeat System] Syslog dashboard ECS_.

Looking at the saved searches at the bottom of the 6.8 dashboard, we can see the gaps in the data. Some events have System.syslog. program and System.syslog. message values, while others do not. Opening up the events without a value, we can see that they are the same Syslog events received by 7.2, but with different field names. Looking at the same time period in the tabs of the ECS dashboard, we can see the opposite. The ECS fields process.name and _message_ are filled out for the 7.2 event, but not for the 6.8 event.

This is a concrete way in which field aliases don’t help us. Saved searches depend on the content of the document, not the index map. As we already mentioned in the overview, if you need continuity, rebuilding the index to backfill (and changing the events as they come in) will solve this problem. We’ll do that in a moment.

Now let’s look at where field aliases can help us. Look at the visualization of the circle diagram in the 6.8 dashboard and hover the mouse over the outer ring to display the value of System.syslog. Program:

You can filter messages generated by a program by clicking a part of the ring. Let’s just select the filter on the program name:

We have just added a filter — _system.syslog.program_ — to fields that no longer appear in 7.2. However, we can still see these two sets of messages in the saved search:

If we examine the 7.2 elements, we can see that the filter was successfully applied to those elements as well. This confirms that the filter on _system.syslog.program_ can use our 7.2 data due to the system.syslog.program alias.

Note that the visualization supported by Elasticsearch aggregation also correctly displays 6.8 and 7.2 results on the migrated System.syslog. Program fields.

Returning to the 7.2 dashboard, without active filters, we can see 6.8 and 7.2 data at the same time. However, if we apply the same filter as 6.8, we see a different situation. Filter process.name:iTunes now only returns 7.2 events. The reason is that the 6.8 index does not have a field named process.name and does not have an alias for that name.

Restore the index for smooth migration

We have discussed how rebuilding indexes helps with three different aspects of migration: resolving data type conflicts, resolving incompatible fields, and backfilling ECS fields to maintain continuity. Now let’s look at a separate example.

The following is how we modified BEATS 6 data:

  • Data type conflict: Willsystem.syslog.pidThe data type fromkeywordChange tolong
  • Incompatible fields: will FilebeatsourceCopy the contents of the field tolog.file.pathAfter, delete it. This eliminates conflicts with the source field set of ECS. Note that Beats 6.6 and later are already populated with the same valueslog.file.path“, but that wasn’t the case with earlier versions of Beats 6, so we’ll copy it as we see fit.
  • Using valuesystem.syslog.processBackfill the ECS field _process.name_.

Here’s how we will implement these changes:

  • We will modify the FileBeat 6.8 index template to use the new data types and add and remove field definitions
  • We will create a new collection pipeline to modify the 6.8 event by deleting or copying the field
  • We will use the _Simulate API to test the pipeline
  • We will use the pipe to rebuild the index for the past data
  • We will also append a call to this new pipeline to the end of the FileBeat 6.8 acquisition pipeline to modify events as they come in

Index Template Change

The data type improvements need to be performed in the index template and take effect when the index is flipped. By default, the index flips the next day. If you used index life cycle management (ILM) in 6.8, you can force a rollover using the rollover API.

Displays the current index template in the Kibana development tool:

Index templates cannot be modified – they must be completely overwritten (documents). Prepare a PUT API call using the entire index template, tweaking something in it:

  • deletesource(All the following are defined by-Opening line)
  • forprogram.nameAdd a field definition
  • The fieldsystem.syslog.pidChange the type oflong

Once the API call body is ready, it can be executed to override the index template. If you plan to backfill a large number of ECS fields, check out the sample ECS Elasticsearch template in the ECS Git repository.

Rebuilding the index

The next step is to write a new capture pipeline to modify our BEATS 6.8 event. In our example, System.syslog.program is copied to process.name_, _source is copied to log.file.path_ (unless it is already populated), and the _source field is removed:

We can test this pipeline using the _Simulate API with fully populated events, but here is a simpler test that would be better suited for use in a blog post. You’ll notice that one event is populated with _log.file.path_ (Beats 6.6 and later), and the other is not (6.5 and before) :

The response to the API call contains two modified events. We can confirm that our pipe is working properly because the source field is missing and the values of both events are stored in log.file.path.

We can now perform a rebuild index operation on the indexes that no longer receive writes (for example, yesterday’s and before), using this collection pipe for each FileBeat index that we are migrating. Be sure to read the _reindex documentation to learn how to rebuild indexes in the background, limit rebuild index operations, and so on. Here is a simple rebuilding index example for the few events we have:

If you are tracing and only have today’s indexes, feel free to try the API call and check the mapping for the migrated indexes. But then don’t drop today’s index, it will just be recreated because FileBeat 6.8 is still sending data.

Otherwise, once the inactive index is re-indexed, we can confirm that the new index has all the fixes we need, and then drop the old index.

Modify the incoming event

Most Beats can be configured to send the capture pipe directly into their ElasticSearch output (and the same is true for Logstash’s ElasticSearch output). Since we are using the FileBeat module (already using the capture pipe) for demonstration in this demo, we will have to modify the pipe of the module.

The acquisition pipeline to be processed, installed by the FileBeat 6.8 installer, is called _filebeat-6.8.1-system-syslog-pipeline_. All we need to do here is add an annotation to our own pipe at the end of the FileBeat Syslog pipe.

The following shows the pipeline we are about to modify:

Next, we’ll prepare the API call to override the pipe by pasting the complete pipe below the PUT API call. Then we will add a “pipeline” handler at the end to call our new pipeline:

After this API call is made, all incoming events will be modified to better match the ECS before indexing.

Finally, we can use _update_by_query to modify documents in the live index immediately before we modify the pipe. We can identify documents that still need to be updated by looking for documents that still have the source field:

Verify the conflict

When all conflicting indexes are dropped, only the re-indexed indexes are left. We can refresh the index mode to confirm that the conflict has been resolved. We can go back to the FileBeat 7 dashboard and see if the 6.8 data in it is now more useful because we backfilled the process.name field:

In our example, we have backfilled only one field. Of course, you can backfill as many fields as you want.

Post-migration cleanup

Your migration may involve modifying custom dashboards and applications that are using BEATS events through the API to use the new ECS field names.

After fully migrating to Beats 7 and no longer using field aliases, we can remove them, as we mentioned earlier, to better save memory. To remove the alias, let’s remove the migration.6_to_7.enabled property from FileBeat. YML and then overwrite the FileBeat 7.2 template with the following:

As with the previous changes to the FileBeat 6.8 template, the new template with no aliases will take effect the next time the FileBeat 7.2 index is flipped.

conclusion

In this article, we discussed the steps required to migrate data to ECS in the BEATS environment. We discussed the benefits and limitations of the upgrade process. These limitations can be resolved by re-indexing past data, or even modifying current incoming BEATS 6 data during collection.

After an overview of the migration, we walked through the process of upgrading FileBeat’s System module from 6.8 to 7.2 with a step-by-step example. We looked at the differences between events in FileBeat 6.8 and 7.2, and then went into detail about all the steps users can take to re-index past data and modify it while it is still coming in.


To learn more about Elastic technology, please follow and sign up for the webinar. The upcoming schedule is as follows:

Wednesday, February 19, 2020 15:00-16:00 Building Omni-Observable Instances Using Elastic Stack

Wednesday, February 26, 2016 15:00-16:00 Kibana Lens Webinar

Wednesday, March 4, 2020 15:00-16:00 Elastic Endpoint Security Overview Network

Monitor website resources with Elastic Stack Wednesday, March 11, 2020 15:00-16:00