CSV is a very common data storage method. In the previous articles, we have used several methods to import CSV files into Elasticsearch. You can refer to this article:

  • Beats: Use Elastic Stack to analyze COVID-19 data and perform visual analysis
  • Logstash: Use Elastic Stack to analyze CSDN readings
  • Logstash: Import the Zipcode CSV file and Geo Search experience
  • Kibana: Use Data Visualizer to analyze CSV Data

Today we will use another approach to show how to import data in CSV format using the Logstash Dissect Filter.

 

The preparatory work

Elasticsearch and Kibana

We first installed our own Elasticsearch and Kibana. If you haven’t already, check out my previous post “Elastic: A Beginner’s Guide.”

Logstash

See my previous article “How to Install A Logstash in an Elastic stack” to install your own.

The CSV file

To illustrate, I created a simple CSV file as follows:

test.csv

"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"
Copy the code

We keep this file in the logstash installation root. Note here that each piece of data in CSV is separated by commas. In this file, it has no header, which means it is not formatted like this:

"Device_ID","Device_Location","Device_Owner","Device_Type"
"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"
Copy the code

Configuration Logstash

We can have our Logstash handle the CSV file above. We create the following configuration file:

logstash_dissect_csv.conf

input { stdin{} } filter { mutate { gsub => [ "message", "\"", "" ] } dissect { mapping => { "message" => "%{Device_ID},%{Device_Location},%{Device_Owner},%{Device_Type}" } } mutate {  remove_field => ["message"] } } output { stdout { codec => "rubydebug" } elasticsearch { index => "devices" } }Copy the code

As shown above, it takes an input from stdin and uses filters:

  • Mutate-gsub: Remove quotes from input message
  • Dissect: Extract corresponding fields. All fields are separated by commas
  • Mutate-remove_field: Remove the message field

Run Logstash

We can run Logstash in the following way:

cat test.csv | sudo ./bin/logstash -f ./logstash_dissect_csv.conf   
Copy the code

We can see the following output in the Logstash console:

It shows that our Logstash is working.

We checked in Kibana to see if we already had an index called Devices:

GET devices/_search
Copy the code

The command above displays the result:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 6, "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ":" devices ", "_type" : "_doc", "_id" : "qE3nanMB6PiWomqxb5U3", "_score" : 1.0, "_source" : {"Device_Owner" : "Engineering", "@timestamp" : "2020-07-20T06:26:58.645Z", "Device_Location" : "London", "host" : "liuxg", "Device_Type" : "Computer", "@version" : "1", "Device_ID" : "device1" } }, { "_index" : "devices", "_type" : "_doc", "_id" : "rU3nanMB6PiWomqxb5X4", "_score" : 1.0, "_source" : {"Device_Owner" : "Sales", "@timestamp" : "2020-07-20T06:26:58.656z ", "Device_Location" : "Winnipeg", "host" : "liuxg", "Device_Type" : "Computer", "@version" : "1", "Device_ID" : "device3" } }, { "_index" : "Devices", "_type" : "_doc", "_id" : "rE3nanMB6PiWomqxb5U5", "_score" : 1.0, "_source" : {" Device_Owner ": "Consulting", "@timestamp" : "2020-07-20T06:26:58.656z ", "Device_Location" : "Toronto", "host" : "liuxg", "Device_Type" : "Mouse", "@version" : "1", "Device_ID" : "device2" } }, { "_index" : "devices", "_type" : "_doc", "_id" : "qk3nanMB6PiWomqxb5U4", "_score" : 1.0, "_source" : {"Device_Owner" : "Engineering", "@timestamp" : "2020-07-20T06:26:58.657Z", "Device_Location" : "Barcelona", "host" : "liuxg", "Device_Type" : "Phone", "@version" : "1", "Device_ID" : "device4" } }, { "_index" : "devices", "_type" : "_doc", "_id" : "qU3nanMB6PiWomqxb5U4", "_score" : 1.0, "_source" : {"Device_Owner" : "Consulting", "@timestamp" : "2020-07-20T06:26:58.657z ", "Device_Location" : "London", "host" : "liuxg", "Device_Type" : "Computer", "@version" : "1", "Device_ID" : "device6" } }, { "_index" : "Devices", "_type" : "_doc", "_id" : "q03nanMB6PiWomqxb5U4", "_score" : 1.0, "_source" : {" Device_Owner ": "Consulting", "@timestamp" : "2020-07-20T06:26:58.657z ", "Device_Location" : "Toronto", "host" : "liuxg", "Device_Type" : "Computer", "@version" : "1", "Device_ID" : "device5" } } ] } }Copy the code

 

conclusion

In this article, we import CSV using a completely different approach. The Elastic Stack is very Elastic. We can use different methods to achieve the same effect. It’s true that all roads lead to Beijing.