In today’s post, I’m going to show you how to use Elastic Stack to analyze browsing data on your computer. The browser’s historical data is stored in a database called SQLite. All of the browsers below save Settings and history data to an SQLite database.

This data file can be read in a programming language into a file format that can be accessed by Elastic Stack to import the data into Elasticsearch and analyze it using Kibana.

The entire project source at the address github.com/liu-xiao-gu…

The preparatory work

To download the code from Github, use the following command:

git clone https://github.com/liu-xiao-guo/elastic-stack-browser-history
Copy the code

Then we go to the root directory of the project:

$PWD/Users/liuxg/python/elastic - stack - browser - history liuxg: elastic - stack - browser - history liuxg $tree - 2 L ├ ─ ─ LICENSE ├ ─ ─ the README. Md ├ ─ ─ docker - compose - ingest. Yml ├ ─ ─ docker - compose. Yml ├ ─ ─ elastic - stack │ └ ─ ─ the config ├ ─ ─ images │ ├ ─ ─ BrowserHistory. PNG │ ├ ─ ─ BrowserHistoryDashboard. PNG │ ├ ─ ─ brave - logo. PNG │ ├ ─ ─ chrome - logo. PNG │ ├ ─ ─ edge - logo. PNG │ ├ ─ ─ edge_logo. PNG │ ├ ─ ─ firefox - logo. PNG │ └ ─ ─ safari - logo. PNG └ ─ ─ scripts ├ ─ ─ __pycache__ ├ ─ ─ browser_history. Py └ ─ ─ local.pyCopy the code

We can see from the above that there is a subdirectory called scripts that contains a Python file called browser_history.py:

browser_history.py

import json import os import sqlite3 import time from local import LOG_DIR from pathlib import Path # Update the list based on the Browsers you use browsers = ['brave','chrome','firefox','safari','edge'] USER_PATH = str(Path.home()) def get_path(browser,browser_params): if browser == 'brave': BRAVE_PATH = "/Library/Application Support/BraveSoftware/Brave-Browser/Default/History" hist_path = USER_PATH + BRAVE_PATH hist_sql = "SELECT datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime') AS visit_date, url, title AS url_title FROM urls ORDER BY visit_date DESC" browser_params['browser_name'] = browser browser_params['hist_path'] = hist_path browser_params['hist_sql'] = hist_sql return browser_params elif browser == 'chrome': CHROME_PATH = "/Library/Application Support/Google/Chrome/Default/History" hist_path = USER_PATH + CHROME_PATH hist_sql = "SELECT datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime') AS visit_date, url, title AS url_title FROM urls ORDER BY visit_date DESC" browser_params['browser_name'] = browser browser_params['hist_path'] = hist_path browser_params['hist_sql'] = hist_sql return browser_params elif browser == 'edge': EDGE_PATH = "/Library/Application Support/Microsoft Edge/Default/History" hist_path = USER_PATH + EDGE_PATH hist_sql = "SELECT datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime') AS visit_date, url, title AS url_title FROM urls ORDER BY visit_date DESC" browser_params['browser_name'] = browser browser_params['hist_path'] = hist_path browser_params['hist_sql'] = hist_sql return browser_params elif browser == 'firefox': FIREFOX_PATH = "/Library/Application Support/Firefox/Profiles/qs6kpgry.default-release/places.sqlite" hist_path = USER_PATH + FIREFOX_PATH hist_sql = "SELECT datetime(moz_historyvisits.visit_date/1000000,'unixepoch','localtime') AS visit_date, moz_places.url AS url, moz_places.title AS url_title FROM moz_places, moz_historyvisits WHERE moz_places.id = moz_historyvisits.place_id ORDER BY visit_date DESC" browser_params['browser_name'] = browser browser_params['hist_path'] = hist_path browser_params['hist_sql'] = hist_sql return browser_params elif browser == 'safari': SAFARI_PATH = "/Library/Safari/History.db" hist_path = USER_PATH + SAFARI_PATH hist_sql = "SELECT datetime(visit_time + 978307200, 'unixepoch', 'localtime') AS visit_date, url, title AS url_title FROM history_visits INNER JOIN history_items ON history_items.id = history_visits.history_item ORDER BY visit_date DESC" browser_params['browser_name'] = browser browser_params['hist_path'] = hist_path browser_params['hist_sql'] = hist_sql return browser_params def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d def get_browser_hist(browser_params,timestr): bn = browser_params['browser_name'] # connect to the SQlite databases print("browser: " + bn) print("hist_path: " + browser_params['hist_path']) try: connection = sqlite3.connect(browser_params['hist_path']) except: print("Connection to SQLite: " + bn + " failed!" ) print("It may not be installed or not closed") return connection.row_factory = dict_factory cursor = connection.cursor() try: cursor.execute(browser_params['hist_sql']) except: browser_open = ("\t" + bn + "!! " +"is open. Please close your browser and retry.") print(browser_open.upper()) tables = cursor.fetchall() results = len(tables) print("{}: Results {}".format(bn, results)) print() if results >0: dict_bn = {"browser":bn} jsonFile = open(LOG_DIR + "/"+ bn + "-" + timestr + ".json","w") for v in tables: v.update(dict_bn) jsonFile.write(json.dumps(v) + '\n') def main(): timestr = time.strftime("%Y%m%d-%H%M%S") for browser in browsers: print("Starting {}".format(browser)) browser_params = {} path = get_path(browser,browser_params) if path: get_browser_hist(browser_params,timestr) main()Copy the code

The historical data path above is on my MacOS computer. You need to modify these paths according to your actual computer operating system. When I used Terminal to import data, I found that the permission of Terminal was not enough, so I need to do the following configuration:

We can go to scripts and type the following command:

$ python3 browser_history.py 
Copy the code

On my computer:

$ python3 browser_history.py 
Starting brave
browser: brave
hist_path: /Users/liuxg/Library/Application Support/BraveSoftware/Brave-Browser/Default/History
Connection to SQLite: brave failed!
It may not be installed or not closed
Starting chrome
browser: chrome
hist_path: /Users/liuxg/Library/Application Support/Google/Chrome/Default/History
chrome: Results 28647

Starting firefox
browser: firefox
hist_path: /Users/liuxg/Library/Application Support/Firefox/Profiles/qs6kpgry.default-release/places.sqlite
firefox: Results 9

Starting safari
browser: safari
hist_path: /Users/liuxg/Library/Safari/History.db
safari: Results 325

Starting edge
browser: edge
hist_path: /Users/liuxg/Library/Application Support/Microsoft Edge/Default/History
Connection to SQLite: edge failed!
It may not be installed or not closed
Copy the code

It says I have three browsers: Chrome, Safari and Firefox. For these browsers, use python scripts to import SQLite data into a.json file in the data\logs directory at the root of the project:

$ pwd
/Users/liuxg/python/elastic-stack-browser-history
liuxg:elastic-stack-browser-history liuxg$ ls 
LICENSE                   docker-compose-ingest.yml images
README.md                 docker-compose.yml        scripts
data                      elastic-stack
liuxg:elastic-stack-browser-history liuxg$ ls data/logs
chrome-20200819-160808.json  safari-20200819-160808.json
firefox-20200819-160808.json
Copy the code

As shown in the figure above, we can see that there are three production JSON files generated, one for each of the three browsers on my computer. In these three files, they contain historical data for each browser:

{"visit_date": "2020-08-19 16:04:37", "url": "https://github.com/liu-xiao-guo/elastic-stack-browser-history", "url_title": "liu-xiao-guo/elastic-stack-browser-history: Import your browser history into Elastic Stack", "browser": "chrome"}
{"visit_date": "2020-08-19 16:04:36", "url": "https://github.com/liu-xiao-guo/elastic-stack-browser-history/tree/main/scripts", "url_title": "elastic-stack-browser-history/scripts at main \u00b7 liu-xiao-guo/elastic-stack-browser-history", "browser": "chrome"}
{"visit_date": "2020-08-19 16:03:59", "url": "https://github.com/liu-xiao-guo/elastic-stack-browser-history/blob/main/scripts/browser_history.py", "url_title": "elastic-stack-browser-history/browser_history.py at main \u00b7 liu-xiao-guo/elastic-stack-browser-history", "browser": "chrome"}
{"visit_date": "2020-08-19 16:03:46", "url": "https://github.com/?q=brow", "url_title": "GitHub", "browser": "chrome"}
Copy the code

Their data is shown above. The daily records are as follows:

{
  "visit_date": "2020-08-19 16:04:37",
  "url": "https://github.com/liu-xiao-guo/elastic-stack-browser-history",
  "url_title": "liu-xiao-guo/elastic-stack-browser-history: Import your browser history into Elastic Stack",
  "browser": "chrome"
}
Copy the code

Obviously, the data above is not very structured. We need to use Logstash or other methods to structure the data, such as analyzing the domain name of each record and so on.

Start the Elastic Stack

In today’s exercise we’ll use Docker to install the Elastic Stack:

Elasticsearch
Kibana
Logstash

We can start all three dockers with the docker-comemage. yml file. In the root directory of the project, we’ll find a file called docker-comedy-ingest.yml:

docker-compose-ingest.yml

version: '2'

services:

  #Elasticsearch container
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
    environment:
      # - cluster.name=docker-pocket
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.type=single-node"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet

  #Logstash container
  logstash:
    image: docker.elastic.co/logstash/logstash:${ELASTIC_VERSION}
    volumes:
      - ./elastic-stack/config/logstash/logstash.yml:/usr/share/logstash/config/logstash.yml
      - ./elastic-stack/config/logstash/pipeline:/usr/share/logstash/pipeline
      - ./data/logs:/usr/share/data
    ports:
      - 5000:5000
      - 9600:9600
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    networks:
      - esnet

  #Kibana container      
  kibana:
    image: docker.elastic.co/kibana/kibana:${ELASTIC_VERSION}
    ports:
      - 5601:5601      
    networks:
      - esnet

volumes:
  esdata1:
    driver: local

networks:
  esnet:
Copy the code

The file above is very simple. We can see a variable ELASTIC_VERSION defined. This variable is defined in the.env file:

ELASTIC_VERSION = 7.8.0Copy the code

That said, we will use version 7.8.0 for the installation.

Since we are going to use Logstash to process our data and import it into Elasticsearch, as shown above, its configuration file is located:

./elastic-stack/config/logstash/pipeline
Copy the code

Definition:

logstash.conf

# Logstash Input Plugin # File: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html # JSON Codec: https://www.elastic.co/guide/en/logstash/current/plugins-codecs-json.html input { file { path => [ "/usr/share/data/*.json"] # sincedb_path => "/usr/share/data/sincedb" start_position => "beginning" codec => "json" } } # Logstash Filter Plugins # Grok: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html # Date: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html # Mutate: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html # Urldecode: https://www.elastic.co/guide/en/logstash/current/plugins-filters-urldecode.html filter { date { match => ["visit_date","yyyy-MM-dd HH:mm:ss"] timezone =>"America/Los_Angeles" target => ["@timestamp"] locale => "en" } if [url]  { grok { break_on_match => "false" match => { "[url]" => "%{URIPROTO:url_proto}://%{URIHOST:domain}%{URIPATH:url_path}%{URIPARAM:url_params}?" } keep_empty_captures => "false" }  if "." not in [domain] { if ":" in [domain] { mutate { split => ["domain",":"] add_field => ["site_domain", "%{[domain][0]}" ] } mutate { replace => { "domain" => "%{[domain][0]}" } } } else { mutate { copy => ["domain","site_domain"] } } } else { mutate { copy => {"domain" => "domain_copy"} } mutate { split => ["domain_copy","."] add_field => ["sub_domain", "%{[domain_copy][0]}" ] add_field => ["site_domain", "%{[domain_copy][-2]}.%{[domain_copy][-1]}" ] } mutate { # Join together whats left as the class name. remove_field => ["domain_copy"] } } urldecode { field => "url_path" } } } # Logstash Output Plugins # Elasticsearch: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html # StdOut: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-stdout.html output { elasticsearch { hosts => [ 'elasticsearch:9200' ] manage_template => "false" index => "logstash-browser" } stdout { } }Copy the code

With a Logstash handle, each document in the previous JSON file is extracted, structured, and imported into Elasticsearch. From the above we can see that the name of the index being imported is logstuck-browser.

We start the Elastic Stack as follows. In the project root directory:

docker-compose -f docker-compose-ingest.yml up
Copy the code

So we’re going to launch the Elastic Stack. If you haven’t downloaded images of 7.8.0 yet, you’ll need to be patient and wait for all the images you need to download. Elasticsearch, Kibana, and Logtash will now automatically start:

If all is well, you can view Kibana in your browser:

Hooray ! Our Elastic Stack is up and running.

Analyzing historical data

We first open Kibana and look at the imported index:

GET logstash-browser/_count
Copy the code

The number of documents that have been imported is shown above. We need to create an index pattern for this index. I won’t bother here. I’ve covered this in many of my previous tutorials.

We can open Discover to analyze the data:

Click the Add button above to Add the domain. We can add urls in the same way:

The url above is not clickable. We can change the format of the URL field as follows:

Select the URL and save. Let’s go back to the Discover screen:

This time, we can see that the link is clickable in the URL.

We can do the search we want in Discover.

visualization

We can use the visualization tools provided by Kibana to analyze our data visually.

The above shows the browser I used. I mainly use Chrome Browser:

A few domains I use most often.

Some of the websites I visit most often.

In the project, there is a file like this:

$ pwd
/Users/liuxg/python/elastic-stack-browser-history
liuxg:elastic-stack-browser-history liuxg$ ls elastic-stack/config/kibana/BrowserHistory-Kibana.ndjson 
elastic-stack/config/kibana/BrowserHistory-Kibana.ndjson
Copy the code

This file is a dashboard file that has been made. We can import it directly in Kibana:

Select the browserHistory-Kibana.ndjson file in our project and select Import:

We see the picture above. Let’s go to Dashboard:

We will see the statistics of all the visits.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elastic: Use the Elastic Stack to analyze browser history data

The preparatory work

Start the Elastic Stack

Analyzing historical data

visualization

Elastic: Use the Elastic Stack to analyze browser history data

The preparatory work

Start the Elastic Stack

Analyzing historical data

visualization

Related Posts

One article is all it takes to figure out SpringBoot’s auto-injection principles

Flink On Yarn architecture /Job startup process/deployment mode

Jacoco – based incremental coverage implementation and landing five