An open source real-time Web log analysis tool with interactive view interface!

preface

In Linux operating system, analyzing log files is a very headache, it records a lot of logs, for most novice and system administrators do not know how to start analysis, unless you have enough experience in analyzing log, that is, the Linux system expert.

There are many log analysis tools available on Linux. GoAccess is one of the tools that allows users to easily analyze Web server logs. In this article, we will cover GoAcccess in detail.

What is GoAccess?

GoAccess is an open source real-time Web log analyzer and interactive viewer, can be run in * NIx system terminal or accessed through a browser, it requires less dependence, written in C language, only ncurses, Support for Apache, Nginx, and Lighttpd logging provides efficient and valuable HTTP statistics for system administrators who need dynamic visual server reporting.

Why GoAccess?

GoAccess can parse the specified Web log file and output the data to the terminal and browser. The terminal based fast log analyzer mainly analyzes and views the statistics on the Web server in real time, without using the browser, and outputs the data in the terminal by default. Ability to combine complete real-time HTML reports as well as JSON and CSV reports.

GoAccess supports, but is not limited to, any custom log format, the combined log format in Apache/Nginx: XLF/ELF, and the common log format in Apache: CLF.

The function of GoAccess

  • Full real-time: the frequency with which all panels and metrics are scheduled to be updated every 200 ms on terminal output and updated every second on HTML output;
  • Support for almost any Web logging format: GoAccess allows any custom logging format string. Predefined options include Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, and more
  • Support for tracking application response time: Tracking the time it takes to process a request is useful when your site is slow;
  • Supports incremental log processing: Logs can be incrementally processed by the DISK B + Tree database.
  • Minimal configuration is required: you can run it only against access log files, select a log format and have GoAccess parse access logs and show you statistics;
  • Visitors: Determine the number of clicks, visitors, bandwidth, metrics, etc. of the slowest request by hour or date;
  • Metrics per virtual host: a panel showing which virtual host is consuming most Web server resources;
  • Customizable color matching: adjust according to your own color, via terminal or simply apply a style sheet to the HTML output;
  • Just one dependency: written in C and run with ncurses as a dependency;
  • Support for large data sets: one disk B + Tree store for large data sets cannot hold all memory;
  • Docker support: Docker images that can build GoAccess from upstream.

GoAccess Specifies the Web log format supported by default

  • Amazon CloudFront: Amazon CloudFront Web distributed system
  • AWSS3: Amazon Simple Storage Service (S3)
  • AWSELB: AWS elastic load balancing
  • COMBINED: COMBINED log format (XLF/ELF) Apache | Nginx
  • COMMON: COMMON Logging Format (CLF) Apache
  • Google Cloud Storage: Google Cloud Storage
  • Apache Virtual hosts: indicates the Apache virtual host
  • Squid Native Format: Squid Native Format
  • W3C: W3C (IIS) format

GoAccess Date format

  • Time – the format:parametertime-formatVariable is followed by a space to specify the log format date. The date containsRegular charactersandSpecial format specifierAny combination of. In order toPercentage (%) symbolAt the beginning. May refer to:man strftime.%Tor%H:%M:%S.

Note: the timestamp in milliseconds, %f must be used as the time format.

  • The date format: –parameterdate-formatVariable is followed by a space to specify the log format date. The date containsRegular charactersandSpecial format specifierAny combination of. In order toPercentage (%) symbolAt the beginning. May refer to:man strftime.

Note: The timestamp is in microseconds, so %f must be used as the date format.

  • Log format:The log format variable is followed by oneThe blank spaceor\t TAB delimiter, specifies the log format string.

The meaning of a special character

  • %x: Date and time fields that match the time format and date format variables. Use this method when a timestamp is used instead of putting the date and time in two separate variables;
  • %t: the time field that matches the time format variable;
  • %d: matches the date field of the date format variable;
  • %v: server name (server block or virtual host) set according to the specification name;
  • %e: user ID determined by HTTP validation when requesting a document;
  • %h: host (client IP address, IPv4 or IPv6)
  • % r:Client request line. This makes the request’s specific delimiters (single quotes, double quotes, and so on) resolvable. Otherwise, use a special format specifier, for example:%m.%U.%qand%HParse each field, available with%rGet the full request, also available%m.%U.%qand%HCombine your requests, but not at the same time;
  • %m: request method;
  • % U:Request URL path if the query string is in%UIs not required%q. ifThe URL pathDoes not contain any query string%q, the query string is appended to the request;
  • %q: query string;
  • %H: request protocol;
  • %s: status code that the server sends back to the client;
  • %b: size of the object returned to the client;
  • %R: HTTP request “Referer” value;
  • %u: the “UserAgent” value of the HTTP request;
  • %D: time (in microseconds) taken to process the request;
  • %T: time (in milliseconds) spent processing the request;
  • %L: the time (in decimal milliseconds) taken to process the request;
  • %^ : Ignore this field;
  • %~ : Moves the log string forward until a non-space (! Isspace) character;
  • ~ H: X-Forwarded-For (XFF) Specifies the host (IPv4 or IPv6 client IP address).

GoAccess Three storage options

  • Default hash table: Memory storage provides better performance, with the disadvantage of limiting the size of the dataset to the amount of physical memory available. By default, GoAccess will use an in-memory hash table. Data sets perform well if they are in memory. Because it has good memory usage and fairly good performance;

  • Tokyo Cabinet disk B+ tree: Use this storage method primarily for large data sets that cannot fit everything in memory. A B+ tree database is slower than any hash database because its data must be committed to disk. Thus using SSDS can greatly improve performance. Use this storage method if you need data persistence and statistics to load quickly later;

  • Tokyo Cabinet memory hash table: It is an alternative to the default hash table, uses generic types, and has average performance in terms of memory and speed;

Install GoAccess

The source code to install

Wget # https://tar.goaccess.io/goaccess-1.3.tar.gz # tar - XZVF goaccess - 1.3. Tar. Gz # CD goaccess - 1.3 / #. / configure --enable-utf8 --enable-geoip=legacy # make # make installCopy the code

Debian/Ubuntu system

# apt-get install goaccess
Copy the code

To obtain the latest GoAccess package, please use the official GoAccess repository as follows:

$ echo "deb https://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list $ wget  -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key add - $ sudo apt-get update $ sudo apt-get install goaccessCopy the code

Note:

  • To get on-disk support (Trusty +Wheezy +) Executable:
$ sudo apt-get install goaccess-tcb
Copy the code
  • .debOfficial library packages are available throughHTTPSObtain, here may need to install, can be executed:
$ apt-transport-https
Copy the code

RHEL/CentOS

# yum install goaccess
Copy the code

OS X / Homebrew

# brew install goaccess
Copy the code

Using GoAccess

Different output formats:

Output to terminal and generate interactive report:

# goaccess access.log
Copy the code

Generate HTML report:

# goaccess access.log -a -o report.html
Copy the code

Generate a JSON report:

# goaccess access.log -a -d -o report.json
Copy the code

Generate a CSV file:

# goaccess access.log --no-csv-summary -o report.csv
Copy the code

GoAccess provides great flexibility for real-time filtering and parsing. If you want to quickly diagnose problems by monitoring logs since goAccess was started:

# tail -f access.log | goaccess-
Copy the code

If you want to filter while keeping open pipes for real-time analysis, we can use tail -f and pattern-matching tools such as grep, awk, sed, etc

# tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -
Copy the code

Parse from the beginning of the file, keep the pipe open and apply the filter

# tail -f -n +0 access.log | grep -i --line-buffered 'firefox' | goaccess -o report.html --real-time-html -
Copy the code

Multi-log file output format:

Pass multiple log files to the command line:

# goaccess access.log access.log.1
Copy the code

Parsing a file from a pipe when reading a regular file:

# cat access.log.2 | goaccess access.log access.log.1-
Copy the code

Note: Single dashes are appended to the command line so that GoAccess knows it should read from the pipe. On Mac OS X, use gunzip -c instead of zcat.

Real-time HTML output format:

The process of generating a real-time HTML report is similar to the process of creating a static report, with only one parameter: –real-time HTML makes it look real.

# goaccess access.log -o /usr/share/nginx/html/site/report.html --real-time-html
Copy the code

In addition to the above three operations, it can also be used in combination with date, virtual host, file, status code and startup, server, please refer to its MAN page or help for more details.

# man goaccess or # goaccess --helpCopy the code

Matters needing attention

Each active panel has a total of 366 items, or 50 items in a live HTML report, the number of items can be customized using Max-items. However, only CSV and JSON outputs allow a maximum number greater than the default of 366 items per panel.

Using disk B + Tree to analyze the same log file twice –keep-db-files and –load-from-disk to use and on each run, GoAccess counts each entry twice.

Matches are the contents of the request access log, 10 requests = 10 matches. HTTP requests with the same IP, date, and user agent are treated as unique access.

Problems during installation

During the installation process, some problems will inevitably occur. For details, please refer to the following links:

1, www.cnblogs.com/zkfopen/p/1… 2, www.cnblogs.com/jshp/p/1014…

Reference

1, github.com/allinurl/go… IO /get-started 4, goAccess. IO /man 5, goaccess. IO /download

conclusion

Through this article introduces what is GoAccess, why to use GoAccess, GoAccess functions, GoAccess default supported Web log format, GoAccess date format, GoAccess special characters represent the meaning of GoAccess Three storage options, installation and use of GoAccess in different scenarios, I hope you can use this tool in your future work and solve some problems related to daily Web server logs.

Original is not easy, if you think this article is useful to you, please kindly like, comment or forward this article, because this will be my power to output more high-quality articles, thank you!