In conventional log collection solutions, the Client needs to install an additional Agent to collect logs, such as Logstash and FileBeat. Additional programs mean complex environment and resource occupation. Is there a way to collect logs without additional installation programs? Rsyslog is the answer!

Rsyslog

Rsyslog is a high-speed log collection and processing service. It features high performance, security, reliability, and modular design. It can receive log input from various sources (such as file, TCP, UDP, and Uxsock) and output the result to different destinations (such as: Mysql, mongodb, ElasticSearch, kafka, etc.), can process over a million logs per second.

Rsyslog is installed by default on all Linux distributions as an enhanced upgrade version of Syslog. No additional installation is required.

Collect Nginx logs

The flowchart for ELK collecting logs through Rsyslog is as follows:

  1. Nginx –syslog– Rsyslog –omkafka– Kafka — Logstash — Elasticsearch — Kibana
  2. Logs generated by Nginx are sent to Rsyslog server through syslog system service. After receiving logs, Rsyslog writes logs to Kafka through omkafka module. Logstash reads Kafka queue and writes Elasticsearch. You can use Kibana to retrieve Elasticsearch logs
  3. The Rsyslog service system does not require installation, so the client does not need to install additional applications during the process
  4. Although Rsyslog is installed on the server, omkafka module is not installed by default. If you want to write Rsyslog to Kafka, you need to install this module first
  5. The omkafka module is not supported until later than Rsyslog V8.7.0, so you need to pass it firstrsyslogd -vCommand to view the Rsyslog version. If the version is earlier, upgrade it

Rsyslog update

1. Add the key of the Rsyslog source

# apt-key adv --recv-keys --keyserver keys.gnupg.net AEF0CF8E
Copy the code

2. Add the Rsyslog source ADDRESS

echo "deb http://debian.adiscon.com/v8-stable wheezy/" >> /etc/apt/sources.list
echo "deb-src http://debian.adiscon.com/v8-stable wheezy/" >> /etc/apt/sources.list
Copy the code

3. Upgrade the Rsyslog service

# apt-get update && apt-get -y install rsyslog
Copy the code

Add the omkafka module

1. Install the compiler, which is required by autoreconf. Otherwise, the configure file cannot be generated

# apt-get -y install pkg-config autoconf automake libtool unzip
Copy the code

2. Omkafka needs to install a bunch of dependency packages

# apt-get -y install libdbi-dev libmysqlclient-dev postgresql-client libpq-dev libnet-dev librdkafka-dev libgrok-dev Libgrok1 libgrok-dev libpcre3-dev libtokyocabinet-dev libglib2.0-dev libmongo-client-dev libhiredis-dev
# apt-get -y install libestr-dev libfastjson-dev uuid-dev liblogging-stdlog-dev libgcrypt-dev
# apt-get -y install flex bison librdkafka1 librdkafka-dev librdkafka1-dbg
Copy the code

3. Compile and install omkafka module

# mkdir tmp && cd tmp

# git init
# git pull [email protected]:VertiPub/omkafka.git

# autoreconf -fvi
# ./configure --sbindir=/usr/sbin --libdir=/usr/lib --enable-omkafka && make && make install && cd ..
Copy the code

Rsyslog Collects Nginx logs

Nginx configuration on the Client
log_format  jsonlog '{'
    '"host": "$host",'
    '"server_addr": "$server_addr",'
    '"http_x_forwarded_for":"$http_x_forwarded_for",'
    '"remote_addr":"$remote_addr",'
    '"time_local":"$time_local",'
    '"request_method":"$request_method",'
    '"request_uri":"$request_uri",'
    '"status":$status,'
    '"body_bytes_sent":$body_bytes_sent,'
    '"http_referer":"$http_referer",'
    '"http_user_agent":"$http_user_agent",'
    '"upstream_addr":"$upstream_addr",'
    '"upstream_status":"$upstream_status",'
    '"upstream_response_time":"$upstream_response_time",'
    '"request_time":$request_time'
'} ';


access_log syslog:server=rsyslog.domain.com,facility=local7,tag=nginx_access_log,severity=info jsonlog;
Copy the code

1.Nginx does not support syslog processing until later than V1.10. Make sure your Nginx version is later than 1.10

2. In order to reduce the logstash processing pressure and reduce the complexity of the entire configuration, our nginx logs are directly in JSON format

3. Discard text files to record Nginx logs and use syslog to directly transfer logs to the remote Rsyslog server for subsequent processing; Another very important benefit of this is that we no longer have to worry about splitting and deleting nginx logs on a regular basis (we usually use the Logrotate service to split logs on a daily basis and delete logs on a regular basis for ease of management so that the disk is not full).

4. Access_log is directly output to the syslog service. The parameters are described as follows:

  • Syslog: indicates that logs are received by the syslog service
  • Server: IP address of the Rsyslog server for receiving and sending syslogs. The default port number is 514 and the udp protocol is used
  • Facility: Specifies the type of logging message, such as authentication type auth, scheduled task cron, program custom local0-7, etc., which does not have any special meaning. The default value is local7
  • tag: Add a tag to the log. The purpose of adding a tag to the log is to distinguish which service or client is sending the log. For example, here we give a tag:nginx_access_logIf multiple services write logs to Rsyslog at the same time and are configured with invalid tags, the Rsyslog server can identify which nginx logs are based on the tag
  • Severity: Defines the log level, such as DEBUG, INFO, notice, etc. The default is Error
Rsyslog configuration on the Server
# cat /etc/rsyslog.d/rsyslog_nginx_kafka_cluster.conf 
module(load="imudp")
input(type="imudp" port="514")

# nginx access log ==> rsyslog server(local) ==> kafka
module(load="omkafka")

template(name="nginxLog" type="string" string="%msg%")

if $inputname= ="imudp" then {
    if ($programname= ="nginx_access_log") then
        action(type="omkafka"
            template="nginxLog"
            broker=["10.82.9.202:9092"."10.82.9.203:9092"."10.82.9.204:9092"]
            topic="rsyslog_nginx"
            partitions.auto="on"
            confParam=[
                "socket.keepalive.enable=true"
            ]
        )
}

:rawmsg, contains, "nginx_access_log" ~
Copy the code

1. Add a configuration file for nginx logs to the rsyslog.d directory

2. The important configuration of the rsyslog configuration file is explained as follows:

  • Here we need to load imUDP module to receive syslog data from nginx server, also need to load omkafka module to write logs to Kafka
  • Input: Udp port 514 or TCP can be enabled at the same time. The two can coexist
  • Template: Define a template, named nginxLog, in the template can define the log format, because we already pass json, do not need to match the format, so do not define additional, note that the template name must be unique
  • action: InputName is matchedimudpAnd programname fornginx_access_logThe omkafka module writes matching logs to the Kafka cluster. For more details on omkafka, refer to the omkafka module documentation
  • :rawmsg, containsThe last line means to ignore the includenginx_access_logBy default, the rsyslog service logs all logs to a message. We have already exported logs to Kafka, so there is no need to log locally

3. The omkafka module checks whether a Topic exists in Kafka. If no topic exists, the omkafka module creates a topic

Server side Logstash configuration
input {
    kafka {
        bootstrap_servers => "10.82.9.202:9092,10.82. 9.203:9092,10.82. 9.204:9092"
        topics => ["rsyslog_nginx"]
    }
}

filter {
    mutate {
        gsub => ["message"."\\x"."\\\x"]
    }

    json {
        source= >"message"
    }

    date {
        match => ["time_local"."dd/MMM/yyyy:HH:mm:ss Z"]
        target => "@timestamp"
    }

}

output {
    elasticsearch {
        hosts => ["10.82.9.205"."10.82.9.206"."10.82.9.207"]
        index => "rsyslog-nginx-%{+YYYY.MM.dd}"}}Copy the code

Important parameters are described as follows:

  • Input: Sets the kafka cluster address and topic name
  • filter: Some filtering policies, because kafka is passed in JSON format, do not need additional processing. The only thing to note is that if there is Chinese content in the log, such as the URL, you need to replace it\\xOtherwise, an error will be reported in the JSON format
  • Output: Sets the ADDRESS and index of the ES server cluster. The index is automatically divided by day
Alignment test

After the configuration is complete, restart the rsyslog service and nginx service respectively to generate logs when accessing nginx

1. Check whether Kafka generates topics properly

# bin/kafka-topics. Sh --list --zookeeper 127.0.0.1:2181
__consumer_offsets
rsyslog_nginx
Copy the code

2. Check whether the Topic can receive logs properly

# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic rsyslog_nginx
{"host": "domain.com"."server_addr": "172.17.0.2"."http_x_forwarded_for":"58.52.198.68"."remote_addr":"10.120.89.84"."time_local":"28/Aug/2018:14:26:00 +0800"."request_method":"GET"."request_uri":"/"."status": 200,"body_bytes_sent": 1461,"http_referer":"-"."http_user_agent":"Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"."upstream_addr":"-"."upstream_status":"-"."upstream_response_time":"-"."request_time": 0.000}Copy the code

3. Add index to Kibana and check if there is any data in Elasticsearch. If the first two steps are normal, kibana does not find index or there is no data in index

Kibana query display

  • Open Kibana add Index of rsyslog-nginx-* and select TIMESTAMP to create Index Pattern

  • After entering the Discover page, you can see the changes in the number of requests at each time point intuitively. Simple filtering can be implemented according to the left Field. For example, if we want to view all urIs with 404 access status, we can click add after request_URI and status, and these two items will appear on the right side. Click the plus sign after the 404 status code below status to view only the requests in the 404 status. Click the auto-refresh button above to set the automatic page refresh time

  • A query can be used to Visualize a number of requirements, including requests per second, bandwidth usage, exception ratio, slow response, TOP IP, TOP URL, etc., and you can use the Visualize Visualize icon to Visualize a Dashboard

Write in the last

  1. Nginx access log is absolutely a treasure of the website, through the change of log volume can know the traffic situation of the website, through the analysis of the status can know the reliability of the service we provide, through the tracking of specific activity URL can know the popularity of the activity in real time, Through the combination of some conditions can also provide advice and help to the website operation, so as to make our website more friendly and easier to use
  2. The single point problem of the Rsyslog service can be ensured by deploying multiple Rsyslog services over the three-layer load to ensure high availability. However, in our experience, the Rsyslog service is very stable. It has been running for more than a year, and the log processing capacity is about 20W per minute without any downtime. If you don’t want to be so complicated, you can write a check rsyslog service status script running background, hang up automatically pull up
  3. In the whole process, we used UDP protocol, firstly, because the syslog mode of Nginx log is supported by UDP protocol by default, and I did not find TCP support after checking the official website. I think this is also considering that UDP protocol performance is much better than TCP. The second consideration is that if TCP is used in the case of network instability may not stop retry or wait, affecting the stability of Nginx. The problem of content exceeding Ethernet data frame length is not encountered at present

If you find this article helpful to you, please share it with more people. If you’re not enjoying your reading, read the following:

  • The principle and application of ELK system in production environment
  • General application log access scheme for ELK log system
  • Kafka Group is used to realize high availability of Logstash under ELK architecture