The purpose of this article is to collect all commands executed on the command line. In addition to sending all commands to Elasticsearch for saving, you also need to generate alarms for sensitive commands.

At the heart of this is the PROMPT_COMMAND environment variable, which is used to execute the variable as a command before the shell command input prompt appears.

So we can define this variable as a command and see what it does:

# export PROMPT_COMMAND="date '+%F %T'"The 2019-04-23 11:25:53# Execute the date command before the following prompt appears
# a
-bash: a: command not found
2019-04-23 11:25:56 # Reappear
# v
-bash: v: command not found
2019-04-23 11:25:58 It appears every time the command prompt appears
Copy the code

This is equivalent to PROMPT_COMMAND being executed every time a command is executed. With this foundation in place, we can have it collect all the commands executed by the user.

The read command

First, because PROMPT_COMMAND is executed before the last command ends, histroy 1 can be used to obtain the last command executed. But since the output of this command is preceded by the command sequence number, we need to remove it.

Thus, the result of PROMPT_COMMAND in the first version is:

# export PROMPT_COMMAND="history 1 | { read _ cmd; echo \$cmd; }"
Copy the code

It looks complicated, but it’s actually quite simple:

  • histroy 1The result is passed to braces to remove the ordinal number before the command;
  • Curly braces open an anonymous function that treats the two commands as a whole, but it doesn’t open a subshell;
  • Read is an internal shell subcommand that uses Spaces as delimiters.

We normally use read to read keyboard input, but it also helps us remove the ordinal number before a command. Since we are defining two variables here, read splits the input with a space once, giving the first part to the variable _ and the other part to the variable CMD.

Obviously, the sequence number is given _, and then we echo $CMD to get the last command executed. $CMD = $CMD = $CMD = $CMD = $CMD = $CMD = $CMD

You can test read directly from the command line:

# read x y z
23232 xxxxx ewewewe ssssss zzzzz
# echo "$x | $y | $z"
23232 | xxxxx | ewewewe ssssss zzzzz
Copy the code

Collect relevant information

It is useless for us to collect the history command. We should also collect the following information:

  • Time when the command is executed
  • Directory where the command is executed
  • The current user who runs the command
  • Logged-in user (may switch to another user after logged-in)
  • User’s TTY (multiple shells may be open at the same time)
  • The login IP
  • Command executed

All of the above information needs to be obtained by executing commands, which together look like this:

date "+%F %T"; pwd; whoami; who -u am i | { read user tty _ _ _ _ ip; echo $user $tty $ip| tr -d "()"; }; history 1 | { read _ cmd; echo $cmd; }
Copy the code

Don’t rush to use it, you just need to see what the commands are and what they do, because this isn’t the final version.

If the HISTTIMEFORMAT environment variable is defined, the output of history might not be what we want. Therefore, we should set HISTTIMEFORMAT to null after users log in. To prevent users from changing it, you can set it to read-only. But once it is set to read-only, this value cannot be changed until the system restarts.

vim /etc/profile
export HISTTIMEFORMAT=""
readonly HISTTIMEFORMAT
Copy the code

The logger command

When we take that information, we can’t just export it, we can store it in a file. This can be problematic if you append it directly to a file. There must be more than one user on the operating system, and different users can execute commands. What should the owner owner group of the log file be changed to? Do you want to cut log files? Do you want to delete it? These are all things to consider.

Although we end up exporting to a file, direct append is simple but hard to control. The best way is to use the logger command to output Rsyslog and let Rsyslog write to the file for us. With the power of Rsyslog, we can do more with the log file.

The logger command will only use two options:

  • -p: Specifies the output infrastructure and log level.
  • -tSpecifies the tag

We need to modify the rsyslog configuration file to receive the logs we send to it and output them to the file. Here the log is output to /var/log/bashlog.

# vim /etc/rsyslog.d/bashlog.conf
local6.debug /var/log/bashlog
Copy the code

Check the Rsyslog configuration and restart:

# rsyslogd -N1
# /etc/init.d/rsyslog/restart
Copy the code

Then test it to see if /var/log/bashlog has what you want.

echo "hehe" | logger -t bashlog -p local6.debug
Copy the code

It would be complicated to assign all of these to the PROMPT_COMMAND variable, but we can define these commands in a file that is then assigned to PROMPT_COMMAND.

# vim /etc/collect_cmd.sh
echo `date "+%F_%T"; pwd; whoami; who -u am i | { read user tty _ _ _ _ ip; echo $user $tty $ip| tr -d "()"; }; history 1 | { read _ cmd; echo $cmd; }` | logger -t bashlog -p local6.debug

# chmod +x /etc/collect_cmd.sh
# export PROMPT_COMMAND="/etc/collect_cmd.sh"
Copy the code

Note that this is the only line in the script, without the #! /bin/bash, otherwise the history command is executed without any results, for unknown reasons.

Each time the end user executes a command on the command line, a line like this is added to the log file:

Apr 25 14:23:03 localhost bashlog: 2019-04-25_14:23:03 root root PTS /0 10.201.2.170 cat /etc/collect_cmd.shCopy the code

The date log, host name, and program name are all added automatically by Rsyslog, followed by what we send back.

Upgrade rsyslog

PROMPT_COMMAND is the core reason why we can collect commands executed by users. Although we have defined it now, it can be modified (even by ordinary users). Once a user logs in and executes unset PROMPT_COMMAND, all of your Settings will be useless. The best way to do this is to set this variable to read-only.

The logs are already collected in a file, but in order to send them to Elasticsearch later, I’m going to write them to a FILE in JSON format.

So how do we do that? Again via Rsyslog. However, the default version of rsyslog in CentOS6 is too low and has limited functionality, so you need to upgrade it to the latest version.

There is no risk to upgrade rsyslog. Our production environment has been upgraded to Rsyslog8 for more than two years without any problems.

You can download the yum repo file directly from the yum Update rsyslog website to upgrade to the latest version.

I have downloaded all the rsyslog related packages here, and then created a local YUM repository for Intranet machines to download and upgrade.

After the upgrade, you need to modify one line of configurations. Some configurations are incompatible:

Emerg: omusRMsg :*Copy the code

Restart after the modification is complete:

service rsyslog restart
Copy the code

Parsing the log

Rsyslog parses logs using the MMNormalize module. The parsed logs are in JSON format. This module uses Liblognorm to parse logs. The Liblognorm parsing syntax can be found in the official documentation.

Why parse to JSON format? The main reason is that Elasticsearch stores data in JSON format, so we can send the parsed data directly to Elasticsearch instead of using a Logstash resolution.

Download the module first:

yum install rsyslog-mmnormalize liblognorm5-utils
Copy the code

Liblognorm5-utils is used to check whether the parsing rule is correct.

When we send the executed command to Rsyslog, what we need to parse is the following, not including the time and date that rsyslog automatically adds as we saw above.

2019-04-30_14:01:45 /root root root PTS /0 10.201.2.170 vim heheCopy the code

It’s going to add a space at the beginning, and I don’t know where that space came from, so we’re going to reserve a space.

To use mmNormalize, you need to specify a parsing library. This parsing library follows the syntax of liblognorm5:

# vim /etc/bashlog.rb
version=2

The space after the # colon is the space mentioned above
rule=: %
    time:word
    The space between the percent sign is the space between the date and PWD commands
    % %
    directory:word
    % %
    exec_user:word
    % %
    login_user:word
    % %
    tty:word
    % %
    src_ip:ipv4
    % %
    command:rest
    %
Copy the code

This file is a parsing library for parsing the above content. Although it ends in RB, it has nothing to do with Ruby.

A brief explanation of what it does:

  • Version =2 must be on the first line, and the line must contain only these characters. No additional characters can be added. It means using the V2 engine, which is officially recommended, but not necessarily more feature-rich than v1, but we’ve had enough. If you don’t write it, or if you write it wrong, you’ll use the V1 engine;
  • The way to write the rule is thisrule=The colon after it:It’s used to split the tag, so you can put a tag between an equal sign and a colon. We don’t need the tag, but we have to put the colon;
  • The colon:It’s field parsing, using a percent sign for the field to be parsed%Wrapped up. We have a colon in the percent sign:The colon is preceded by the field name (object name in JSON), followed by the built-in field type of Liblognorm. The field type can be followed by parameters, using brackets{}Reference, but not used above, each field parameter is not the same, some have, some do not;
  • The percent sign allows Spaces and newlines so that it can be written on multiple lines instead of all on one line, which is nicer;
  • When parsing with Liblognorm, Spaces are one-to-one. If there are three Spaces between two fields, the parse rule must have three Spaces between two percent signs. What if you can’t determine the number of Spaces? Use the whitespace type;
  • Field types (only common ones are listed here, see the official documentation for more) :
    • Word: any character outside the space, that is, the match is terminated when a space is seen;
    • Whitespace: Matches all Spaces until the first non-space character is encountered. That is, it works well when there is more than one space;
    • Date-rfc3164: Time field of Rsyslog.
    • Ipv4: indicates an ipv4 address.
    • Rest: matches directly to the end of the line;
    • – : matches but does not display. It is generally used to discard fields, such as whitespace.

Test a handful of parsing libraries:

_14: # echo "2019-04-30 01:45 / root root root PTS / 0 10.201.2.170 vim hehe" | lognormalizer - r/etc/bashlog. Rb - e json
{ "command": "vim hehe"."src_ip": "10.201.2.170"."tty": "pts\/0"."login_user": "root"."exec_user": "root"."directory": "\/root"."time": "The 2019-04-30 _14:01:45" }
Copy the code

This is the result of parsing, with the only drawback being that the interpreter \ is preceded by /.

Rsyslog now saves parsed JSON data to a file with a simple configuration.

# vim /etc/rsyslog.d/bashlog.conf
# load module
module(load="mmnormalize")

template(name="all-json" type="list"){
  property(name="$!!! all-json")
  constant(value="\n") Without this line, parsed information will not be wrapped
}

if $syslogfacility-text == 'local6' and $syslogseverity-text == 'debug' then {
  action(type="mmnormalize" rulebase="/etc/bashlog.rb")
  action(type="omfile" File="/var/log/bashlog" template="all-json")}Copy the code

The previous content of the file can be deleted.

We first define a template that is used to parse a message and save the parsed JSON-formatted information in $! In the all-json variable, and then you can define the action and store it in a file, or in NoSQL.

The idea was to send it directly to Kafka/ElasticSearch, but considering that rsyslog will only send the message at that time, it will not be resent if it fails, so save it to a file and read the file via FileBeat and send it.

You can also define the owner group and file permissions on omfile. By default, the owner group is root and the permission is 600.

After restarting rsyslog, we can see the command we executed in /var/log/bashlog.

In order for PROMPT_COMMAND to take effect upon login, we can define it in /etc/profile and make it read-only.

vim /etc/profile
export PROMPT_COMMAND="/etc/collect_cmd.sh"
readonly PROMPT_COMMAND
Copy the code

To improve the

We can collect command line logs and parse them, but there are still problems. When you empty enter on the command line without typing anything, executing history 1 will get the last command executed, no problem. But if you were empty last time, your collection command will be empty this time and parsing will fail.

You’ll see something like this in the parsed log file:

{ "originalmsg": " 2019-05-01_14:19:19 \/home\/user1 user1 root pts\/0 10.201.2.170"."unparsed-data": "" }
Copy the code

Our resolution rule is that by default, there will be content after the login IP, and when there is no content, the resolution will fail. This happens when you use the su – command to switch to another user and then empty enter.

In this case, we should determine if the command entered by the user is null, and if so, exit the script. Therefore, our /etc/collect_cmd.sh file can be modified as follows:

cmd=`history 1 | { read _ cmd; echo $cmd; }`
[ -z "$cmd" ] && exit Exit when it is empty
echo `date "+%F_%T"; pwd; whoami; who -u am i | { read user tty _ _ _ _ ip; echo $user $tty $ip| tr -d "()"; }; echo $cmd` | logger -t bashlog -p local6.debug
Copy the code

One final point is for two environment variables, HISTTIMEFORMAT and PROMPT_COMMAND. To avoid parsing exceptions, it is best to make both variables read-only so that no user can modify their contents. If it is not set to read-only, users can reset it in their home directory. Bashrc or directly from the command line.

If you want to keep it in a separate file in /etc/profile.d, don’t put HISTTIMEFORMAT’s definition in it. Otherwise, other users will be prompted to change the read-only variable.

unset HISTTIMEFORMAT
readonly HISTTIMEFORMAT
export PROMPT_COMMAND="/etc/collect_cmd.sh"
readonly PROMPT_COMMAND
Copy the code