preface

Linux has many tools to do text processing, such as: sort, cut, split, join, paste, Comm, UNIq, Column, rev, TAC, tr, NL, PR, head, tail….. The lazy way to learn Linux text processing (not the best way) might be to just learn grep, sed, and awk.

With these three tools, you can solve nearly 99% of the text processing problems of Linux systems without having to remember the different commands and parameters above. The picture

And, if you’ve learned and used all three, you’ll know the difference. In practice, the difference here means which tools are good at solving which problems.

A lazier approach might be to learn a scripting language (Python, Perl, or Ruby) and use it for every text processing.

An overview of the

Awk, grep, sed are three powerful tools of Linux operation text, and also one of the Linux commands that must be mastered.

Awk is the most powerful, but also the most complex, of the three, all of which deal with text in different ways. Grep is more suitable for simply searching or matching text, sed is more suitable for editing matched text, awk is more suitable for formatting text, and more complex format processing of text.

In a nutshell:

Awk: slice data sed: modify data grep = global regular expression print

In its simplest terms, grep(global regular expression print)– the command is used to find a string in a file that matches the criteria. Starting at the first line of the file, grep copies the line into buffer, compares it to the search string, and prints the line to the screen if the comparison passes. Grep repeats this process until the file searches for all lines.

Note: There is no process running grep to store rows, change rows, or search only partial rows.

Sample data file

Please cut and paste the following data into a file called “sampler.log” :

Boot Book Booze Machine boots Bungie Bark Aardvark broken$tuff Robots is a simple example

The simplest example of grep is:

Grep “boo” sampler.log In this example, grep will walk through each line of the file “sampler.log” and print out that each line contains the word “boo” :

Boot book Booze Boots But this can happen if you’re working with large files: you might be more useful if the lines identify which lines in the file and what they are, and if you need to open the file in the editor, it’s easier to keep track of specific strings to make some changes. This can be done by adding the -n argument:

Grep -n “boo” sampler.log This yields a more useful result explaining which lines match the search string:

1 :boot 2: Book 3: Booze 5 :boots Another interesting argument is -v, which prints the opposite result. In other words, grep prints all lines that do not match the search string, rather than the lines that do.

In the following cases, grep prints each line that does not contain the string “boo” and displays the line number, as shown in the previous example

Grep -vn “boo” sampler.log 4 :machine 6: Bungie 7: Bark 8 :aardvark 9 :broken$tuff 10 :robots c option tells grep to suppress the printing of matching lines, Only the number of rows that match the query is displayed. For example, the number 4 will be printed below because four of them appear as “boo” in sampler. Log.

The grep -c “boo” sampler.log 4 l option prints only the filename string of files in the query that have lines matching the search. This is useful if you want to search for the same string in multiple files. Like this:

Grep -l “boo” * A more useful option for searching non-code files is -i, ignoring case. This option handles case equality when matching the search string. In the example below, lines containing “boo” are printed even if the search string is capitalized.

Grep -i “BOO” sampler.log boot book Booze boots x option matches exactly only. In other words, the following command search yields no results because none of the lines contain only “boo”

Grep -x “boo” sampler.log Finally, -a allows you to specify extra lines up and down the file, so you get extra lines in the search string, for example

Grep -a2 “Mach” sampler. Log Machine boots Bungie regular expression

Regular expressions are a compact way to describe complex patterns in text.

With grep you can use search patterns. Other tools use regular expressions (Regexps) in complex ways. Grep, on the other hand, uses a plain string, which is actually a very simple regular expression. If you use wildcards such as’ * ‘or’? ‘, such as listing file names, you can use grep to search using basic regular expressions

For example, search for lines ending with the letter E in a file:

Grep “e$” sampler.log Booze machine Bungie must use grep -e if more extensive regular expression commands are required.

For example, regular expression commands? Will match characters occurring before 1 or 0 times:

grep -E “boots?” Sampler. The log boot boots, you can also use the pipe (|) combined with multiple search, it means “or”, so you can do it:

The grep -e “boot | boots” sampler. The log boot boots special characters

What if you want to search for a special character? If you want to find all the lines, you can’t do grep “” if it contains the dollar character” “, you can’t do grep “”, you can’t do grep” “a_file because ‘$’ will be interpreted as a regular expression, and instead you’ll get all the lines with anything as a line ending, all the lines. The solution is to “escape” the symbol, so you will use it

Grep ‘$’ sampler. Log broken$tuff You can also use the ‘-f’ option, which stands for ‘fixed string’ or ‘fast’ because it only searches for strings, not regular expressions.

More examples of regexp

Reference: gnosis. Cx/publish/pro…

AWK

Text mode scanning and processing language created by Aho, Weinberger and Kernighan.

AWK is very complex, so this is not a complete guide, but should give you an idea of what AWK can do. It is relatively simple to use and highly recommended.

AWK basics

The AWK program operates on each line of the input file. It can have an optional BEGIN{} section that executes the command before processing any content of the file, then the main {} section that runs on each line of the file, and finally an optional END{} section operation that executes the file after the read is complete:

The BEGIN {… Initialization awk commands… } {… Awk Commands for each line of the file… {} to END… Finalization awk commands… } For each line of the input file, it looks to see if there are any pattern-matching instructions, in which case it only runs on the line that matches that pattern, otherwise it runs on all lines. These ‘pattern-matching’ commands can contain the same regular expressions as grep.

The AWk command can do some very complex math and string manipulation, and AWk also supports associative arrays. AWK treats each line as consisting of multiple fields, each separated by a “spacer.” By default, this is one or more space characters, so the line:

This is a line of text contains 6 fields. In AWK, the first field is called 1, the second field is called 1, the second field is called 1, the second field is called 2, and so on, and the entire row is called $0.

The field separator is set by the AWK internal variable FS, so if you set FS= “: “it will split a line based on the position of ‘:’, which is useful for files like /etc/passwd. The other useful internal variable is NR, the current record number (that is, the line number). NF is the number of fields in the current line.

AWK can operate on any files, including STD – in, in this case, it is usually used with ‘|’ command, for example, combined with the grep or other orders.

For example, if I list all the files in the current directory

Ls -l Total usage 140 – RW-r –r– 1 root root 55121

On January 3

17 : 03 combined_log_format.log -rw-r–r– 1 root root 80644

On January 3

17 : 03 combined_log_format_w_resp_time.log -rw-r–r– 1 root root

71

On January 3

17:55 sampler. Log ‘I can see file size reports as 3 columns of data. If I want to know their total size, the files in this directory I can do:

Ls -l | awk ‘BEGIN {sum = 0} {sum = sum + 5} END {print the sum}’ 135836 note, ‘print the sum to print the value of the variable sum, So if sum = 2 then ‘print sum’ will give output ‘2’ and ‘print sum’ will print ‘1’ because the second field contains the value ‘1’.

Therefore, it would be easy to write an AWK command that calculates the mean and the standard deviation of a list of numbers – accumulating the ‘SUMx’ and ‘SUMx2’ sections in the main internals, and then calculating the mean and standard deviation of the END section using the standard formula.

AWK supports (‘for’ and ‘while’) loops and branches (using ‘if ‘). So, if you want to prune a file and only do it on each line 3, you can do this:

ls -l | awk ‘{for (i=1; i<3; i++) {getline}; print NR,$0}’ 3 -rw-r–r– 1 root root 80644

On January 3

17 : 03 combined_log_format_w_resp_time.log 4 -rw-r–r– 1 root root

71

On January 3

17:55 sampler.log for loops through the file using the “getline” command, printing only one line every three times.

Note that since the file line number is 4 and is not divisible by 3, the last command completes ahead of time, so the last “print $0” command prints line 4, which you can see we also printed, using the NR variable to print the line number.

AWK mode matching

AWK is a line-oriented language. First the pattern, then the action. Action statements are enclosed in {and}. Patterns may be missing, or actions may be missing, but, of course, not all. If a pattern is missing, an operation is performed on each input record. A missing action prints the entire record.

AWK patterns include regular expression (using the same as the “grep – E” grammar) AND the use of a combination of special symbol “&” said “logic” AND “, “| |” or “logic”, said the “!” It is logical not to.

You can also do relational patterns, pattern groups, scopes, and so on.

AWK control statement

if (condition) statement [ else statement ] while (condition) statement do statement while (condition) for (expr1; expr2; Expr3) Statement for (var in array) statement break continue exit [expression] AWK input/output statement

Note: The printf command allows you to specify output formats more closely using C-like syntax. For example, you can specify integers, floating-point numbers, or strings of a given width.

AWK mathematical functions

AWK string function

AWK command line and usage

You can use the ‘-v’ flag as many times as necessary to pass variables to awK programs, for example

awk -v skip= 3

‘{for (i=1; i

cat awk_strip #! /usr/bin/awk -f #only print out every 3rd line of input file BEGIN {skip= 3 } { for (i= 1 ; i

chmod u+x awk_strip ./awk_strip sampler.dat sed = stream editor

Sed performs basic text conversion single pass streams on input streams (files or input from pipes), so it is very efficient. However, SED’s ability to filter text through pipes is particularly different from other types of editors.

Sed based

Sed can be used on the command line or in shel L scripts to non-interactively edit files. Perhaps the most useful feature is to “search and replace” one string into another. You can embed the sed commands on the command line that calls sed with the ‘-e’ option, or put them in a separate file ‘sed.in’ and call sed with the ‘-f sed.in’ option. The latter option is most commonly used if the sed command is complex and involves a lot of regexp, for example:

sed-e’s/input/output/’sampler.log

Echo each line from sampler.log to standard output, changing the ‘input’ of each line to ‘output’. Note that sed is line-oriented, so if you want to change every event in every row, then you need to make it a ‘greedy’ search and replace, as follows:

sed -e ‘s/input/output/g’ sampler.log boot book booze machine boots bungie bark aardvark broken$tuff robots /… The expression in/can be a literal string or a regular expression. Note that by default, the output is written to stdout. You can redirect it to a new file, or if you prefer to edit an existing file, you should use the ‘-i’ flag:

Sed -e ‘s/input/output/’ sampler.log > new_file sed -I -e ‘s/input/output/’ sampler.log sed and regular expression

What if one of the characters you want to use in your search command is a special symbol, such as a ‘/’? (e.g., in filename) or ‘*’ etc.? Then you must escape symbols like grep (and awk). If you want to edit your shell script to refer to /usr/local/bin instead of /bin, you can do so

Sed -e ‘s//bin//usr/local/bin/’ my_script > new_script What if you want to use wildcards in searches – how to write output strings? You need to use the special symbol ampersand that corresponds to the pattern found. So you want each line to start with the number in your file and enclose that number in parentheses:

sed -e ‘s/[0-9]*/(&)/’

Where [0-9] is the regexp range for all the units digits, and ‘*’ is a repeat count representing the number of digits of any number. You can also use location directives in regexp, and you can even save part of the match results in a pattern buffer for reuse elsewhere.

Other SED commands

The general form is

Sed -e ‘/pattern/ command’ sampler.log where ‘pattern’ is a regular expression and’ command’ can be ‘s’= search&replace, or ‘p’= print, or ‘d’= delete, Or ‘I ‘=insert, or ‘a’=append, etc. Note that the default operation is to print everything that doesn’t match anyway, so if you want to suppress it, you need to call sed with the ‘-n’ flag, and then you can use the ‘p’ command to control what is printed. So, if you want to make a list of all the subdirectories you can use

Ls -l | sed – n – e ‘/ ^ d/p because long list to start each line with’ d ‘sign, if it is a directory, so it will only print out those lines begin with’ d ‘symbol. Also, if you want to delete all lines that begin with the comment symbol ‘#’, you can use

Sed -e ‘/^#/ d’ sampler. Log can also use range forms

Sed -e ‘1,100 Command ‘sampler. Log executes the’ command’ on lines 1-100. You can also use the special line number $to “close” the file. So if you want to delete all but the first 10 lines of the file, you can use

Sed-e ’11,$d’ sampler. Log You can also use the pattern range form, where the first regular expression defines the start of the range, as well as the second stop. So, for example, if you wanted to print all lines from ‘boot’ to ‘machine’ you could do this:

Sed-n-e ‘/boot$/,/ Mach /p’ sampler.log boot book booze machine then prints only those rows within a given range given by (-n) regexp.

read

Using sed can do, there are many specific reference: www.grymoire.com/Unix/Sed.ht…

conclusion

Linux three swordsman AWK,sed and grep are widely used in performance modeling, performance monitoring, performance analysis and other aspects in the field of performance. They are also frequently interviewed by test posts in Internet companies and one of the necessary skills for middle and high-end testers