Original: Curly brace MC(wechat official account: Huakuohao-MC) Focus on JAVA basic programming and big data, focus on experience sharing and personal growth.

AWK was born in 1977 at Bell LABS. It was developed by Alfred Aho, Peter Weinberger and Brian Kernighan. AWK is named after the initials of the three authors’ Family names.

AWK was designed to handle text and data quickly, which was cool and meaningful in the days when Python and Go didn’t exist. But even now, with Python and Go, AWK still has a place in daily online operations, and it’s almost irreplaceable.

If you still don’t know much about AWK, I hope this blog post will give you a brief introduction, and if this article can arouse your interest in learning AWK, so much the better.

AWK programs tend to be short, but that’s not to say AWK can’t write logically complex programs. AWK, like most languages you’ve learned, supports variable customization, array structures, for loops, and so on, as well as custom functions.

Suppose we want to process a background log for an online store that reads as follows:

Python Foundation Tutorial Yamaharu 34.50 5 Python Advanced Tutorial Solon 40.00 6 Python Practice Nami 38 3 Java Programming Ideas Frank 48.00 6 Vue Tutorial javascript team 55.00 8 AWK Foundation Tutorial Lufei 42 3Copy the code

The first list is the title of the book, the second list is the author, the third list is the price, and the fourth list is the number of sales that day. We’ll name this file awk_books.log. Next we’ll look at AWK through some practical case requirements

Find books that cost more than 40

awk '$3>40 {print $0}' awk_books.log
Copy the code

Output:

$ awk '$3>40 {print $0}' awk_books.logJava Programming ideas Frank 48.00 6 Vue Tutorial javascript group 55.00 8 AWK Foundation Tutorial Luff 42 3Copy the code

explain

Awk means to tell the operating system that I’m going to execute an AWK program, please be prepared. The content inside the single quotation marks is the specific program logic. Awk_books.log is the file to be processed.

Writing awK programs is actually very easy because there are set routines. Pattern is ‘pattern {action}’,pattern is called pattern, action is called action.

Awk automatically reads each line of the file to be processed and matches it with pattern. If the match is successful, the action is executed. For the above example, pattern is $3>40 and action is {print $0}.

Note: Both pattern and action do not have to exist at the same time.

At this point, you should know everything except $3 and $0.

Awk separates each line of input by whitespace, so the first column after the partition is $1, the second column is $2, and so on. In our case, the book price is $3 in the third column, and the variable $0 represents the entire row.

What if the text content you want to work with is not whitespace delimited? You can do this.

echo "hello-world" | awk 'BEGIN{FS="-"} {print $1}'
Copy the code

The output is hello

FS is a built-in variable of AWK and represents the delimiter of the input column, which defaults to whitespace. In the example above we set the default value of FS to the desired value, which is -. In addition to FS, built-in variables associated with the delimiter are the output column delimiter OFS, the input row delimiter RS, and the output row delimiter ORS. I’m not going to give you an example.

BEGIN means that the statement after the BEGIN keyword is executed only once before the AWK program begins execution, so it is often used to initialize variables or file headers. In addition to BEGIN and END, I’ll show you a sample column later, so don’t worry.

All right, a quick summary here, and then we’ll move on. 1. We know how to call AWK programs from command line terminals. 2. Know that the fixed routine of AWK program is ‘pattern {action}’ format 3. You learned about the built-in variables $0,$1, and FS.

Let’s continue with some concrete examples.

Print out the last column of each line

awk '{print $NF}' awk_books.log
Copy the code

Output result:

$ awk '{print $NF}' awk_books.log5, 6, 3, 6, 8, 3Copy the code

In this example, there is clearly no pattern but only action. If there is no pattern in AWK, all rows are matched by default. NF is a built-in variable of AWK that indicates how many columns there are in each row. So $NF naturally represents the output of the last column. Based on this example, you can try printing out the penultimate column of each line as follows:

awk '{print $(NF-1) }' awk_books.log
Copy the code

Only the second line is printed

awk 'NR == 2' awk_books.log
Copy the code

Output result:

$ awk 'NR == 2' awk_books.logPython Advanced Tutorial Solon 40.00 6Copy the code

This example clearly has no action but pattern. If no action is written, the default action will be executed. The default action will print the matching line content. NR is also a built-in variable that represents the number of rows that AWK has read.

Print the contents of each line and the line number

awk '{print NR,$0}' awk_books.log
Copy the code

Output result:

$ awk '{print NR,$0}' awk_books.log1 Python Basics Tutorial Sandhi 34.50 5 2 Advanced Python Tutorial Solon 40.00 6 3 Python Basics Tutorial Name 38 34 Java Programming ideas Frank 48.00 6 5 Vue tutorial javascript team 55.00 8 6 Awk Basics Tutorial Luffy 42 3Copy the code

Print uses commas to separate the items to be printed, and the commas are replaced by Spaces.

Better output

Following the example above, let’s make the output more readable.

awk 'BEGIN{print" print"; print""} {print NR,$0} END{print""; Print "End of output! } ' awk_books.log
Copy the code

The output

$ awk 'BEGIN{print" print"; print""} {print NR,$0} END{print""; Print "End of output! } ' awk_books.log1 Basic Python tutorial Yamaharu 34.50 5 2 Advanced Python tutorial Solon 40.00 6 3 Python Practice Nami 38 34 Java programming ideas Frank 48.00 6 5 Vue tutorial Javascript team 55.00 8 6 AWK Foundation Tutorial Luffy 42 3 Output end!Copy the code

BEGIN has already been mentioned, but this is just to make things a little bit clearer. The statements that follow BEGIN and END are executed only once, at the beginning and END of the program.

Prints the line that contains Java

awk '/java/ {print $0}' awk_books.log
Copy the code

Output result:

$ awk '/java/ {print $0}' awk_books.logJava Programming ideas frank 48.00 6 Vue tutorial javascript team 55.00 8Copy the code

The pattern for this sample match is a regular expression that outputs the corresponding row whenever there is a match. You can write more complex regular expressions based on your business needs.

Print the first row containing Java

awk '$1 ~/java/ {print $0}' awk_books.log
Copy the code

Output result:

$ awk '$1 ~/java/ {print $0}' awk_books.logJava programming ideas frank 48.00 6Copy the code

Print the row that does not contain Java in the first column

awk '$1 !~/java/ {print $0}' awk_books.log
Copy the code

Output result:

$ awk '$1 !~/java/ {print $0}' awk_books.logPython Foundation Tutorial Yamaharu 34.50 5 Python Advanced Tutorial Solon 40.00 6 Python Basics Nami 38 3 Vue Tutorial javascript team 55.00 8 AWK Foundation Tutorial Lufei 42 3Copy the code

Count the total number of lines of input text

awk 'END {print NR}' awk_books.log
Copy the code

Output result:

$ awk 'END {print NR}' awk_books.log
6
Copy the code

Output lines with more than 33 characters

awk 'length($0)> 33' awk_books.log
Copy the code

Output result:

$ awk 'length($0)> 33' awk_books.logJava Programming ideas frank 48.00 6 Vue tutorial javascript team 55.00 8Copy the code

Length is a built-in function, and there are many built-in functions, not listed here, interested can go to check.

Outputs the row with the highest price

awk '$3>max {max = $3; maxline=$0} END {print max,maxline}' awk_books.log
Copy the code

Output result:

$ awk '$3>max {max = $3; maxline=$0} END {print max,maxline}' awk_books.log55.00 Vue Tutorial javascript team 55.00 8Copy the code

Here Max and maxline are user-defined variables, AWK variables do not need to be declared in advance, directly use OK, default value is null or 0.

Calculate the total sales price of the day and output

awk '{total += $3*$4} END { print total}' awk_books.log
Copy the code

Output result:

$ awk '{total += $3*$4} END { print total}' awk_books.log
1380.5
Copy the code

Find all Python titles and sort them by price

awk '$1 ~/python/ {print $0}' awk_books.log|sort -n -k 3 -t ' '
Copy the code

Output result:

$ awk '$1 ~/python/ {print $0}' awk_books.log|sort -n -k 3 -t ' '3 Python Advanced Tutorial Solon 40.00 6Copy the code

Find all Python titles, sort them by price, and output the two most expensive ones

awk '$1 ~/python/ {print $0}' awk_books.log|sort -r -n -k 3 -t ' '|head -n 2
Copy the code

Output result:

$ awk '$1 ~/python/ {print $0}' awk_books.log|sort -r -n -k 3 -t ' '|head -n 2Python Advanced tutorial Solon 40.00 6 Python Practical namei 38 3Copy the code

The last two columns use sort and head. If you don’t know the two instructions, you can check them out.

Well, all this AWK knowledge should be enough for you to cope with your daily work needs. What if it’s not enough? Go buy two books and keep studying!


Recommended reading:

As a programmer, you should know about coding

Strange characters in Shell programming

Javaer Operation and Maintenance Instruction Collection (Fast Food version)

, END,

Curly braces MC

Java· Big Data · Personal growth

Wechat id: Huakuohao-MC