funkyfrogstock - Fotolia

Get started Bring yourself up to speed with our introductory content.

Get to know the awk command for Linux

Awk lets you find and process patterns in text. It can get complicated, but the basics are pretty simple and a good place to start learning about Linux commands.

Awk is a very powerful Linux command-line tool. Programs can get quite complex, but it's important to start with the basics.

Awk is an interpreted programming language developed for text processing by Alfred Aho, Peter Weinberger, and Brian Kernighan in 1977. The initials of the three authors' last names make up the program's name. Various forms of awk (nawk, gawk and others) are available in almost every version of Linux, and they are easy to use from the command line. Being a powerful command-line tool also means awk is effective over Secure Shell.

The language is used to process text files. By default each line of a file is treated as a record. The record is then further broken up into a sequence of fields. Awk programs are just a list of pattern-action statements that are performed on the records and fields. Awk reads through the files sequentially and when it matches a pattern, it prompts a related action.

Pattern matching with the awk command

For example, say I want to print out all the lines in a file that match a certain string.

First I need a text file. You could use the ls command with a couple of options to get a listing of your local directory to use as a file. The following command line runs ls with the -l (long) and -h (print kilobyte or megabyte files sizes) and redirects the output to a text file named ob-list.txt.

ls -lh > rob-list.txt

Here's what the resulting rob-list.txt looks like.

total 40K

-rw-rw-r-- 1 rob rob 23K Jul 12 15:29 awk-basics.odt

-rw-rw-r-- 1 rob rob 110 Jul  7 12:52 rob2.data

-rw-rw-r-- 1 rob rob 220 Jul 12 16:26 rob3.data

-rwxrwxrwx 1 rob rob  59 Jul 12 16:28 rob.awk

-rw-rw-r-- 1 rob rob 220 Jun 27 10:55 rob.data

-rw-rw-r-- 1 rob rob   0 Jul 12 16:57 rob-list.txt

An awk command line to find "220" might look like this.

awk '/220/ {print $0}' rob-list.txt

And here's the result.

-rw-rw-r-- 1 rob rob 220 Jul 12 16:26 rob3.data

-rw-rw-r-- 1 rob rob 220 Jun 27 10:55 rob.data

In this case, the text between the slashes is matched as awk moves from the beginning to the end of the file. The $0 field represents the entire line, and you can print only certain fields from each line.

If you want to print only the file size and file name of each line in the file, you would do something like this: The file size in the text file is field number 5. The file name is field 9. White space is the default field separator. This awk command line shows only the file size and file name:

awk '{print $5 " " $9}' rob-list.txt

And this is the result:

23K awk-basics.odt

110 rob2.data

220 rob3.data

59 rob.awk

220 rob.data

0 rob-list.txt

If you looked at the text on the actual command line on your computer screen, there'd be a blank line at the beginning of the print out. It's there because awk just prints fields 5 and 9, even if it's a blank line.

Possible combinations

You could then combine matching with printing certain fields. For example, if you just want find the lines that contain the string "220" and then print out the file size and name, this line will do the trick:

awk '/220/ {print $5 " " $9}' rob-list.txt

The output looks like the following:

220 rob3.data

220 rob.data

Be aware that when you use the matching option '/ target string /' the awk command matches all occurrences of the string in each line of the text file. It's conceivable that you might have a 220 on the line somewhere other than in one of your printed fields. The matched line still shows up, even though in this case the file size (field 5) may not be 220.

Obviously, these examples use very small text files. Awk can handle much larger files, with tens or even hundreds of thousands of lines of text. Awk simply starts at the beginning and methodically steps through the file until it reaches the end, matching patterns, printing and outputting text as it goes.

You can also get a bit more advanced and look for patterns in the fields themselves. For example, maybe you want to search with some conditions and print out just those lines:

awk '(index($9, "rob") != 0) && (index($9, "awk") !=0) {print $5" "$9}' rob-list.txt

The result would be:

59 rob.awk

Here the index function finds instances of "rob" and "awk" in the rob-list.txt. In this case I used the && (and) operator to just print out the line(s) with both. You could also use || operator to do an "or" comparison.

You could also substitute in data when you find certain strings. With a simple change to the last awk command, you could substitute "I found it" into the output when a line with "rob" and "awk" are found together. For example:

awk '(index($9, "rob") != 0) && (index($9, "awk") !=0) {print $5" I found it"}' rob-list.txt

This is the output:

59 I found it

Putting awk to work

I've used awk in some Internet of Things projects to move data between platforms and applications. For example, I have taken the data from a DS18B20 digital temperature sensor, run it through Arduino and output the text data via XBee radios to a Linux notebook. The data was simply pulled into the notebook using the cat command to query the USB port, then redirect the data to a text file.

cat /dev/ttyUSB0 > rob.data

Here's some of the data right out of the Arduino:

a001|83.52|a002|92.11

a001|83.52|a002|92.31

a001|83.52|a002|94.36

a001|83.52|a002|93.92

a001|83.64|a002|93.50

a001|83.64|a002|93.12

a001|83.64|a002|92.91

a001|83.64|a002|92.85

a001|83.52|a002|92.43

a001|83.64|a002|92.17

This data represents temperature readings from two different sensors, "a001" and "a002," that are polled about once per second. I used awk to separate the readings and put them in a form that could be read into the kst graphing program on a Linux notebook.

I used the following awk command line to prep the data for the kst graphing program.

awk -F "|" '{print $3","$4}' < rob.data > rob2.data

The data output is now just X and Y coordinates, separated by a comma, that feeds into kst.

a002,92.11

a002,92.31

a002,94.36

a002,93.92

a002,93.50

a002,93.12

a002,92.91

a002,92.85

a002,92.43

a002,92.17

Notice that you can subtly change the format of the output simply by adding characters, spaces and so on. Awk is ideal for data format conversion jobs.

Next Steps

There's a vast treasure trove of knowledge and information on the Web about awk, these are just a few basic examples. Check out the gawk user's guide, the awk man page and an awk tutorial. You can even get a version of awk for Windows. The Web is also full of sites dedicated to awk one-liners. Try tuxgraphics.org. Another good one is awk.info. Catonmat.net has a nice list of one-liners dealing with line spacing and numbering.

This was last published in October 2015

Dig Deeper on Alternative operating systems

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

2 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What tasks do you use awk for?
Cancel
Ah, awk! Those were the days.
I think a lot people moved to using Perl a long time back.

There was a brief flurry of tcl but Perl seemed to attract a lot of attention and wasn't bad to use either aswell as having a lot of utilities eg to obfuscate your release code.

Fun though.
Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchWindowsServer

SearchExchange

Close