SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2012-02-09 17:44:40

johnraff
#!Drunkard
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 2,706
Website

Getting started with awk.

Well, I'm not an awk guru - what I know is just a tiny bit of what you can do with awk, but even that tiny bit can come in quite useful. Awk's been a good friend of late. There are plenty of tutorials available already (links below) but awk can seem a bit off-putting at first and I'll try here to get round a couple of the stumbling blocks and show some of the handy things you can do easily.

Why awk? You can replace a pipeline of 'stuff | grep | sed | cut...' with a single call to awk. For a simple script, most of the timelag is in loading these apps into memory, and it's much faster to do it all with one. This is ideal for something like an openbox pipe menu where you want to generate something on the fly. You can use awk to make a neat one-liner for some quick job in the terminal, or build an awk section into a shell script. Read up the docs and you'll be able to do some quite fancy stuff. cool

What awk seems to be best for is taking a file full of data and going through it systematically, pulling something useful out of it. The basic idea is that the file is divided into a number of records, each consisting of several fields, for example an address book where the records might be people and the fields phone number, address, email... Usually the records are divided by line breaks and the fields are space separated, but both these can be changed with interesting results.

The way to call awk is

awk 'do awk stuff' /path/to/file
# or
process | awk 'do stuff'

You can either give awk a file to work on or pipe it the output of some command. It's also possible to import variables from your bash script (see below). Getting output is a bit more limited. It's basically 'print' to standard output, the same kind of lines-and-fields data it was inputted. It's not easy to output multiple variables or arrays, though it is possible to run system calls from inside an awk script (which I haven't tried yet).

Quoting:
We have to wrap awk's commands in single quotes to keep the shell from trying to do things with them. If you need a single quote inside the awk script you can't just escape it with a backslash because the shell will attach no special meaning to the backslash and think the awk script has ended at that point. You have to exit the quotes, then escape your single quote, then re-enter awkland, like this:

'...awk stuff'\''more awk stuff...'

Strings are enclosed in double quotes. Unlike bash, a variable in double quotes is not expanded. {print "$1"} will output literally $1, not the value of $1.

Variables:
Variables inside the single quote section are in awk's world and are unconnected with any variables you might have created in your bash part of the script. $1, $2... stand for the first, second... field of the current record, not bash's $1 etc. $0 stands for the complete record (or line). You can make your own variables with any name you want. To call a variable other than the fields $0, $1 etc. you don't need the $:

john@raffles3:~$ echo|awk '{a="test";print a}'
test
john@raffles3:~$ echo|awk '{a="test";b="phrase";print a b}'
testphrase
john@raffles3:~$ echo|awk '{a="test";b="phrase";print a " " b}' # you need quotes round the space
test phrase
john@raffles3:~$ echo|awk '{a="test";b="phrase";print "\"" a " " b "\""}' # escape literal double quotes with backslash
"test phrase"
john@raffles3:~$ echo|awk '{a="test";b="phrase";print"\""a" "b"\""}' # spaces are less important than in bash
"test phrase"

There are also some useful builtin variables like FS, RS and NF. If you want to import variables from your bash code, the safest way is to use the -v option when calling awk.

awk -v awk_var="$var" -v other_awk_var="$othervar" 'do stuff'

You could give them the same names as your bash variables but it might get confusing.


The basic command is:
condition {action;other action}
Separate actions by semicolons or line breaks. The condition is often to match the record ($0) with a regular expression /regex/. That condition will be applied to each record in turn, and, if satisfied, the actions taken.

You can often omit things and use the defaults - try these in a terminal:

awk '/\/bin\/bash/' /etc/passwd

Each record (line by default) is checked against the regex /bin/bash, and the ones that match are printed out. That expression is the whole awk script!
(The forward slshes in the expression have to be escaped with backslashes.)

awk '$1 ~ /daemon/' /etc/passwd

Default is to match against the whole line, but now we're looking at the first field, $1.

awk 'BEGIN {FS=":"}{print $1}' /etc/passwd

If there's no action defined, the default is to print the whole record, ie {print $0}.
If there's no condition defined, the default is to do the action for every record.

BEGIN and END are special patterns to run commands before going through the records, and afterwards. BEGIN {FS=":"} sets the Field Separator to a colon instead of the default space. BEGIN {RS="."} will make the Record Separator a full stop instead of the default linebreak, so now you'll be looking at each sentence of the file in turn. FS and RS can also be Regular Expressions!

Another function you'll use often is sub or gsub to do string substitution, like sed 's/pattern/replacement/[g]',
so:

gsub(/pattern/,"replacement",variable_to_modify)

The variable is modified directly - default is $0. (Awk's full of defaults.)

I find it's easiest to gradually build up your script in a terminal till you get what you want. Try these, in turn:

curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} {print"TAG: " $0}
curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} {gsub(/[ \t\n]+/," ");print"TAG: " $0}' # remove linebreaks and squash spaces
curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} /^a / {gsub(/[ \t\n]+/," ");print"LINK: " $0}'
curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} /^a / {gsub(/[ \t\n]+/," ");sub(/a .*href="/,"");print"LINK: " $0}'
curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} /^a / {gsub(/[ \t\n]+/," ");sub(/a .*href="/,"");sub(/\".*$/,"");print"LINK: " $0}'
curl -s crunchbanglinux.org | awk 'BEGIN {RS="<"} /^a / {gsub(/[ \t\n]+/," ");sub(/a .*href="/,"");sub(/\".*$/,"");if(/crunchbang/)next;print"EXTERNAL LINK: " $0}'

Have fun!


OK now here's where to read this stuff properly explained. roll
Two thorough tutorials:
http://www.gnu.org/software/gawk/manual/gawk.html
http://www.grymoire.com/Unix/Awk.html
A famous list of useful one-liners - though they're short, many are quite tricky:
http://www.pement.org/awk/awk1line.txt
And some nice explanations of those one-liners. After reading this you'll have a pretty good grasp!
http://www.catonmat.net/blog/awk-one-li … -part-one/
http://www.catonmat.net/blog/ten-awk-ti … -pitfalls/


A couple more notes:

Gawk vs Mawk:
The default version in Debian and ubuntu is mawk, which is a smaller faster version, but gawk is available in the repositories if you want it. There are a few extra things that gawk can do, but mawk's fine most of the time.

Regular expressions:
Mawk will handle most extended regular expressions, but two things I've hit so far that you can't use are Posix character classes like [:digit:] and backreferences like \1.

btw regular expressions are pretty much essential for getting useful stuff done here, but they're so generally useful anyway that if you're not yet up to speed I'd recommend getting a grip on at least the basics. Just google, but here are some links I liked:
http://gnosis.cx/publish/programming/re … sions.html
http://www.grymoire.com/Unix/Regular.html
http://www.thegeekstuff.com/2011/01/reg … p-command/
http://www.thegeekstuff.com/2011/01/adv … 3-part-ii/

...phew... Major respect to the people who've written all the HOW TOs to date! It's not as easy as it might look.

Last edited by johnraff (2012-02-11 16:34:48)


John
--------------------
( a boring Japan blog , idle twitterings  and GitStuff )
#! forum moderator

Offline

Be excellent to each other!

#2 2012-02-09 18:38:43

VastOne
#! Ranger
From: #! Fringe Division
Registered: 2011-04-26
Posts: 10,163
Website

Re: Getting started with awk.

Nice How To johnraff!  I have added it to the Quick Reference page.

awk indeed is a great tool, thanks for this!


VSIDO | SolusOS

Words That Build Or Destroy

Offline

#3 2012-02-09 23:14:08

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Getting started with awk.

really nice johnraff, thanks for that! will read up on awk some more again. i know it is a powerful tool but i have just never been able to really 'get into it'. strange thing is that sed looks way stranger and i get that a lot better...

Offline

#4 2012-02-10 17:36:17

johnraff
#!Drunkard
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 2,706
Website

Re: Getting started with awk.

^Hmm interesting I found the opposite - apart from the popular s/foo/bar thing I couldn't get a feel for sed at all, and you have to use so many backslashes all the time it's really hard to read. Awk just seemed friendlier somehow.

PS of course any suggestions, additions and corrections will (probably) be incorporated into the top post. smile

Last edited by johnraff (2012-02-10 17:48:26)


John
--------------------
( a boring Japan blog , idle twitterings  and GitStuff )
#! forum moderator

Offline

#5 2012-02-10 17:46:03

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Getting started with awk.

^ well, my affinity with sed comes from my detect_syllables script. i use sed for the entire process of extracting the syllables, so i figured lots of stuff out while doing that. on the other hand, i know there are functions of sed which i can't quite wrap my head around. but having some knowledge helps with gathering even more knowledge. which is why i'm gonna read your awk-how-to thoroughly smile

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.

Debian Logo