SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2015-01-11 17:10:38

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

How To: scripting: bash internals vs. sed | cut | awk

1) Introduction
2) Examples
2.1) 'sed+cut' way
2.2) 'bash-only'-way
3) Benchmarks
3.1) Test script
3.2) Benchmark results
4) Final words

1. INTRODUCTION
What?
Plenty of bash scripts often need to extract information from strings variable (or command output) like:

variable="   something   somethingelse XXXX-YYYY-ZZ  0xab2345   whatever"
#         ^^^ note spaces at the beginning of the string

Being myself in need for such stuff too, and wandering around internet for solutions, I noticed that most of the solutions heavily use some or other combination of 'sed', 'cut' and/or 'awk'. Basically, one 'precook' the string with 'sed', and then use 'cut' or 'awk' to extract required part of the string.

Why?
Being some sort of a stubborn (don't want to use python/ruby/perl...) purist (why call other tools like 'sed', when 'bash' itself might be powerful enough), I stumbled across so called 'a surprising number of string manipulation operations' and 'arrays', both internal to bash. And discovered that most stuff can be programmed with these in place of 'sed', 'cut' and 'awk'. Since most google searches point to sed/cut/awk tools, I got an idea that it might be good to promote bash internal commands approach via few examples.

How?
It can be done surprisingly easy, and IMHO more cleanly syntactically, compared to  'sed', 'cut' and 'awk'. There are some drawbacks, most notably certain oneliners are impossible to construct (because of the way arrays and string manipulation works in bash), but overall impression is that 'arrays' and 'string manipulation operations' make script more readable.

Stay tuned for next posts with Examples ...

Last edited by iMBeCil (2015-01-12 21:22:41)


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

Help fund CrunchBang, donate to the project!

#2 2015-01-11 17:25:36

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

2. Examples
OK, here follows a very simple example. I want to extract all four number in separate variables, from string like:

1680x1050+2880+23

Of course, you recognize this is as a typical 'geometry'-like string. You can get a lot of those from 'xrandr' for example wink

2.1 'sed+cut'-way
Here is simple script for doing it via 'sed':

# example-SED
# define string
xrandroutput="1680x1050+2880+23"

# define TAB ('\t') character; needed for 'sed'
# and convenient for 'cut'
TAB=$(echo -e "\t")

# replace 'x' with '\t'
array=`echo "$xrandroutput" | sed "s/x/$TAB/"`

# replace '+' with '\t'
array=`echo -e "$array" | sed "s/+/$TAB/g"`

# store values
H=`echo -e "$array" | cut -f 1`
W=`echo -e "$array" | cut -f 2`
X=`echo -e "$array" | cut -f 3`
Y=`echo -e "$array" | cut -f 4`

Hacker will shout: 'why two sed's?' And they will be right, it can be a little shorter:

# example-SED-SINGLE
# define string
xrandroutput="1680x1050+2880+23"

# define TAB ('\t') character; needed for 'sed'
# and convenient for 'cut'
TAB=$(echo -e "\t")

# replace 'x' and '+' with '\t'
array=`echo "$xrandroutput" | sed "s/[x+]/$TAB/g"`

# store values
H=`echo -e "$array" | cut -f 1`
W=`echo -e "$array" | cut -f 2`
X=`echo -e "$array" | cut -f 3`
Y=`echo -e "$array" | cut -f 4`

That's it ... that's how I - more-less - saw people do it. Probably, it can be further optimized, but this is a gist of it.

Last edited by iMBeCil (2015-01-12 22:46:59)


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#3 2015-01-11 17:37:19

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

2.2 'bash-only'-way
Here is a promised example with bash internal command only:

# example-ARRAY
# define string
xrandroutput="1680x1050+2880+23"

# replace all 'x' with space " ", using powerful
# bash internal string search-replace pattern:
# ${string//substring/replacement},
# and store result in variable called 'array' 
# (not yet of array type)
array=${xrandroutput//x/" "}      # "1680 1050+2880+23"

# replace all '+' with space " ", and store result as
# an array in variable called 'array' using '(' and ')'
# note: parentheses will honor space as delimiter
# and make an array of values
array=( ${array//+/" "} )     # ( "1680" "1050" "2880" "23" )

# print data
echo "array[0] = ${array[0]}"
echo "array[1] = ${array[1]}"
echo "array[2] = ${array[2]}"
echo "array[3] = ${array[3]}"

Of course, two replacements patterns can be combined in single one:

# example-ARRAY-SINGLE
...
# replace all 'x' and '+' with space " "
# and store result in array variable called 'array'
array=( ${xrandroutput//[x+]/" "} )
...

Isn't it simpler and cleaner? Not to mention that in addition we have all the data in single (array) variable, which is sometime convenient, for example for less namespace cluttering.

OK, writing this made me very hungry ... going to eat and drink a beer, and then I will do some benchmarking.

Last edited by iMBeCil (2015-01-12 22:47:19)


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#4 2015-01-11 19:32:20

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

3. Benchmarks
3.1 Test script
Is it worth using arrays and string manipulation with internal bash commands, or is it just one more way of doing things? One way to see that is to run some benchmarks to see how fast is particular solution. First, I will explain the way I did benchmarks. Below is a skeleton for benchmark script 'test'. The idea is that we run certain (large) number of above examples, and time its execution.

#!/bin/bash
# 
# Usage: test [ITER]

iter=10000
# see if we supplied no of iterations 
if [ -n "$1" ]
then    
    iter="$1"
fi

TAB=$(echo -e "\t")

# iterate
for i in `seq $iter`
do
    #
    # do stuff
    #
done

and then we run this script from command line as:

$ /usr/bin/time -f "\nReal: %E\nUser: %U\nSys: %S" ./test 100000

Inside 'for' loop we put stuff like (I removed most of the comments, to make it shorter):

# example-SED
xrandroutput="1680x1050+2880+23"
array=`echo "$xrandroutput" | sed "s/x/$TAB/"`
array=`echo -e "$array" | sed "s/+/$TAB/g"`
H=`echo -e "$array" | cut -f 1`

i.e.

# example-ARRAY
xrandroutput="1680x1050+2880+23"
array=${xrandroutput//x/" "}      # "1680 1050+2880+23"
array=( ${array//+/" "} )     # ( "1680" "1050" "2880" "23" )

Note that in 'example-SED' we have to put result in variable 'H', while in 'example-ARRAY' all four values are inside 'array' array, accessible by array[ i] syntax.


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#5 2015-01-11 19:45:18

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

3.2 Benchmark results
To get meaningful results, I run every benchmark several times. (I could have done it much more systematically, by making some statistics, but the difference is so large that it is not necessary.) Furthermore I tried to choose number of iterations in a way to make script running for about 10 seconds, and afterwards normalize results.

The results are:

example-SED:   1000 iteration = 2.17 secs
example-ARRAY: 1000 iteration = 0.011 secs

Yes, this is factor of almost 200 times, in favor of example-ARRAY! yikes 8)

Therefore, using internal bash commands is quite faster than calling 'sed'!

Is it surprising? Well, I did expect some increase, but not for factor 200 ... Actually, I call more knowledgeable people here to make 'example-SED' faster. Perhaps, it can be done/programmed significantly better than I did.

Last edited by iMBeCil (2015-01-11 19:50:10)


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#6 2015-01-11 19:59:47

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

4. Final words
So is it worth using bash internals? Let me try to summarize:
PROS (for bash internals):
- it is significantly faster
- code is cleaner (IMHO), with less namespace clutter
- it seems to be easier

CONS (against):
- it is strongly bash-dependent
- although arrays and string manipulation commands should be avaliable in modern bash (above version 3), there might be compatibility problems for older bash versions (but really old versions)
- can't do certain oneliners (and impress friends), which are otherwise easily accessible via piping.

And, as a final words:
a) I hope someone will find use of this TL;DR of mine. I know that some will say 'Oh, I know this', some will say 'What on earth is he talking about', but I hope there will be someone who will learn something from it.
b) Sorry for TL;DR, couldn't find shorter way to explain it. Sorry for awkward EngRish.
c) Do not hesitate to make fool of me, if I did something wrong and/or stupid above.  tongue

The End.

Last edited by iMBeCil (2015-01-11 20:03:16)


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#7 2015-01-11 20:05:42

damo
#! gimpbanger
From: N51.5 W002.8 (mostly)
Registered: 2011-11-24
Posts: 5,434

Re: How To: scripting: bash internals vs. sed | cut | awk

THANK YOU - it has given me a lot of food for thought smile


BunsenLabs Group on deviantArt
damo's gallery on deviantArt
Openbox themes
Forum Moderator smile

Offline

#8 2015-01-11 21:28:14

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

damo wrote:

THANK YOU - it has given me a lot of food for thought smile

Thanks a lot, damo.  smile


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

#9 2015-01-12 02:48:43

machinebacon
#! unstable
From: China
Registered: 2009-07-02
Posts: 6,826
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

Imbecil, that's a great collection, thanks a lot. Gives some food for thought indeed.


Sweaty lads picking up the soap | I love the new "Ignore user" button

Offline

#10 2015-01-12 06:21:36

brontosaurusrex
#! Red Menace
Registered: 2012-06-15
Posts: 1,643

Re: How To: scripting: bash internals vs. sed | cut | awk

subscribed.

Offline

#11 2015-01-12 06:41:35

aiBo
#! CrunchBanger
Registered: 2010-11-08
Posts: 243

Re: How To: scripting: bash internals vs. sed | cut | awk

Great comparison and benchmarking.

However, there is a minor issue with your combined replacment line in your bash-only example.
As character classes in the matching pattern behave like in regex or globbing, your character class doesn't need the "," as a seperator. Actually, it is currently matching 'x' or ',' or '+'. So in your scenario the pattern just needs to be "[x+]".

So the line should look like this: (Space character in the replacment part doesn't need to be quoted.)

array=( ${xrandroutput//[x+]/ } )

Offline

#12 2015-01-12 08:19:11

tivasyk
Member
From: kyiv, ukraine
Registered: 2010-08-30
Posts: 26
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

thank you! i've learned something new today smile


[ corenominal mccoder has created a masterpiece. this is a linux distro. all craftsdwarfship is of the highest quality. it is crafted of scripts. it menaces with spikes of code. it menaces with spikes of theming. it depicts an image of a crunch. it depicts an image of a bang. the crunch is crunching. the bang is banging. ]

Offline

#13 2015-01-12 12:31:28

luc
#! Die Hard
From: Munich, Germany
Registered: 2010-03-21
Posts: 597

Re: How To: scripting: bash internals vs. sed | cut | awk

iMBeCil wrote:

Is it surprising? Well, I did expect some increase, but not for factor 200 ...

The thing that makes the bash version significantly faster than the version working with sed etc is the opening and not opening of pipes and the starting and not starting of new processes. That is the fact for all shell script solutions where you can reduce the number of calls to different external programs (use one sed instead of two, use sed to search and then replace instead of piping grep into sed, ...).

I once wrote a shell function to do some simple benchmarking for shell code, it is implemented for zsh (because that's what I use) but could easily be adapted for bash, I think. It is here.

Last edited by luc (2015-01-12 12:32:16)

Offline

#14 2015-01-12 12:58:32

Unia
#! Octo-portal-pussy
From: The Netherlands
Registered: 2010-07-17
Posts: 4,634
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

Very nice explanation and benchmarking!


If you can't sit by a cozy fire with your code in hand enjoying its simplicity and clarity, it needs more work. --Carlos Torres

I am a #! forum moderator. Feel free to send me a PM with any question you have!

Offline

#15 2015-01-12 16:59:37

tivasyk
Member
From: kyiv, ukraine
Registered: 2010-08-30
Posts: 26
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

iMBeCil wrote:

I hope someone will find use of this TL;DR of mine. I know that some will say 'Oh, I know this', some will say 'What on earth is he talking about', but I hope there will be someone who will learn something from it.

in fact i loved it so much i absolutely had to translate it and keep in my blog for future reference (ukrainian). link to the original post & attribution are there of course. please let me know if you'd rather want the translation to not be re-posted separately.


[ corenominal mccoder has created a masterpiece. this is a linux distro. all craftsdwarfship is of the highest quality. it is crafted of scripts. it menaces with spikes of code. it menaces with spikes of theming. it depicts an image of a crunch. it depicts an image of a bang. the crunch is crunching. the bang is banging. ]

Offline

#16 2015-01-12 21:21:47

iMBeCil
WAAAT?
From: Edrychwch o'ch cwmpas
Registered: 2012-03-22
Posts: 1,026
Website

Re: How To: scripting: bash internals vs. sed | cut | awk

Thanks all of you for your kind words ... I'm glad I could be of help.

aiBo wrote:

However, there is a minor issue with your combined replacment line in your bash-only example.
As character classes in the matching pattern behave like in regex or globbing, your character class doesn't need the "," as a seperator. Actually, it is currently matching 'x' or ',' or '+'. So in your scenario the pattern just needs to be "[x+]".

Thanks aiBo, you are right. Corrected!

luc wrote:

... The thing that makes the bash version significantly faster than the version working with sed etc is the opening and not opening of pipes and the starting and not starting of new processes ...

Exactly my thoughts. And thanks for the link for your benchmark script.

tivasyk wrote:

in fact i loved it so much i absolutely had to translate it and keep in my blog for future reference (ukrainian). link to the original post & attribution are there of course. please let me know if you'd rather want the translation to not be re-posted separately.

By all means, you have my permission. I'm honored by this  smile

BTW, I changed title of the topic to better resemble content.


Postpone all your duties; if you die, you won't have to do them ..
--> The very new BL forum! <--

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat

Debian Logo