SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2012-01-25 02:17:22

ackernan
#! Junkie
Registered: 2011-01-10
Posts: 403

Curl & grep help

I'm writing a script to get images from a website.  I'm using curl piped to grep, but the grep command is ignored.  Here's something like what I'm using..

curl http://www.slothradio.com/covers/?artist=creed&album=my+own+prison | grep "img src"
curl http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all | grep thumb

Why is the grep command not working?

I thought it was because the lines had no line breaks.  I downloaded the page source with wget and each line had a line break.

Last edited by ackernan (2012-01-25 02:21:02)

Offline

Help fund CrunchBang, donate to the project!

#2 2012-01-25 02:39:32

oylenshpeegul
Member
From: Maryland, USA
Registered: 2012-01-01
Posts: 37
Website

Re: Curl & grep help

You need to escape some of those characters in the URLs from your shell. The easiest thing to do is probably to put the whole thing in quotes.

    curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'img src'

    curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep thumb

Also, you might want to add an -s to curl.

Last edited by oylenshpeegul (2012-01-25 02:40:35)

Offline

#3 2012-01-25 02:43:47

ackernan
#! Junkie
Registered: 2011-01-10
Posts: 403

Re: Curl & grep help

oylenshpeegul wrote:

You need to escape some of those characters in the URLs from your shell. The easiest thing to do is probably to put the whole thing in quotes.

    curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'img src'

    curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep thumb

Also, you might want to add an -s to curl.

Neither worked.  Just what I grepped for was highlighted.

Offline

#4 2012-01-25 03:04:00

oylenshpeegul
Member
From: Maryland, USA
Registered: 2012-01-01
Posts: 37
Website

Re: Curl & grep help

Grep is going to print out every line that matches. The lines can be quite long and include lots of junk, but the thing you grepped for is there somewhere.

Your second example returns JSON, rather than HTML...try running it through Python's json.tool before grepping

curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | python -mjson.tool | grep thumb

Offline

#5 2012-01-25 03:07:33

ackernan
#! Junkie
Registered: 2011-01-10
Posts: 403

Re: Curl & grep help

oylenshpeegul wrote:

Grep is going to print out every line that matches. The lines can be quite long and include lots of junk, but the thing you grepped for is there somewhere.

Your second example returns JSON, rather than HTML...try running it through Python's json.tool before grepping

curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | python -mjson.tool | grep thumb

That worked, thanks.

Offline

#6 2012-01-25 03:10:02

mrpeachy
20% cooler
From: The Everfree Forest
Registered: 2009-11-08
Posts: 3,460

Re: Curl & grep help

ackernan wrote:

I'm writing a script to get images from a website.  I'm using curl piped to grep, but the grep command is ignored.  Here's something like what I'm using..

curl http://www.slothradio.com/covers/?artist=creed&album=my+own+prison | grep "img src"
curl http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all | grep thumb

Why is the grep command not working?

I thought it was because the lines had no line breaks.  I downloaded the page source with wget and each line had a line break.

huh, i always thought you had to put your grep matches in quotes

but the first line is working correctly... its just that the lines that have "img src" on them are looooooooong

you need to be more specific in your matching

seems my reply was too late smile

Last edited by mrpeachy (2012-01-25 03:13:37)

Offline

#7 2012-01-25 03:22:48

oylenshpeegul
Member
From: Maryland, USA
Registered: 2012-01-01
Posts: 37
Website

Re: Curl & grep help

You could try parsing the HTML with Perl or something, instead of trying to grep out the stuff you want.

$ perl -MLWP::Simple -MHTML::TokeParser -E '$c = get shift; $p = HTML::TokeParser->new(\$c); while ($t = $p->get_tag("img")){say $t->[1]{src}}' 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison'
https://www.paypal.com/en_US/i/scr/pixel.gif
http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41B750QM6BL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41qZeFFdFdL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/515QKW7AVgL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/51I8Cqs1luL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41CANTYE0YL.jpg
buy-button.gif
http://www.assoc-amazon.com/s/noscript?tag=coverfinder-20

Offline

#8 2012-01-25 03:30:09

mrpeachy
20% cooler
From: The Everfree Forest
Registered: 2009-11-08
Posts: 3,460

Re: Curl & grep help

this works also (that gawk command is useful for lots of things)

curl 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'images-amazon.com' | gawk -F'><img src="' -v RS='" width=' 'RT{print $NF}' | grep 'jpg'

im not an expert by any means, i just like doing this stuff smile

there is probably a 100 other ways to get the same result

no need for 2 greps

curl 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | gawk -F'><img src="' -v RS='" width=' 'RT{print $NF}' | grep 'amazon.com'

Last edited by mrpeachy (2012-01-25 03:35:14)

Offline

#9 2012-01-25 05:06:17

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Curl & grep help

Grep's -o option can be useful sometimes too. That will print out only the matching part of the line. Then you have to make a regular expression that will cover the bit you want. For example:

john@raffles3:~$ curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep -o 'img src="[^"]*"'
img src="http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41B750QM6BL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41qZeFFdFdL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/515QKW7AVgL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/51I8Cqs1luL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41CANTYE0YL.jpg"
img src="buy-button.gif"
img src="http://www.assoc-amazon.com/s/noscript?tag=coverfinder-20"

So we've got all the images. The [^"]*" bit is useful in cases like this. It matches anything that isn't a double quote, up to the next double quote. Now you've got to decide which of those ???????????.jpg images to grab, and extend the regular expression a bit more.

As MrP says, there are so many ways to do this kind of thing, and awk is great!


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#10 2012-01-25 11:21:43

ackernan
#! Junkie
Registered: 2011-01-10
Posts: 403

Re: Curl & grep help

Thanks for the help guys.  Here's what I got, using sed to get the first instance.

curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p

This works great in a terminal but when use it in a script surrounded by double quotes it gets messed up.  The double quote in the brackets of grep throughs everything off.  How can I modify/use this statement in a script?

Here's the statement in the script.

local f = io.popen("curl -s 'http://api.discogs.com/search?q='"..mocartist.."'+'"..mocalbum.."'&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p")

Last edited by ackernan (2012-01-25 11:54:00)

Offline

#11 2012-01-25 14:34:22

mrpeachy
20% cooler
From: The Everfree Forest
Registered: 2009-11-08
Posts: 3,460

Re: Curl & grep help

ackernan wrote:

Thanks for the help guys.  Here's what I got, using sed to get the first instance.

curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p

This works great in a terminal but when use it in a script surrounded by double quotes it gets messed up.  The double quote in the brackets of grep throughs everything off.  How can I modify/use this statement in a script?

Here's the statement in the script.

local f = io.popen("curl -s 'http://api.discogs.com/search?q='"..mocartist.."'+'"..mocalbum.."'&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p")

you can use backslash \ right before to escape any quotes (ie turn them into plain text) that are messing up your line

eg

command("do /"this/" and this")

the other thing you can do is if all the quotes that are causing problems are double, use single quotes to surround the command, if the ones inside are single enclose them in doubles smile

Last edited by mrpeachy (2012-01-25 14:36:39)

Offline

#12 2012-01-25 21:32:15

ackernan
#! Junkie
Registered: 2011-01-10
Posts: 403

Re: Curl & grep help

mrpeachy wrote:

you can use backslash \ right before to escape any quotes (ie turn them into plain text) that are messing up your line

eg

command("do /"this/" and this")

the other thing you can do is if all the quotes that are causing problems are double, use single quotes to surround the command, if the ones inside are single enclose them in doubles smile

Thanks.  I knew that, just didn't think of it before I asked.  Sometimes I'm slow thinking.   BTW the \ did the trick. big_smile

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat

Debian Logo