You are not logged in.
I'm writing a script to get images from a website. I'm using curl piped to grep, but the grep command is ignored. Here's something like what I'm using..
curl http://www.slothradio.com/covers/?artist=creed&album=my+own+prison | grep "img src"
curl http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all | grep thumb
Why is the grep command not working?
I thought it was because the lines had no line breaks. I downloaded the page source with wget and each line had a line break.
Last edited by ackernan (2012-01-25 02:21:02)
Offline
You need to escape some of those characters in the URLs from your shell. The easiest thing to do is probably to put the whole thing in quotes.
curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'img src'
curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep thumb
Also, you might want to add an -s to curl.
Last edited by oylenshpeegul (2012-01-25 02:40:35)
Offline
You need to escape some of those characters in the URLs from your shell. The easiest thing to do is probably to put the whole thing in quotes.
curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'img src' curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep thumb
Also, you might want to add an -s to curl.
Neither worked. Just what I grepped for was highlighted.
Offline
Grep is going to print out every line that matches. The lines can be quite long and include lots of junk, but the thing you grepped for is there somewhere.
Your second example returns JSON, rather than HTML...try running it through Python's json.tool before grepping
curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | python -mjson.tool | grep thumb
Offline
Grep is going to print out every line that matches. The lines can be quite long and include lots of junk, but the thing you grepped for is there somewhere.
Your second example returns JSON, rather than HTML...try running it through Python's json.tool before grepping
curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | python -mjson.tool | grep thumb
That worked, thanks.
Offline
I'm writing a script to get images from a website. I'm using curl piped to grep, but the grep command is ignored. Here's something like what I'm using..
curl http://www.slothradio.com/covers/?artist=creed&album=my+own+prison | grep "img src" curl http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all | grep thumb
Why is the grep command not working?
I thought it was because the lines had no line breaks. I downloaded the page source with wget and each line had a line break.
huh, i always thought you had to put your grep matches in quotes
but the first line is working correctly... its just that the lines that have "img src" on them are looooooooong
you need to be more specific in your matching
seems my reply was too late
Last edited by mrpeachy (2012-01-25 03:13:37)
Offline
You could try parsing the HTML with Perl or something, instead of trying to grep out the stuff you want.
$ perl -MLWP::Simple -MHTML::TokeParser -E '$c = get shift; $p = HTML::TokeParser->new(\$c); while ($t = $p->get_tag("img")){say $t->[1]{src}}' 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison'
https://www.paypal.com/en_US/i/scr/pixel.gif
http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41B750QM6BL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41qZeFFdFdL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/515QKW7AVgL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/51I8Cqs1luL.jpg
buy-button.gif
http://ecx.images-amazon.com/images/I/41CANTYE0YL.jpg
buy-button.gif
http://www.assoc-amazon.com/s/noscript?tag=coverfinder-20
Offline
this works also (that gawk command is useful for lots of things)
curl 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep 'images-amazon.com' | gawk -F'><img src="' -v RS='" width=' 'RT{print $NF}' | grep 'jpg'
im not an expert by any means, i just like doing this stuff
there is probably a 100 other ways to get the same result
no need for 2 greps
curl 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | gawk -F'><img src="' -v RS='" width=' 'RT{print $NF}' | grep 'amazon.com'
Last edited by mrpeachy (2012-01-25 03:35:14)
Offline
Grep's -o option can be useful sometimes too. That will print out only the matching part of the line. Then you have to make a regular expression that will cover the bit you want. For example:
john@raffles3:~$ curl -s 'http://www.slothradio.com/covers/?artist=creed&album=my+own+prison' | grep -o 'img src="[^"]*"'
img src="http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41B750QM6BL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/51Y5ZCMV2QL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41qZeFFdFdL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/515QKW7AVgL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/51I8Cqs1luL.jpg"
img src="buy-button.gif"
img src="http://ecx.images-amazon.com/images/I/41CANTYE0YL.jpg"
img src="buy-button.gif"
img src="http://www.assoc-amazon.com/s/noscript?tag=coverfinder-20"
So we've got all the images. The [^"]*" bit is useful in cases like this. It matches anything that isn't a double quote, up to the next double quote. Now you've got to decide which of those ???????????.jpg images to grab, and extend the regular expression a bit more.
As MrP says, there are so many ways to do this kind of thing, and awk is great!
John
--------------------
( a boring Japan blog , Japan Links, idle twitterings and GitStuff )
#! forum moderator BunsenLabs
Offline
Thanks for the help guys. Here's what I got, using sed to get the first instance.
curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p
This works great in a terminal but when use it in a script surrounded by double quotes it gets messed up. The double quote in the brackets of grep throughs everything off. How can I modify/use this statement in a script?
Here's the statement in the script.
local f = io.popen("curl -s 'http://api.discogs.com/search?q='"..mocartist.."'+'"..mocalbum.."'&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p")
Last edited by ackernan (2012-01-25 11:54:00)
Offline
Thanks for the help guys. Here's what I got, using sed to get the first instance.
curl -s 'http://api.discogs.com/search?q=creed+my%20own%20prison&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p
This works great in a terminal but when use it in a script surrounded by double quotes it gets messed up. The double quote in the brackets of grep throughs everything off. How can I modify/use this statement in a script?
Here's the statement in the script.
local f = io.popen("curl -s 'http://api.discogs.com/search?q='"..mocartist.."'+'"..mocalbum.."'&btn=&type=all' | grep -o 'http://api.discogs.com/image/[^"]*.jpeg' | sed -n 1p")
you can use backslash \ right before to escape any quotes (ie turn them into plain text) that are messing up your line
eg
command("do /"this/" and this")
the other thing you can do is if all the quotes that are causing problems are double, use single quotes to surround the command, if the ones inside are single enclose them in doubles
Last edited by mrpeachy (2012-01-25 14:36:39)
Offline
you can use backslash \ right before to escape any quotes (ie turn them into plain text) that are messing up your line
eg
command("do /"this/" and this")
the other thing you can do is if all the quotes that are causing problems are double, use single quotes to surround the command, if the ones inside are single enclose them in doubles
Thanks. I knew that, just didn't think of it before I asked. Sometimes I'm slow thinking. BTW the \ did the trick.
Offline
Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat