You are not logged in.
Pages: 1
I'm not sure if this is the right place to ask, but I know you guys could help 
I currently have this code:
#! usr/bin/python
import os
import re
import glob
os.system('wget -r --no-parent -Q1m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm')
os.chdir('/home/npn_/news.bbc.co.uk/2/hi/business)
names = glob.glob('*.stm')
for name in names:
file = open(names, 'r')
text = file.readlines()
file.close()
keyword = re.compile(r"December 3, 2009 ")
for line in text:
if keyword.search(line):
print line,"\n"This downloads 4gb of news from the Business section of the BBC, and is supposed to give me articles on that date. The only problem is when I run it, nothing happens. Sorry if this is really obvious, but I'm pretty new to Python. Anyone have a fix?
"Lost packet, 42 bytes, last seen on a saturated OC3, reward $$$."
Offline
try changing it to:
os.chdir(os.environ["HOME"])
os.system('wget -r --no-parent -Q1m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm')
os.chdir(os.environ["HOME"] + '/news.bbc.co.uk/2/hi/business')
it was missing a '. Also unless your username was npn_ it wouldn't work, and the way it was written you would have to run it from your home directory.
I say never be complete, I say stop being perfect, I say lets evolve, let the chips fall where they may.
Offline
is this a learning exercise, or for actual use?
if its a learning exercise, python includes a module for downloading files
from urllib import FancyURLopener
class MyOpener(FancyURLopener): version = '' #change user agent string to ''
s = MyOpener().open(http://news.bbc.co.uk/2/hi/business/default.stm).read()also you could reuse the re module rather than use glob, for 'glob.glob('*.stm')'
although perhaps a better approach would be to just do a google search?
' "3 December 2009" site:http://news.bbc.co.uk/2/hi/business filetype:stm '
would do the same job in a fraction of the time, i suppose you could wrap that into some sort of script to download them automatically if you wanted
- - - - - - - - Wiki Pages - - - - - - -
#! install guide *autostart programs, modify the menu & keybindings
configuring Conky *installing scripts
Offline
@Benji- yeah, I just need to automatically download news by date.
That python script now seems pretty stupid, so how about a bash script instead?
[#! /bin/bash
echo -i "Please enter the date. (eg. 3+December+2009): \c"
read
wget -r --no-parent -Q4096m -U Mozilla -A.stm -erobots=off [url=http://www.google.com/search?q=%22\\$REPLY\\%22+site%3Ahttp%3A%2F%2Fnews.bbc.co.uk%2F2%2Fhi%2Fbusiness+filetype%3Astm&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a/code]]http://www.google.com/search?q=%22\"'\"$REPLY\"'\"%22+site%3Ahttp%3A%2F%2Fnews.bbc.co.uk%2F2%2Fhi%2Fbusiness+filetype%3Astm&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-aI've never written a bash script before. I intend this to just download the google search page, and all the .stm files that it links too, unfortunataly, it downloads the entire website, even files that aren't stm.
So is there any way I could just have it download the search page and urls that are .stms?
"Lost packet, 42 bytes, last seen on a saturated OC3, reward $$$."
Offline
Pages: 1
Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.