SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2009-12-04 04:21:37

TheLivestockMan
Member
From: Wisconsin, USA
Registered: 2009-09-12
Posts: 15

Python trouble?

I'm not sure if this is the right place to ask, but I know you guys could help big_smile

I currently have this code:

#! usr/bin/python
import os
import re
import glob

os.system('wget -r --no-parent -Q1m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm')
os.chdir('/home/npn_/news.bbc.co.uk/2/hi/business)
names = glob.glob('*.stm')
for name in names:
    file = open(names, 'r')
    text = file.readlines()
    file.close()
    keyword = re.compile(r"December 3, 2009 ")
    for line in text:
      if keyword.search(line):
         print line,"\n"

This downloads 4gb of news from the Business section of the BBC, and is supposed to give me articles on that date. The only problem is when I run it, nothing happens. Sorry if this is really obvious, but I'm pretty new to Python. Anyone have a fix?


"Lost packet, 42 bytes, last seen on a saturated OC3, reward $$$."

Offline

Be excellent to each other!

#2 2009-12-04 07:23:27

iggykoopa
Script Master
Registered: 2008-12-13
Posts: 1,486

Re: Python trouble?

try changing it to:
os.chdir(os.environ["HOME"])
os.system('wget -r --no-parent -Q1m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm')
os.chdir(os.environ["HOME"] + '/news.bbc.co.uk/2/hi/business')

it was missing a '. Also unless your username was npn_ it wouldn't work, and the way it was written you would have to run it from your home directory.


I say never be complete, I say stop being perfect, I say lets evolve, let the chips fall where they may.

Offline

#3 2009-12-04 12:09:54

benj1
Wiki Wizard
From: Yorkshire, England
Registered: 2009-09-05
Posts: 1,084

Re: Python trouble?

is this a learning exercise, or for actual use?

if its a learning exercise, python includes a module for downloading files

from urllib import FancyURLopener
class MyOpener(FancyURLopener): version = '' #change user agent string to ''
s = MyOpener().open(http://news.bbc.co.uk/2/hi/business/default.stm).read()

also you could reuse the re module rather than use glob, for 'glob.glob('*.stm')'


although perhaps a better approach would be to just do a google search?
' "3 December 2009" site:http://news.bbc.co.uk/2/hi/business filetype:stm '
would do the same job in a fraction of the time, i suppose you could wrap that into some sort of script to download them automatically if you wanted


- - - - - - - - Wiki Pages - - - - - - -
#! install guide           *autostart programs, modify the menu & keybindings
configuring Conky       *installing scripts

Offline

#4 2009-12-04 22:40:42

TheLivestockMan
Member
From: Wisconsin, USA
Registered: 2009-09-12
Posts: 15

Re: Python trouble?

@Benji- yeah, I just need to automatically download news by date.

That python script now seems pretty stupid, so how about a bash script instead?

[#! /bin/bash

echo -i "Please enter the date. (eg. 3+December+2009): \c"
read

wget -r --no-parent -Q4096m -U Mozilla -A.stm -erobots=off [url=http://www.google.com/search?q=%22\\$REPLY\\%22+site%3Ahttp%3A%2F%2Fnews.bbc.co.uk%2F2%2Fhi%2Fbusiness+filetype%3Astm&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a/code]]http://www.google.com/search?q=%22\"'\"$REPLY\"'\"%22+site%3Ahttp%3A%2F%2Fnews.bbc.co.uk%2F2%2Fhi%2Fbusiness+filetype%3Astm&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a

I've never written a bash script before. I intend this to just download the google search page, and all the .stm files that it links too, unfortunataly, it downloads the entire website, even files that aren't stm.

So is there any way I could just have it download the search page and urls that are .stms?


"Lost packet, 42 bytes, last seen on a saturated OC3, reward $$$."

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.

Debian Logo