SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2014-10-01 23:41:46

Joe90
#! Junkie
Registered: 2013-10-10
Posts: 337

[SOLVED] Wget and "not" (edit) the Underscore

Someone might know how to work round this?

I am building a website in php, and have created some of the stock php files (e.g. header/footer) to start with an underscore, (e.g _header.php)

I just attempted to wget the whole site from the server, and everything comes down except the files starting with an underscore.

I attempted to use -A php in the command but that didn't grab them either

Here is my command:

wget -r -p --no-parent -nc http://website/

I can grab them one by one if I use the full filename, but that seems a bit daft and misses the whole point of one command to bring whole website down?

wget -r -p --no-parent -nc http://website/_header.php

Any suggestions, can't find a solution on interweb or in man file?

Last edited by Joe90 (2014-10-04 00:04:37)

Offline

Be excellent to each other!

#2 2014-10-02 00:39:56

Sector11
#!'er to BL'er
From: SR11 Cockpit
Registered: 2010-05-05
Posts: 15,667
Website

Re: [SOLVED] Wget and "not" (edit) the Underscore

Don't know about wget but: how to work round this

httrack

Package: httrack                 

Description: Copy websites to your computer (Offline browser)

HTTrack is an offline browser utility, allowing you to download a World Wide website from the Internet to a local directory, building recursively all directories, getting html, images, and other files from the server to your computer.

HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.
Homepage: http://www.httrack.com

Just a sudo apt-get install away.


·  ↓   ↓   ↓   ↓   ↓   ↓  ·
BunsenLabs Forums now Open for Registration
·  ↑   ↑   ↑   ↑   ↑   ↑  · BL ModSquad

Offline

#3 2014-10-02 17:34:59

Joe90
#! Junkie
Registered: 2013-10-10
Posts: 337

Re: [SOLVED] Wget and "not" (edit) the Underscore

Thanks Sector11, didn't spot it was linux ready also, may give it a spin, but would be good to understand the problem with wget?

I spotted somewhere that the "_" is seen as an escape character of some kind?

Last edited by Joe90 (2014-10-02 17:35:46)

Offline

#4 2014-10-02 19:31:28

Sector11
#!'er to BL'er
From: SR11 Cockpit
Registered: 2010-05-05
Posts: 15,667
Website

Re: [SOLVED] Wget and "not" (edit) the Underscore

I can understand your wanting to know, I can't help there, I was just trying to help you get the site.  big_smile

Maybe something to do with this URL Encoding:

URL Encoding

URLs can only be sent over the Internet using the ASCII character-set.

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.

URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.

URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

Did httrack work for you?


·  ↓   ↓   ↓   ↓   ↓   ↓  ·
BunsenLabs Forums now Open for Registration
·  ↑   ↑   ↑   ↑   ↑   ↑  · BL ModSquad

Offline

#5 2014-10-02 20:17:13

Joe90
#! Junkie
Registered: 2013-10-10
Posts: 337

Re: [SOLVED] Wget and "not" (edit) the Underscore

httrack didn't bring the files that start with a "_" either ?

Offline

#6 2014-10-02 20:25:44

twoion
Moderator
Registered: 2012-05-11
Posts: 1,648

Re: [SOLVED] Wget and "not" (edit) the Underscore

There are two possibilities here:

a) wget omits the files during retrieval. This is unlikely; I can't reproduce this on any of my servers. You can make sure by specifying --accept-regex '.*' or a similar option.

b) your HTTP server (apache, nginx, lighttpd ...) doesn't serve the files to begin with and is disallowing access to them, for example using .htaccess configurations. It could be that this is intended for protecting 'special' files that start with a non-alphanumeric character (like .dotfiles or _data). You might want to check this yourself or with the server administrator.

Last edited by twoion (2014-10-02 20:26:07)


Tannhäuser ~ {www,pkg,ddl}.bunsenlabs.org/{gitlog,repoidx}

Offline

#7 2014-10-02 20:36:30

gutterslob
#! Resident Bum
Registered: 2009-11-03
Posts: 3,207

Re: [SOLVED] Wget and "not" (edit) the Underscore

Joe90 wrote:

I attempted to use -A php in the command but that didn't grab them either

Just double checking. Did you try putting the suffix in quotes? Not entirely sure if this makes a difference or not, though. Or maybe quotes are only for a patterns?

Last edited by gutterslob (2014-10-02 20:46:44)


Point & Squirt

Offline

#8 2014-10-02 21:24:59

Sector11
#!'er to BL'er
From: SR11 Cockpit
Registered: 2010-05-05
Posts: 15,667
Website

Re: [SOLVED] Wget and "not" (edit) the Underscore

OK, it's official:  I'm stuck!  sad


·  ↓   ↓   ↓   ↓   ↓   ↓  ·
BunsenLabs Forums now Open for Registration
·  ↑   ↑   ↑   ↑   ↑   ↑  · BL ModSquad

Offline

#9 2014-10-02 23:24:26

Joe90
#! Junkie
Registered: 2013-10-10
Posts: 337

Re: [SOLVED] Wget and "not" (edit) the Underscore

Hmmm my wget doesn't seem to like "--accept-regex"

@ gutterslob

Works (that is, limits to php files only) with or without quotes, but still not files that start with "_"  sad

Last edited by Joe90 (2014-10-02 23:27:19)

Offline

#10 2014-10-04 00:03:41

Joe90
#! Junkie
Registered: 2013-10-10
Posts: 337

Re: [SOLVED] Wget and "not" (edit) the Underscore

:8  :8  :8

Got to the bottom of this, having overcome my ignorance of how wget worked in http mode.

Nothing to do with the underscore at all, and everything to do with php and how wget in http mode works.

My underscore files were only used as php calls for headers and footers inside other web pages. I looked inside one of the files that was being downloaded, expecting to see the php call sections, e.g.

<?php require($DOCUMENT_ROOT . "_header.php"); ?>

only to find that wget in http mode works by seeing the pages on the site as they are presented in a browser (view source code), so the contents of _header.php were included in each of the files. Doh!

I discovered this because I renamed all the underscore files "_" = "x" and changed all the references, and still the files did not show up.

The only way to do what I want to do with wget is to use ftp mode:

wget -r --user myuser --password mypass ftp://mywebsite.co.uk/www/html/

My apologies for leading you up the garden path, but I got there in the end  neutral

Offline

#11 2014-10-04 12:50:42

twoion
Moderator
Registered: 2012-05-11
Posts: 1,648

Re: [SOLVED] Wget and "not" (edit) the Underscore

^ php? Annoying you since 1995 !1!!
--
No, seriously, good job in figuring this one out...

Last edited by twoion (2014-10-04 12:51:07)


Tannhäuser ~ {www,pkg,ddl}.bunsenlabs.org/{gitlog,repoidx}

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat

Debian Logo