You are not logged in.
Someone might know how to work round this?
I am building a website in php, and have created some of the stock php files (e.g. header/footer) to start with an underscore, (e.g _header.php)
I just attempted to wget the whole site from the server, and everything comes down except the files starting with an underscore.
I attempted to use -A php in the command but that didn't grab them either
Here is my command:
wget -r -p --no-parent -nc http://website/
I can grab them one by one if I use the full filename, but that seems a bit daft and misses the whole point of one command to bring whole website down?
wget -r -p --no-parent -nc http://website/_header.php
Any suggestions, can't find a solution on interweb or in man file?
Last edited by Joe90 (2014-10-04 00:04:37)
Offline
Don't know about wget but: how to work round this
Package: httrack
Description: Copy websites to your computer (Offline browser)
HTTrack is an offline browser utility, allowing you to download a World Wide website from the Internet to a local directory, building recursively all directories, getting html, images, and other files from the server to your computer.
HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.
Homepage: http://www.httrack.com
Just a sudo apt-get install away.
· ↓ ↓ ↓ ↓ ↓ ↓ ·
BunsenLabs Forums now Open for Registration
· ↑ ↑ ↑ ↑ ↑ ↑ · BL ModSquad
Offline
Thanks Sector11, didn't spot it was linux ready also, may give it a spin, but would be good to understand the problem with wget?
I spotted somewhere that the "_" is seen as an escape character of some kind?
Last edited by Joe90 (2014-10-02 17:35:46)
Offline
I can understand your wanting to know, I can't help there, I was just trying to help you get the site.
Maybe something to do with this URL Encoding:
URL Encoding
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
Did httrack work for you?
· ↓ ↓ ↓ ↓ ↓ ↓ ·
BunsenLabs Forums now Open for Registration
· ↑ ↑ ↑ ↑ ↑ ↑ · BL ModSquad
Offline
httrack didn't bring the files that start with a "_" either ?
Offline
There are two possibilities here:
a) wget omits the files during retrieval. This is unlikely; I can't reproduce this on any of my servers. You can make sure by specifying --accept-regex '.*' or a similar option.
b) your HTTP server (apache, nginx, lighttpd ...) doesn't serve the files to begin with and is disallowing access to them, for example using .htaccess configurations. It could be that this is intended for protecting 'special' files that start with a non-alphanumeric character (like .dotfiles or _data). You might want to check this yourself or with the server administrator.
Last edited by twoion (2014-10-02 20:26:07)
Offline
I attempted to use -A php in the command but that didn't grab them either
Just double checking. Did you try putting the suffix in quotes? Not entirely sure if this makes a difference or not, though. Or maybe quotes are only for a patterns?
Last edited by gutterslob (2014-10-02 20:46:44)
Point & Squirt
Offline
OK, it's official: I'm stuck!
· ↓ ↓ ↓ ↓ ↓ ↓ ·
BunsenLabs Forums now Open for Registration
· ↑ ↑ ↑ ↑ ↑ ↑ · BL ModSquad
Offline
Hmmm my wget doesn't seem to like "--accept-regex"
@ gutterslob
Works (that is, limits to php files only) with or without quotes, but still not files that start with "_"
Last edited by Joe90 (2014-10-02 23:27:19)
Offline
:8 :8 :8
Got to the bottom of this, having overcome my ignorance of how wget worked in http mode.
Nothing to do with the underscore at all, and everything to do with php and how wget in http mode works.
My underscore files were only used as php calls for headers and footers inside other web pages. I looked inside one of the files that was being downloaded, expecting to see the php call sections, e.g.
<?php require($DOCUMENT_ROOT . "_header.php"); ?>
only to find that wget in http mode works by seeing the pages on the site as they are presented in a browser (view source code), so the contents of _header.php were included in each of the files. Doh!
I discovered this because I renamed all the underscore files "_" = "x" and changed all the references, and still the files did not show up.
The only way to do what I want to do with wget is to use ftp mode:
wget -r --user myuser --password mypass ftp://mywebsite.co.uk/www/html/
My apologies for leading you up the garden path, but I got there in the end
Offline
Offline
Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat