There are many tools out there to download/scrape websites, i.e. curl
, httrack
, sitesucker
, deepvacuum
(which is actually a GUI wrapper for wget
) and probably more.
I find wget
to be one of the most useable tools to get an entire website. Make sure to use the option --convert-links
, which converts any links between the subpages into local relative URLs, otherwise the links would still point to the original site, making local browsing impossible. Also use --restrict-file-names=windows
to ensure safe file names for your respective OS. To put it in a nutshell, this are the arguments I use with wget to make a local copy of an entire website:
wget -H -r --level=5 --restrict-file-names=windows --convert-links -e robots=off http://example.org
or
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
--mirror
– Makes (among other things) the download recursive. --convert-links
– convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing. --adjust-extension
– Adds suitable extensions to filenames (html or css) depending on their content-type. --page-requisites
– Download things like CSS style-sheets and images required to properly display the page offline. --no-parent
– When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.
Alternatively, the command above may be shortened:
wget -mkEpnp http://example.org