I am downloading files with wget command
wget abc.com -nH -r -l1 --no-parent
This is storing files in different sub folders. I want path of each download file. So, how do I get it ?
Example:
wget is downloading file to:
c:/test/com/test/pacakage/filename1.text
c:/test/com1/test/package1/filename2.text
So, how to retrieve complete file path - ie. com/test/package/filename1.text ?
Thanks
I have a file urls.txt. Which has multiple urls.
I am using wget to download the web content like this:
wget -i urls.txt. The web content is getting saved in different different files for each link. I want it to save everything in a single txt file
This will store the request details to messages.txt and all of the downloaded content to html.txt
$ wget -a messages.txt -i urls.txt -O html.txt
From wget --help:
Logging and input file:
-a, --append-output=FILE append messages to FILE
Download:
-O, --output-document=FILE write documents to FILE
Tested on GNU Wget 1.19.1 built on darwin16.6.0.
I am using wget to try and download two .zip files (SWVF_1_44.zip and SWVF_44_88.zip) from this site: http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE
when I run:
wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off "http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE/SWVF_1_44.zip"
I get a downloaded zip file that has a screwed up name (f#p=111%3A1%3A0%3A%3ANO%3ARP%3AP1_TYPE%3ASTATE%2FSWVF_1_44) and it cannot be opened.
Any thoughts on where my code is wrong?
There's nothing "wrong" with your code. Wget is simply assuming you want to save the file in the same name that appears in the url. Use the -O option to specify an output file:
wget blahblahblah -O useablefilename.zip
wget -r -np www.a.com/b/c/d
The above will create a directory called 'www.a.com' in the current working directory on my local computer, containing all subdirectories on the path to 'd'.
I only want directory 'd' (and its contents) created in my cwd.
How can I achieve this?
You can mention that directory name explicitely and avoid creation of sub-directories by the following line.
wget -nd -P /home/d www.a.com/b/c/d
The -nd will avoid creation of sub-directories and -P will set the directory to /home/d and all your files will be downloaded to "/home/d" folder only.
I need files to be downloaded to /tmp/cron_test/. My wget code is
wget --random-wait -r -p -nd -e robots=off -A".pdf" -U mozilla http://math.stanford.edu/undergrad/
So is there some parameter to specify the directory?
From the manual page:
-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the
directory where all other files and sub-directories will be
saved to, i.e. the top of the retrieval tree. The default
is . (the current directory).
So you need to add -P /tmp/cron_test/ (short form) or --directory-prefix=/tmp/cron_test/ (long form) to your command. Also note that if the directory does not exist it will get created.
-O is the option to specify the path of the file you want to download to:
wget <uri> -O /path/to/file.ext
-P is prefix where it will download the file in the directory:
wget <uri> -P /path/to/folder
Make sure you have the URL correct for whatever you are downloading. First of all, URLs with characters like ? and such cannot be parsed and resolved. This will confuse the cmd line and accept any characters that aren't resolved into the source URL name as the file name you are downloading into.
For example:
wget "sourceforge.net/projects/ebosse/files/latest/download?source=typ_redirect"
will download into a file named, ?source=typ_redirect.
As you can see, knowing a thing or two about URLs helps to understand wget.
I am booting from a hirens disk and only had Linux 2.6.1 as a resource (import os is unavailable). The correct syntax that solved my problem downloading an ISO onto the physical hard drive was:
wget "(source url)" -O (directory where HD was mounted)/isofile.iso"
One could figure the correct URL by finding at what point wget downloads into a file named index.html (the default file), and has the correct size/other attributes of the file you need shown by the following command:
wget "(source url)"
Once that URL and source file is correct and it is downloading into index.html, you can stop the download (ctrl + z) and change the output file by using:
-O "<specified download directory>/filename.extension"
after the source url.
In my case this results in downloading an ISO and storing it as a binary file under isofile.iso, which hopefully mounts.
"-P" is the right option, please read on for more related information:
wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2
Relevant snippets from man pages for convenience:
-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).
-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
filenames will get extensions .n).
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
man wget:
-O file
--output-document=file
wget "url" -O /tmp/cron_test/<file>