Retrieve file path downloaded via wget - wget

I am downloading files with wget command
wget abc.com -nH -r -l1 --no-parent
This is storing files in different sub folders. I want path of each download file. So, how do I get it ?
Example:
wget is downloading file to:
c:/test/com/test/pacakage/filename1.text
c:/test/com1/test/package1/filename2.text
So, how to retrieve complete file path - ie. com/test/package/filename1.text ?
Thanks

Related

wget download only the sub directory

i need to download only the sub directory named pyVim with all its content ,but i
am getting the parents as well , even that i tried the following options:
wget -r --no-parent http://server/pub/scripts/pyVim
getting : server directory with its subdirectories
tried:
wget -r -X pub,scripts --no-parent http://server/pub/scripts/pyVim
tried few more options ,none of those works
i just need to download pyVim directory with its content to the current directory.
You said pyVim is a directory, but then the URL you passed to wget indicates that pyVim is a file in the directory scripts.
To explicitly tell wget that pvVim is a directory pass a trailing /. So your final command is:
wget -r --no-parent http://server/pub/scripts/pyVim/

Using wget to download multiple urls.

I have a file urls.txt. Which has multiple urls.
I am using wget to download the web content like this:
wget -i urls.txt. The web content is getting saved in different different files for each link. I want it to save everything in a single txt file
This will store the request details to messages.txt and all of the downloaded content to html.txt
$ wget -a messages.txt -i urls.txt -O html.txt
From wget --help:
Logging and input file:
-a, --append-output=FILE append messages to FILE
Download:
-O, --output-document=FILE write documents to FILE
Tested on GNU Wget 1.19.1 built on darwin16.6.0.

wget - list of all the files url in the directory recursively but not download them?

wget -recursive and other options are used to download files. Is there a way to download the files' URL without downloading the files themselves ?
You can execute HEAD requests with wget -r --spider to retrieve the response headers without the file content.

wget corrupted file .zip file error

I am using wget to try and download two .zip files (SWVF_1_44.zip and SWVF_44_88.zip) from this site: http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE
when I run:
wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off "http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE/SWVF_1_44.zip"
I get a downloaded zip file that has a screwed up name (f#p=111%3A1%3A0%3A%3ANO%3ARP%3AP1_TYPE%3ASTATE%2FSWVF_1_44) and it cannot be opened.
Any thoughts on where my code is wrong?
There's nothing "wrong" with your code. Wget is simply assuming you want to save the file in the same name that appears in the url. Use the -O option to specify an output file:
wget blahblahblah -O useablefilename.zip

How to specify the download location with wget?

I need files to be downloaded to /tmp/cron_test/. My wget code is
wget --random-wait -r -p -nd -e robots=off -A".pdf" -U mozilla http://math.stanford.edu/undergrad/
So is there some parameter to specify the directory?
From the manual page:
-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the
directory where all other files and sub-directories will be
saved to, i.e. the top of the retrieval tree. The default
is . (the current directory).
So you need to add -P /tmp/cron_test/ (short form) or --directory-prefix=/tmp/cron_test/ (long form) to your command. Also note that if the directory does not exist it will get created.
-O is the option to specify the path of the file you want to download to:
wget <uri> -O /path/to/file.ext
-P is prefix where it will download the file in the directory:
wget <uri> -P /path/to/folder
Make sure you have the URL correct for whatever you are downloading. First of all, URLs with characters like ? and such cannot be parsed and resolved. This will confuse the cmd line and accept any characters that aren't resolved into the source URL name as the file name you are downloading into.
For example:
wget "sourceforge.net/projects/ebosse/files/latest/download?source=typ_redirect"
will download into a file named, ?source=typ_redirect.
As you can see, knowing a thing or two about URLs helps to understand wget.
I am booting from a hirens disk and only had Linux 2.6.1 as a resource (import os is unavailable). The correct syntax that solved my problem downloading an ISO onto the physical hard drive was:
wget "(source url)" -O (directory where HD was mounted)/isofile.iso"
One could figure the correct URL by finding at what point wget downloads into a file named index.html (the default file), and has the correct size/other attributes of the file you need shown by the following command:
wget "(source url)"
Once that URL and source file is correct and it is downloading into index.html, you can stop the download (ctrl + z) and change the output file by using:
-O "<specified download directory>/filename.extension"
after the source url.
In my case this results in downloading an ISO and storing it as a binary file under isofile.iso, which hopefully mounts.
"-P" is the right option, please read on for more related information:
wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2
Relevant snippets from man pages for convenience:
-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).
-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
filenames will get extensions .n).
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
man wget:
-O file
--output-document=file
wget "url" -O /tmp/cron_test/<file>