wget not following links with mirror

wget not following links with mirror - wget

I'm trying to semi mirror a site. What I want is to download all of the MP3s and make sure I'm not redownloading those that I already have (hence the "mirror" part). I've typed in the following:
wget -m -nd -e robots=off --random-wait -A "*.mp3" -P FOLDER http://www.example.com/
And it downloads all the MP3s on the Current Page. It never follows the links to the "Next Page" or the likes. I've replaced the -m with -N -c -r without success. What other options can I use?

Try:
wget ‐‐execute robots=off ‐‐recursive ‐‐accept mp3,MP3 --random-wait ‐‐no-parent ‐‐continue ‐‐no-clobber //site.com/

Related

wget converted some of the links - how to convert links after download?

I used
wget -mirror --convert-links http://example.com/ 2>&1 | tee -a wget.log
to download a website. It turns out that only some of the links were converted. How can I have all of the links converted, even after the download? I do not want to download all of the contents again.

Firstly, please be aware that --convert-links does it job after everything was downloaded so if you are inspecting certain downloaded file before wget finished working you might see unconverted list.
I do not want to download all of the contents again.
then you should use --no-clobber, but according to man page --mirror is equivalent to -r -N -l inf --no-remove-listing and --no-clobber and -N are mutually exclusive, therefore you must not use --mirror but parts of it excluding -N taking this is account your command should look following way
wget -r --no-clobber -l inf --no-remove-listing --convert-links http://example.com/

wget doesn't see/download certain files?

I am trying to download all files starting with traceroute from https://data-store.ripe.net/datasets/atlas-daily-dumps/ via wget.
I am running the following command:
wget -A traceroute* -m -np https://data-store.ripe.net/datasets/atlas-daily-dumps/ --no-check-certificate
It creates the directories, checks index.html's and then within 5 minutes it stops, without downloading any traceroute files.
When I try another type of file via
wget -A connection* -m -np https://data-store.ripe.net/datasets/atlas-daily-dumps/ --no-check-certificate
it donwloads the connection files no problem. What can be the issue?

You probably have a local file that matches the glob traceroute*; you need to put single quotes around it so the shell won't match anything:
wget -A 'traceroute*' -m -np https://data-store.ripe.net/datasets/atlas-daily-dumps/ --no-check-certificate

specifying traceroute*.bz2 seems to have fixed the problem

Renaming a downloaded file using wget

Here is my wget command where i am trying to rename the file which i am downloading but it is not working. I am using -O option here but somehow it is not working.
access="http://mvn:8081/nexus/content/com/mvn/"
wget -r -np -nd -l1 -O "access.war" "$access" -A "com.infa.products.ldm.ingestion.access.web-"$n"-.-1-ldm-access-web.war"
Here i am renaming it to access.war. I can only use wget to do this job due to some restrictions.
Thanks for the help.

The option -A is "comma separated", but you are using dots to separate the extensions!
Instead of
-A "com.infa.products.ldm.ingestion.access.web-"$n"-.-1-ldm-access-web.war"
Try
-A "com,infa,products,ldm,ingestion,access,web-"$n"-,-1-ldm-access-web,war"
If this is not the solution to your problem, I suggest you simplify your wget-call down to something like this
wget -r -np -nd -l1 -O "access.war" "$access"
Just to verify that all else is working.
Or even better (to get fewer files)
wget -r -np -nd -l1 -O "access.war" "$access" -A "war"

wget -k -p -r exampleserver.com doesn't download images?

Title says it all. I try to download a page with wget -k -p -r and it downloads .html .js and robots.txt only. I need those images as well. They didn't land in my folder. What's wrong? I used same command on another page and it did what i wanted.

Add -H option and it works. -H is for Host
No it does not. It only downloaded a load of crap.

how to use wget on a site with many folders and subfolders

I try to download this site, with this code:
wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off tenshi.spb.ru/anime-ost/
But I only get the index and enter inside the first folder, not the subfolder, help me?

I use this command to download sites including their subfolders:
wget --mirror -p --convert-links -P . [site address]
A little explanation:
--mirror is a shortcut for -N -r -l inf --no-remove-listing.
--convert-links makes links in downloaded HTML or CSS point to local files
-p allows you to get all images, etc. needed to display HTML pages
-P specifies the next argument is the directory the files will be saved to
I found the command at:
http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/

You use -l 1 also known as --level=1 which limits recursion to one level. Set that to a higher level to download more pages. BTW, I like long options like --level because its easier to see what you are doing without going back to man pages.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

wget not following links with mirror - wget

Try: wget ‐‐execute robots=off ‐‐recursive ‐‐accept mp3,MP3 --random-wait ‐‐no-parent ‐‐continue ‐‐no-clobber //site.com/

Related

wget converted some of the links - how to convert links after download?

wget doesn't see/download certain files?

Renaming a downloaded file using wget

wget -k -p -r exampleserver.com doesn't download images?

how to use wget on a site with many folders and subfolders

Categories

Resources