wget can't download - 404 error - wget

I tried to download an image using wget but got an error like the following.
--2011-10-01 16:45:42-- http://www.icerts.com/images/logo.jpg
Resolving www.icerts.com... 97.74.86.3
Connecting to www.icerts.com|97.74.86.3|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-10-01 16:45:43 ERROR 404: Not Found.
My browser has no problem loading the image.
What's the problem?
curl can't download either.
Thanks.
Sam

You need to add the referer field in the headers of the HTTP request. With wget, you just need the --header arg :
wget http://www.icerts.com/images/logo.jpg --header "Referer: www.icerts.com"
And the result :
--2011-10-02 02:00:18-- http://www.icerts.com/images/logo.jpg
Résolution de www.icerts.com (www.icerts.com)... 97.74.86.3
Connexion vers www.icerts.com (www.icerts.com)|97.74.86.3|:80...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: 6102 (6,0K) [image/jpeg]
Sauvegarde en : «logo.jpg»

I had the same problem with a Google Docs URL. Enclosing the URL in quotes did the trick for me:
wget "https://docs.google.com/spreadsheets/export?format=tsv&id=1sSi9f6m-zKteoXA4r4Yq-zfdmL4rjlZRt38mejpdhC23" -O sheet.tsv

You will also get a 404 error if you are using ipv6 and the server only accepts ipv4.
To use ipv4, make a request adding -4:
wget -4 http://www.php.net/get/php-5.4.13.tar.gz/from/this/mirror

I had same problem.
Solved using single quotes like this:
$ wget 'http://www.icerts.com/images/logo.jpg'
wget version in use:
$ wget --version
GNU Wget 1.11.4 Red Hat modified

Wget 404 error also always happens if you want to download the pages from Wordpress-website by typing
wget -r http://somewebsite.com
If this website is built using Wordpress you'll get such an error:
ERROR 404: Not Found.
There's no way to mirror Wordpress-website because the website content is stored in the database and wget is not able to grab .php files. That's why you get Wget 404 error.
I know it's not this question's case, because Sam only wants to download a single picture, but it can be helpful for others.

Actually I don't know what is the reason exactly, I have faced this like of problem.
if you have the domain's IP address (ex 208.113.139.4), please use the IP address instead of domain (in this case www.icerts.com)
wget 192.243.111.11/images/logo.jpg
Go to find the IP from URL https://ipinfo.info/html/ip_checker.php

I want to add something to #blotus's answer,
In case adding the referrer header does not solve the issue, May be you are using the wrong referrer (Sometimes the referrer is different from the URL's domain name).
Paste the URL on a web browser and find the referrer from developer tools (Network -> Request Headers).

I met exactly the same problem while setting up GitHub actions with Cygwin. Only after I used wget --debug <url>, I realized that URL is appended with 0xd symbol which is \r (carriage return).
For this kind of problem there is the solution described in docs:
you can also use igncr in the SHELLOPTS environment variable
So I added the following lines to my YAML script to make wget work properly, as well as other shell commands in my GHA workflow:
env:
SHELLOPTS: igncr

Related

wget gets Read error (Connection reset by peer) in headers

I searched a lot but no solution helped me with this problem where I get this error constantly.
HTTP request sent, awaiting response... Read error (Connection reset
by peer) in headers. Retrying.
I tried the following inputs.
wget url
wget -O url
wget -O url username="user" password="pass" host="host" (something like this)
I am just trying to download html from a secure website page but all the time it shows the error. So I tried to download any web page but still not working. Is it any server configuration problem?
This error can occur if you access a website via HTTP but it's trying to redirect you to HTTPS
So if your command was
wget http://url
Try changing it to
wget https://url
I encountered a similar issue today, our IT team suggests to use "https" over "http" in the url and use "wget --no-check-certificate", it worked for me.
Websites may stop serving the unencrypted http transfer at some point, which might lead to the issue.
This following command works form me.
wget -O test.html http://url --auth-no-challenge --force-directories
try with "sudo" privilege, it worked for me.
sudo wget url

wget, curl or whatever: get filename only in case of 302 redirect

I want to download some files but first check if I got them already.
Problem is, I don't have their real URLs but they are behind
a 302 redirection. See the wget output:
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: ./epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub [following]
--2016-02-26 19:38:29-- http://www.ebooksgratuits.com/epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub
Now this "./epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub"
string is exactly what I would like to have, but WITHOUT downloading,
because I want to check if I have the file already and can avoid downloading.
Is it possible to tell wget, curl or whatever tool
to give me that local path without downloading?
It seems this does the job:
curl -w "%{redirect_url}" URL
Without the -L option, it doesn't download from the new place,
and with -w "%{redirect_url}" it prints the desired redirect URL.
Still have to investigate in order to be sure...

How to download a page with wget but ignore 404 error messages if the page does not exist?

Is there any way to have wget ignore HTTP error response codes when downloading a URL or spidering a webpage?
Assuming I understood what you mean by "ignoring errors", you can try the --content-on-error argument. According to wget manual, it will force wget to skip status error codes\.

Issue processing RSS feed with Perl/CURL

I have this RSS feed URL:
http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php
A client is trying to access to this RSS programmatically via PERL like this:
# Fetch the content available in source HTTP URL
`curl -g --compressed "$source_url" > $tempRSSFile`;
Where $source_url is http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php
But they said that they couldn't access the feed this way with my URL, I know nothing about PERL so, you guys could point me in the right direction to make a compatible URL for the feed?
Thanks a lot!
The problem has nothing to do with perl. If you run the curl command from cmdline, then you get a Error 406 - Not Acceptable error. One possibility is to trick mod_security by using another User-Agent header. This works right now:
curl --user-agent Mozilla/5.0 -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > /tmp/feed.rss
But better, as amon already said, is to fix the server and allow RSS download also for curl.

How to skip selected url while mirroring site with wget

I have the following problem. I need to mirror password protected site. Sounds like simple task:
wget -m -k -K -E --cookies=on --keep-session-cookies --load-cookies=myCookies.txt http://mysite.com
in myCookies.txt I am keeping proper session cookie. This works until wget come accross logout page - then session is invalidated and, effectively, further mirroring is usless.
W tried to add --reject option, but it works only with file types - I can block only html file download or swf file download, I can't say
--reject http://mysite.com/*.php?type=Logout*
Any ideas how to skip certain URLs in wget? Maybe there is other tool that can do the job (must work on MS Windows).
What if you first download (or even just touch) the logout page, and then
wget --no-clobber --your-original-arguments
This should skip the logout page, as it has already been downloaded
(Disclaimer: I didn't try this myself)
I have also encountered this problem and later solved it like this: "--reject-regex logout", more:wget-devTips