wget gets Read error (Connection reset by peer) in headers - wget

I searched a lot but no solution helped me with this problem where I get this error constantly.
HTTP request sent, awaiting response... Read error (Connection reset
by peer) in headers. Retrying.
I tried the following inputs.
wget url
wget -O url
wget -O url username="user" password="pass" host="host" (something like this)
I am just trying to download html from a secure website page but all the time it shows the error. So I tried to download any web page but still not working. Is it any server configuration problem?

This error can occur if you access a website via HTTP but it's trying to redirect you to HTTPS
So if your command was
wget http://url
Try changing it to
wget https://url

I encountered a similar issue today, our IT team suggests to use "https" over "http" in the url and use "wget --no-check-certificate", it worked for me.
Websites may stop serving the unencrypted http transfer at some point, which might lead to the issue.

This following command works form me.
wget -O test.html http://url --auth-no-challenge --force-directories

try with "sudo" privilege, it worked for me.
sudo wget url

Related

wget, curl or whatever: get filename only in case of 302 redirect

I want to download some files but first check if I got them already.
Problem is, I don't have their real URLs but they are behind
a 302 redirection. See the wget output:
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: ./epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub [following]
--2016-02-26 19:38:29-- http://www.ebooksgratuits.com/epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub
Now this "./epub/balzac_37_un_grand_homme_de_province_a_paris.plain.epub"
string is exactly what I would like to have, but WITHOUT downloading,
because I want to check if I have the file already and can avoid downloading.
Is it possible to tell wget, curl or whatever tool
to give me that local path without downloading?
It seems this does the job:
curl -w "%{redirect_url}" URL
Without the -L option, it doesn't download from the new place,
and with -w "%{redirect_url}" it prints the desired redirect URL.
Still have to investigate in order to be sure...

How to download a page with wget but ignore 404 error messages if the page does not exist?

Is there any way to have wget ignore HTTP error response codes when downloading a URL or spidering a webpage?
Assuming I understood what you mean by "ignoring errors", you can try the --content-on-error argument. According to wget manual, it will force wget to skip status error codes\.

Issue processing RSS feed with Perl/CURL

I have this RSS feed URL:
http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php
A client is trying to access to this RSS programmatically via PERL like this:
# Fetch the content available in source HTTP URL
`curl -g --compressed "$source_url" > $tempRSSFile`;
Where $source_url is http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php
But they said that they couldn't access the feed this way with my URL, I know nothing about PERL so, you guys could point me in the right direction to make a compatible URL for the feed?
Thanks a lot!
The problem has nothing to do with perl. If you run the curl command from cmdline, then you get a Error 406 - Not Acceptable error. One possibility is to trick mod_security by using another User-Agent header. This works right now:
curl --user-agent Mozilla/5.0 -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > /tmp/feed.rss
But better, as amon already said, is to fix the server and allow RSS download also for curl.

wget returns only (an expected) "401 unauthorized" but no html

In using a browser to test our website, I enter a purposefully incorrect username/password, and get an html page back from the application telling me the login has failed and to "Please check my username & password, then try again". Viewing the source for this page, I can't find a "401" embedded anywhere.
But mimicking the above using wget:
wget http://servername:8011/ui/login.do --post-data="loginId=NotAUser&password=NotAPassword" -U Mozilla -o log.txt
the output file contains "401 Unauthorized", but none of the html mentioned above.
Is there a way to get the html page I was expecting, and just check for the 401 return code with a "$?" test?
The "401 Unauthorised" you're getting is in the headers of the response from the server. A 401 response doesn't have an HTML - it's just an error code.
The page that you describe in the browser is actually generated by the browser, not sent back from the server. Browser vendors generate meaningful error pages rather than just displaying "401 Unauthorized" or "404 Page Not Found" to the user. There's no way to get the HTML code you're seeing using wget because it's not part of the HTTP conversation.
wget used to stop on a non-200 response code and not retrieve the body of the reply. It has been updated to correct this behaviour and versions 1.14-1 and later should work as expected:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=247985
If you're stuck with an older version or can't get it to work, why not try curl instead?
https://superuser.com/questions/253826/how-to-use-wget-to-download-http-error-pages

wget can't download - 404 error

I tried to download an image using wget but got an error like the following.
--2011-10-01 16:45:42-- http://www.icerts.com/images/logo.jpg
Resolving www.icerts.com... 97.74.86.3
Connecting to www.icerts.com|97.74.86.3|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-10-01 16:45:43 ERROR 404: Not Found.
My browser has no problem loading the image.
What's the problem?
curl can't download either.
Thanks.
Sam
You need to add the referer field in the headers of the HTTP request. With wget, you just need the --header arg :
wget http://www.icerts.com/images/logo.jpg --header "Referer: www.icerts.com"
And the result :
--2011-10-02 02:00:18-- http://www.icerts.com/images/logo.jpg
Résolution de www.icerts.com (www.icerts.com)... 97.74.86.3
Connexion vers www.icerts.com (www.icerts.com)|97.74.86.3|:80...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: 6102 (6,0K) [image/jpeg]
Sauvegarde en : «logo.jpg»
I had the same problem with a Google Docs URL. Enclosing the URL in quotes did the trick for me:
wget "https://docs.google.com/spreadsheets/export?format=tsv&id=1sSi9f6m-zKteoXA4r4Yq-zfdmL4rjlZRt38mejpdhC23" -O sheet.tsv
You will also get a 404 error if you are using ipv6 and the server only accepts ipv4.
To use ipv4, make a request adding -4:
wget -4 http://www.php.net/get/php-5.4.13.tar.gz/from/this/mirror
I had same problem.
Solved using single quotes like this:
$ wget 'http://www.icerts.com/images/logo.jpg'
wget version in use:
$ wget --version
GNU Wget 1.11.4 Red Hat modified
Wget 404 error also always happens if you want to download the pages from Wordpress-website by typing
wget -r http://somewebsite.com
If this website is built using Wordpress you'll get such an error:
ERROR 404: Not Found.
There's no way to mirror Wordpress-website because the website content is stored in the database and wget is not able to grab .php files. That's why you get Wget 404 error.
I know it's not this question's case, because Sam only wants to download a single picture, but it can be helpful for others.
Actually I don't know what is the reason exactly, I have faced this like of problem.
if you have the domain's IP address (ex 208.113.139.4), please use the IP address instead of domain (in this case www.icerts.com)
wget 192.243.111.11/images/logo.jpg
Go to find the IP from URL https://ipinfo.info/html/ip_checker.php
I want to add something to #blotus's answer,
In case adding the referrer header does not solve the issue, May be you are using the wrong referrer (Sometimes the referrer is different from the URL's domain name).
Paste the URL on a web browser and find the referrer from developer tools (Network -> Request Headers).
I met exactly the same problem while setting up GitHub actions with Cygwin. Only after I used wget --debug <url>, I realized that URL is appended with 0xd symbol which is \r (carriage return).
For this kind of problem there is the solution described in docs:
you can also use igncr in the SHELLOPTS environment variable
So I added the following lines to my YAML script to make wget work properly, as well as other shell commands in my GHA workflow:
env:
SHELLOPTS: igncr