Wget works for some websites but not for others?

Wget works for some websites but not for others? - wget

I am on centos 8 and using the wget command to download some files, I am able to do so on certain websites but not on others. here is an example that works for me
wget https://forums.centos.org/index.php
--2021-07-26 20:50:40-- https://forums.centos.org/index.php
Resolving forums.centos.org (forums.centos.org)... 35.178.235.168
Connecting to forums.centos.org (forums.centos.org)|35.178.235.168|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.php’
index.php [ <=> ] 49.80K 319KB/s in 0.2s
2021-07-26 20:50:41 (319 KB/s) - ‘index.php’ saved [50997]
and here is an example that doesn't
wget -d https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/_build/html/index.html
DEBUG output created by Wget 1.19.5 on linux-gnu.
Reading HSTS entries from /home/tuser1/.wget-hsts
URI encoding = ‘UTF-8’
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2021-07-26 20:27:16-- https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/_build/html/index.html
Certificates loaded: 147
Resolving www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)... 149.155.133.4
Caching www.bioinformatics.babraham.ac.uk => 149.155.133.4
Connecting to www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)|149.155.133.4|:443... Closed fd 3
failed: Connection timed out.
EDIT: I have tried pinging as well, I can ping yahoo.com but cannot ping google.com. Some websites are working some are not with ping as well.
I have disabled the firewall (firewalld) and tried to use the curl -O command as well to download but have not found a solution for this. Please let me know if there is any way to fix this

Related

In Minikube Buildroot OS wget: not an http or ftp url

I have Setup minikube in my Machine using Hyper-v in windows 10. All working fine, but when i tried to setup fannel network i execute following commannd.
wget http://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Oupput:-
Connecting to raw.githubusercontent.com (151.101.192.133:80)
wget: not an http or ftp url: https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
I tried some solution like to install wget and hash -r but not working.
Any idea or suggestion to solve this.
Thank you,

Note that the actual url your wget is trying to connect to is not an http but https url and the output you attached says:
wget: not an http or ftp url
which is true as https is neither http nor ftp url. It looks like your wget version supports only two mentioned protocols.
You can easily check it by issuing following commands:
wget -V | grep https
and
wget -V | grep ssl
I tried to reproduce it on a system possibly similar to the one you're using. For this purpose I created a buildroot Pod from advancedclimatesystems/docker-buildroot image:
kubectl run --generator=run-pod/v1 buildroot --image=advancedclimatesystems/docker-buildroot --command sleep 3600
and I attached to it by:
kubectl exec -ti buildroot /bin/sh
Once there, I tested out your wget command and it was successful. It's output in my system looks like this (note the 301 redirection to https url):
root#buildroot:~/buildroot# wget http://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
--2019-12-31 16:04:27-- http://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml [following]
--2019-12-31 16:04:27-- https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14416 (14K) [text/plain]
Saving to: 'kube-flannel.yml'
kube-flannel.yml 100%[=====================================================================================================================>] 14.08K --.-KB/s in 0s
2019-12-31 16:04:27 (53.3 MB/s) - 'kube-flannel.yml' saved [14416/14416]
As you can see it has built-in ssl and https support:
root#buildroot:~/buildroot# wget -V | grep ssl
+ntlm +opie +psl +ssl/openssl
-Wl,-z,relro -Wl,-z,now -lpcre -luuid -lidn2 -lssl -lcrypto -lpsl
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
root#buildroot:~/buildroot# wget -V | grep https
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls

Wget - downloading all files in a webpage?

I'm using this wget command to download all the .fits files from that URL:
wget -r -np -nd -l inf -A fits https://archive.stsci.edu/missions/tess/ete-6/tid/00/000/000/057/
This is based on an adaptation of this answer.
All I'm getting is a directory structures that mirrors the URI on the website all the way down to /057/, but there's no file.
If I add -nd, then I only get a robot.txt files which isn't very instructive, but still no files.
What am I not getting about how to use wget for this?
EDIT: based on Turgbek's answer below, I do see that the robot.txt file from that website actually has /missions/ in the "Disallow"... maybe this is what is preventing me from using the wget command? Is that the source of the problem? How can I get around that?

In robots.txt there's a statement:
Disallow: /missions/
Which your requested files are in. Since the url builds up as /missions/tess/ete-6/tid/00/000/000/057/ I believe that robots.txt is blocking you.
I've saved two of the files from that URL in my Raspberry Pi and ran a local test without robots.txt. With this command:
wget -r -np -nd -l inf -A fits 192.168.1.250/test/
It worked as intended and I've received both of the files.
--2018-05-03 23:46:51-- http://192.168.1.250/test/tess2019128220341-0000000005712108-0016-s_lc.fits
Reusing existing connection to 192.168.1.250:80.
HTTP request sent, awaiting response... 200 OK
Length: 2090880 (2.0M)
Saving to: `192.168.1.250/test/tess2019128220341-0000000005712108-0016-s_lc.fits'
100%[==============================================================================>] 2,090,880 3.77M/s in 0.5s
2018-05-03 23:46:51 (3.77 MB/s) - `192.168.1.250/test/tess2019128220341-0000000005712108-0016-s_lc.fits' saved [2090880/2090880]
--2018-05-03 23:46:51-- http://192.168.1.250/test/tess2019128220341-0000000005715814-0016-s_lc.fits
Reusing existing connection to 192.168.1.250:80.
HTTP request sent, awaiting response... 200 OK
Length: 2090880 (2.0M)
Saving to: `192.168.1.250/test/tess2019128220341-0000000005715814-0016-s_lc.fits'
100%[==============================================================================>] 2,090,880 4.61M/s in 0.4s
2018-05-03 23:46:52 (4.61 MB/s) - `192.168.1.250/test/tess2019128220341-0000000005715814-0016-s_lc.fits' saved [2090880/2090880]

why `wget` can not get redirection for certain website?

wget hangs there while it accesses the following website. But when I use a browser to access it, it will be redirected to https://nyulangone.org. Does anybody know why wget can not get redirected in this case? Thanks.
$ wget http://nyumc.org
--2018-02-20 20:27:05-- http://nyumc.org/
Resolving nyumc.org (nyumc.org)... 216.165.125.106
Connecting to nyumc.org (nyumc.org)|216.165.125.106|:80...

When I used wget on the site you mentioned, this is what I get:
--2018-02-21 21:16:38-- http://www.nyumc.org/
Resolving www.nyumc.org (www.nyumc.org)... 216.165.125.112
Connecting to www.nyumc.org (www.nyumc.org)|216.165.125.112|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 179 [text/html]
Saving to: ‘index.html’
index.html 100%[==================================>] 179 --.-KB/s in 0s
2018-02-21 21:16:38 (8.16 MB/s) - ‘index.html’ saved [179/179]
In the index.html file, which bears the logo of NYU Langone Medical Center, it says: "The following URL has been rejected for security concerns. If you believe you have received this message in error, please summit an incident with our helpdesk at 212-263-6868..." So, it may not redirect because the website can detect that you are a bot and not a browser. You could attempt to change the user agent string and other HTTP headers to avoid detection, but I'm not sure why you wouldn't just turn wget on https://nyulangone.org. Judging from information on archive.org, nyumc.org has been redirecting to other sites for at least the last 5 years. It was redirecting to http://www.med.nyu.edu until 2016, at which point it started redirecting to https://www.nyulangone.org.
I hope that helps.

WGET a Redmine CSV file

I am trying to get a CSV file from Redmine in a shell script. WGET is complaining about an unacceptable. Any ideas what the magical incantation is, or how to find it?
$ wget --no-check-certificate --accept csv https://username:password#company.com/redmine/issues.csv?utf8=%E2%9C%93&columns=all&description=1
Resolving company.com (company.com)... 192.168.1.45
Connecting to company.com (company.com)|192.168.1.45|:443... connected.
WARNING: The certificate of ‘company.com’ is not trusted.
WARNING: The certificate of ‘company.com’ hasn't got a known issuer.
HTTP request sent, awaiting response... 406 Not Acceptable
2017-04-04 10:14:20 ERROR 406: Not Acceptable.

You can try to replace --accept csv with --accept "*.csv". See the wget manual: https://www.gnu.org/software/wget/manual/wget.html#Recursive-Accept_002fReject-Options

Trying to recursively download zip files with wget fails with 401 error

I am trying to download all the zip files from the website http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/Paginas/ElectricasMensuales.aspx
Note that the .zip files listed have the route (By using right click over the link and selecting copy link address) in the form, for example the first one http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/2015/Abril_2015.zip
I have tried all the combinations listed here How to download multiple .zip files?, but none of them works. They either download an empty structure of directories, or I get this error message:
For example, this one
wget -r -l1 -nd -H -t1 -nd -N -np -A.zip -erobots=off -U mozilla --random-wait http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/2015
gives me the following error message:
Resolving www.minetad.gob.es (www.minetad.gob.es)... 193.146.1.81, 2001:720:438:400::81
Connecting to www.minetad.gob.es (www.minetad.gob.es)|193.146.1.81|:80... connected.
HTTP request sent, awaiting response... 302 Redirect
Location: http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/2015/Forms/AllItems.aspx [following]
--2017-02-05 18:52:40-- http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/2015/Forms/AllItems.aspx
Reusing existing connection to www.minetad.gob.es:80.
HTTP request sent, awaiting response... 401 Unauthorized
Username/Password Authentication Failed.
Note that I can download every file individually by using the complete url, like this:
wget http://www.minetad.gob.es/energia/balances/Publicaciones/ElectricasMensuales/2015/Abril_2015.zip

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Wget works for some websites but not for others? - wget

Related

In Minikube Buildroot OS wget: not an http or ftp url

Wget - downloading all files in a webpage?

why `wget` can not get redirection for certain website?

WGET a Redmine CSV file

Trying to recursively download zip files with wget fails with 401 error

Categories

Resources