wget not working with 404 not found - wget

The same exact url has no problems downloading from chrome, but when I try with wget I get the error:
xxx#yyy:~/dataset/imagenet_synsets$ wget http://image-net.org/download/synset?wnid=xxx&username=xxx&accesskey=xxx&release=latest&src=stanford
[1] 5842
[2] 5843
[3] 5844
[4] 5845
[2] Done username=xxx
xxx#xxx:~/dataset/imagenet_synsets$ --2017-05-12 11:11:31-- http://image-net.org/download/synset?wnid=xxx
Resolving image-net.org (image-net.org)... 171.64.68.16
Connecting to image-net.org (image-net.org)|171.64.68.16|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2017-05-12 11:11:32 ERROR 404: Not Found.

This is because wget does not follow redirects. I might suggest using curl instead.
Use the -L flag to make curl follow redirects, and -O or -o <filename> to redirect output to a file:
curl -L -O http://image-net.org/download/synset\?wnid\=xxx\&username\=xxx\&accesskey\=xxx\&release\=latest\&src\=stanford
or (saving to myfile.html):
curl -L -o myfile.html http://image-net.org/download/synset\?wnid\=xxx\&username\=xxx\&accesskey\=xxx\&release\=latest\&src\=stanford

Related

Executing wget command 404 Not Found error is being thorwn

when i execute wget command below exception is being thrown. ANy suggestions?
command:
wget --user 'myusername' --password 'mypassword' -r -A '*.jpg' 'http://123.12.12.123:8090/file/settings'
exception:
--2021-04-18 20:17:46-- http://123.12.12.123:8090/file/settings
Connecting to 123.12.12.123:8090... connected.
Proxy request sent, awaiting response... 404 Not Found
2021-04-18 20:17:46 ERROR 404: Not Found.

How to confirm Solr is running from the command line?

We have a few servers that are going to be rebooted soon and I may have to restart Apache Solr manually.
How can I verify (from the command line) that Solr is running?
The proper way is to use Solr's STATUS command. You could parse its XML response, but as long as it returns something to you with an HTTP status of 200, it should be safe to assume it's running. You can perform an HTTP HEAD request using curl with:
curl -s -o /dev/null -I -w '%{http_code}' http://example.com:8983/solr/admin/cores?action=STATUS
NOTE: Also, you can add a -m <seconds> to the command to only wait so many seconds for a response.
This will make a request to the Solr admin interface, and print out 200 on success which can be used from a bash script such as:
RESULT=$(curl -s -o /dev/null -I -w '%{http_code}' http://example.com:8983/solr/admin/cores?action=STATUS)
if [ "$RESULT" -eq '200' ]; then
# Solr is running...
else
# Solr is not running...
fi
If you are on the same machine where Solr is running then this is my favourite:
$> solr status

How to set up cron using curl command?

After apache rebuilt my cron jobs stopped working.
I used the following command:
wget -O - -q -t 1 http://example.com/cgi-bin/loki/autobonus.pl
Now my DC support suggests me to change the wget method to curl. What would be the correct value in this case?
-O - is equivalent to curl's default behavior, so that's easy.
-q is curl's -s (or --silent)
--retry N will substitute for wget's -t N
All in all:
curl -s --retry 1 http://example.com/cgi-bin/loki/autobonus.pl
try run change with the full path of wget
/usr/bin/wget -O - -q -t 1 http://example.com/cgi-bin/loki/autobonus.pl
you can find the full path with:
which wget
and more, check if you can reach the destination domain with ping or other methods:
ping example.com
Update:
based on the comments, seems to be caused by the line in /etc/hosts:
127.0.0.1 example.com #change example.com to the real domain
It seems that you have restricted options in terms that on the server where the cron should run you have the domain pinned to 127.0.0.1 but the virtual host configuration does not work with that.
What you can do is to let wget connect by IP but send the Host header so that the virtual host matching would work:
wget -O - -q -t 1 --header 'Host: example.com' http://xx.xx.35.162/cgi-bin/loki/autobonus.pl
Update
Also probably you don't need to run this over the web server, so why not just run:
perl /path/to/your/script/autobonus.pl

wget not following links with spider

I am trying to check a page and all of its links as well as images.
The following is stopping after the initial page and I get very little output.
wget -v -r --spider -o /Users/SSSSS/Desktop/file21.txt http://www.WWWWWWW.com/SSSSSS/index.php

Can I use wget to check , but not download

Can I use wget to check for a 404 and not actually download the resource?
If so how?
Thanks
There is the command line parameter --spider exactly for this. In this mode, wget does not download the files and its return value is zero if the resource was found and non-zero if it was not found. Try this (in your favorite shell):
wget -q --spider address
echo $?
Or if you want full output, leave the -q off, so just wget --spider address. -nv shows some output, but not as much as the default.
If you want to check quietly via $? without the hassle of grep'ing wget's output you can use:
wget -q "http://blah.meh.com/my/path" -O /dev/null
Works even on URLs with just a path but has the disadvantage that something's really downloaded so this is not recommended when checking big files for existence.
You can use the following option to check for the files:
wget --delete-after URL
Yes easy.
wget --spider www.bluespark.co.nz
That will give you
Resolving www.bluespark.co.nz... 210.48.79.121
Connecting to www.bluespark.co.nz[210.48.79.121]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
200 OK
Yes, to use the wget to check , but not download the target URL/file, just run:
wget --spider -S www.example.com
If you are in a directory where only root have access to write in system. Then you can directly use wget www.example.com/wget-test using a standard user account. So it will hit the url but because of having no write permission file won't be saved..
This method is working fine for me as i am using this method for a cronjob.
Thanks.
sthx