there is a url:
http://118.26.57.16/1Q2W3E4R5T6Y7U8I9O0P1Z2X3C4V5B/218.11.178.160/edge.v.iask.com/95687694.hlv?KID=sina,viask&Expires=1358956800&ssig=WHgIi1wQOW&wsiphost=ipdbm
you can download it in chrome or firefox ,why i cann't download it in :
wget -c http://118.26.57.16/1Q2W3E4R5T6Y7U8I9O0P1Z2X3C4V5B/218.11.178.160/edge.v.iask.com/95687694.hlv?KID=sina,viask&Expires=1358956800&ssig=WHgIi1wQOW&wsiphost=ipdbm
Because of the special characters in the URL ('&') you need to put the URL in quotation marks:
wget -c "http://118.26.57.16/1Q2W3E4R5T6Y7U8I9O0P1Z2X3C4V5B/218.11.178.160/edge.v.iask.com/95687694.hlv?KID=sina,viask&Expires=1358956800&ssig=WHgIi1wQOW&wsiphost=ipdbm"
You could alternatively escape the special characters, but wrapping the URL in quotation marks is probably easiest.
Related
I want to download all files matching a certain pattern from https://download.osmand.net/list.php.
For example running
wget -nd -r -A 'list.php, *Sweden_*' https://download.osmand.net/list.php
downloads only list.php.
If I use
wget -nd -r -A 'list.php, *' https://download.osmand.net/list.php
instead all files are downloaded.
What is wrong with my acclist in the first example?
I didn't know myself, researching for this answer I learned something new.
wget docs says:
‘-A acclist --accept acclist’
‘-R rejlist --reject rejlist’
Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A ".mp3"’ or ‘-A '*.mp3'’.
‘--accept-regex urlregex’
‘--reject-regex urlregex’
Specify a regular expression to accept or reject the complete URL.
So it seems that -A can be used to check extensions, but not the full pattern of the link. To search the full pattern, you need the --accept-regex urlregex option.
The following command worked for me.
wget -nd -r --accept-regex 'Sweden_' https://download.osmand.net/list.php
I've created a direct link to a file in box:
The previous link is to the browser web interface, so I've then shared with a direct link:
However, if I download the file with a wget I receive garbage.
How can I download the file with wget?
I was able to download the file by making the link public, then replacing /s/ in the url with /shared/static
So my final command was:
curl -L https://MYUNI.box.com/shared/static/EXAMPLEtzwosac6pz --output myfile.zip
This can probably be modified for wget.
I might be a bit late to the party, but FWIW:
I tried to do the same things in order to download a folder.
I went to the box UI and opened the browser's network tab on the developer tools.
Then I clicked on download and copied as cURL the first link generated, it was something like (removed many headers and options for readability)
curl 'https://app.box.com/index.php?folder_id=122215143745&rm=box_v2_zip_folder'
The response of this request is a json object containing a link for downloading the folder:
{
"use_zpdl": "true",
"result": "success",
"download_url": <somg long url>,
"progress_reporting_url": <some other url>
}
I then executed wget -L <download_url> and was able to download the file using wget
The solution was to add the -L option to follow the HTTP redirect:
wget -v -O myfile.tgz -L https://ibm.box.com/shared/static/xxxxx.tgz
What you can do in 2022 is something like this:
wget "https://your_university.app.box.com/index.php?rm=box_download_shared_file&vanity_name=your_private_name&file_id=f_your_file_id"
You can find this link in the POST method in an incognito under Google Chrome's network tab. Note that the double quotes escape characters.
I download an HTML page and its files via Wget on Windows:
wget -m -k -p -np --html-extension
That HTML content has a lot of URLs with special characters (example: Chp1).
There are two issues:
Inside the HTML content, URLs (including special character's) become some random words:
Expectation:
Chp1
Actual:
Chp1
Filename is random words.
The second issue can be solved by adding --restrict-file-names=nocontrol.
How do I solve the first one? Is this Windows version a problem?
Obviously, inside HTML, it converts URLs with special characters to something...
Your problem comes from the fact that Windows will still treat your UTF-8 characters as Latin-1 characters, even with the --restrict-file-names=nocontrol command line argument.
GNU's site documents this bug here, and it is still unfortunately an issue for Windows users to this day. Your command would work inside a Linux environment however.
I have a MoinMoin site which I've inherited from a previous system
administrator. I'd like to shut it down but keep a static copy of the
content as an archive, ideally with the same URLs. At the moment I'm
trying to accomplish this using wget with the following parameters:
--mirror
--convert-links
--page-requisites
--no-parent
-w 1
-e robots=off
-user-agent="Mozilla/5.0"
-4
This seems to work for getting the HTML and CSS, but it fails to
download any of the attachments. Is there an argument I can add to wget
which will get round this problem?
Alternatively, is there a way I can tell MoinMoin to link directly to
files in the HTML it produces? If I could do that then I think wget
would "just work" and download all the attachments. I'm not bothered
about the attachment URLs changing as they won't have been linked to
directly in other places (e.g. email archives).
The site is running MoinMoin 1.9.x.
My version of wget:
$ wget --version
GNU Wget 1.16.1 built on linux-gnu.
+digest +https +ipv6 +iri +large-file +nls +ntlm +opie -psl +ssl/openssl
The solution in the end was to use MoinMoin's export dump functionality:
https://moinmo.in/FeatureRequests/MoinExportDump
It doesn't preserve the file paths in the way that wget does, but has the major advantage of including all the files and the attachments.
I want to replace a URL present in an html file for the shortcut icon, using c. I use sed to replace the url but the command is giving an error as it can't read the icon even if the icon is present at the specified location.
If I manually replace the URL, it's working fine.
My command is:
sed -i '/<link id=/c\\<link id='test' rel='shortcut icon' href='path_of_icon' type='image/x-icon'/>' path_of_html_file
sed -i "s|<link id=|<link id='test' rel='shortcut icon' href='path_of_icon' type='image/x-icon'|" path_of_html_file
You use only ' so shell mix content and interpretation
you forget the first s for replacment order
you use / as separtor of pattern but /is also in your pattern
If you want to use sed 'change' command to replace whole line containing "link id=..." then why not putting it into a script file and source via -f ? That would help with quoting issue (I believe your error is due to quoting, as already mentioned):
s.sed:
/<link id=/c\
<link id='test' rel='shortcut icon' href='path_of_icon' type='image/x-icon'/>
then
sed -i -f s.sed path_of_html_file
It's perfectly possible doing that on command line, but everything that deals with quotes inside other quotes can get pretty much ugly on a cmdline.