wget link containing question mark

wget link containing question mark - wget

I'm trying to download a .exe using the command line.
download link: https://go.microsoft.com/fwlink/?LinkId=691980&clcid=0x409
Doing wget <link> results in a file index.html#LinkId=691980&clcid=0x409
How do you deal with links that contain parameters at the end of the link? LinkId is necessary to download the correct .exe, so I can't just get rid of/ignore it.

Related

Wget creates some of directories only at the end of the mirroring

I'm currently mirroring www.typingstudy.com
wget --mirror --page-requisites --convert-link --no-clobber --no-parent --domains typingstudy.com https://www.typingstudy.com/
And wget creates directories, which contain html files on the site, only at the end of the scrapping and, accordingly, when it tries to download those html files, before the creation of the directories in which this files are located, wget says:powershell output
Sometimes it downloads only 1 file, like at this example, to the directory "part" and refuses to see this directory while trying downloading all other ~10 files from this exact directory, saying that this directory does not exist: enter image description here
Can someone help me understand what's wrong with my commands? Or is it a bug of wget?(Probably not)
Thanks in advance.
When I start the downloading process again - everything is neat, wget downloads all those ~10 other html files to created in the previous download session ("part") directories. So the problem is that I need to start the downloading 2 times, at least in case of this site.
And I totally do not understand why this is happening.

Wget to download software with a changing URL

Setting up a PowerShell script to auto-download applications with varying URLs.
I had a batch file script to download certain application to ensure my USB Toolkit was always updated. However I want to switch to powershell because I found it had the WGET command to download direct from a URL. What I was hoping, was to have the URL always get the latest version.
wget https://download.ccleaner.com/ccsetup556.exe -O ccleaner.exe
Note that the 556 in the URL is the variable I would like to always select the highest version.
They have two download pages, a direct link, and one that takes 2-5 seconds to download, however when I point wget to that page, it just downloads the HTML webpage.

wget download and rename files that originally have no file extension

Have a wget download I'm trying to perform.
It downloads several thousand files, unless I start to restrict the file type (junk files etc). In theory restricting the file type is fine.
However there are lots of files that wget downloads without a file extension, that when manually opened with Adobe for example, are actually PDF's. These are actually the files I want.
Restricting the wget to filetype PDF does not download these files.
So far my syntax is wget -r --no-parent A.pdf www.websitehere.com
Using wget -r --no-parent www.websitehere.com brings me every file type, so in theory I have everything. But this means I have 1000's of junk files to remove, and then several hundred of the useful files of unknown file type to rename.
Any ideas on how to wget and save the files with the appropriate file extension?
Alternatively, a way restrict the wget to only files without a file extension, and then a separate batch method to determine the file type and rename appropriately?
Manually testing every file to determine the appropriate application will take a lot of time.
Appreciate any help!

wget has an --adjust-extension option, which will add the correct extensions to HTML and CSS files. Other files (like PDFs) may not work, though. See the complete documentation here.

Rename the Directory Index of a web page downloaded with wget to index.html

I am currently using a wget command that is fairly complicated, but the essence of it is the -p and -k flags to download all the pre-requisites. How do I rename the main downloaded file to index.html?
For instance, I download a webpage
http://myawesomewebsite.com/something/derp.html
This will, for example, download:
derp.html
style.css
firstimage.png
secondimage.jpg
And maybe even an iFrame:
iframe.html
iframe-style.css
So now the question is how do I rename derp.html to index.html, without accidentally renaming iframe.html to index.html as well, given that I don't know what the name of the resolved downloaded file may be?
When I tried this method on a Tumblr page with URL http://something.tumblr.com/34324/post it downloaded as page.html.
I've tried the --output-document flag, but that results in nothing being downloaded at all.
Thanks!

This is what I ended up doing:
If there was no index.html found after downloading, I used Ruby to get the derp.html part of the URL, and then searched for derp.html and then renamed it to index.html.
It's not as elegant as I would like, but it works.

download by wget without specific folder site

how download site for viewing offline without specific folder
for example i want download the site without http://site.com/forum/ sub-directory

wget --help
might lead you to
-nH, --no-host-directories don't create host directories.
I'd try that first, but I'm not sure whether it will do what you want.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

wget link containing question mark - wget

Related

Wget creates some of directories only at the end of the mirroring

Wget to download software with a changing URL

wget download and rename files that originally have no file extension

Rename the Directory Index of a web page downloaded with wget to index.html

download by wget without specific folder site

Categories

Resources