wget download and rename files that originally have no file extension - wget

Have a wget download I'm trying to perform.
It downloads several thousand files, unless I start to restrict the file type (junk files etc). In theory restricting the file type is fine.
However there are lots of files that wget downloads without a file extension, that when manually opened with Adobe for example, are actually PDF's. These are actually the files I want.
Restricting the wget to filetype PDF does not download these files.
So far my syntax is wget -r --no-parent A.pdf www.websitehere.com
Using wget -r --no-parent www.websitehere.com brings me every file type, so in theory I have everything. But this means I have 1000's of junk files to remove, and then several hundred of the useful files of unknown file type to rename.
Any ideas on how to wget and save the files with the appropriate file extension?
Alternatively, a way restrict the wget to only files without a file extension, and then a separate batch method to determine the file type and rename appropriately?
Manually testing every file to determine the appropriate application will take a lot of time.
Appreciate any help!

wget has an --adjust-extension option, which will add the correct extensions to HTML and CSS files. Other files (like PDFs) may not work, though. See the complete documentation here.

Related

Wget creates some of directories only at the end of the mirroring

I'm currently mirroring www.typingstudy.com
wget --mirror --page-requisites --convert-link --no-clobber --no-parent --domains typingstudy.com https://www.typingstudy.com/
And wget creates directories, which contain html files on the site, only at the end of the scrapping and, accordingly, when it tries to download those html files, before the creation of the directories in which this files are located, wget says:powershell output
Sometimes it downloads only 1 file, like at this example, to the directory "part" and refuses to see this directory while trying downloading all other ~10 files from this exact directory, saying that this directory does not exist: enter image description here
Can someone help me understand what's wrong with my commands? Or is it a bug of wget?(Probably not)
Thanks in advance.
When I start the downloading process again - everything is neat, wget downloads all those ~10 other html files to created in the previous download session ("part") directories. So the problem is that I need to start the downloading 2 times, at least in case of this site.
And I totally do not understand why this is happening.

Opening all files of a certain filetype in vscode

For my website, I have one "root" folder with a bunch of subfolders containing many different types of files. Example:
root folder
subfolder
subfolder
HTML file
other files...
subfolder
HTML file
other files...
Many of these subfolders have HTML files in them. I am wondering if there are any commands I can run to open all of the HTML files in vscode, or of any filetype for that matter.
I am aware that cmd+control+p allows the selection of a specific file by search, but is there any modification of that for all files of a file type? It is probably possible to write a bash script to do this, but I am not well versed in bash and I am wondering if there are any built-in ways to do this on vscode or plugins I can install to do this.
It also should be noted I am on a mac.
Here's on option to find all files that match a given filename specification (in*.html in my example) and send that list of files to VS Code via xargs:
find . -iname "in*\.html" -print0 | xargs -0 code

Run exiftool across all file types

I am using exiftool to recursively search across directories containing hundreds of files. At present, it is returning the results to a .csv file. Manually comparing my results, there are some files within the target directory that do not have a valid file extension. Nonetheless, when examining the raw data (or adding .jpeg as the extension), the files are indeed image files.
Is there any way to force exiftool to process all files regardless of what the file extension is or indeed whether it has a file extension?
Thanks
You will want to use the -ext option.
Add -ext "*" to your command to process all files.

Wget - Overwrite files that are different

So, I'm making an updater for my game using wget. The latest version of the game (with all it's files), is on my server. The thing is that I want wget to download only the files that are different from a directory on the server into the www folder in the root of the game files (This has to be recursive, since not all of the game's files are stored directly in that folder). So, when the command is being ran it should check to see if the file's hashsum (if possible, otherwise it should check the size) on the server matches the one in the game's files. If it doesn't, it should download the file from the server and replace the one in the game's directory. This way, the player won't have to re-download all of the game's files.
Here's the command I used for testing:
wget.exe -N -r ftp://10.42.0.1/gamupd/arpg -O ./www
The server is running on my local network, so I use that IP address.
The thing is that it doesn't save the contents of /gamupd/arpg to the www folder, instead it seems to copy the directory tree of arpg.
Maybe the timestamping flag will satisfy you. From wget -help:
-N, --timestamping don't re-retrieve files unless newer than
local

How to move a file in to zip uncompressed, with zip cmd tool

I'm try to determine how to use the zip cmd line tool to move a file (uncompressed) in to a zip of compressed files (ie I want a zip in the end with all files but one compressed, b/c the one file is another compressed file).
Anyone know how to do this?
It looks like you could use -n option to just store the files with defined extensions together with -g option to append the file to archive.
I didn't test it, but something like this should do the trick:
zip -gn .foo archive.zip myAddedFile.foo
Although documentation states that, by default, zip does not compress files with extensions in the list .Z:.zip:.zoo:.arc:.lzh:.arj, so if you are adding a file with one of those extensions you should be fine.
Documentation to the command is here
-m is what I wanted, moves the file(s) into a zip.