wget :: rename downloaded files and only download if newer

wget :: rename downloaded files and only download if newer - wget

I am trying to use wget to download a file under a different local name and only download if the file on the server is newer.
What I thought I could do was use the -O option of wget so as to be able to choose the name of the downloaded file, as in:
wget http://example.com/weird-name -O local-name
and combine that with the -N option that doesn't download anything except if the timestamp is newer on the server. For reasons explained in the comments below, wget refuses to combine both flags:
WARNING: timestamping does nothing in combination with -O. See the manual
for details.
Any ideas on succinct work-arounds ?

Download it, then create a link
wget -N example.com/weird-name
ln weird-name local-name
After that you can run wget -N and it will work as expected:
Only download if newer
If a new file is downloaded it will be accessible from either name, without
costing you extra drive space

If using other tool is possible in your case, I recommend the free, open source lwp-mirror:
lwp-mirror [-options] <url> <file>
It works just as you wish, with no workarounds.
This command is provided by the libwww-perl package on Ubuntu and Debian among other places.
Note that lwp-mirror doesn't support all of wget's other features. For example, it doesn't allow you to set a User Agent for the request like wget does.

Related

Installing cpan or cpanm modules on a behind-firewall machine with no Internet connection

I've already read related threads like these, but they do not fully capture our situation.
This is on a firewalled machine. No net access. We can ftp files to folders and install modules from there.
We have CHMOD 777 for our users on some folders. We can install Perl modules if we locally build them by downloading the relevant .pm files. But when these files cannot install, we do not have any cpan or cpanm.
I'd like to install, for example, HTML::Restrict. If I do the download + install thing, the Restrict.pm gives me this error:
/lib/HTML/Restrict.PM:328: Unknown command paragraph "=encoding UTF-8"
Reading a bit online suggests that this could be an old Perl problem. We use 5.8.x. Our own dev machines have the luxury of 5.16.x and internet access so installing module is a cinch. Anyway, one of my older machines also has 5.8.x, and installing the module via cpanminus worked there (with internet).
So, question: is it possible to install "cpanminus" (cpanm) through FTP, then upload specific module files to the server through FTP too, and then go into shell and install modules via cpanm by pointing it to respective .pm files?
Thank you for any pointers.

You should take a look at perldoc perlmodinstall which goes into detail about how to install a module from its distribution. It follows what should be a familiar incantation
Decompress
Unpack
Build
Test
Install
Assuming you're on a Linux system, this commonly takes take the form of
gzip -d My-Module-Distribution.tar.gz
tar -xof My-Module-Distribution.tar
perl Makefile.PL
make
make test
make install
But after the Unpack stage you will often find a README file or other text file that will describe any unusual steps to be taken
Clearly some of these steps can be combined. For instance, most people will probably want to use
tar -xvfz My-Module-Distribution.tar.gz
to avoid having to invoke gzip separately. Likewise, the make system will force a build phase as a prerequisite if you use just
make test
without the preceding make
The linked document has a lot to say about how to install on other platforms, should you not be running a Linux variant

I still don't really understand your thinking, but you can get a stand-alone version of cpanm using curl. For instance
curl -sS --location https://cpanmin.us/ --output cpanm
then you should be able to just copy it to your target machine, put it on your PATH, and do
cpanm HTML-Restrict-2.2.2.tar.gz
but I doubt if you will find any change to the specific errors you are getting

wget on Windows replaces path characters in --directory-prefix

I'm trying to download a directory and all its subdirectories from a website, using wget.
Reading all other SO questions I arrived at this:
wget -nH --recursive --no-parent --cut-dirs=5 --reject "index.html*" --directory-prefix="c:\temp" http://blahblah.com/directory/
However, no mather how I try to formulate the c:\temp, wget always creates "#5Ctemp" in the current directory and does the download in that directory. I check the documentation but to no avail.
Preferably I would also be able to use an environment variable as --directory-prefix, eg
--directory-prefix=%PREFIX%

Looks like the version of wget you're using (1.8.2) is either buggy or too old. It definitely works with newer versions, get one here:
wget 1.11.4 from gnuwin32
wget 1.14 from osspack32
wget 1.15 from eternallybored.org
For completeness, here's a link to the wget wiki download section.

Using downloaded wget

I downloaded the source code of wget using apt-get source wget. I want to modify it a little, then use this wget rather than the one I'm using in /usr/bin/wget. How can I do that?

apt-get source wget is retrieving your distribution's source code of wget.
You may want to work on the genuine upstream wget source, which you can get (with some wget or some browser) by following links from http://www.gnu.org/software/wget/
Then you configure, build and install - usually with ./configure; make; sudo make install but the details may vary from package to package. You should look into files named README and INSTALL
You could also be interested by libcurl
Notice that the GPL license requires more or less that you publish your patch (in source form) if you redistribute your patched version of your improved wget software binary

Why wget doesn't get java file recursively?

I am trying to download all the folder structure and files under a folder in a website using wget.
Say there is a website like:
http://test/root. Under root it is like
/A
/A1/file1.java
/B
/B1/file2.html
My wget cmd is:
wget -r http://test/root/
I got all the folders and the html files, but no java files. Why is that?
UPDATE1:
I can access the file in the browser using:
http://test/root/A/A1/file1.java
I can also download this individual file using:
wget http://test/root/A/A1/file1.java

wget can just follow links.
If there is no link to the files in the subdirectories, then wget will not find those files. wget will not guess any file-names, it will not test exhaustively for filenames and wget does not practice black magic.

Just because you can access the files in a browser does not mean that wget can necessarily retrieve it. Your browser has code able to recognize the directory structure, wget only knows what you tell it.
You can try adding the java file to an accept list first, perhaps that's all it needs:
wget -r -A "*.java" http://text/root
But it sounds like you're trying to get a complete offline miror of the site. Let's start—as with any command we're trying to figure out—with man wget:
Wget can follow links in HTML, XHTML, and CSS pages, to create local
versions of remote web sites, fully recreating the directory structure
of the original site. This is sometimes referred to as "recursive
downloading." While doing that, Wget respects the Robot Exclusion
Standard (/robots.txt). Wget can be instructed to convert the links in
downloaded files to point at the local files, for offline viewing.
What We Need
1. Proper links to the file to be downloaded.
In your intex.html file, you must provide a link to the Java file, otherwise wget will not recognize it as needing to be downloaded. For your current directory structure, ensure file2.html contains a link to the java file. Format it to link to a directory above the current one:
JavaFile
However, if the file1.java is not sensitive and you routinely do this, it's cleaner and less code to put an index.html file in your root directory and link to:
JavaFile
If you only want the Java files and want to ignore HTML, you can use --reject like so:
wget -r -nH --reject="file2.html"
### Or to reject ALL html files ###
wget -r -nH --reject="*.html"
This will recursively (-r) go through all directories starting at the point we specify.
2. Respect robots.txt
Ensure that if you have a /robots.txt file in your */root/* directory it does not prevent crawling. If it does, you need to instruct wget to ignore it using the following option in your wget command by adding:
wget ... -e robots=off http://test/root
3. Convert remote links to local files.
Additionally, wget must be instructed to convert links into downloaded files. If you've done everything above correctly, you should be fine here. The easiest way I've found to get all files, provided nothing is hidden behind a non-public directory, is using the mirror command.
Try this:
wget -mpEk http://text/root/
# If robots.txt is present:
wget -mpEk robots=off http://text/root/
Using -m instead of -r is preferred as it doesn't have a maximum recursion depth and it downloads all assets. Mirror is pretty good at determining the full depth of a site, however if you have many external links you could end up downloading more than just your site, which is why we use -p -E -k. All pre-requisite files to make the page, and a preserved directory structure should be the output. -k converts links to local files.
Since you should have a link set up, you should get a file1.java inside the ../A1/ directory. However this command should work as is without a specific link being placed to the java file inside of your index.html or file2.html but it doesn't hurt as it preserves the rest of your directory. Mirror mode also works with a directory structure that's set up as an ftp:// also.
General rule of thumb:
Depending on the side of the site you are doing a mirror of, you're sending many calls to the server. In order to prevent you from being blacklisted or cut off, use the wait option to rate-limit your downloads. If it's a site the side of the one you posted you shouldn't have to, but any large site you're mirroring you'll want to use it:
wget -mpEk --no-parent robots=off --random-wait http://text/root/

How to convert WOFF to TTF/OTF via command line?

I know about services like Online Font Converter, but I am interested in offline solution, preferably over command line. Does anyone know a tool or workflow how to convert WOFF to OTF/TTF offline?

I wrote a simple tool for that:
https://github.com/hanikesn/woff2otf
Currently only tested with ttf files.

Here is the reference code for making WOFF files: http://people.mozilla.org/~jkew/woff/ I have a mirror: https://github.com/samboy/WOFF
To compile and install, make sure you have the zlib development libraries installed (e.g. in CentOS6 yum -y install zlib-devel as root), then
git clone https://github.com/samboy/WOFF
cd WOFF
make
Then, as root:
cp sfnt2woff /usr/local/bin
Once this is done, to make a webfont, enter the directory with the .ttf file, then run sfnt2woff
sfnt2woff Chortle2014f.ttf
This creates a Chortle2014f.woff webfont file. Replace “Chortle2014f.ttf” with the name of the actual webfont to convert.
The first link I provide has Windows and MacOS binaries for people who do not wish to install a compiler.
Here is the reference code for making WOFF2 files: https://github.com/google/woff2 Note that this code will not install in CentOS6, but compiles and installs just fine in CentOS7:
git clone --recursive https://github.com/google/woff2.git
cd woff2
make clean all
woff2 font generation is similar:
woff2_compress Chortle2014f.ttf

I didn't like the fact that the current best answer is a Python script, and there also appear to be cases of people saying it doesn't work. In addition, none of the current answers seem to make mention of compiling WOFF converters with the zopfli compression algorithm, which is superior to the standard zlib algorithm that other tools use. For these reasons I decided to go the "proper" (i.e. non-script) route and add my own answer in the process.
Note: the compilation process for both of the below utilities is very easy, and made even easier by simply copying and running the snippets of code I've provided below, but they do still require a working compiler. If you haven't compiled software from source before, you may need to setup a compiler environment first. If you're using Cygwin, you can follow the first part of my answer here to set up the MinGW-w64 cross-compiler.
WOFF CLI converter (with ZOPFLI compression)
First, compile and install sfnt2woff1 by pasting all of the following into a terminal and pressing Enter:
git clone https://github.com/bramstein/sfnt2woff-zopfli.git woff &&
cd woff &&
make &&
chmod 755 woff2sfnt-zopfli sfnt2woff-zopfli &&
mv woff2sfnt-zopfli sfnt2woff-zopfli /usr/local/bin &&
rm -rf ../woff
Once the tool has been compiled and installed, convert a TTF or OTF file to WOFF by running:
sfnt2woff-zopfli <inputfile>.ttf
You can also use the -n option to increase the number of iterations the program is run in, increasing compression at the cost of conversion time (the default number of iterations is 15).
To convert all files in the current directory to WOFF:
for i in *; \
do sfnt2woff-zopfli.exe "$i"; \
done
WOFF2 CLI converter (with Brotli compression)
First, compile and install Google's woff2 tools by pasting all of the following into a terminal and pressing Enter:
git clone --recursive https://github.com/google/woff2.git &&
cd woff2 &&
make clean all &&
mv woff2_compress woff2_decompress woff2_info /usr/local/bin &&
rm -rf ../woff2
Once the tool has been compiled and installed, convert a single TTF or OTF file to WOFF2 by running:
woff2_compress.exe <inputfile>.ttf
To convert all files in the current directory to WOFF2:
for i in *; \
do woff2_compress.exe "$i"; \
done
You can even convert a WOFF2 file back to TTF or OTF:
woff2_decompress.exe <inputfile>.woff2
1 Note that SFNT here refers to the SFNT table format that both TTF and OTF font formats are built around.

Ive been looking for this too but, sorry i couldn't find an offline one but i found this:
http://orionevent.comxa.com/woff2otf.html - no longer available
its really good
EDIT: Found a command line tool
https://superuser.com/questions/192146/converting-from-woffweb-open-font-format

I used the python script linked above by
barethon to write an online javascript converter of WOFF to OTF

I realise this thread has been inactive for some time now, but with the help of a few stackoverflow users, I was able to use the above mentioned python script [woff2otf.py by #hanikesn] to create a workflow allowing batch conversion of woff files.
If not for the original poster's use, then for others who come across this thread in search of the same thing, check out my thread for details on how to do this:
Modify Python Script to Batch Convert all "WOFF" Files in Directory
Even if you don't need to batch convert, onlinefontconverter.com produces unreliable results, and everythingfonts.com has a 0.4 MB limit on conversions unless you upgrade to a paid account, and both are needlessly time consuming compared to offline solutions.
Good luck!

EverythingFonts has an online tool that appears to work well.
If you wish to do it offline, following Erik Tjernlund's answer on Super User, you can downloaded the source and compile executables of woff2sfnt and sfnt2woff.
The latest version as of this writing was from 2009/09/09. Unfortunately I've discovered that it doesn't appear to work for all WOFF files, sometimes complaining of a bad signature and sometimes simply giving a broken OTF file.

On a Mac with Homebrew it's simpler than the other mentioned approaches.
.woff2 to .ttf
brew install woff2
woff2_decompress somefont.woff2
This will leave you with somefont.ttf in the same directory.
.woff to .ttf
Converting WOFF (not woff2) is a little trickier, woff2_decompress probably won't handle it. You would first want to convert the .woff file to .woff2, then use the woff2_decompress command to turn that into .ttf file.
There's a brew tap that can be used to install sfnt2woff, which can be used to convert your .woff to .woff2.
brew tap bramstein/webfonttools;
brew install sfnt2woff;
sfnt2woff somefont.woff;
woff2_decompress somefont.woff2