Is Strawberry Perl safe? - perl

Strawberry Perl lists SHA1 digests on its download page.
However, looking at download page snapshots on archive.org, their SHA1 digests for the same perl version and build seem to change over time.
Example: in the download page snapshot from 2013-05-10, strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 86.8 MB long with an SHA1 digest of 3b9c4c32bf29e141329c3be417d9c425a7f6c2ff.
In the download page snapshot from 2017-02-14, the same strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 87.3 MB long with an SHA1 digest of 7f6da2c3e1b7a808f27969976777f47a7a7c6544.
And on the current download page, the same strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 91.0 MB long with an SHA1 digest of 00ba29e351e2f74a7dbceaad5d9bc20159dd7003
I thought they might have recompiled the package for some reason, but the current strawberry-perl-5.10.0.6-portable.zip has only one file dated later than 2009 (it's portable.perl), so this doesn't explain why the archive grew over time. Sadly, I don't have older zip files, so I have no way of knowing what changed inside the archive.
What's going on here? Why do past builds change over time?? I am kind of concerned that some hackers might be injecting malicious code or something into binary perl packages...
Is there a rational explanation here ? Thanks...

An hash such as a SHA1 digest is good to defend against communication errors, that is to ensure integrity of what you downloaded (basically proving: "file on your hard disk" = "file on the webserver"), but just by itself does not help to ensure authentication.
For that, files should be signed with some PGP signature, or using X.509 certificates. This is the only way you could verify that the file was indeed produced by the true intended authors.
So just by itself your observation neither signal an attack nor helps you defend against one in fact.
Like #ikegami said, you can even configure compressors with a different RAM/time ratio and the same on will produce different results.
See for example in Unix zip:
-Z cm
--compression-method cm
Set the default compression method. Currently the main methods supported by zip are store and deflate. Compression method can be set to:
store - Setting the compression method to store forces zip to store entries with no compression. This is generally faster than compressing entries, but results in no space savings. This is the same as using -0 (compression level zero).
deflate - This is the default method for zip. If zip determines that storing is better than deflation, the entry will be stored instead.
bzip2 - If bzip2 support is compiled in, this compression method also becomes available. Only some modern unzips currently support the bzip2 compression method, so test the unzip you will be using before relying on archives using this method (compression method 12).
and
-#
(-0, -1, -2, -3, -4, -5, -6, -7, -8, -9)
Regulate the speed of compression using the specified digit #, where -0 indicates no compression (store all files), -1 indicates the fastest compression speed (less compression) and -9 indicates the slowest compression speed (optimal compression, ignores the suffix list). The default compression level is -6.
Though still being worked, the intention is this setting will control compression speed for all compression methods. Currently only deflation is controlled.
The same source code could have been recompressed over time when the website is regenerated. So same content but different archive results.
I have downloaded all three files. In fact they are the same sizes exactly (or the WayBack machine did not store them correctly, it does not do a redirection), and all in fact with a SHA1 hash of 00ba29e351e2f74a7dbceaad5d9bc20159dd7003.
Your best bet is probably to ask StrawberryPerl organization directly.

Related

Get Maximum Compression from 7zip compression algorithm

I am trying to compress some of my large document files. But most of files are getting compresses by only 10% maximum. I am using 7zip Terminal Commands.
7z a filename.7z -m0=LZMA -mx=9 -mmt=on -aoa -mfb=64 filename.pptx
Any suggestion on changing parameters. I need at least 30% compression ratio.
.pptx files or .docx files are internally .zip archives. You can not expect a lot of compression on an already compressed file.
Documentation states lzma2 handles better data that can not be compressed, so you can try with
7z a -m0=lzma2 -mx filename.7z filename.pptx
But the required 30% is almost unreachable.
If you really need that compression, you could use the fact that a pptx is just a fancy zip file:
Unzip the pptx, then compress it with 7zip. To recover an equivalent (but not identical) pptx decompress with 7zip and recompress with zip.
There are probably some complications, for example with epub there is a certain file that must be stored uncompressed as first file in the archive at a certain offset from the start. I'm not familiar with pptx, but it might have similar requirements.
I think it's unlikely that the small reduction in file size is worth the trouble, but it's the only approach I can think of.
Depending on what's responsible for the size of the pptx you could also try to compress the contained files. For example by recompressing png files with a better compressor, stripping unnecessary data (e.g. meta-data or change histories) or applying lossy compression with lower quality settings for jpeg files.
Well just an idea to max compressing is
'recompress' these .zip archives(the .docx, .pptx, jar...) using -m0 (storing = NoCompression) and then
apply lzma2 on them
lzma2 is petty good - however if the file contains many jpg's consider to give the opensource packer peazip or more specify paq8o a try. Paq8 has a build in Jpeg compressor and supports range compression. So it will also come along with jpg's the are inside some other file. Winzip's zipx in contrast to this will require pure jpg files and is useless in this case.
But again to make PAQ effectively working/compressing your target file you'll need to 'null' the zip/deflate compression, turn it into an uncompressed zip.
Well PAQ is probably a little exotic, however it's in my eye's more honest and clear than zipx. PAQ is unsupport so it's as always a good idea to just google for what don't have/know and you will find something.
Zipx in contrast may appears a little intrigious since it looks like a normal zip and files are listed properly in Winrar or 7zip but when you like to extract jpg's it will fail so if the user is not experienced it may seem like the zip corrupted. It'll be much harder to find out that is a zipx that so far only winzip or The Unarchiver(unar.exe) can handle properly.
PPTX, XLSX, and DOCX files can indeed be compressed effectively if there are many of them. By unzipping each of them into their directories, an archiver can find commonalities between them, deduplicating the boilerplate XML as well as any common text between them.
If you must use the ZIP format, first create a zero-compression "store" archive containing all of them, then ZIP that. This is necessary because each file in a ZIP archive is compressed from scratch without taking advantage of redundancies across different files.
By taking advantage of boilerplate deduplication, 30% should be a piece of cake.

Why do some sites have a md5 string on each file?

On some sites, in their download section each file has a md5. md5 of what? i cant understand the purpose
on phpBB.com for example:
Download phpBB 3.0.6 (zip)
Size: 2.30 MiB
MD5: 63e2bde5bd03d8ed504fe181a70ec97a
It is the signature of the file's hash. The idea is that you can run MD5 against the downloaded file, then compare it against that value to make sure you did not end up with a corrupted download.
This is a checksum, for verifying that the file as-downloaded is intact, without transmission errors. If the checksum listing is on a different server than the download, it also may give a little peace of mind that the download server hasn't been hacked (with the presumption that two servers are harder to hack than one).
It's a hash of the file. Used to ensure file integrity once you download said file. You'd use an md5 checksum tool to verify the file state.
Sites will post checksums so that you can make sure the file downloaded is the same as the file they're offering. This lets you ensure that file has not been corrupted or tampered with.
On most unix operating systems you can run md5 or md5sum on a file to get the hash for it. If the hash you get matches the hash from the website, you can be reasonably certain that the file is intact. A quick Google search will get you md5sum utilities for Windows.
You might also see an SHA-1 hash sometimes. It's the same concept, but a different and more secure algorithm.
This is an md5 hash of the entire binary contents of the file. The point is that if two files have different md5 hashes, they are different. This helps you determine whether a local file on your computer is the same as the file on the website, without having to download it again. For instance:
You downloaded your local copy somewhere else and think there might be a virus inside.
Your connection is lossy and you fear the file might be corrupted by the download.
You have changed the local file name and want to know which version you have.

Sync (Diff) of the remote binary file

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.
There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.
You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.
To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.
Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.

Hash of an .exe file

I'm wondering whether I will ever get a different result when producing a checksum on an .exe file before and then while or after running that file. I'm more concerned with common practice (such as producing a SHA hash of popular app like firefox.exe) than with boundary cases, but both are interesting. Thanks.
The hash of a file should be constant for as long as the file is identical (i.e. contains only the same bytes, in the same order). It's very rare to find applications that rewrite their on-disk representation at runtime, so the hash should be constant. There are self-modifying programs, but they tend to operate on the in-memory loaded copy of their code, rather than the disk copy.
Edit: We should consider "Self-updating" applications, but these tend to launch a little helper program to download and update the core application. It's difficult (especially on Windows) to update an execution whilst it's running. UNIX systems tend to operate Copy on Write systems, so it's possible that a software update might change your executable under your feet - but again, this is a "corner case".
The hash will only change if the exe changes. That will only happen if the app modifies itself, which isn't going to happen on windows without the app restarting. Firefox might update itself (including a restart), but apart from such cases, the hash will remain the same.
The hash will change if the file changes.
EXE files rarely change on their own. firefox.exe would change if the user updates to a new version.
You can check the "date modified" attribute of an EXE file (like firefox.exe) after running it to see whether it has changed, but you'll probably find it hasn't.
If you mean the modification of the last access time, don't worry, it's stored at the filesystem level, not within the file so the hash will remain the same.

why are downloads sometimes tagged md5, sha1 and other hash indicators?

I've seen this all over the place:
Download here! SHA1 =
8e1ed2ce9e7e473d38a9dc7824a384a9ac34d7d0
What does it mean? How does a hash come into play as far as downloads and... What use can I make of it? Is this a legacy item where you used to have to verify some checksum after you downloaded the whole file?
It's a security measure. It allows you to verify that the file you just downloaded is the one that the author posted to the site. Note that using hashes from the same website you're getting the files from is not especially secure. Often a good place to get them from is a mailing list announcement where a PGP-signed email contains the link to the file and the hash.
Since this answer has been ranked so highly compared to the others for some reason, I'm editing it to add the other major reason mentioned first by the other authors below, which is to verify the integrity of the file after transferring it over the network.
So:
Security - verify that the file that you downloaded was the one the author originally published
Integrity - verify that the file wasn't damaged during transmission over the network.
When downloading larger files, it's often useful to perform a checksum to ensure your download was successful and not mangled along transport. There's tons of freeware apps that can be used to gen the checksum for you to validate your download. This to me is an interesting mainstreaming of procedures that popular mp3 and warez sites used to use back in the day when distributing files.
SHA1 and MD5 hashes are used to verify the integrity of files you've downloaded. They aren't necessarily a legacy technology, and can be used by tools like those in the openssl to verify whether or not your a file has been corrupted/changed from its original.
It's to ensure that you downloaded the file correctly. If you hash the downloaded the file and it matches the hash on the page, all is well.
A cryptographic hash (such as SH1 or MD5) allows you to verify that file you have has been downloaded correctly and has not been tampered with.
To go along with what everyone here is saying I use HashTab when I need to generate/compare MD5 and SHA1 hashes on Windows. It adds a new tab to the file properties window and will calculate the hashes.
With a has (MD5, SHA-1) one input matches only with one output, and then if you down load the file and calculate the hash again should obtain the same output.
If the output is different the file is corrupt.
If (hash(file) == “Hash in page”)
validFile = true;
else
validFile = false;