why are downloads sometimes tagged md5, sha1 and other hash indicators? - hash

I've seen this all over the place:
Download here! SHA1 =
8e1ed2ce9e7e473d38a9dc7824a384a9ac34d7d0
What does it mean? How does a hash come into play as far as downloads and... What use can I make of it? Is this a legacy item where you used to have to verify some checksum after you downloaded the whole file?

It's a security measure. It allows you to verify that the file you just downloaded is the one that the author posted to the site. Note that using hashes from the same website you're getting the files from is not especially secure. Often a good place to get them from is a mailing list announcement where a PGP-signed email contains the link to the file and the hash.
Since this answer has been ranked so highly compared to the others for some reason, I'm editing it to add the other major reason mentioned first by the other authors below, which is to verify the integrity of the file after transferring it over the network.
So:
Security - verify that the file that you downloaded was the one the author originally published
Integrity - verify that the file wasn't damaged during transmission over the network.

When downloading larger files, it's often useful to perform a checksum to ensure your download was successful and not mangled along transport. There's tons of freeware apps that can be used to gen the checksum for you to validate your download. This to me is an interesting mainstreaming of procedures that popular mp3 and warez sites used to use back in the day when distributing files.

SHA1 and MD5 hashes are used to verify the integrity of files you've downloaded. They aren't necessarily a legacy technology, and can be used by tools like those in the openssl to verify whether or not your a file has been corrupted/changed from its original.

It's to ensure that you downloaded the file correctly. If you hash the downloaded the file and it matches the hash on the page, all is well.

A cryptographic hash (such as SH1 or MD5) allows you to verify that file you have has been downloaded correctly and has not been tampered with.

To go along with what everyone here is saying I use HashTab when I need to generate/compare MD5 and SHA1 hashes on Windows. It adds a new tab to the file properties window and will calculate the hashes.

With a has (MD5, SHA-1) one input matches only with one output, and then if you down load the file and calculate the hash again should obtain the same output.
If the output is different the file is corrupt.
If (hash(file) == “Hash in page”)
validFile = true;
else
validFile = false;

Related

What can go wrong if I use MD5 instead of SHA256 for the sake of file names?

I'm developing a cache system. This system creates cache files for each API call.
Each API has a URL of course, and I'm using that URL to create a hash and use that hash as the name of the file.
The reason I can't use the URL directly as the name of the file is because many URLs contain Unicode characters and when they are URL encoded, they become very long and OS won't let me save them.
Now, I just tested some hashing algorithms online and it seems that MD5 generates shorter digests for me than SHA256.
So I want to use MD5. But do I lose anything? I mean, does it have any shortcomings that I might not see now?
Is there a greater chance for it to generate duplicate digests for different URLs?

Is Strawberry Perl safe?

Strawberry Perl lists SHA1 digests on its download page.
However, looking at download page snapshots on archive.org, their SHA1 digests for the same perl version and build seem to change over time.
Example: in the download page snapshot from 2013-05-10, strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 86.8 MB long with an SHA1 digest of 3b9c4c32bf29e141329c3be417d9c425a7f6c2ff.
In the download page snapshot from 2017-02-14, the same strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 87.3 MB long with an SHA1 digest of 7f6da2c3e1b7a808f27969976777f47a7a7c6544.
And on the current download page, the same strawberry-perl-5.16.3.1-32bit-portable.zip is shown to be 91.0 MB long with an SHA1 digest of 00ba29e351e2f74a7dbceaad5d9bc20159dd7003
I thought they might have recompiled the package for some reason, but the current strawberry-perl-5.10.0.6-portable.zip has only one file dated later than 2009 (it's portable.perl), so this doesn't explain why the archive grew over time. Sadly, I don't have older zip files, so I have no way of knowing what changed inside the archive.
What's going on here? Why do past builds change over time?? I am kind of concerned that some hackers might be injecting malicious code or something into binary perl packages...
Is there a rational explanation here ? Thanks...
An hash such as a SHA1 digest is good to defend against communication errors, that is to ensure integrity of what you downloaded (basically proving: "file on your hard disk" = "file on the webserver"), but just by itself does not help to ensure authentication.
For that, files should be signed with some PGP signature, or using X.509 certificates. This is the only way you could verify that the file was indeed produced by the true intended authors.
So just by itself your observation neither signal an attack nor helps you defend against one in fact.
Like #ikegami said, you can even configure compressors with a different RAM/time ratio and the same on will produce different results.
See for example in Unix zip:
-Z cm
--compression-method cm
Set the default compression method. Currently the main methods supported by zip are store and deflate. Compression method can be set to:
store - Setting the compression method to store forces zip to store entries with no compression. This is generally faster than compressing entries, but results in no space savings. This is the same as using -0 (compression level zero).
deflate - This is the default method for zip. If zip determines that storing is better than deflation, the entry will be stored instead.
bzip2 - If bzip2 support is compiled in, this compression method also becomes available. Only some modern unzips currently support the bzip2 compression method, so test the unzip you will be using before relying on archives using this method (compression method 12).
and
-#
(-0, -1, -2, -3, -4, -5, -6, -7, -8, -9)
Regulate the speed of compression using the specified digit #, where -0 indicates no compression (store all files), -1 indicates the fastest compression speed (less compression) and -9 indicates the slowest compression speed (optimal compression, ignores the suffix list). The default compression level is -6.
Though still being worked, the intention is this setting will control compression speed for all compression methods. Currently only deflation is controlled.
The same source code could have been recompressed over time when the website is regenerated. So same content but different archive results.
I have downloaded all three files. In fact they are the same sizes exactly (or the WayBack machine did not store them correctly, it does not do a redirection), and all in fact with a SHA1 hash of 00ba29e351e2f74a7dbceaad5d9bc20159dd7003.
Your best bet is probably to ask StrawberryPerl organization directly.

Check if downloaded file is corrupted and delete if corrupted from iPhone

Problem:
I have to download some files from the server. In between the connection with the server is lost. And when the file is opened it opened without any problem, except that it was blank.
Question
How to check if the downloaded file from the server is corrupted or not? Is there any way to do that?
If the file is corrupted it must be deleted from the documents folder.
Thank you!
You can create a hash of the file and then use that hash to compare the current hash to the new hash.
Here's an example on creating a hash for iOS:
http://iosdevelopertips.com/core-services/create-md5-hash-from-nsstring-nsdata-or-file.html
It should work pretty well because the hash only changes if the file contents change and is unaffected by creation times, modification times, and the file name.
Edit
You can also sign your files with PGP or GPG and use your public key to verify its contents.
Hope this helps :)
Send a hash of the file with the file, and then compare the hashes.

Why do some sites have a md5 string on each file?

On some sites, in their download section each file has a md5. md5 of what? i cant understand the purpose
on phpBB.com for example:
Download phpBB 3.0.6 (zip)
Size: 2.30 MiB
MD5: 63e2bde5bd03d8ed504fe181a70ec97a
It is the signature of the file's hash. The idea is that you can run MD5 against the downloaded file, then compare it against that value to make sure you did not end up with a corrupted download.
This is a checksum, for verifying that the file as-downloaded is intact, without transmission errors. If the checksum listing is on a different server than the download, it also may give a little peace of mind that the download server hasn't been hacked (with the presumption that two servers are harder to hack than one).
It's a hash of the file. Used to ensure file integrity once you download said file. You'd use an md5 checksum tool to verify the file state.
Sites will post checksums so that you can make sure the file downloaded is the same as the file they're offering. This lets you ensure that file has not been corrupted or tampered with.
On most unix operating systems you can run md5 or md5sum on a file to get the hash for it. If the hash you get matches the hash from the website, you can be reasonably certain that the file is intact. A quick Google search will get you md5sum utilities for Windows.
You might also see an SHA-1 hash sometimes. It's the same concept, but a different and more secure algorithm.
This is an md5 hash of the entire binary contents of the file. The point is that if two files have different md5 hashes, they are different. This helps you determine whether a local file on your computer is the same as the file on the website, without having to download it again. For instance:
You downloaded your local copy somewhere else and think there might be a virus inside.
Your connection is lossy and you fear the file might be corrupted by the download.
You have changed the local file name and want to know which version you have.

Sync (Diff) of the remote binary file

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.
There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.
You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.
To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.
Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.