Why do some sites have a md5 string on each file? - hash

On some sites, in their download section each file has a md5. md5 of what? i cant understand the purpose
on phpBB.com for example:
Download phpBB 3.0.6 (zip)
Size: 2.30 MiB
MD5: 63e2bde5bd03d8ed504fe181a70ec97a

It is the signature of the file's hash. The idea is that you can run MD5 against the downloaded file, then compare it against that value to make sure you did not end up with a corrupted download.

This is a checksum, for verifying that the file as-downloaded is intact, without transmission errors. If the checksum listing is on a different server than the download, it also may give a little peace of mind that the download server hasn't been hacked (with the presumption that two servers are harder to hack than one).

It's a hash of the file. Used to ensure file integrity once you download said file. You'd use an md5 checksum tool to verify the file state.

Sites will post checksums so that you can make sure the file downloaded is the same as the file they're offering. This lets you ensure that file has not been corrupted or tampered with.
On most unix operating systems you can run md5 or md5sum on a file to get the hash for it. If the hash you get matches the hash from the website, you can be reasonably certain that the file is intact. A quick Google search will get you md5sum utilities for Windows.
You might also see an SHA-1 hash sometimes. It's the same concept, but a different and more secure algorithm.

This is an md5 hash of the entire binary contents of the file. The point is that if two files have different md5 hashes, they are different. This helps you determine whether a local file on your computer is the same as the file on the website, without having to download it again. For instance:
You downloaded your local copy somewhere else and think there might be a virus inside.
Your connection is lossy and you fear the file might be corrupted by the download.
You have changed the local file name and want to know which version you have.

Related

What can go wrong if I use MD5 instead of SHA256 for the sake of file names?

I'm developing a cache system. This system creates cache files for each API call.
Each API has a URL of course, and I'm using that URL to create a hash and use that hash as the name of the file.
The reason I can't use the URL directly as the name of the file is because many URLs contain Unicode characters and when they are URL encoded, they become very long and OS won't let me save them.
Now, I just tested some hashing algorithms online and it seems that MD5 generates shorter digests for me than SHA256.
So I want to use MD5. But do I lose anything? I mean, does it have any shortcomings that I might not see now?
Is there a greater chance for it to generate duplicate digests for different URLs?

Check if downloaded file is corrupted and delete if corrupted from iPhone

Problem:
I have to download some files from the server. In between the connection with the server is lost. And when the file is opened it opened without any problem, except that it was blank.
Question
How to check if the downloaded file from the server is corrupted or not? Is there any way to do that?
If the file is corrupted it must be deleted from the documents folder.
Thank you!
You can create a hash of the file and then use that hash to compare the current hash to the new hash.
Here's an example on creating a hash for iOS:
http://iosdevelopertips.com/core-services/create-md5-hash-from-nsstring-nsdata-or-file.html
It should work pretty well because the hash only changes if the file contents change and is unaffected by creation times, modification times, and the file name.
Edit
You can also sign your files with PGP or GPG and use your public key to verify its contents.
Hope this helps :)
Send a hash of the file with the file, and then compare the hashes.

How to read a file from the disk if less than X days old, if older, refetch the html file

I wish to read an html file off of the internet and cache it. Then when I go back, because I'm debugging, I don't want to hammer the servers with the numerous requests I'll need. I don't want to get my IP banned for slamming the server over and over again just because I'm debugging. So my code needs to look something like:
if ((file > days_old) || !(file exists))
fetch html file from internet
save file to disk
else
read it from the disk
Because there will be multiple files, I'll need to include a variable name in the file name so the file is unique and I can easily look it up again.
I just learned Perl this semester and we only learned the basics & a bit of regex, once I get this I should be mostly fine.
Thanks!
Use an existing module:
Cache::Cache
HTTP::Cache::Transparent
If you really want to implement your own, you'll want to look at the If-Modified-Since and ETag HTTP headers to determine when to re-fetch a file, rather than an arbitrary days_old number you suck out of your thumb. You will also have to generate a unique filename, preferably with a hash function, while retaining the original URL to cater for hash collisions.

Sync (Diff) of the remote binary file

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.
There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.
You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.
To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.
Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.

why are downloads sometimes tagged md5, sha1 and other hash indicators?

I've seen this all over the place:
Download here! SHA1 =
8e1ed2ce9e7e473d38a9dc7824a384a9ac34d7d0
What does it mean? How does a hash come into play as far as downloads and... What use can I make of it? Is this a legacy item where you used to have to verify some checksum after you downloaded the whole file?
It's a security measure. It allows you to verify that the file you just downloaded is the one that the author posted to the site. Note that using hashes from the same website you're getting the files from is not especially secure. Often a good place to get them from is a mailing list announcement where a PGP-signed email contains the link to the file and the hash.
Since this answer has been ranked so highly compared to the others for some reason, I'm editing it to add the other major reason mentioned first by the other authors below, which is to verify the integrity of the file after transferring it over the network.
So:
Security - verify that the file that you downloaded was the one the author originally published
Integrity - verify that the file wasn't damaged during transmission over the network.
When downloading larger files, it's often useful to perform a checksum to ensure your download was successful and not mangled along transport. There's tons of freeware apps that can be used to gen the checksum for you to validate your download. This to me is an interesting mainstreaming of procedures that popular mp3 and warez sites used to use back in the day when distributing files.
SHA1 and MD5 hashes are used to verify the integrity of files you've downloaded. They aren't necessarily a legacy technology, and can be used by tools like those in the openssl to verify whether or not your a file has been corrupted/changed from its original.
It's to ensure that you downloaded the file correctly. If you hash the downloaded the file and it matches the hash on the page, all is well.
A cryptographic hash (such as SH1 or MD5) allows you to verify that file you have has been downloaded correctly and has not been tampered with.
To go along with what everyone here is saying I use HashTab when I need to generate/compare MD5 and SHA1 hashes on Windows. It adds a new tab to the file properties window and will calculate the hashes.
With a has (MD5, SHA-1) one input matches only with one output, and then if you down load the file and calculate the hash again should obtain the same output.
If the output is different the file is corrupt.
If (hash(file) == “Hash in page”)
validFile = true;
else
validFile = false;