I am working on writing a SHA512 function. When i check the file I am encrypting on different sources, a Linux SHA512SUM tool, a couple websites, and run it through the old source code i have for SHA512, they all give different hash values. My thought going into this project is that all Hash algorithms will output the same hash value if implemented correctly, to be used as a check sum. Am I wrong in thinking this? If I am wrong how would I really check to see if my work is correct?
Thanks in advance.
Yes, that's one of the basic building block of PKI: the same data block passed to a hash should always return the same hash value.
beware of the interpretation, though: the result of a SHA-2(512) hash is a block of 512 bits, not a string value so it will first be encoded for human consumption and it is therefore possible that you see what looks like visually different results when it's simply a matter of using different encodings.
Related
So the April 1, 2013 xkcd Externalities web comic features a Skein 1024 1024 hash breaking contest. I'm assuming that this must be nothing more than a brute force effort where random strings are hashed in an effort to match Randall's posted hash? Is this correct?
Also, my knowledge of Skein hashing theory is virtually non-existent but being a halfway decent programmer I was able to download and run both SkeinFish (C#) and Maarten Bodewes Skein implementation (Java) locally in 1024 1024 mode with some input strings. The hashes that they gave, however, were different than the hash that xkcd returned for the same input. This may be an extremely naive question but do different Skein implementations give different hashes? And what Skein implementation is xkcd using?
Thanks for pardoning my ignorance!
There are several different iterations of the skein algorithm. XKCD is using version 1.3, which is also the most recent. Sources can be found here (look for "V1.3")
Interestingly enough, this brute-force method is the same one employed by Bitcoin to "mine" bitcoins. The big differences are the hash algorithm (SHA-256 in that case) and the target hash (which is dynamically determined to be any hash starting with a certain number of zeros.) It takes a lot of work to discover the hash, but once it has been found it is trivial to verify the source bits and that the resulting hash meets the criteria.
Here's the source code the Stanford team used. We ran this on about a hundred 8-core EC2 servers for a while, but not the whole competition.
https://github.com/jhiesey/skeincrack
If you were hashing non-alphanumeric characters (spaces, punctuation, etc.), you may have been getting different results due to HTML form encoding. The "enctype" attribute on the form XKCD was hosting was "application/octet-stream", which according to https://developer.mozilla.org/en-US/docs/HTML/Element/form is not a browser-supported standard. I assume the browser falls back on the URL-encoding type when it sees one it doesn't recognize.
I observed the string "=" being submitted URL-encoded in Chrome, and returning a different hash than what I got locally with the latest pyskein. But when I submitted it with this curl command line (no longer works), I got the expected hash:
curl -X POST --data-binary "hashable==" "http://almamater.xkcd.com/?edu=school.edu"
The Stanford code in another answer does the same thing, and they apparently had some success. I never got any random data to locally hash to a better score than even my own school, so I never got a chance to test thoroughly how to pass arbitrary data in properly. I don't know what the exact behavior was (e.g., perhaps if you omitted hashable= the server would detect that and just hash the whole POST body), but it may have intentionally been a little tricky as part of April Fool's.
I'm going to be using a key:value store and would like to create non-collidable hashes in Perl. Is there a Perl module, or function that I can use to generate a non-collidable hash function or table (maybe something like gperf)? I already know my range of input values.
I can't find a pure Perl solution, closest is Reini Urban's examinations of using perfect hashes with a type system. If you were to do it in XS, the CMPH (C Minimal Perfect Hashing Library) might be more apropos than gperf. CMPH seems to be optimized for non-trivial key sizes and run-time generation.
The cost of generating a perfect hash function at runtime in Perl might swamp the value of using it. In order to gain benefit, you'd want it compiled and cached. So again, writing an XS module which generates the function from a fixed key list at XS compile time might be the best way to go.
Out of curiosity, how big is your data and how many keys does the set contain?
You might be interested in Judy. It's not a hash table implementation, but it's supposedly a very efficient associative array implementation.
Mind you, Perl's hashes are very well tuned, and they automatically get rehashed when a bucket starts growing large.
I've been wondering how the CryptographyManager is able to compare a salted hash with the plain text. It has to save the salt for each hash somewhere, right? Has anyone any insight on this?
We ship source code. Take a look at CryptographyManagerImpl.cs in the Cryptography solution.
Also, you may want to review our unit tests - the ones that start with HashProvider should give you additional insight.
So I checked out the source code and it is actually quite trivial: The salt is prepended to the actual hash value. When the hash is compared to a plaintext the salt is extracted and used to hash the plaintext. These two hash values (= salt + hash) are then compared.
I'm looking for a special hash-function. Let's say I have a large list of strings, if I order them by their hash-values they should be ordered quasi randomly.
The most important point is: it must be super fast. I've tried md5 and sha1 and they're using to much cpu power.
Clashes are not a problem.
I'm using javascript, so it shouldn't be too complicated to implement.
Take a look at Murmur hash. It has a nice space/collision trade-off:
http://sites.google.com/site/murmurhash/
It looks as if you want the sort of hash function used in a hash table, not the sort used to detect duplicates or tampering.
Googling will yield you a wealth of information on alternative hash functions. To start with, stay away from cryptographic signature hashes (like MD-5 or SHA-1), they solve another problem.
You can read this, or this, or this, to start with.
If speed is paramount, you can implement a simple ad-hoc hash, e.g. take the first and last letter and order your string by the last and then first letter. The result would look, as you say, "quasi random" and it would be fast. For instance, part of my answer sorted that way would look like this:
ca ad-hoc
el like
es simple
gt taking
hh hash
nc can
ti implement
uy you
Hsieh, Murmur, Bob Jenkin's comes to my mind.
A nice page about hash functions that has some tests for quality and a simple S-box hash as well.
I seen MD5 and SHA1 hashes on the net to verify files. What are common hashes used on the net and other programs? This is to verify a file not to hash a pw.
I've used some hash functions from the following site before - they are usually pretty quick, and full code is given on the website, and a description of each of the functions and their strengths/weaknesses:
http://www.partow.net/programming/hashfunctions
Examples of the hashes given are - Kernighan and Ritchie (from "The C Programming Language") and the Knuth hash (from "The Art Of Computer Programming Volume 3").
To verify files you can use cyclic redundancy checks, such as CRC32, which have been as far as I know the de-facto standard for hashing files for a long time in the IT, if you want to look at other stuff than MD5/SHA.
See also this list of checksum algorithms for more ways to check your files.
I never used anything else than MD5. Add a Salt if you use it for passwords.
Wikipedia has a list of hash functions, broken up into different types (checksums, non-crypto, crypto etc).
The Apache Foundation (among others) uses PGP Signatures.