CRC32 vs CRC32C? - hash

What is the difference of CRC32 and CRC32C? I know CRC32 for a long time, but just heard CRC32C today. Are they basically the same method (i.e. both results in the same hash for a given data)?

The CRC32 found in zip and a lot of other places uses the polynomial 0x04C11DB7; its reversed form 0xEDB88320 is perhaps better known, being often found in little-endian implementations.
CRC32C uses a different polynomial (0x1EDC6F41, reversed 0x82F63B78) but otherwise the computation is the same. The results are different, naturally. This is also known as the Castagnoli CRC32 and most conspicuously found in newer Intel CPUs which can compute a full 32-bit CRC step in 3 cycles. That is the reason why the CRC32C is becoming more popular, since it allows advanced implementations that effectively process one 32-bit word per cycle despite the three-cycle latency (by processing 3 streams of data in parallel and using linear algebra to combine the results).

Related

How well do Non-cryptographic hashes detect errors in data vs. CRC-32 etc.?

Non-cryptographic hashes such as MurmurHash3 and xxHash are almost exclusively designed for hash tables, but they appear to function comparably (and even favorably) to CRC-32, Adler-32 and Fletcher-32. Non-crypto hashes are often faster than CRC-32 and produce more "random" output similar to slow cryptographic hashes (MD5, SHA). Despite this, I only ever see CRC-32 or MD5 recommended for data integrity/checksum purposes.
In the table below, I tested 32-bit checksum/CRC/hash functions to determine how well they detect small differences in data:
The results in each cell means: A) number of collisions found, and B) minimum and maximum probability that any of the 32 output bits are set to 1. To pass test B, the max and min should be as close as possible to 50. Anything under 45 or over 55 indicates bias.
Looking at the table, MurmurHash3 and Jenkins lookup2 compare favorably to CRC-32 (which actually fails one test). They are also well-distributed. DJB2 and FNV1a pass collision tests but aren't well distributed. Fletcher32 and Adler32 struggle with the NullBytes and 8RandBytes tests.
So then my question is, compared to other checksums, how suitable are 'non-cryptographic hashes' for detecting errors or differences in files? Is there any reason a CRC-32/Adler-32/CRC-64 might outperform any decent 32-bit/64-bit hash?
Is there any reason this function would be inferior to CRC-32 or
Adler-32 for detecting errors in data?
Yes, for certain kinds of error characteristics. A CRC can be designed to very effectively detect small numbers of bit errors in a packet, as you might expect on an actual communications or storage channel. That's what it's designed for.
For large numbers of errors, any 32-bit check that fills the 32 bits and does a reasonably good job of being sensitive to all of the bits of the packet will work about as well as any other. So your's would be as good as a CRC-32, and a smidge better than an Adler-32. (The Adler-32 deliberately does not use all possible 32-bit values, so has a slightly higher false positive rate than 32-bit checks that use all possible values.)
By the way, looking a little more at your algorithm, it does not distribute over all 32-bit values until you have many bytes of input. So your check would not be as good as any other 32-bit check on a large number of errors until you have covered the possible 32-bit values of the check.

Can a CRC32 engine be used for computing CRC16 hashes?

I'm working with a microcontroller with native HW functions to calculate CRC32 hashes from chunks of memory, where the polynomial can be freely defined. It turns out that the system has different data-links with different bit-lengths for CRC, like 16 and 8 bit, and I intend to use the hardware engine for it.
In simple tests with online tools I've concluded that it is possible to find a 32-bit polynomial that has the same result of a 8-bit CRC, example:
hashing "a sample string" with 8-bit engine and poly 0xb7 yelds a result 0x97
hashing "a sample string" with 16-bit engine and poly 0xb700 yelds a result 0x9700
...32-bit engine and poly 0xb7000000 yelds a result 0x97000000
(with zero initial value and zero final xor, no reflections)
So, padding the poly with zeros and right-shifting the results seems to work.
But is it 'always' possible to find a set of parameters that make 32-bit engines to work as 16 or 8 bit ones? (including poly, final xor, init val and inversions)
To provide more context and prevent 'bypass answers' like 'dont't use the native engine': I have a scenario in a safety critical system where it's necessary to prevent a common design error from propagating to redundant processing nodes. One solution for that is having software-based CRC calculation in one node, and hardware-based in its pair.
Yes, what you're doing will work in general for CRCs that are not reflected. The pre and post conditioning can be done very simply with code around the hardware instructions loop.
Assuming that the hardware CRC doesn't have an option for this, to do a reflected CRC you would need to reflect each input byte, and then reflect the final result. That may defeat the purpose of using a hardware CRC. (Though if your purpose is just to have a different implementation, then maybe it wouldn't.)
You don't have to guess. You can calculate it. Because CRC is a remainder of a division by an irreducible polynomial, it's a 1-to-1 function on its domain.
So, CRC16, for example, has to produce 65536 (64k) unique results if you run it over 0 through 65536.
To see if you get the same outcome by taking parts of CRC32, run it over 0 through 65535, keep the 2 bytes that you want to keep, and then see if there is any collision.
If your data has 32 bits in it, then it should not be an issue. The issue arises if you have less than 32 bit numbers and you shuffle them around in a 32-bit space. Their 1st and last byte are not guaranteed to be uniformly distributed.

Checksumming: CRC or hash?

Performance and security considerations aside, and assuming a hash function with a perfect avalanche effect, which should I use for checksumming blocks of data: CRC32 or hash truncated to N bytes? I.e. which will have a smaller probability to miss an error? Specifically:
CRC32 vs. 4-byte hash
CRC32 vs. 8-byte hash
CRC64 vs. 8-byte hash
Data blocks are to be transferred over network and stored on disk, repeatedly. Blocks can be 1KB to 1GB in size.
As far as I understand, CRC32 can detect up to 32 bit flips with 100% reliability, but after that its reliability approaches 1-2^(-32) and for some patterns is much worse. A perfect 4-byte hash reliability is always 1-2^(-32), so go figure.
8-byte hash should have a much better overall reliability (2^(-64) chance to miss an error), so should it be preferred over CRC32? What about CRC64?
I guess the answer depends on type of errors that might be expected in such sort of operation. Are we likely to see sparse 1-bit flips or massive block corruptions? Also, given that most storage and networking hardware implements some sort of CRC, should not accidental bit flips be taken care of already?
Only you can say whether 1-2-32 is good enough or not for your application. The error detection performance between a CRC-n and n bits from a good hash function will be very close to the same, so pick whichever one is faster. That is likely to be the CRC-n.
Update:
The above "That is likely to be the CRC-n" is only somewhat likely. It is not so likely if very high performance hash functions are used. In particular, CityHash appears to be very nearly as fast as a CRC-32 calculated using the Intel crc32 hardware instruction! I tested three CityHash routines and the Intel crc32 instruction on a 434 MB file. The crc32 instruction version (which computes a CRC-32C) took 24 ms of CPU time. CityHash64 took 55 ms, CityHash128 60 ms, and CityHashCrc128 50 ms. CityHashCrc128 makes use of the same hardware instruction, though it does not compute a CRC.
In order to get the CRC-32C calculation that fast, I had to get fancy with three crc32 instructions on three separate buffers in order to make use of the three arithmetic logic units in parallel in a single core, and then writing the inner loop in assembler. CityHash is pretty damned fast. If you don't have the crc32 instruction, then you would be hard-pressed to compute a 32-bit CRC as fast as a CityHash64 or CityHash128.
Note however that the CityHash functions would need to be modified for this purpose, or an arbitrary choice would need to be made in order to define a consistent meaning for the CityHash value on large streams of data. The reason is that those functions are not set up to accept buffered data, i.e. feeding the functions a chunk at a time and expecting to get the same result as if the entire set of data were fed to the function at once. The CityHash functions would need to modified to update an intermediate state.
The alternative, and what I did for the quick and dirty testing, is to use the Seed versions of the functions where I would use the CityHash from the previous buffer as the seed for the next buffer. The problem with that is that the result is then dependent on the buffer size. If you feed CityHash different size buffers with this approach, you get different hash values.
Another Update four years later:
Even faster is the xxhash family. I would now recommend that over a CRC for a non-cryptographic hash.
Putting aside "performance" issues; you might want to consider using one of the SHA-2 functions (say SHA-256).

Can one construct a "good" hash function using CRC32C as a base?

Given that SSE 4.2 (Intel Core i7 & i5 parts) includes a CRC32 instruction, it seems reasonable to investigate whether one could build a faster general-purpose hash function. According to this only 16 bits of a CRC32 are evenly distributed. So what other transformation would one apply to overcome that?
Update
How about this? Only 16 bits are suitable for a hash value. Fine. If your table is 65535 or less then great. If not, run the CRC value through the Nehalem POPCNT (population count) instruction to get the number of bits set. Then, use that as an index into an array of tables. This works if your table is south of 1mm entries. I'd bet that's cheaper/faster that the best-performing hash functions. Now that GCC 4.5 has a CRC32 intrinsic it should be easy to test...if only I had the copious spare time to work on it.
David
Revisited, August 2014
Prompted by Arnaud Bouchez in a recent comment, and in view of other answers and comments, I acknowledge that the original answer needs to be altered or for the least qualified. I left the original as-is, at the end, for reference.
First, and maybe most important, a fair answer to the question depends on the intended use of the hash code: What does one mean by "good" [hash function...]? Where/how will the hash be used? (e.g. is it for hashing a relatively short input key? Is it for indexing / lookup purposes, to produce message digests or yet other uses? How long is the desired hash code itself, all 32 bits [of CRC32 or derivatives thereof], more bits, fewer... etc?
The OP questions calls for "a faster general-purpose hash function", so the focus is on SPEED (something less CPU intensive and/or something which can make use of parallel processing of various nature). We may note here that the computation time for the hash code itself is often only part of the problem in an application of hash (for example if the size of the hash code or its intrinsic characteristics result in many collisions which require extra cycles to be dealt with). Also the requirement for "general purpose" leaves many questions as to the possible uses.
With this in mind, a short and better answer is, maybe:
Yes, the hardware implementations of CRC32C on newer Intel processors can be used to build faster hash codes; beware however that depending on the specific implementation of the hash and on its application the overall results may be sub-optimal because of the frequency of collisions, of the need to use longer codes. Also, for sure, cryptographic uses of the hash should be carefully vetted because the CRC32 algorithm itself is very weak in this regard.
The original answer cited a article on Evaluating Hash functions by Bret Mulvey and as pointed in Mdlg's answer, the conclusion of this article are erroneous in regards to CRC32 as the implementation of CRC32 it was based on was buggy/flawed. Despite this major error in regards to CRC32, the article provides useful guidance as to the properties of hash algorithms in general. The URL to this article is now defunct; I found it on archive.today but I don't know if the author has it at another location and also whether he updated it.
Other answers here cite CityHash 1.0 as an example of a hash library that uses CRC32C. Apparently, this is used in the context of some longer (than 32 bits) hash codes but not for the CityHash32() function itself. Also, the use of CRC32 by City Hash functions is relatively small, compared with all the shifting and shuffling and other operations that are performed to produce the hash code. (This is not a critique of CityHash for which I have no hands-on experience. I'll go on a limb, from a cursory review of the source code that CityHash functions produce good, e.g. ell distributed codes, but are not significantly faster than various other hash functions.)
Finally, you may also find insight on this issue in a quasi duplicate question on SO .
Original answer and edit (April 2010)
A priori, this sounds like a bad idea!.
CRC32 was not designed for hashing purposes, and its distribution is likely to not be uniform, hence making it a relatively poor hash-code. Furthermore, its "scrambling" power is relatively weak, making for a very poor one-way hash, as would be used in cryptographic applications.
[BRB: I'm looking for online references to that effect...]
Google's first [keywords = CRC32 distribution] hit seems to confirm this :
Evaluating CRC32 for hash tables
Edit: The page cited above, and indeed the complete article provides a good basis of what to look for in Hash functions.
Reading [quickly] this article, confirmed the blanket statement that in general CRC32 should not be used as a hash, however, and depending on the specific purpose of the hash, it may be possible to use, at least in part, a CRC32 as a hash code.
For example the lower (or higher, depending on implementation) 16 bits of the CRC32 code have a relatively even distribution, and, provided that one isn't concerned about the cryptographic properties of the hash code (i.e. for example the fact that similar keys produce very similar codes), it may be possible to build a hash code which uses, say, a concatenation of the lower [or higher] 16 bits for two CRC32 codes produced with the two halves (or whatever division) of the original key.
One would need to run tests to see if the efficiency of the built-in CRC32 instruction, relative to an alternative hash functions, would be such that the overhead of calling the instruction twice and splicing the code together etc. wouldn't result in an overall slower function.
The article referred to in other answers draws incorrect conclusions based on buggy crc32 code. Google's ranking algorithm does not rank based on scientific accuracy yet.
Contrary to the referred article "Evaluating CRC32 for hash tables" conclusions, CRC32 and CRC32C are acceptable for hash table use. The author's sample code has a bug in the crc32 table generation. Fixing the crc32 table, gives satifactory results using the same methodology. Also the speed of the CRC32 instruction, makes it the best choice in many contexts. Code using the CRC32 instruction is 16x faster at peak than an optimal software implementation. (Note that CRC32 is not exactly the same than CRC32C which the intel instruction implements.)
CRC32 is obviously not suitable for crypto use. (32 bit is a joke to brute force).
Yes. CityHash 1.0.1 includes some new "good hash functions" that use CRC32 instructions.
Just so long as you're not after crypto hash it just might work.
For cryptographic purposes, CRC32 is a bad fundation because it is linear (over the vector space GF(2)^32) and that is hard to correct. It may work for non-cryptographic purposes.
However, recent Intel cores have the AES-NI instructions, which basically perform 1/10th of an AES block encryption in two clock cycles. They are available on the most recent i5 and i7 processors (see the Wikipedia page for some details). This looks like a good start for building a cryptographic hash function (and a hash function which is good for cryptography will also be good for about anything else).
Indeed, at least one of the SHA-3 "round 2" candidates (the ECHO hash function) is built around the AES elements so that the AES-NI opcodes provide a very substantial performance boost. (Unfortunately, in the absence of AES-NI instruction, ECHO performance somewhat sucks.)

What hash algorithms are parallelizable? Optimizing the hashing of large files utilizing on multi-core CPUs

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out.
I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel.
Are any of the mainstream hash algorithms parallelizable?
Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)?
As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?
There is actually a lot of research going on in this area. The US National Institute of Standards and Technology is currently holding a competition to design the next-generation of government-grade hash function. Most of the proposals for that are parallelizable.
One example: http://www.schneier.com/skein1.2.pdf
Wikipedia's description of current status of the contest: http://en.wikipedia.org/wiki/SHA-3
What kind of SSD do you have ? My C implementation of MD5 runs at 400 MB/s on a single Intel Core2 core (2.4 GHz, not the latest Intel). Do you really have SSD which support a bandwidth of 1.6 GB/s ? I want the same !
Tree hashing can be applied on any hash function. There are a few subtleties and the Skein specification tries to deal with them, integrating some metadata in the function itself (this does not change much things for performance), but the "tree mode" of Skein is not "the" Skein as submitted to SHA-3. Even if Skein is selected as SHA-3, the output of a tree-mode hash would not be the same as the output of "plain Skein".
Hopefully, a standard will be defined at some point, to describe generic tree hashing. Right now there is none. However, some protocols have been defined with support for a custom tree hashing with the Tiger hash function, under the name "TTH" (Tiger Tree Hash) or "THEX" (Tree Hash Exchange Format). The specification for TTH appears to be a bit elusive; I find some references to drafts which have either moved or disappeared for good.
Still, I am a bit dubious about the concept. It is kind of neat, but provides a performance boost only if you can read data faster than what a single core can process, and, given the right function and the right implementation, a single core can hash quite a lot of data per second. A tree hash spread over several cores requires having the data sent to the proper cores, and 1.6 GB/s is not the smallest bandwidth ever.
SHA-256 and SHA-512 are not very fast. Among the SHA-3 candidates, assuming an x86 processor in 64-bit mode, some of them achieve high speed (more than 300 MB/s on my 2.4 GHz Intel Core2 Q6600, with a single core -- that's what I can get out of SHA-1, too), e.g. BMW, SHABAL or Skein. Cryptographically speaking, these designs are a bit too new, but MD5 and SHA-1 are already cryptographically "broken" (quite effectively in the case of MD5, rather theoretically for SHA-1) so any of the round-2 SHA-3 candidates should be fine.
When I put my "seer" cap, I foresee that processors will keep on becoming faster than RAM, to the point that hashing cost will be dwarfed out by memory bandwidth: the CPU will have clock cycles to spare while it waits for the data from the main RAM. At some point, the whole threading model (one big RAM for many cores) will have to be amended.
You didn't say what you need your hash for.
If you're not gonna exchange it with the outside world but just for internal use, simply divide each file in chunks, compute and store all the checksums. You can then use many cores just by throwing a chunk to each one.
Two solutions that comes to mind is dividing files in fixed-size chunks (simpler, but will use less cores for smaller files where you're not supposed to need all that power) or in a fixed-number of chunks (will use all the cores for every file). Really depends on what you want to achieve and what your file size distribution looks like.
If, on the other hand, you need hashes for the outside world, as you can read from the other replies it's not possible with "standard" hashes (eg. if you want to send out SHA1 hashes for others to check with different tools) so you must look somewhere else. Like computing the hash when you store the file, for later retrieval, or compute hashes in background with the 'free' cores and store for later retrieval.
The better solution depends on what your constraints are and where you can invest space, time or cpu power.