I read that Facebook on the 1st Oct 2015 will move from SHA-1 to SHA-2 and we have to update our applications: https://developers.facebook.com/blog/post/2015/06/02/SHA-2-Updates-Needed/
Do you know which function of SHA-2 will it use?
I read there are several (224, 256, 384 or 512) and one of these (SHA-224) doesn't work with the Windows XP SP3 which I use (http://blogs.msdn.com/b/alejacma/archive/2009/01/23/sha-2-support-on-windows-xp.aspx)
You don't have to care that much because usage of the SHA-224 is quite limited.
In this question CBroe has pointed out an important remark:
That blog post is about SSL connections when your app is making API
requests. This is not about anything you do with data within your app,
it is about the transport layer.
According to the https://crypto.stackexchange.com/questions/15151/sha-224-purpose
Answer by Ilmari Karonen:
Honestly, in practice, there are very few if any reasons to use
SHA-224.
As fgrieu notes, SHA-224 is simply SHA-256 with a different IV and
with 32 of the output bits thrown away. For most purposes, if you want
a hash with more than 128 but less than 256 bits, simply using SHA-256
and truncating the output yourself to the desired bit length is
simpler and just as efficient as using SHA-224. As you observe,
SHA-256 is also more likely to be available on different platforms
than SHA-224, making it the better choice for portability.
Why would you ever want to use SHA-224, then?
The obvious use case is if you need to implement an existing protocol
that specifies the use of SHA-224 hashes. While, for the reasons
described above, it's not a very common choice, I'm sure such
protocols do exist.
Also, a minor advantage of SHA-224 over truncated SHA-256 is that, due
to the different IV, knowing the SHA-224 hash of a given message does
not reveal anything useful about its SHA-256 hash, or vice versa. This
is really more of an "idiot-proofing" feature; since the two hashes
have different names, careless users might assume that their outputs
have nothing in common, so NIST changed the IV to ensure that this is
indeed the case.
However, this isn't really something you should generally rely on. If
you really need to compute multiple unrelated hashes of the same input
string, what you probably want instead is a keyed PRF like HMAC, which
can be instantiated using any common hash function (such as SHA-256).
As you've mentioned, Windows XP with SP3 doesn't support SHA-224 but it supports SHA-256:
Check also: https://security.stackexchange.com/questions/1751/what-are-the-realistic-and-most-secure-crypto-for-symmetric-asymmetric-hash
Especially: https://stackoverflow.com/a/817121/3964066
And: https://security.stackexchange.com/a/1755
Part of the Thomas Pornin's answer:
ECDSA over a 256-bit curve already achieves an "unbreakable" level of
security (i.e. roughly the same level than AES with a 128-bit key, or
SHA-256 against collisions). Note that there are elliptic curves on
prime fields, and curves on binary fields; which kind is most
efficient depends on the involved hardware (for curves of similar
size, a PC will prefer the curves on a prime field, but dedicated
hardware will be easier to build with binary fields; the CLMUL
instructions on the newer Intel and AMD processors may change that).
SHA-512 uses 64-bit operations. This is fast on a PC, not so fast on a
smartcard. SHA-256 is often a better deal on small hardware (including
32-bit architectures such as home routers or smartphones)
Related
I'm working on design of Content-addressable storage, so I'm looking for a hash function to generate object identifiers. Every object should get short ID based on its content in that way: object_id = hash(object_content).
Prerequisites:
Hash-function should be fast.
Collision probability must be as low as possible.
Optimal ID length is 32 bytes in order to address 256^32 objects at max (but this requirement may be relaxed).
Taking into account these requirements, I picked up SHA256 hash, but unfortunately it's not fast enough for my purposes. The fastest implementations of SHA256 that I was able to benchmark were openssl and boringssl: on my desktop Intel Core I5 6400 it gave about 420 MB/s per core. Other implementations (like crypto/rsa in Go) are even slower. I would like to replace SHA256 with other hash function that provides the same collision guarantees as SHA256, but gives betters throughput (at least 600 MB/s per core).
Please share your opinion about possible options to solve this problem.
Also I would like to note that hardware update (like purchasing modern CPU with AVX512 instruction set) is not possible. The main point is to find hash function that will provide better performance on commodity hardware.
Check out Cityhash and HighwayHash. Both have 256-bit variants, and much faster than SHA256. Cityhash is faster, but it is a non-cryptographic hash. HighwayHash is slower (but still faster than SHA256), and a secure hash.
All modern non-cryptographic hashes are much faster than SHA256. If you're willing to use a 128-bit hash, you'll have more options.
Note, that you may want to consider using a 128-bit hash, as it may be adequate for your purpose. For example, if you have 1010 different objects, the probability that you have a collision with a quality 128-bit hash is less than 10-18. Check out the table here.
Finally, for my use-case BLAKE2S_256 turns out to be a better option than SHA256.
I'm a bit conflicted with an answer when I google for this, as these algos are constantly improving and new exploits are being found and new issues come up all the time... a lot of advice on what algo to use is simply old, or keeping ideas from an older time when they were the best way.
I want to be very clear here: I'm not talking about passwords. I'm talking about message digests, not cryptographic hashes.
I could go ahead and use md5 as my first inkling for message digest (it's right in the name), but then I remembered there's more collisions than more modern algos out there. But then, what makes these newer algos more suitable for the message digest of a file or short string?
So that's my question, what's the modern message digest algo that should be used?
From that perspective, depending on the amount of data you are working with, SHA1 should do fine - if you will be working with larger amounts of data, a SHA-2 algorithm, such as SHA-256 might be more suitable as the fear of collisions in SHA1 is rising due to a flaw in its algorithm, but it isn't extremely serious when working with smallish amounts of data.
MD5 has been shown to be too vulnerable to collisions, as there have been attacks on SSL certificates that used MD5 to create a forged SSL certificate, so I'd stay away from there. Also depending on your application, MD5 is not FIPS 140 compliant, if that is of any importance to you.
SHA1 is ideal over MD5 because it is safer as MD5 is risky to use, and SHA1 has better performance in most common circumstances than SHA-2. The SHA-2 algorithms are by no means slow - but it has an edge. However, SHA1 is slightly riskier because you've probably locked yourself into using it - if collisions start to be found, it might be hard for you to change, so it might be better to invest in a SHA-2 algorithm up-front. The penalty for using SHA-256 over SHA-1 is very little, depending on how you will be using the SHA algorithm. SHA-2 algorithms produce a much larger output than SHA1, but at the benefit of the reduced chances of a collision.
So which one is right? It depends on what you are looking for and what your use case it. Hopefully now you can make a decision.
When in doubt, use SHA-256. The other SHA-2 functions are fine too; however, SHA-384 and SHA-512 may suffer from a non-negligible performance degradation on small (32-bit only) platforms. This may matter for some specific applications.
For non-security related usages (e.g. first pass of indexing in a hash table, or detection of accidental, non-malicious data alteration -- the kind of job where you could use a CRC), consider MD4, a predecessor to MD5. MD4 is even more broken than MD5, but also simpler to implement (with shorter code) and faster (actually, it has been measured to be faster than CRC32 on some ARM platforms).
Is there a library that uses Blowfish in the Merkle–Damgård construction, for the purpose of constructing a cryptographic hash? I'm not interested in password hashing, but a general purpose cryptographic hash. (In an application where we're already using Blowfish.)
Rolling your own crypto is a VERY VERY BAD idea. Read it, repeat it loudly, do it again.
Especially for hash functions. Cryptographers around the world are currently in the process of designing a new hash function, through the SHA-3 competition. It began in 2007, it will supposedly end in 2012, and several dozens (more probably hundreds) of smart people who specialize at cryptographic design (read: PhD and more) work hard at it. Assuming that you can, by yourself in a few weeks, do better than all those people in five years, verges on the preposterous. It turns out that building a secure hash function is a difficult problem (on a theoretical point of view, we do not even know if a secure hash function can really exist). Building a secure block cipher is quite easier.
The designer of Blowfish (Bruce Schneier) is one of the designers of Skein, one of the candidates for SHA-3. Note that he did not reuse Blowfish for that. Note that he also published in 1998 the Twofish block cipher, a candidate to the AES selection process, as a much advanced successor to Blowfish; Twofish was much more scrutinized than Blowfish, so even for symmetric encryption you should not use Blowfish but Twofish (or, better yet, use the AES, aka "Rijndael", which was preferred over Twofish).
Blowfish is problematic to use as a hash compression function, for a couple of reasons.:
Firstly, many of the unbroken schemes for turning block ciphers into hash functions produce a hash that is the same length as the block cipher's block length. For Blowfish, with only a 64 bit block size, this is insufficient - a 64 bit hash length provides only 32 bits of security, which is trivially defeated.
Secondly, all of the secure schemes change the block cipher key on every block of the input message. Blowfish has a notoriously slow key setup procedure, so a hash based on it will necessarily be slow too.
If you remain undeterred, look up double-block-length hash constructions such as Tandem Davies-Meyer and Abreast Davies-Meyer. However, I would strongly suggest that you use an implemention of a function from the SHA-2 family instead - these are also easy to find, are fast and are considered secure. You will not gain anything by re-using Blowfish as your hash function.
I've been asked to look for a perfect hash/one way function to be able to hash 10^11 numbers.
However as we'll be using a embedded device it wont have the memory to store the relevant buckets so I was wondering if it's possible to have a decent (minimal) perfect hash without them?
The plan is to use the device to hash the number(s) and we use a rainbow table or a file using the hash as the offset.
Cheers
Edit:
I'll try to provide some more info :)
1) 10^11 is actually now 10^10 so that makes it easer.This number is the possible combinations. So we could get a number between 0000000001 and 10000000000 (10^10).
2) The plan is to us it as part of a one way function to make the number secure so we can send it by insecure means.
We will then look up the original number at the other end using a rainbow table.
The problem is that the source the devices generally have 512k-4Meg of memory to use.
3) it must be perfect - we 100% cannot have a collision .
Edit2:
4) We cant use encryption as we've been told it's not really possable on the devices and keymanigment would be a nightmare if we could.
Edit3:
As this is not sensible, Its purely academic question now (I promise)
Okay, since you've clarified what you're trying to do, I rewrote my answer.
To summarize: Use a real encryption algorithm.
First, let me go over why your hashing system is a bad idea.
What is your hashing system, anyway?
As I understand it, your proposed system is something like this:
Your embedded system (which I will call C) is sending some sort of data with a value space of 10^11. This data needs to be kept confidential in transit to some sort of server (which I will call S).
Your proposal is to send the value hash(salt + data) to S. S will then use a rainbow table to reverse this hash and recover the data. salt is a shared value known to both C and S.
This is an encryption algorithm
An encryption algorithm, when you boil it down, is any algorithm that gives you confidentiality. Since your goal is confidentiality, any algorithm that satisfies your goals is an encryption algorithm, including this one.
This is a very poor encryption algorithm
First, there is an unavoidable chance of collision. Moreover, the set of colliding values differs each day.
Second, decryption is extremely CPU- and memory-intensive even for the legitimate server S. Changing the salt is even more expensive.
Third, although your stated goal is avoiding key management, your salt is a key! You haven't solved key management at all; anyone with the salt will be able to crack the message just as well as you can.
Fourth, it's only usable from C to S. Your embedded system C will not have enough computational resources to reverse hashes, and can only send data.
This isn't any faster than a real encryption algorithm on the embedded device
Most secure hashing algorithm are just as computationally expensive as a reasonable block cipher, if not worse. For example, SHA-1 requires doing the following for each 512-bit block:
Allocate 12 32-bit variables.
Allocate 80 32-bit words for the expanded message
64 times: Perform three array lookups, three 32-bit xors, and a rotate operation
80 times: Perform up to five 32-bit binary operations (some combination of xor, and, or, not, and and depending on the round); then a rotate, array lookup, four adds, another rotate, and several memory loads/stores.
Perform five 32-bit twos-complement adds
There is one chunk per 512-bits of the message, plus a possible extra chunk at the end. This is 1136 binary operations per chunk (not counting memory operations), or about 16 operations per byte.
For comparison, the RC4 encryption algorithm requires four operations (three additions, plus an xor on the message) per byte, plus two array reads and two array writes. It also requires only 258 bytes of working memory, vs a peak of 368 bytes for SHA-1.
Key management is fundamental
With any confidentiality system, you must have some sort of secret. If you have no secrets, then anyone else can implement the same decoding algorithm, and your data is exposed to the world.
So, you have two choices as to where to put the secrecy. One option is to make the encipherpent/decipherment algorithms secret. However, if the code (or binaries) for the algorithm is ever leaked, you lose - it's quite hard to replace such an algorithm.
Thus, secrets are generally made easy to replace - this is what we call a key.
Your proposed usage of hash algorithms would require a salt - this is the only secret in the system and is therefore a key. Whether you like it or not, you will have to manage this key carefully. And it's a lot harder to replace if leaked than other keys - you have to spend many CPU-hours generating a new rainbow table every time it's changed!
What should you do?
Use a real encryption algorithm, and spend some time actually thinking about key management. These issues have been solved before.
First, use a real encryption algorithm. AES has been designed for high performance and low RAM requirements. You could also use a stream cipher like RC4 as I mentioned before - the thing to watch out for with RC4, however, is that you must discard the first 4 kilobytes or so of output from the cipher, or you will be vulnerable to the same attacks that plauged WEP.
Second, think about key management. One option is to simply burn a key into each client, and physically go out and replace it if the client is compromised. This is reasonable if you have easy physical access to all of the clients.
Otherwise, if you don't care about man-in-the-middle attacks, you can simply use Diffie-Hellman key exchange to negotiate a shared key between S and C. If you are concerned about MitMs, then you'll need to start looking at ECDSA or something to authenticate the key obtained from the D-H exchange - beware that when you start going down that road, it's easy to get things wrong, however. I would recommend implementing TLS at that point. It's not beyond the capabilities of an embedded system - indeed, there are a number of embedded commercial (and open source) libraries available already. If you don't implement TLS, then at least have a professional cryptographer look over your algorithm before implementing it.
There is obviously no such thing as a "perfect" hash unless you have at least as many hash buckets as inputs; if you don't, then inevitably it will be possible for two of your inputs to share the same hash bucket.
However, it's unlikely you'll be storing all the numbers between 0 and 10^11. So what's the pattern? If there's a pattern, there may be a perfect hash function for your actual data set.
It's really not that important to find a "perfect" hash function anyway, though. Hash tables are very fast. A function with a very low collision rate - and when hashing integers, that means nearly any simple function, like modulus - is fine and you'll get O(1) average performance.
I don't mean for this to be a debate, but I'm trying to understand the technical rationale behind why so many apps use SHA1 for hashing secrets, when SHA512 is more secure. Perhaps it's simply for backwards compatibility.
Besides the obvious larger size (128 chars vs 40), or slight speed differences, is there any other reason why folks use the former?
Also, SHA-1 I believe was first cracked by a VCR's processor years ago. Has anyone cracked 512 yet (perhaps with a leaf blower), or is it still safe to use without salting?
Most uses of SHA-1 are for interoperability: we use SHA-1 when we implement protocols where SHA-1 is mandated. Ease of development also comes into account: SHA-1 implementations in various languages and programming environment are more common than SHA-512 implementations.
Also, even so most usages of hash functions do not have performance issues (at least, no performance issue where the hash function is the bottleneck), there are some architectures where SHA-1 is vastly more efficient than SHA-512. Consider a basic Linksys router: it uses a Mips-derivative CPU, clocked at 200 MHz. Such a machine can be reprogrammed, e.g. with OpenWRT (a small Linux for embedded systems). As a router, it has fast network (100Mbit/s). Suppose that you want to hash some data (e.g. as part of some VPN software -- a router looks like a good candidate for running a VPN). With SHA-1, you will get about 6 MB/s, using the full CPU. That's already quite lower than the network bandwidth. SHA-512 will give you no more than 1.5 MB/s on the same machine. On such a system, the difference in performance is not negligible. Also, if I use SHA-1 on my Linksys router for some communication protocol, then the machine at the other end of the link will also have to use SHA-1.
The good news is that there is an ongoing competition to select a new standard hash function, code-named SHA-3. Some of the competing candidates provide performance similar to SHA-1, or even somewhat better, while still yielding a 512-bit output and be (probably) as secure as SHA-512.
Both SHA1 and SHA512 are hash functions. If you are using them as a cryptographic hash, then perhaps that is good reason to use SHA512; however, there are applications that use these function simply to identify objects. For example, Git uses SHA1 to cheaply distinguish between objects. In that case, because the possibility of collision between two documents is incredibly small with SHA1, there really is no justification for the additional space requirement of SHA512 when SHA1 is more than suitable for the task.
In terms of cryptographic hashes and the choice to use a salt or not, you may be interested in reading Don't Hash Secrets. Even with SHA512, using a salt is a good idea (and it's cheap to do, too, so why not do it?), because you can guess the top passwords and see if they have the same hash, but the author points out that HMAC is a more secure mechanism. In any case, you will have to determine the costs associated with the extra time+space and the costs associated with the possibility of a breach, and determine how paranoid you want to be. As was recently discovered by Microsoft, constantly changing passwords is a waste of money and doesn't pay off, so while paranoia is usually good when it comes to security, you really have to do the math to determine if it makes sense.... do the gains in security outweigh time and storage costs?
If you need something to be hashed quickly, or only need a 160 bit hash, you'd use SHA-1.
For comparing database entries to one another quickly, you might take 100 fields and make a SHA-1 hash from them, yielding 160 bits. Those 160 bits are 10^50ish values.
If I'm unlikely to ever have more than a tiny fraction of 10^50th values, it's quicker to just hash what I have with the simpler and faster algorithm.