Blowfish in the Merkle–Damgård construction? - hash

Is there a library that uses Blowfish in the Merkle–Damgård construction, for the purpose of constructing a cryptographic hash? I'm not interested in password hashing, but a general purpose cryptographic hash. (In an application where we're already using Blowfish.)

Rolling your own crypto is a VERY VERY BAD idea. Read it, repeat it loudly, do it again.
Especially for hash functions. Cryptographers around the world are currently in the process of designing a new hash function, through the SHA-3 competition. It began in 2007, it will supposedly end in 2012, and several dozens (more probably hundreds) of smart people who specialize at cryptographic design (read: PhD and more) work hard at it. Assuming that you can, by yourself in a few weeks, do better than all those people in five years, verges on the preposterous. It turns out that building a secure hash function is a difficult problem (on a theoretical point of view, we do not even know if a secure hash function can really exist). Building a secure block cipher is quite easier.
The designer of Blowfish (Bruce Schneier) is one of the designers of Skein, one of the candidates for SHA-3. Note that he did not reuse Blowfish for that. Note that he also published in 1998 the Twofish block cipher, a candidate to the AES selection process, as a much advanced successor to Blowfish; Twofish was much more scrutinized than Blowfish, so even for symmetric encryption you should not use Blowfish but Twofish (or, better yet, use the AES, aka "Rijndael", which was preferred over Twofish).

Blowfish is problematic to use as a hash compression function, for a couple of reasons.:
Firstly, many of the unbroken schemes for turning block ciphers into hash functions produce a hash that is the same length as the block cipher's block length. For Blowfish, with only a 64 bit block size, this is insufficient - a 64 bit hash length provides only 32 bits of security, which is trivially defeated.
Secondly, all of the secure schemes change the block cipher key on every block of the input message. Blowfish has a notoriously slow key setup procedure, so a hash based on it will necessarily be slow too.
If you remain undeterred, look up double-block-length hash constructions such as Tandem Davies-Meyer and Abreast Davies-Meyer. However, I would strongly suggest that you use an implemention of a function from the SHA-2 family instead - these are also easy to find, are fast and are considered secure. You will not gain anything by re-using Blowfish as your hash function.

Related

What are the downsides of using my own hashing algorithm instead of popular ones available?

I am a noob in algorithms and not really so smart. But I have a question in my mind. There are a lot of hashing algorithms available and those might be 10 times more complex than what I wrote, but almost all of them are predictable these days. Recently, I read that writing my own hashing function is not a good idea. But why? I was wondering how a program/programmer can break my logic that (for example) creates a unique hash for each string in 5+ steps. Suppose someone successfully injected a SQL query in my server and got all the hashes stored. How a program (like hashcat) may help him to decrypt those hashes? I can see a strong side of my own algorithm in this case, that it is known by no one and the hacker has no idea how it was implemented. On the other hand, well-known algorithms (like sha-1) are not unpredictable anymore. There are websites available that are highly eligible to efficiently break those hashes. So, my simple question is, why smart people do not recommend to use self-written hashing algorithms?
Security by obscurity can be an advantage, but you should never rely on it. You rely on the fact that your code stays secret, as soon as it becomes known (shared hosting, backups, source-control, ...) the stored passwords are propably not safe anymore.
Inventing a new safe algorithm is extremely difficult, even for cryptographers. There are many points to consider like correct salting or key-stretching, making sure that similar output does not allow to draw conclusions about the similarity of the input, and so on... Only algorithms withstanding years of attacks by other cryptographers are regarded as safe.
There is a better alternative to inventing your own scheme. With inventing an algorithm you actually add a secret to the hashing (your code), only with the knowledge of this code an attacker can start brute-forcing the passwords. A better way to add a secret is:
Hash the passwords with a known proven algorithm (BCrypt, SCrypt, PBKDF2).
Encrypt the resulting hash with a secret server-side key (two-way encryption).
This way you can also add a secret (the server side key). Only if the attacker has privileges on the server he can know the key, in this case (s)he would also know your algorithm. This scheme also allows to exchange the key when necessary, exchanging the hash algorithm would be much more difficult.

Which SHA-2 function will Facebook use?

I read that Facebook on the 1st Oct 2015 will move from SHA-1 to SHA-2 and we have to update our applications: https://developers.facebook.com/blog/post/2015/06/02/SHA-2-Updates-Needed/
Do you know which function of SHA-2 will it use?
I read there are several (224, 256, 384 or 512) and one of these (SHA-224) doesn't work with the Windows XP SP3 which I use (http://blogs.msdn.com/b/alejacma/archive/2009/01/23/sha-2-support-on-windows-xp.aspx)
You don't have to care that much because usage of the SHA-224 is quite limited.
In this question CBroe has pointed out an important remark:
That blog post is about SSL connections when your app is making API
requests. This is not about anything you do with data within your app,
it is about the transport layer.
According to the https://crypto.stackexchange.com/questions/15151/sha-224-purpose
Answer by Ilmari Karonen:
Honestly, in practice, there are very few if any reasons to use
SHA-224.
As fgrieu notes, SHA-224 is simply SHA-256 with a different IV and
with 32 of the output bits thrown away. For most purposes, if you want
a hash with more than 128 but less than 256 bits, simply using SHA-256
and truncating the output yourself to the desired bit length is
simpler and just as efficient as using SHA-224. As you observe,
SHA-256 is also more likely to be available on different platforms
than SHA-224, making it the better choice for portability.
Why would you ever want to use SHA-224, then?
The obvious use case is if you need to implement an existing protocol
that specifies the use of SHA-224 hashes. While, for the reasons
described above, it's not a very common choice, I'm sure such
protocols do exist.
Also, a minor advantage of SHA-224 over truncated SHA-256 is that, due
to the different IV, knowing the SHA-224 hash of a given message does
not reveal anything useful about its SHA-256 hash, or vice versa. This
is really more of an "idiot-proofing" feature; since the two hashes
have different names, careless users might assume that their outputs
have nothing in common, so NIST changed the IV to ensure that this is
indeed the case.
However, this isn't really something you should generally rely on. If
you really need to compute multiple unrelated hashes of the same input
string, what you probably want instead is a keyed PRF like HMAC, which
can be instantiated using any common hash function (such as SHA-256).
As you've mentioned, Windows XP with SP3 doesn't support SHA-224 but it supports SHA-256:
Check also: https://security.stackexchange.com/questions/1751/what-are-the-realistic-and-most-secure-crypto-for-symmetric-asymmetric-hash
Especially: https://stackoverflow.com/a/817121/3964066
And: https://security.stackexchange.com/a/1755
Part of the Thomas Pornin's answer:
ECDSA over a 256-bit curve already achieves an "unbreakable" level of
security (i.e. roughly the same level than AES with a 128-bit key, or
SHA-256 against collisions). Note that there are elliptic curves on
prime fields, and curves on binary fields; which kind is most
efficient depends on the involved hardware (for curves of similar
size, a PC will prefer the curves on a prime field, but dedicated
hardware will be easier to build with binary fields; the CLMUL
instructions on the newer Intel and AMD processors may change that).
SHA-512 uses 64-bit operations. This is fast on a PC, not so fast on a
smartcard. SHA-256 is often a better deal on small hardware (including
32-bit architectures such as home routers or smartphones)

is perfect hashing without buckets possible?

I've been asked to look for a perfect hash/one way function to be able to hash 10^11 numbers.
However as we'll be using a embedded device it wont have the memory to store the relevant buckets so I was wondering if it's possible to have a decent (minimal) perfect hash without them?
The plan is to use the device to hash the number(s) and we use a rainbow table or a file using the hash as the offset.
Cheers
Edit:
I'll try to provide some more info :)
1) 10^11 is actually now 10^10 so that makes it easer.This number is the possible combinations. So we could get a number between 0000000001 and 10000000000 (10^10).
2) The plan is to us it as part of a one way function to make the number secure so we can send it by insecure means.
We will then look up the original number at the other end using a rainbow table.
The problem is that the source the devices generally have 512k-4Meg of memory to use.
3) it must be perfect - we 100% cannot have a collision .
Edit2:
4) We cant use encryption as we've been told it's not really possable on the devices and keymanigment would be a nightmare if we could.
Edit3:
As this is not sensible, Its purely academic question now (I promise)
Okay, since you've clarified what you're trying to do, I rewrote my answer.
To summarize: Use a real encryption algorithm.
First, let me go over why your hashing system is a bad idea.
What is your hashing system, anyway?
As I understand it, your proposed system is something like this:
Your embedded system (which I will call C) is sending some sort of data with a value space of 10^11. This data needs to be kept confidential in transit to some sort of server (which I will call S).
Your proposal is to send the value hash(salt + data) to S. S will then use a rainbow table to reverse this hash and recover the data. salt is a shared value known to both C and S.
This is an encryption algorithm
An encryption algorithm, when you boil it down, is any algorithm that gives you confidentiality. Since your goal is confidentiality, any algorithm that satisfies your goals is an encryption algorithm, including this one.
This is a very poor encryption algorithm
First, there is an unavoidable chance of collision. Moreover, the set of colliding values differs each day.
Second, decryption is extremely CPU- and memory-intensive even for the legitimate server S. Changing the salt is even more expensive.
Third, although your stated goal is avoiding key management, your salt is a key! You haven't solved key management at all; anyone with the salt will be able to crack the message just as well as you can.
Fourth, it's only usable from C to S. Your embedded system C will not have enough computational resources to reverse hashes, and can only send data.
This isn't any faster than a real encryption algorithm on the embedded device
Most secure hashing algorithm are just as computationally expensive as a reasonable block cipher, if not worse. For example, SHA-1 requires doing the following for each 512-bit block:
Allocate 12 32-bit variables.
Allocate 80 32-bit words for the expanded message
64 times: Perform three array lookups, three 32-bit xors, and a rotate operation
80 times: Perform up to five 32-bit binary operations (some combination of xor, and, or, not, and and depending on the round); then a rotate, array lookup, four adds, another rotate, and several memory loads/stores.
Perform five 32-bit twos-complement adds
There is one chunk per 512-bits of the message, plus a possible extra chunk at the end. This is 1136 binary operations per chunk (not counting memory operations), or about 16 operations per byte.
For comparison, the RC4 encryption algorithm requires four operations (three additions, plus an xor on the message) per byte, plus two array reads and two array writes. It also requires only 258 bytes of working memory, vs a peak of 368 bytes for SHA-1.
Key management is fundamental
With any confidentiality system, you must have some sort of secret. If you have no secrets, then anyone else can implement the same decoding algorithm, and your data is exposed to the world.
So, you have two choices as to where to put the secrecy. One option is to make the encipherpent/decipherment algorithms secret. However, if the code (or binaries) for the algorithm is ever leaked, you lose - it's quite hard to replace such an algorithm.
Thus, secrets are generally made easy to replace - this is what we call a key.
Your proposed usage of hash algorithms would require a salt - this is the only secret in the system and is therefore a key. Whether you like it or not, you will have to manage this key carefully. And it's a lot harder to replace if leaked than other keys - you have to spend many CPU-hours generating a new rainbow table every time it's changed!
What should you do?
Use a real encryption algorithm, and spend some time actually thinking about key management. These issues have been solved before.
First, use a real encryption algorithm. AES has been designed for high performance and low RAM requirements. You could also use a stream cipher like RC4 as I mentioned before - the thing to watch out for with RC4, however, is that you must discard the first 4 kilobytes or so of output from the cipher, or you will be vulnerable to the same attacks that plauged WEP.
Second, think about key management. One option is to simply burn a key into each client, and physically go out and replace it if the client is compromised. This is reasonable if you have easy physical access to all of the clients.
Otherwise, if you don't care about man-in-the-middle attacks, you can simply use Diffie-Hellman key exchange to negotiate a shared key between S and C. If you are concerned about MitMs, then you'll need to start looking at ECDSA or something to authenticate the key obtained from the D-H exchange - beware that when you start going down that road, it's easy to get things wrong, however. I would recommend implementing TLS at that point. It's not beyond the capabilities of an embedded system - indeed, there are a number of embedded commercial (and open source) libraries available already. If you don't implement TLS, then at least have a professional cryptographer look over your algorithm before implementing it.
There is obviously no such thing as a "perfect" hash unless you have at least as many hash buckets as inputs; if you don't, then inevitably it will be possible for two of your inputs to share the same hash bucket.
However, it's unlikely you'll be storing all the numbers between 0 and 10^11. So what's the pattern? If there's a pattern, there may be a perfect hash function for your actual data set.
It's really not that important to find a "perfect" hash function anyway, though. Hash tables are very fast. A function with a very low collision rate - and when hashing integers, that means nearly any simple function, like modulus - is fine and you'll get O(1) average performance.

Why use SHA1 for hashing secrets when SHA-512 is more secure?

I don't mean for this to be a debate, but I'm trying to understand the technical rationale behind why so many apps use SHA1 for hashing secrets, when SHA512 is more secure. Perhaps it's simply for backwards compatibility.
Besides the obvious larger size (128 chars vs 40), or slight speed differences, is there any other reason why folks use the former?
Also, SHA-1 I believe was first cracked by a VCR's processor years ago. Has anyone cracked 512 yet (perhaps with a leaf blower), or is it still safe to use without salting?
Most uses of SHA-1 are for interoperability: we use SHA-1 when we implement protocols where SHA-1 is mandated. Ease of development also comes into account: SHA-1 implementations in various languages and programming environment are more common than SHA-512 implementations.
Also, even so most usages of hash functions do not have performance issues (at least, no performance issue where the hash function is the bottleneck), there are some architectures where SHA-1 is vastly more efficient than SHA-512. Consider a basic Linksys router: it uses a Mips-derivative CPU, clocked at 200 MHz. Such a machine can be reprogrammed, e.g. with OpenWRT (a small Linux for embedded systems). As a router, it has fast network (100Mbit/s). Suppose that you want to hash some data (e.g. as part of some VPN software -- a router looks like a good candidate for running a VPN). With SHA-1, you will get about 6 MB/s, using the full CPU. That's already quite lower than the network bandwidth. SHA-512 will give you no more than 1.5 MB/s on the same machine. On such a system, the difference in performance is not negligible. Also, if I use SHA-1 on my Linksys router for some communication protocol, then the machine at the other end of the link will also have to use SHA-1.
The good news is that there is an ongoing competition to select a new standard hash function, code-named SHA-3. Some of the competing candidates provide performance similar to SHA-1, or even somewhat better, while still yielding a 512-bit output and be (probably) as secure as SHA-512.
Both SHA1 and SHA512 are hash functions. If you are using them as a cryptographic hash, then perhaps that is good reason to use SHA512; however, there are applications that use these function simply to identify objects. For example, Git uses SHA1 to cheaply distinguish between objects. In that case, because the possibility of collision between two documents is incredibly small with SHA1, there really is no justification for the additional space requirement of SHA512 when SHA1 is more than suitable for the task.
In terms of cryptographic hashes and the choice to use a salt or not, you may be interested in reading Don't Hash Secrets. Even with SHA512, using a salt is a good idea (and it's cheap to do, too, so why not do it?), because you can guess the top passwords and see if they have the same hash, but the author points out that HMAC is a more secure mechanism. In any case, you will have to determine the costs associated with the extra time+space and the costs associated with the possibility of a breach, and determine how paranoid you want to be. As was recently discovered by Microsoft, constantly changing passwords is a waste of money and doesn't pay off, so while paranoia is usually good when it comes to security, you really have to do the math to determine if it makes sense.... do the gains in security outweigh time and storage costs?
If you need something to be hashed quickly, or only need a 160 bit hash, you'd use SHA-1.
For comparing database entries to one another quickly, you might take 100 fields and make a SHA-1 hash from them, yielding 160 bits. Those 160 bits are 10^50ish values.
If I'm unlikely to ever have more than a tiny fraction of 10^50th values, it's quicker to just hash what I have with the simpler and faster algorithm.

Is Md5 Encryption Symmetric or Asymmetric?

For my iPhone application, Apple wants to know if my password encryption (md5) is greater then 64-bit symmetric or greater then 1024-bit symmetric. I have not been able to find it online, so I am wondering if anyone knows the answer. In addition, is this considered an appropriate encryption technology for passwords, or should I use something different?
Thanks for any help!
MD5 is a hashing function, thus by definition it is not reversible. This is not the case for encryption (either symmetric or asymmetric), which has to be reversible to be useful.
To be more precise, hashes are one-way functions, in that an infinite number of inputs can map to a single output, thus it is impossible to obtain the exact input, with certainty, that resulted in a given output.
However, it may be possible to find a different input that hashes to the same output. This is called a collision.
Generally, hashing passwords instead of storing the plain text (even encrypted) is a good idea. (Even better if using a salt) However, MD5 has known weaknesses (and large collections of rainbow tables that aid in finding collisions), thus it would be a good idea to switch to something like SHA-1 or one of the SHA-2 family of hashes.
However, to answer your original question, there is really is no way to compare MD5 or any hash against any type of encryption; they have no equivalents because it's like comparing apples and oranges.
md5 isn't really symmetric or asymmetric encryption because it isn't reversible either symmetrically or asymmetrically. It's a Message Digest (secure hash) algorithm.
It's not encryption, it's a digest. If you didn't salt it, it's not particularly secure, but they're asking you the wrong question.
What exactly are you doing with MD5 and passwords? There are standard ways of doing things here, and it's always better to use one, but without knowing what you want to do it's hard to point you at a relevant standard.
It is NOT encryption at all.
Apple asks the question about the use of MD5 for hashing passwords to see if it requires authorization for export from the Department of Commerce/Bureau of Industry and Security.
The answer for that purpose is that using MD5 for password protection is not controlled as strong encryption (like symmetric algorithms in excess of 64 bits) in accord with the Technical Note to 15 CFR part 774, Supplement 1, ECCN 5A002, paragraph a.1, which describes using encryption for password protection. However, it may still be controlled under ECCN 5A992.
http://www.bis.doc.gov/encryption/ccl5pt2.pdf
The other answers are not helpful in the context of why the question was asked.
Also, you may want to call the Department of Commerce/Bureau of Industry and Security at 202-482-0707 and ask about your specific application.
Hash function most of times is a way to compress your data. They are one-way hash functions, meaning that are difficult to reversed(having the hash function=digest of a message it is difficult to find the original message that is converted to the specific hash value). On the other hand, are very easy to implemented because there is no need of any type of key.
It is not a symmetric or asymmetric algorithm. These kind of algorithms are used to encrypt and not to hash data. Encryption is used for confidentiality reasons, to protect data from attackers where they try to read someone's.
Encryption or cipher algorithms need keys to perform their tasks in contrast to hashes where they do not need any kind of key. Hashes are not used for confidentiality reasons but for integrity reasons even if they do not have enough strength. MD5 is one type of a hash function where exists many others because MD5 is not strong enough
I think MD5 is used for better security.... if we tell about any encryption or decryption algorithm, they are just for converting any plain text into cipher text... but on the other hand MD5 provides an uniqueness on that plain text that would be sent by any source(Alice)...so we can say that for better security or for providing envelop on plain text MD5 should be used before using any encryption algothim(symmetric or asymmetric).
As the numerous other guys on here have mentioned, MD5 is not a symmetric or an asymmetric algorithm.
Instead it comes under a different branch in cryptography all together. It's one of the smallest hashing algorithms available in the .Net framework. At a mere 16bytes for its keysizes, which should be 128 bit. Something that you learn your bread and butter with.
So yes it is greater than 64bit which is only 8bytes in size.
The maximum key size the common symm' enc' algs use is 256bit (Rijndael Managed).
If you want to be looking at keysizes greater than that, then you can use the RC2 symm' enc' algs which supports variable key sizes. Something that you can experiment with?
If you want higher than 1024bit, then you need to be looking at Asymm' Enc' Algs like the RSACryptoServiceProvider class which supports key sizes going upto 16K in Bits I think?
If you want to use passwords, then you need to use Keyed Hashing Algs, like anything HMAC' something, they should be Keyed Hashing Algorithms or MacTripleDes. These all use secret keyes to encrypt the hash that is generated from the data you supply. The keys are created by using passwords and salt values via the RFC2898DerivesBytes class. <-- Don't forget that RC2, Rijndael, AES, DES and etc all can be set-up to use passwords to help derive the secret keys. In case you are thinking that the opening sentence of this paragraph is a little misleading. So i added this just to be sure in the event that hashing is not what you need altogether.
*REMEMBER THAT THERE ARE UNIQUE INHERITANCE HIERARCHIES IN .net's Cryptography NameSpace.
So MD5 is the base Abstract class all MD5 Derived classes are to derive from. .Net provides one such derived class that is called MD5CryptoServiceProvider class. Which is essentially a managed wrapper class that makes call to windows unmanaged Crypto-Libraries API. MD5 is known in MS official textbooks under the umbrella term as a Non-Keyed Hashing Algorithm. *
There are plenty of options available to you.
: ) Enjoy !