Can we reverse second sha256 hash? - hash

Can i reverse sha256 hash like 2nd hash to 1st hash ?
ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
da3811154d59c4267077ddd8bb768fa9b06399c486e1fc00485116b57c9872f5
2nd hash is generated by sha256(1) so is it possible to reverse to 1st hash ?

In short, as of 2019, NO.
Cryptographic Hash functions are, in short, one-way deterministic but random functions. Deterministic means the same input has always the same output and the random in the sense that the output is unpredictable.
In Cryptography, we consider the security of hash functions by
Preimage-Resistance: for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage x' such that h(x') = y when given any y for which a corresponding input is not known.
2nd-preimage resistance, weak-collision: it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given x, to find a 2nd-preimage x' != x such that h(x) = h(x').
Collision resistance: it is computationally infeasible to find any two distinct inputs x, x' which hash to the same output, i.e., such that h(x) = h(x').
What you are looking for is the preimage. There are cryptographic hash functions like MD4 and SHA-1 for those collisions are found. But all of them are still have pre and 2nd-preimage resistance.
For Sha256 there are no known pre-secondary yet collision attacks. It is considered a secure hash function.
You may find some rainbow tables for SHA-256 that may include your hash values but probably not since the space is too big to cover.

Hashing is meant to be a one way process. If a hashing algorithm were easily reversible, then it would be insecure. To answer your question, no, it's not possible to "unhash" 2 and obtain 1. In order to "crack" the second hash, you would have to brute force it by computing the sha256 of other strings and comparing the result with 2. If they match, then you (probably) have the original string.

Sha256 is a hash function, as defined in wikipedia https://en.wikipedia.org/wiki/Cryptographic_hash_function :
The ideal cryptographic hash function has five main properties:
it is deterministic so the same message always results in the same hash
it is quick to compute the hash value for any given message
it is infeasible to generate a message from its hash value except by trying all possible messages
a small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value
it is infeasible to find two different messages with the same hash value
By definition a hash function is useful as long as you cannot reverse to the input.

Related

Why isn't modulus sufficient within a hash function for hash tables?

I often see or hear of modulus being used as a last step of hashing or after hashing. e.g. h(input)%N where h is the hash function and % is the modulus operator. If I am designing a hash table, and want to map a large set of keys to a smaller space of indices for the hash table, doesn't the modulus operator achieve that? Furthermore, if I wanted to randomize the distribution across those locations within the hash table, is the remainder generated by modulus not sufficient? What does the hashing function h provide on top of the modulus operator?
I often see or hear of modulus being used as a last step of hashing or after hashing. e.g. h( input ) % N where h is the hash function and % is the modulus operator.
Indeed.
If I am designing a hash table, and want to map a large set of keys to a smaller space of indices for the hash table, doesn't the modulus operator achieve that?
That's precisely the purpose of the modulo operator: to restrict the range of array indexes, so yes.
But you cannot simply use the modulo operator by itself: the modulo operator requires an integer value: you cannot get the "modulo of a string over N" or "modulo of an object-graph over N"[1].
Furthermore, if I wanted to randomize the distribution across those locations within the hash table, is the remainder generated by modulus not sufficient?
No, it does not - because the modulo operator doesn't give you pseudorandom output - nor does it have any kind of avalanche effect - which means that similar input values will have similar output hashes, which will result in clustering in your hashtable bins, which will result in subpar performance due to the greatly increased likelihood of hash-collisions (and so requiring slower techniques like linear-probing which defeat the purpose of a hashtable because you lose O(1) lookup times.
What does the hashing function h provide on top of the modulus operator?
The domain of h can be anything, especially non-integer values.
[1] Technically speaking, this is possible if you use the value of the memory address of an object (i.e. an object pointer), but that doesn't work if you have hashtable keys that don't use object identity, such as a stack-allocated object or custom struct.
First, the hash function's primary purpose is to turn something that's not a number into a number. Even if you just use modulus after that to get a number in your range, getting the number is still the first step and is the responsibility of the hash function. If you're hashing integers and you just use the integers as their own hashes, it isn't that there's no hash function, it's that you've chosen the identity function as your hash function. If you don't write out the function, that means you inlined it.
Second, the hash function can provide a more unpredictable distribution to reduce the likelihood of unintentional collisions. The data people work with often contain patterns and if you're just using a simple identity function with modulus, the pattern in inputs may be such that the modulus is more likely to cause collisions. The hash function presents an opportunity to break this up so it becomes unlikely that modulus exposes patterns in the original data sequence.

Can we repeatedly hash an input and hash it indefinitely?

Was just wondering that if we are given an input string x and we hash it with function f to get f(x) can we repeat this process indefinitely i.e f(f(x)) and so on. Because most hash functions generate a different fixed output that is not the same as the input.
So by this premise, would we be able to carry this out indefinitely? One possible issue I can think is that it has to be fixed length and usually hashes are shorter than the input?
Please correct me if I am wrong. Would love an explanation!
Yes you absolutely can hash the prior hash output.
When we do this with cryptographic keys it’s called ratcheting.
The output size of the hashing algo will determine how many outputs you can rehash before you get a collision.
Thus for a 256-bit hash function we will see a collision with 50% probability after 2^128 hashing calls.

Hash UUIDs without requiring ordering

I have two UUIDs. I want to hash them perfectly to produce a single unique value, but with a constraint that f(m,n) and f(n,m) must generate the same hash.
UUIDs are 128-bit values
the hash function should have no collisions - all possible input pairings must generate unique hash values
f(m,n) and f(n,m) must generate the same hash - that is, ordering is not important
I'm working in Go, so the resulting value must fit in a 256-bit int
the hash does not need to be reversible
Can anyone help?
Concatenate them with the smaller one first.
To build on user2357112's brilliant solution and boil down the comment chain, let's consider your requirements one by one (and out of order):
No collisions
Technically, that's not a hash function. A hash function is about mapping heterogeneous, arbitrary length data inputs into fixed-width, homogenous outputs. The only way to accomplish that if the input is longer than the output is through some data loss. For most applications, this is tolerable because the hash function is only used as a fast lookup key and the code falls back onto the slower, complete comparison of the data. That's why many guides and languages insist that if you implement one, you must implement the other.
Fortunately, you say:
Two UUID inputs m and n
UUIDs are 128 bits each
Output of f(m,n) must be 256 bits or less
Combined your two inputs are exactly 256 bits, which means you do not have to lose any data. If you needed a smaller output, then you would be out of luck. As it is, you can concatenate the two numbers together and generate a perfect, unique representation.
f(m,n) and f(n,m) must generate the same hash
To accomplish this final requirement, make a decision on the concatenation order by some intrinsic value of the two UUIDs. The suggested smaller-first works just great. However...
The hash does not need to be reversible
If you specifically need irreversible hashing, that's a different question entirely. You could still use the less-than comparison to ensure order independence when feeding to a cryptographically hash function, but you would be hard pressed to find something that guaranteed no collisions even with fixed-width inputs a 256 bit output width.

Faster way to find the correct order of chunks to get a known SHA1 hash?

Say a known SHA1 hash was calculated by concatenating several chunks of data and that the order in which the chunks were concatenated is unknown. The straight forward way to find the order of the chunks that gives the known hash would be to calculate an SHA1 hash for each possible ordering until the known hash is found.
Is it possible to speed this up by calculating an SHA1 hash separately for each chunk and then find the order of the chunks by only manipulating the hashes?
In short, No.
If you are using SHA-1, due to Avalanche Effect ,any tiny change in the plaintext (in your case, your chunks) would alter its corresponding SHA-1 significantly.
Say if you have 4 chunks : A B C and D,
the SHA1 hash of A+B+C+D (concated) is supposed to be uncorrelated with the SHA1 hash for A, B, C and D computed as separately.
Since they are unrelated, you cannot draw any relationship between the concated chunk (A+B+C+D, B+C+A+D etc) and each individual chunk (A,B,C or D).
If you could identify any relationship in-between, the SHA1 hashing algorithm would be in trouble.
Practical answer: no. If the hash function you use is any good, then it is supposed to look like a Random Oracle, the output of which on an exact given input being totally unknown until that input is tried. So you cannot infer anything from the hashes you compute until you hit the exact input ordering that you are looking for. (Strictly speaking, there could exist a hash function which has the usual properties of a hash function, namely collision and preimage resistances, without being a random oracle, but departing from the RO model is still considered as a hash function weakness.)(Still strictly speaking, it is slightly improper to talk about a random oracle for a single, unkeyed function.)
Theoretical answer: it depends. Assuming, for simplicity, that you have N chunks of 512 bits, then you can arrange for the cost not to exceed N*2160 elementary evaluations of SHA-1, which is lower than N! when N >= 42. The idea is that the running state of SHA-1, between two successive blocks, is limited to 160 bits. Of course, that cost is ridiculously infeasible anyway. More generally, your problem is about finding a preimage to SHA-1 with inputs in a custom set S (the N! sequences of your N chunks) so the cost has a lower bound of the size of S and the preimage resistance of SHA-1, whichever is lower. The size of S is N!, which grows very fast when N is increased. SHA-1 has no known weakness with regards to preimages, so its resistance is still assumed to be about 2160 (since it has a 160-bit output).
Edit: this kind of question would be appropriate on the proposed "cryptography" stack exchange, when (if) it is instantiated. Please commit to help create it !
Depending on your hashing library, something like this may work: Say you have blocks A, B, C, and D. You can process the hash for block A, and then clone that state and calculate A+B, A+C, and A+D without having to recalculate A each time. And then you can clone each of those to calculate A+B+C and A+B+D from A+B, A+C+B and A+C+D from A+C, and so on.
Nope. Calculating the complete SHA1 hash requires that the chunks be put in in order. The calculation of the next hash chunk requires the output of the current one. If that wasn't true then it would be much easier to manipulate documents so that you could reorder the chunks at will, which would greatly decrease the usefulness of the algorithm.

What are the important points about cryptographic hash functions?

I was reading this question on MD5 hash values and the accepted answer confuses me. One of the main properties, as I understand it, of a cryptopgraphic hash function is that it is infeasible to find two different messages (inputs) with the same hash value.
Yet the consensus answer to the question Why aren't MD5 hash values reversible? is Because an infinite number of input strings will generate the same output. This seems completely contradictory to me.
Also, what perplexes me somewhat is the fact that the algorithms are public, yet the hash values are still irreversible. Is this because there is always data loss in a hash function so there's no way to tell which data was thrown away?
What happens when the input data size is smaller than the fixed output data size (e.g., hashing a password "abc")?
EDIT:
OK, let me see if I have this straight:
It is really, really hard to infer the input from the hash because there are an infinite amount of input strings that will generate the same output (irreversible property).
However, finding even a single instance of multiple input strings that generate the same output is also really, really hard (collision resistant property).
Warning: Long answer
I think all of these answers are missing a very important property of cryptographic hash functions: Not only is it impossible to compute the original message that was hashed to get a given hash, it's impossible to compute any message that would hash to a given hash value. This is called preimage resistance.
(By "impossible" - I mean that no one knows how to do it in less time than it takes to guess every possible message until you guess the one that was hashed into your hash.)
(Despite popular belief in the insecurity of MD5, MD5 is still preimage resistant. Anyone who doesn't believe me is free to give me anything that hashes to 2aaddf751bff2121cc51dc709e866f19. What MD5 doesn't have is collision resistance, which is something else entirely.)
Now, if the only reason you can't "work backwards" in a cryptographic hash function was because the hash function discards data to create the hash, then it would not guarantee preimage resistance: You can still "work backwards", and just insert random data wherever the hash function discards data, and while you wouldn't come up with the original message, you'd still come up with a message that hashes to the desired hash value. But you can't.
So the question becomes: Why not? (Or, in other words, how do you make a function preimage resistant?)
The answer is that cryptographic hash functions simulate chaotic systems. They take your message, break it into blocks, mix those blocks around, have some of the blocks interact with each other, mix those blocks around, and repeat that a lot of times (well, one cryptographic hash function does that; others have their own methods). Since the blocks interact with each other, block C not only has to interact with block D to produce block A, but it has to interact with block E to produce block B. Now, sure, you can find values of blocks C, D, E that would produce the blocks A and B in your hash value, but as you go further back, suddenly you need a block F that interacts with C to make D, and with E to make B, and no such block can do both at the same time! You must have guessed wrong values for C, D, and E.
While not all cryptographic hash functions are exactly as described above with block interaction, they have the same idea: That if you try to "work backwards", you're going to end up with a whole lot of dead ends, and the time it takes for you to try enough values to generate a preimage is on the order of hundreds to millions of years (depending on the hash function), not much better than the time it would take just to try messages until you find one that works.
1: The primary purpose of a hash is to map a very, very large space to a smaller but still very large space (e.g., MD5, which will take 'anything' and convert it into a space of size 2^128 -- big, but not nearly as big as aleph-0.)
In addition to other features, good hashes fill the destination space homogeneously. Bad hashes fill the space in a clumpy fashion, coming up with the same hash for many common inputs.
Imagine the idiotic hash function sum(), which just adds all the digits of the input number: it succeeds in mapping down, but there are a bunch of collisions (inputs with the same output, like 3 and 12 and 21) at the low end of the output space and the upper end of the space is nearly empty. As a result it makes very poor use of the space, is easy to crack, etc.
So a good hash that makes even use of the destination space will make it difficult to find two inputs with the same output, just by the odds: if MD5 were perfect, the odds that two inputs would have the same output would be 2^-128. That's pretty decent odds: the best you can do without resorting to a larger output space. (In truth MD5 isn't perfect, which is one of the things that makes it vulnerable.)
But it will still be true that a huge number of inputs will map to any given hash, because the input space is 'infinite', and dividing infinity by 2^128 still gives you infinity.
2: Yes, hashes always cause data loss, except in the case where your output space is the same as, or larger than, your input space -- and in that case you probably didn't need to hash!
3: For smaller inputs, best practice is to salt the input. Actually, that's good practice for any cryptographic hashing, because otherwise an attacker can feed you specific inputs and try to figure out which hash you are using. 'Salt' is just a set of additional information that you append (or prepend) to your input; you then hash the result.
edit: In cryptography, it is also important that the hash function is resistant to preimage attacks, intuitively, that is hard to guess the input for a given output even knowing many other input/output pairs. The "sum" function could probably be guessed rather easily (but since it destroys data still might not be easy to reverse).
You may be confused, because the answer to the question you cite is confusing.
One of the requirements for a cryptographic hash function is that it should be preimage resistant. That is, if you know MD5(x) but not the message x, then it is difficult to find any x' (either equal x or different from x) such that MD5(x') = MD5(x).
Being preimage resistant is a different property than being reversible. A function is reversible if given y = f(x) there is exactly one x which fits (whether this is easy or not). For example define f(x) = x mod 10.
Then f is not reversible. From f(x) = 7 you can't determine whether x was 17, 27 or something else. But f is not preimage resistant, since values x' such that f(x) = 7 are easy to find. x' = 17, 27, 12341237 etc all work.
When doing crypto you usually need functions that are preimage resistant (and other properties such as collision resistance), not just something that is not reversible.
These are the properties of hash functions in general.
A word of caution though, MD5 shouldn't be used anymore because of vulnerabilities that have been found in it. Check the 'Vulnerabilities' section and external links detailing these attacks. http://en.wikipedia.org/wiki/Md5 You can make an MD5 collision by changing only 128 bits in a message.
SHA-1 is safe for simple hashing although there are some attacks that would make it weaker against well-funded entities (Governments, large corporations)
SHA-256 is a safe starting point against technology for the next couple decades.
Yet the consensus answer to the question "why aren't MD5 hash values reversible?" is because "an infinite number of input strings will generate the same output."
This is true for any hash function, but it is not the essence of a cryptographic hash function.
For short input strings such as passwords it is theoretically possible to reverse a cryptographic hash function, but it ought to be computationally infeasible. I.e. your computation would run too long to be useful.
The reason for this infeasibility is that the input is so thoroughly "mixed together" in the hash value that it becomes impossible to disentangle it with any less effort than the brute force attack of computing the hash value for all inputs
"why aren't MD5 hash values reversible?" is because "an infinite number of input strings >will generate the same output"
this is the reason that it isn't possible to reverse the hash function (get the same input).
cryptographic hash functions are collision resistant, that means that it's also hard to find another input value that maps to the same output (if your hash function was mod 2 : 134 mod 2 = 0; now you can't get the 134 back from the result, but we can stil find number 2 with the same output value (134 and 2 collide)).
When the input is smaller than the block size, padding is used to fit it to the block size.