Does adding two hash values generate a valid hash? - hash

Does adding 2 hash values generate another valid hash value? In other words will this hash(a) + hash(b) != hash(c) + hash(d) always be true? I don't think it will but does it matter? Are the essential properties of the hash function preserved under addition?

Since several values can have the same hash, it could be that hash(a) = hash(b) = hash(c) = hash(d), so also hash(a) + hash(b) = hash(c) + hash(d).

A hash, by the pidgeonhole theorem, can't be collision free. So hash(a)+hash(b) == hash(c)+hash(d) for some values of a, b, c, and d. Adding hash functions still gets you the good qualities of the hashes that you added together, but it won't make the result any better than the better of the two. (You're not increasing your hash table space.)On second thought, the result will be only as good as the worse hash that you added.

This depends on your hashing algorithm, and in general it will not be true. Given a finite hash table, any two hash keys near the end of the table when added together will clearly give you a hash key off past the limit of your legal hash values.

I'm working under the assumption that when you say "hash" you're refering to a cryptographic one like MD5 or SHA1, if you're talking about something else... ignore me.
Adding hashes together would be kind of a weird process, XORing them might make more sense... ish.
It's possible for hash(a) + hash(b) == hash(c) + hash(d), but incredibly unlikely. By merging the two hashes you're creating the possibility (though there's the possibility that hash(a) == hash(c) off the boat, it's just slim). Hashing identical items would clearly result in equality.

Your question expressed in English doesn't match the expression you give. Do you want to know if:
hash(a) + hash(b) != hash(c)
will always be true?
The answer is no. Any value might be a valid hash value.

Related

Declaring a Perl array and assigning it values by an array-slice

I was trying to split a string and rearrange the results, all in a single statement:
my $date_str = '15/5/2015';
my #directly_assigned_date_array[2,1,0] = split ('/', $date_str);
This resulted in:
syntax error at Array_slice_test.pl line 16, near "#directly_assigned_date_array["
Why is that an error?
The following works well though:
my #date_array;
#date_array[2,1,0] = split ('/', $date_str);
#vol7ron offered a different way to do it:
my #rvalue_array = (split '/', $date_str)[2,1,0];
And it indeed does the job, but it looks unintuitive, to me at least.
As you are just reversing the splitted array you can accomplish the same using this single statement: #date_array = reverse(split('/',$date_str));
Others here know much more about Perl internals than myself, but I assume it cannot perform the operation because an array slice is referencing an element of an array, which does not yet exist. Because the array has not yet been declared, it wouldn't know what address to reference.
my #array = ( split '/', $date_str )[2,1,0];
This works because split returns values in list context. Lists and arrays are very similar in Perl. You could think of an array as a super list, with extra abilities. However you choose to think of it, you can perform a list slice just like an array slice.
In the above code, you're taking the list, then reordering it using the slice and then assigning that to array. It may feel different to think about at first, but it shouldn't be too hard. Generally, you want your data operations (modifications and ordering) to be performed on the rhs of the assignment and your lhs to be the receiving end.
Keep in mind that I've also dropped some parentheses and used Perl's smart order of operation interpreting to reduce the syntax. The same code might otherwise look like the following (same operations, just more fluff):
my #array = ( split( '/', $date_str ) )[2,1,0];
As #luminos mentioned, since you only have 3 elements you're manually reversing it, you could use a reverse function; again we can make use of Perl's magic order of operation and drop the parentheses here:
my #array = reverse split '/', $date_str;
But in this case it might be too magical, so depending on your coding practice guidelines, you may want to include a set of parentheses for the split or reverse, if it increases readability and comprehension.

Prefix preserving hash function

I am looking for a hash function f() whose outputs can preserve the prefix of the inputs. The detailed requirements are as followings.
f() takes variable-length bit strings as input and outputs bit strings;
assume a and b are bit strings and a is a substring of b, then f(a) is also a substring of f(b);
the length of the output bit string should be smaller than the input bit string.
Any idea?
There will be no such hash function that meets your criterion.
Suppose you have such hash function Hash that preserves prefix, then answer these questions:
(1) Hash("a") =? It could be anything, right?
(2) What about Hash("xa")=? to preserve the prefix, it has to be
|Hash("xa")-Hash("a")| + Hash("a")
(3) What about Hash("yxa")=? similarly as (2), it has to be
|Hash("yxa")-Hash("xa")| + |Hash("xa")-Hash("a")| + Hash("a")
So the hash will always have longer lengh than the original.

case and white space insensitive hashing function

I am looking for a hashing function that is case insensitive and ignores white spaces as well.
for example:
the hash value generated for this is a hash and ThisIsAHash will be exactly the same.
does any such hash function exist?
Hash Functions are how we make them. For example:
First, for all strings ->
Step1. Lowercase them (or Uppercase them)
Step2. Strip all Whitespaces.
By now, both strings would map to: thisisahash
Step3. Now, apply any Hash function to it: crc32, java's polynomial or whatever...
Given a string, you can always now do a lookup and see if other Strings are hashed to the same key.
Note that hash functions are one-way. So doing Step1 and Step2 don't count against valid hash methods.

Perl: basic question about hashmap

$hash_map{$key}->{$value1} = 1;
I'm just a beginner at perl and I need help in this expression, what does this expression mean? I assume that a new key/value pair will be created but what is the meaning of 1 here?
What you've got here is a hash of hashes, or a two-level hash. $hash_map{$key} holds a hash reference, which points to another hash. $hash_map{$key}{$value} (the arrow can be omitted in this case) is a particular key in the second hash. The 1 is the value being assigned to that hash key.
For more on this topic, see Perl Data Structures Cookbook section on Hashes of Hashes, and also see the Perl reference tutorial for how references work.

hash collision and appending data

Assume I have two strings (or byte arrays) A and B which both have the same hash (with hash I mean things like MD5 or SHA1). If I concatenate another string behind it, will A+C and B+C have the same hash H' as well? What happens to C+A and C+B?
I tested it with MD5 and in all my tests, appending something to the end made the hash the same, but appending at the beginning did not.
Is this always true (for all inputs)?
Is this true for all (well-known) hash functions? If no, is there a (well-known) hash function, where A+C and B+C will not collide (and C+A and C+B do not either)?
(besides from MD5(x + reverse(x)) and other constructed stuff I mean)
Details depend on the hash function H, but generally they work as follows:
Consume a block of input X (say, 512 bits)
Break the input into smaller pieces (say, 32 bits) and update hash internal state based on the input
If there's more input, go to step 1
At the end, spit the internal state out as the hash value H(X)
So, if A and B collide i.e. H(A) = H(B), the hash will be in the same state after consuming them. Updating the state further with the same input C can make the resulting hash value identical. This explains why H(A+C) is sometimes H(B+C). But it depends how A's and B's sizes are aligned to input block size and how the hash breaks the input block internally.
C+A and C+B can be identical if C is a multiple of the hash block size but probably not otherwise.
This depends entirely on the hash function. Also, the probability that you have those collisions is really small.
The hash functions being discussed here are typically cryptographic (SHA1, MD5). These hash functions have an Avalanche effect -- the output will change drastically with a slight change in the input.
The prefix and suffix extension of C will effectively make a longer input.
So, adding anything to the front or rear of the input should change the effective hash outputs significantly.
I do not understand how you did the MD5 check, here is my test.
echo "abcd" | md5sum
70fbc1fdada604e61e8d72205089b5eb
echo "0abcd" | md5sum
f5ac8127b3b6b85cdc13f237c6005d80
echo "abcd0" | md5sum
4c8a24d096de5d26c77677860a3c50e3
Are you saying that you located two inputs which had the same MD5 hash and then appended something to the end or beginning of the input and found that adding at the end resulted in the same MD5 as that for the original input?
Please provide samples with your test results.