MD5 Merkle–Damgård construction in Postgres? - postgresql

I have a column which contains the MD5 hashes of some unknown data.
I need the MD5 hash of "that unknown data, concatenated with another string".
I think that the Merkle–Damgård construction means that there should be a function which I can use to pipe some more plain-text into the hash, without knowing the original plain-text.
Does such a function exist for MD5?
What is is called in Postgres? I've only found MD5(plain-text). I need MD5extend(md5hash, extra-plain-text).
I think that what I'm trying to do is a "length-extension attack".

Related

Any way to get orginal data from hashed values in snowflake?

I have a table which uses the snowflake hash function to store values in some columns.
Is there any way to reverse the encrytion from the hash function and get the original values from the table?
As per the documentation, the function is not "not a cryptographic hash function", and will always return the same result for the same input expression.
Example :
select hash(1) always returns -4730168494964875235
select hash('a') always returns -947125324004678632
select hash('1234') always returns -4035663806895772878
I was wondering if there is any way to reverse the hashing and get the original input expression from the hashed values.
I think these disclaimers are for preventing potential legal disputes:
Cryptographic hash functions have a few properties which this function
does not, for example:
The cryptographic hashing of a value cannot be inverted to find the
original value.
It's not possible to reserve a hash value in general. If you consider that when you even send a very long text, and it is represented in a 64-bit value, it's obvious that the data is not preserved. On the other hand, if you use a brute force technique, you may find the actual value producing the hash, and it can be counted as reserving the hash value.
For example, if you store all hash values for the numbers between 0 and 5000 in a table, when I came with hash value '-7875472545445966613', you can look up that value in your table, and say it belongs to 1000 (number).

Hash Conversion from Postgres to Rails

I am not sure what type of coded hash I am getting back from a postgres database that when queried shows instead a different code.
The question is...
how to convert this hash (as it is returned from Rails):
\x158\x06\xDB\xCD\x13M\xDE\xE6\x9A\x8CR\x04\xE3\x8A\xC8\x04H\xF6#B\xF8\xC2<\xFEK~\xDF
into this (as it shows inside the postgres database):
\x153806dbcd134ddee69a8c5204e38ac80448f62342f8c23cfe4b7edf
The first hash (as you say, coming from Rails) is a byte array, in which any printable character is left as is instead of being converted to hex: \x158 is really two characters: '\x15' and '\x38' ('8').
In the Postgres table, that byte array is the same, but the format is to hexlify the whole thing.
So:
\x158\x06\xDB\xCD\x13M... is really \x15,8,\x06,\xDB,\xCD,\x13,M
-- becomes
\x153806dbcd134d... ('8':\x38, 'M':\x4d)

Data hashing in Pentaho

Can anyone suggest me the best possible options that I can use in pentaho to suit my requirement. The requirement is we need to convert first_name & last_name attributes into hash and load the hash values for these columns into the user table to support the business reports. For the reports the actual values for these columns are not needed, the reporting code only checks for NULL values in first_name & last_name columns, and validates length of these fields.
I tried converting the fields to hash using Add checksum transformation but wasn't sure about which type of checksum to use (CRC 32, ADLER 32, MD5, SHA-1). Any suggestions?
source & target DB is PostgreSql not sure if it's needed.
Thanks in advance.
Hashing and encryption are not the same thing.
It seems you want a one-way hash. What hash you choose depends mainly on how much you care about collisions. If you don't care that multiple names could generate the same hash, a short fast hash like CRC32 is fine. If you do care about collisions then I'd use at least MD5.

Hashing of timestamp

I need a hash function(maybe I should not call that a "hash" function) that:
1.is used for hashing timestamps only;
2.there exist a reverse function that I can restore the timestamp through that function;
3.does not generate duplicate hash value;
4.whether not it is a hash function, it is nearly as fast as a hash function;
PS: About the data type of timestamp --- image that as a 4 bytes "long" type in C.
Is that possible?
(I need the timestamp to be a secret. --- In fact, I need the hash value as a session id and the original timestamp as an index in my database. Whenever user request something with the session id, I can get the timestamp as an index to get the request info.)
If you can skip #2 MurmurHash might be a good option:
https://sites.google.com/site/murmurhash/
(2) If you must crypt/decrypt there are standard implementations of the various algorithms for most languages (AES, for instance). This will be much slower than hashing.
If you don't actually need this to secure the data (which begs the question: why bother at all with any conversion?) and just want to make some non-timestamp-looking string that is easily reversible (by you -- and anyone else) then check this question:
Rot13 for numbers

How does the hash part in hash maps work?

So there is this nice picture in the hash maps article on Wikipedia:
Everything clear so far, except for the hash function in the middle.
How can a function generate the right index from any string? Are the indexes integers in reality too? If yes, how can the function output 1 for John Smith, 2 for Lisa Smith, etc.?
That's one of the key problems of hashmaps/dictionaries and so on. You have to choose a good hash function. A very bad but fast hash function could be the length of the keys. You instantly see, that you will get a lot of collisions (different keys, but same hash). Another bad hash function could be the ASCII value of the first character of your key. Lot's of collisions, too.
So you need a function that is a lot better than those two. You could add (xor) all ASCII values of the key characters and mix the length in for instance. In practice you often depend on the values (fields) of the object that you want to hash (same values give same hash => value type). For reference types you can mix in a memory location for instance.
In your example that's just simplified a lot. No real hash function would map these keys to sequential numbers.
Maybe you want to read one of my previous answers to hashmaps
A simple hash function may be as follows:
$hash = $string[0] % HASH_TABLE_SIZE;
This function will return a number between 0 and HASH_TABLE_SIZE - 1, depending on the first letter of the string. This number can be used to go to the correct position in the hash table.
A real hash function will consider all letters in a string, and it will be designed so that there is an even spread among the buckets.
The hash function most often (but not necessarily always) outputs an integer within wanted range (often parameter to the hash function). This integer can be used as an index. Notice that hash function cannot be guaranteed to always produce unique result when given different data to hash. This is called hash collision and hash algorithm must always handle it in some way.
As for your specific question, how a string becomes a number. Any string is composed of characters (J, o, h, n ...) and characters can be interpreted as numbers (in computers). ASCII and UTF standards bind certain values to certain characters, so result is deterministic and always the same on all computers. So the hash function does operation on these characters that processes them as numbers and comes up with another number (output). You could for example simply sum all the values and use modulo operation to range-limit the resulting value.
This would be quite a horrible hashing function because for example "ab" and "ba" would get same result. Design of hash function is difficult and so one should use some ready-made algorithm unless situation dictates some other solution.
There's a really good article on how hash functions (and colision detection/resolution) on MSDN:
Part 2: The Queue, Stack, and Hashtable
You can skip down to the header Compressing Ordinal Indexing with a Hash Function
There are some bits and pieces that are .NET specific (when they talk about which Hash algorithm .NET uses by default) but for the most part it is language agnostic.
All that is required of a hash function is that it returns the same integer given the same key. Technically, a hash function that always returns '1' is not incorrect.