Which incremental hash function is the best for a generic hash table implementation ?
I need to search a message, for example "ABC", in the hash table. If the message is in the hash table, then append short piece of information, for example "D", to the message, then search message "ABCD" in the hash table. Repeatedly append more information until the whole information doesn't exist in the hash table.
I need to do a lot this kind of search, so an efficient (fast to calculate and low collision rate) incremental hash function is very important to my algorithm.
Related
I have a table which uses the snowflake hash function to store values in some columns.
Is there any way to reverse the encrytion from the hash function and get the original values from the table?
As per the documentation, the function is not "not a cryptographic hash function", and will always return the same result for the same input expression.
Example :
select hash(1) always returns -4730168494964875235
select hash('a') always returns -947125324004678632
select hash('1234') always returns -4035663806895772878
I was wondering if there is any way to reverse the hashing and get the original input expression from the hashed values.
I think these disclaimers are for preventing potential legal disputes:
Cryptographic hash functions have a few properties which this function
does not, for example:
The cryptographic hashing of a value cannot be inverted to find the
original value.
It's not possible to reserve a hash value in general. If you consider that when you even send a very long text, and it is represented in a 64-bit value, it's obvious that the data is not preserved. On the other hand, if you use a brute force technique, you may find the actual value producing the hash, and it can be counted as reserving the hash value.
For example, if you store all hash values for the numbers between 0 and 5000 in a table, when I came with hash value '-7875472545445966613', you can look up that value in your table, and say it belongs to 1000 (number).
I need to implement a program, which will insert numbers from input to hash table. I want to use chaining method to avoid collisions. The program must have a function to resize hash table. My question is, how to count load factor, and when the hash table should be resized?
I want to have some objects data in redis and I want to display all objects in a table.
in SQL I would just get the entire row for all object and display it in a view
in redis, I don't want to query each hash separately, since that will be unbearable slow.
Assuming I know the hash keys and the hash names I want to pull, Is there a way to do this effienctly?
I'm not sure why you believe querying each hash would be unbearably slow. If you loop through your hash keys and do an HMGET for each with the field names you should be good, provided you pipeline the requests.
Alternatively, you could do this in a Lua script that accepts (some of) the key names as KEYS and the fields as ARGV, returnint the answer in whatever format you need it.
Store all hash key in a set, let's called it 'hashkeyset'
Use 'sort' command to retrieve all hash values sort hashkeyset get *->field0 get *->field1 ... get *->fieldN
You can find more about 'sort' in this link http://redis.io/commands/sort
I'm new to hashing and here's my question:
Can you insert in a DELETED slot of the hash table?
Yes, you can insert to a deleted slot. But...
At first you should know that there is soft-deletion and hard-deletion. In soft-delete you just flip a flag and mark your slot as "deleted", in hard-delete you empty the slot.
Let me explain why we need soft-delete: For example you're using a hash table with linear probing and somehow your hash function maps 3 input values to the same slot. By using linear probing you place these three elements by advancing linearly on the table until you find an empty slot. In this case if you use hard-delete for deletion, you will break the hash table since there will be an empty slot while try to retrieve a value so one value will become unreachable.
On the other hand; if you have a perfect hash function you are OK to use hard-delete. A perfect hash function maps every input value to slots uniquely. So no probing scheme is needed and hard-delete doesn't break your table.
Now coming back to your question, you should also consider and figure out how to avoid duplicate insertions.
So there is this nice picture in the hash maps article on Wikipedia:
Everything clear so far, except for the hash function in the middle.
How can a function generate the right index from any string? Are the indexes integers in reality too? If yes, how can the function output 1 for John Smith, 2 for Lisa Smith, etc.?
That's one of the key problems of hashmaps/dictionaries and so on. You have to choose a good hash function. A very bad but fast hash function could be the length of the keys. You instantly see, that you will get a lot of collisions (different keys, but same hash). Another bad hash function could be the ASCII value of the first character of your key. Lot's of collisions, too.
So you need a function that is a lot better than those two. You could add (xor) all ASCII values of the key characters and mix the length in for instance. In practice you often depend on the values (fields) of the object that you want to hash (same values give same hash => value type). For reference types you can mix in a memory location for instance.
In your example that's just simplified a lot. No real hash function would map these keys to sequential numbers.
Maybe you want to read one of my previous answers to hashmaps
A simple hash function may be as follows:
$hash = $string[0] % HASH_TABLE_SIZE;
This function will return a number between 0 and HASH_TABLE_SIZE - 1, depending on the first letter of the string. This number can be used to go to the correct position in the hash table.
A real hash function will consider all letters in a string, and it will be designed so that there is an even spread among the buckets.
The hash function most often (but not necessarily always) outputs an integer within wanted range (often parameter to the hash function). This integer can be used as an index. Notice that hash function cannot be guaranteed to always produce unique result when given different data to hash. This is called hash collision and hash algorithm must always handle it in some way.
As for your specific question, how a string becomes a number. Any string is composed of characters (J, o, h, n ...) and characters can be interpreted as numbers (in computers). ASCII and UTF standards bind certain values to certain characters, so result is deterministic and always the same on all computers. So the hash function does operation on these characters that processes them as numbers and comes up with another number (output). You could for example simply sum all the values and use modulo operation to range-limit the resulting value.
This would be quite a horrible hashing function because for example "ab" and "ba" would get same result. Design of hash function is difficult and so one should use some ready-made algorithm unless situation dictates some other solution.
There's a really good article on how hash functions (and colision detection/resolution) on MSDN:
Part 2: The Queue, Stack, and Hashtable
You can skip down to the header Compressing Ordinal Indexing with a Hash Function
There are some bits and pieces that are .NET specific (when they talk about which Hash algorithm .NET uses by default) but for the most part it is language agnostic.
All that is required of a hash function is that it returns the same integer given the same key. Technically, a hash function that always returns '1' is not incorrect.