Meaning of Open hashing and Closed hashing - hash

Open Hashing (Separate Chaining):
In open hashing, keys are stored in linked lists attached to cells of a hash table.
Closed Hashing (Open Addressing):
In closed hashing, all keys are stored in the hash table itself without the use of linked lists.
I am unable to understand why they are called open, closed and Separate. Can some one explain it?

The use of "closed" vs. "open" reflects whether or not we are locked in to using a certain position or data structure (this is an extremely vague description, but hopefully the rest helps).
For instance, the "open" in "open addressing" tells us the index (aka. address) at which an object will be stored in the hash table is not completely determined by its hash code. Instead, the index may vary depending on what's already in the hash table.
The "closed" in "closed hashing" refers to the fact that we never leave the hash table; every object is stored directly at an index in the hash table's internal array. Note that this is only possible by using some sort of open addressing strategy. This explains why "closed hashing" and "open addressing" are synonyms.
Contrast this with open hashing - in this strategy, none of the objects are actually stored in the hash table's array; instead once an object is hashed, it is stored in a list which is separate from the hash table's internal array. "open" refers to the freedom we get by leaving the hash table, and using a separate list. By the way, "separate list" hints at why open hashing is also known as "separate chaining".
In short, "closed" always refers to some sort of strict guarantee, like when we guarantee that objects are always stored directly within the hash table (closed hashing). Then, the opposite of "closed" is "open", so if you don't have such guarantees, the strategy is considered "open".

You have an array that is the "hash table".
In Open Hashing each cell in the array points to a list containg the collisions. The hashing has produced the same index for all items in the linked list.
In Closed Hashing you use only one array for everything. You store the collisions in the same array. The trick is to use some smart way to jump from collision to collision until you find what you want. And do this in a reproducible / deterministic way.

The name open addressing refers to the fact that the location ("address") of the element is not determined by its hash value. (This method is also called closed hashing).
In separate chaining, each bucket is independent, and has some sort of ADT (list, binary search trees, etc) of entries with the same index.
In a good hash table, each bucket has zero or one entries, because we need operations of order O(1) for insert, search, etc.
This is a example of separate chaining using C++ with a simple hash function using mod operator (clearly, a bad hash function)

Related

Perl: Quickly find object in list of objects - looking for a fitting datastructure

I am developing a program to sync users between to different LDAP Servers. I have two types of user groups: Master-Groups and Target-Groups (those are predefined in a config-file. There can be multiple Master and Targets per Group definition).
Users in Master-Groups missing in the Target-Groups shall be added to the Targets, Users in Target-Groups missing in Master-Groups shall be removed from the Targets.
The Users in those Groups are Objects themselves. My problem is as follows:
I loop through my availiable master groups and have to perform a quick lookup wheter a user is already part of a target-group. I am struggeling to pick the right datastructure to solve this problem. I tried using a hash, but quickly realized that hash-keys are stringyfied, so I cannot perform
if ( exists( $master_members->{$target_user_object} ) )
When using an array for storing the objects, everytime I have to check if a user object exists, I have to loop through the whole array which essentially kills performance.
How do I perfom a lookup if a specific object exists in a list of objects?
Kind Regards,
Yulivee
You're right that hash keys are stringified. You cannot use objects as keys. But a hash is the right data structure.
Instead of just letting Perl stringify your references, build your own serializer. That could be as simple as using the cn. Or a concatenation of all the fields of the object. Make a sub, put that in there, call that sub within your exist.
... if exists $master_members->{ my_serializer($target_user_object) };

Cryptographically hashing a tree of elements

I am working on a project where I have a tree of objects. This tree of objects can be quite large, and can be subject to very frequent modifications (e.g. adding or removing a node, changing some properties of a node, and so on) by more users. Now, every time an update is published by an user, I need to be able to get some hash of the tree as it is after the user modified it, so that the user can sign the update with his private RSA key. Therefore I obviously need the hash to be cryptographically secure. However, hashing a linear representation of the whole tree over and over every time an user changes just one node is unfeasible.
I thought about this strategy, but I am not sure if that will work out properly:
I add to each node of a new field, that is the SHA256 hash of all its children nodes.
The hash of a node is now the hash of each of the fields of the node, therefore included the hash of its children.
Now, updating the tree should be easy: every time I update a node, I change the hash field of its parent, then it's grandparent and so on until the root is reached, and use the hash of the root as hash value. This would reduce the complexity of this operation to O(ln(N)) rather than O(N).
However, I know that it is never safe to trust one's own intuition about cryptography. So is this procedure secure?
This is called a hash tree or Merkle tree. It's nothing new and it is secure. It is often used to parallelize hashing as the hash methods themselves are strictly sequential in nature.
Don't concatenate data and hashes though unless you explicitly include the size of the data. It's better to only concatenate hashes.
In my opinion your algorithm is already good enough.
Assuming that SHA-256 is secure (well, at least, its name is "Secure Hash Algorithm"), one can prove, by induction on the depth of the tree, that your algorithm is secure as well.

how can get original value from hash value?

My original Text : "sanjay"
SHA-1 Text : "25ecbcb559d14a98e4665d6830ac5c99991d7c25"
Now how can i get original value - "sanjay" from this hash value ?
is there any code or algorithm or method?
No. That's usually the point -- the process of hashing is normally one-way.
This is especially important for hashes designed for passwords or cryptology -- which differ from hashes designed, for say, hash-maps. Also, with an unbounded input length, there is an infinite amount of values which result in the same hash.
One method that can be used is to hash a bunch of values (e.g. brute-force from aaaaaaaa-zzzzzzz) and see which value has the same hash. If you have found this, you have found "the value" (the time is not cheap). "Rainbow tables" work on this idea (but use space instead of time), but are defeated with a nonce salt.
From what I've been taught on the subject, if you were the one that turned your value into a hash value, chances are you have full access to the hash function, and would be able to reverse it in the same way. If you only have the original value and the end value, and don't know what hash function was used, you can't really reverse it without doing what was said above (going over every possibility).

How do I access random element in a Perl DBM hash?

I have a Perl DBM hash containing a list of URLs that I want to pick randomly from to load balance spidering sites. As a result I want to pick a key at random, or select the nth element (so I can randomise n).
I'm aware this goes against the concept of a hash, but is this possible?
NOTE: missed a valuable point that hash size will be too large to load all the keys to randomly select.
I don't think any of the DBM packages have an API for retrieving a random key, or for retrieving keys by index number. You can look up a particular key, or you can read through all the keys in whatever order the database chooses to return them in (which may change if the database is modified, and may or may not be "random" enough for whatever you want to do).
You could read through all the keys and pick one, but that would require reading the entire database each time (or at least a sizable chunk of it), and that's probably too slow.
I think you'll need to rearrange your data structure.
You could use a real SQL database
(like SQLite), so you could
look up rows both by a sequential
row number and by URL. This would
be the most flexible.
You could use a sequential integer
as the key for your DBM file. That
would make picking a random one
easy, but you could no longer look
up entries by URL.
You could use two DBM files: the one you have now and a second keyed by sequential integer with the URL as value. (Actually, since URLs don't look like integers, you could store both sets of records in the same DBM file, but that would complicate any code that uses each.) This would use twice the disk space, and would make inserting/removing entries a bit more complicated. You'd probably be better off with approach #1, unless you can't install SQLite for some reason.
Picking a random element from an array is simpler so you can use keys(%foo) to get the array of keys and pick randomly from that.
I believe this will return a random element $x from an array:
$x = $array[rand #array];
If you want to shuffle an array, consider List::Util::shuffle. See http://search.cpan.org/perldoc/List::Util#shuffle_LIST
Of course, it is possible. First, get a list of the keys. Then, randomize the list, using shuffle from List::Util.
Then, loop over the keys.
If there are too many keys (so keeping them all in a list and shuffling is not possible), just remember that you are using tied hashes. Just use each to iterate over key value pairs.
The order will be deterministic but AFAIK, it will not be alphabetical or order of insertion. That, by itself, might be able to get you what you want.
You could use DBM::Deep instead of a traditional DB file to keep your data.
tie %hash, "DBM::Deep", {
file => "foo.db",
locking => 1,
autoflush => 1
};
# $hash{keys} = [ ... ]
# $hash{urls} = { ... } <- same as your current DB file.
my $like_old = $hash{urls}; # a ref to a hash you can use like your old hashref.
my $count = #{$hash{keys}};
With that you can pull out random values as needed.

What is hash exactly?

I am learning MD5. I found a term 'hash' in most description of MD5. I googled 'hash', but I could not find exact term of 'hash' in computer programming.
Why are we using 'hash' in computer programming? What is origin of the word??
I would say any answer must be guesswork, so I will make this a community wiki.
Hash, or hash browns, is breakfast food made from cutting potatoes into long thin strips (smaller than french fries, and shorter, but proportionally similar), then frying the mass of strips in animal or vegetable fat until browned, stuck together, and cooked. By analogy, 'hashing' a number meant turning it into another, usually smaller, number using a method which still deterministically depending on the input number.
I believe the term "hash" was first used in the context of "hash table", which was commonly used in the 1960's on mainframe-type machines. In these cases, usually an integer value with a large range is converted to a "hash table index" which is a small integer. It is important for an efficient hash table that the "hash function" be evenly distributed, or "flat."
I don't have a citation, that is how I have understood the analogy since I heard it in the 80's. Someone must have been there when the term was first applied, though.
A hash value (or simply hash), also
called a message digest, is a number
generated from a string of text. The
hash is substantially smaller than the
text itself, and is generated by a
formula in such a way that it is
extremely unlikely that some other
text will produce the same hash value.
You're refering to the "hash function". It is used to generate a unique value for a given set of parameters.
One great use of a hash is password security. Instead of saving a password in a database, you save a hash of the password.
A hash is supposed to be a unique combination of values from 00 to FF (hexadecimal) that represents a certain piece of data, be it a file or a string of bytes. It is used primarily for password storage and verification, and to test if a file is the same as another file (i.e. you hash two files, if they match, they're the same file).
Generally, any of the SHA algorithms are preferred over MD5, due to hash collisions that can occur when using it. See this Wikipedia article.
According to the Wikipedia article on hash functions, Donald Knuth in the Art of Computer Programming was able to trace the concept of hash functions back to an internal IBM memo by Hans Peter Luhn in 1953.
And just for fun, here's a scrap of overheard conversation quoted in Two Women in the Klondike: the Story of a Journey to the Gold Fields of Alaska (1899):
They'll have to keep the hash table going all day long to feed us. 'T will be a short order affair.
the hash function hashes input to a value, requires a salt value and no proof salt is needed to store. Indications are everybody says we must store the salt same time match and new still work. Mathematically related concept is bijection
adding to gabriel1836's answer, one of the important properties of hash function is that it is a one way function, which means you cannot generate the original string from its hash value.