Hashing functions and Universal Hashing Family - hash

I need to determine whether the following Hash Functions Set is universal or not:
Let U be the set of the keys - {000, 001, 002, 003, ... ,999} - all the numbers between 0 and 999 padded with 0 in the beginning where needed. Let n = 10 and 1 < a < 9 ,an integer between 1 and 9. We denote by ha(x) the rightmost digit of the number a*x.
For example, h2(123) = 6, because, 2 * 123 = 246.
We also denote H = {h1, h2, h3, ... ,h9} as our set of hash functions.
Is H is universal? prove.
I know I need to calculate the probability for collision of 2 different keys and check if it's smaller or equal to 1/n (which is 1/10), so I tried to separate into cases - if a is odd or even, because when a is even the last digit of a*x will be 0/2/4/6/8, else it could be anything. But it didn't help me so much as I'm stuck on it.
Would be very glad for some help here.

Related

How can I extract a specific bit from a 16-bit register using math ONLY?

I have a 16-bit WORD and I want to read the status of a specific bit or several bits.
I've tried a method that divides the word by the bit that I want, converts the result to two values - an integer and to a real, and compares the two. if they are not equal, then it it equates to false. This appears to only work if i am looking for a bit that the last 'TRUE' bit in the word. If there are any successive TRUE bits, it fails. Perhaps I just haven't done it right. I don't have the ability to use code, just basic math, boolean operations, and type conversion. Any ideas? I hope this isn't a dumb question but i have a feeling it is.
eg:
WORD 0010000100100100 = 9348
I want to know the value of bit 2. how can i determine it from 9348?
There are many ways, depending on what operations you can use. It appears you don't have much to choose from. But this should work, using just integer division and multiplication, and a test for equality.
(psuedocode):
x = 9348 (binary 0010000100100100, bit 0 = 0, bit 1 = 0, bit 2 = 1, ...)
x = x / 4 (now x is 1000010010010000
y = (x / 2) * 2 (y is 0000010010010000)
if (x == y) {
(bit 2 must have been 0)
} else {
(bit 2 must have been 1)
}
Every time you divide by 2, you move the bits to the left one position (in your big endian representation). Every time you multiply by 2, you move the bits to the right one position. Odd numbers will have 1 in the least significant position. Even numbers will have 0 in the least significant position. If you divide an odd number by 2 in integer math, and then multiply by 2, you loose the odd bit if there was one. So the idea above is to first move the bit you want to know about into the least significant position. Then, divide by 2 and then multiply by two. If the result is the same as what you had before, then there must have been a 0 in the bit you care about. If the result is not the same as what you had before, then there must have been a 1 in the bit you care about.
Having explained the idea, we can simplify to
((x / 8) * 2) <> (x / 4)
which will resolve to true if the bit was set, and false if the bit was not set.
AND the word with a mask [1].
In your example, you're interested in the second bit, so the mask (in binary) is
00000010. (Which is 2 in decimal.)
In binary, your word 9348 is 0010010010000100 [2]
0010010010000100 (your word)
AND 0000000000000010 (mask)
----------------
0000000000000000 (result of ANDing your word and the mask)
Because the value is equal to zero, the bit is not set. If it were different to zero, the bit was set.
This technique works for extracting one bit at a time. You can however use it repeatedly with different masks if you're interested in extracting multiple bits.
[1] For more information on masking techniques see http://en.wikipedia.org/wiki/Mask_(computing)
[2] See http://www.binaryhexconverter.com/decimal-to-binary-converter
The nth bit is equal to the word divided by 2^n mod 2
I think you'll have to test each bit, 0 through 15 inclusive.
You could try 9348 AND 4 (equivalent of 1<<2 - index of the bit you wanted)
9348 AND 4
should give 4 if bit is set, 0 if not.
So here is what I have come up with: 3 solutions. One is Hatchet's as proposed above, and his answer helped me immensely with actually understanding HOW this works, which is of utmost importance to me! The proposed AND masking solutions could have worked if my system supports bitwise operators, but it apparently does not.
Original technique:
( ( ( INT ( TAG / BIT ) ) / 2 ) - ( INT ( ( INT ( TAG / BIT ) ) / 2 ) ) <> 0 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT, then REAL division by 2. In the second part, integer division is performed TAG/BIT, then integer division again by 2. The difference between these two results is compared to 0. If the difference is not 0, then the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5-1168 <> 0, so the result is TRUE.
My modified technique:
( INT ( TAG / BIT ) / 2 ) <> ( INT ( INT ( TAG / BIT ) / 2 ) )
Explanation:
effectively the same as above, but instead of subtracting the two results and comparing them to 0, I am just comparing the two results themselves. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5 <> 1168, so the result is TRUE.
Hatchet's technique as it applies to my system:
( INT ( TAG / BIT )) <> ( INT ( INT ( TAG / BIT ) / 2 ) * 2 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT. In the second part, integer division is performed TAG/BIT, then integer division again by 2, then multiplication by 2. The two results are compared. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337. Then 2337/2 = 1168 w/ integer division. Then 1168x2=2336. 2337 <> 2336 so the result is TRUE. As Hatchet stated, this method 'drops the odd bit'.
Note - 9348/4 = 2337 w/ both REAL and integer division, but it is important that these parts of the formula use integer division and not REAL division (12164/32 = 380 w/ integer division and 380.125 w/ REAL division)
I feel it important to note for any future readers that the BIT value in the equations above is not the bit number, but the actual value of the resulting decimal if the bit in the desired position was the only TRUE bit in the binary string (bit 2 = 4 (2^2), bit 6 = 64 (2^6))
This explanation may be a bit too verbatim for some, but may be perfect for others :)
Please feel free to comment/critique/correct me if necessary!
I just needed to resolve an integer status code to a bit state in order to interface with some hardware. Here's a method that works for me:
private bool resolveBitState(int value, int bitNumber)
{
return (value & (1 << (bitNumber - 1))) != 0;
}
I like it, because it's non-iterative, requires no cast operations and essentially translates directly to machine code operations like Shift, And and Comparison, which probably means it's really optimal.
To explain in a little more detail, I'm comparing the bitwise value to a mask for the bit I am interested in (value & mask) using an AND operation. If the bitwise AND operation result is zero, then the bit is not set (return false). If the AND operation result is not zero, then the bit is set (return true). The result of the AND operation is either zero or the value of the bit (1, 2, 4, 8, 16, 32...). Hence the boolean evaluation comparing the AND operation result and 0. The mask is created by taking the number 1 and shifting it left (bit wise), by the appropriate number of binary places (1 << n). The number of places is the number of the bit targeted minus 1. If it's bit #1, I want to shift the 1 left by 0 and if it's #2, I want to shift it left 1 place, etc.
I'm surprised no one rates my solution. It think it's most logical and succinct... and works.

Calculating prime numbers in Scala: how does this code work?

So I've spent hours trying to work out exactly how this code produces prime numbers.
lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i =>
ps.takeWhile{j => j * j <= i}.forall{ k => i % k > 0});
I've used a number of printlns etc, but nothings making it clearer.
This is what I think the code does:
/**
* [2,3]
*
* takeWhile 2*2 <= 3
* takeWhile 2*2 <= 4 found match
* (4 % [2,3] > 1) return false.
* takeWhile 2*2 <= 5 found match
* (5 % [2,3] > 1) return true
* Add 5 to the list
* takeWhile 2*2 <= 6 found match
* (6 % [2,3,5] > 1) return false
* takeWhile 2*2 <= 7
* (7 % [2,3,5] > 1) return true
* Add 7 to the list
*/
But If I change j*j in the list to be 2*2 which I assumed would work exactly the same, it causes a stackoverflow error.
I'm obviously missing something fundamental here, and could really use someone explaining this to me like I was a five year old.
Any help would be greatly appreciated.
I'm not sure that seeking a procedural/imperative explanation is the best way to gain understanding here. Streams come from functional programming and they're best understood from that perspective. The key aspects of the definition you've given are:
It's lazy. Other than the first element in the stream, nothing is computed until you ask for it. If you never ask for the 5th prime, it will never be computed.
It's recursive. The list of prime numbers is defined in terms of itself.
It's infinite. Streams have the interesting property (because they're lazy) that they can represent a sequence with an infinite number of elements. Stream.from(3) is an example of this: it represents the list [3, 4, 5, ...].
Let's see if we can understand why your definition computes the sequence of prime numbers.
The definition starts out with 2 #:: .... This just says that the first number in the sequence is 2 - simple enough so far.
The next part defines the rest of the prime numbers. We can start with all the counting numbers starting at 3 (Stream.from(3)), but we obviously need to filter a bunch of these numbers out (i.e., all the composites). So let's consider each number i. If i is not a multiple of a lesser prime number, then i is prime. That is, i is prime if, for all primes k less than i, i % k > 0. In Scala, we could express this as
nums.filter(i => ps.takeWhile(k => k < i).forall(k => i % k > 0))
However, it isn't actually necessary to check all lesser prime numbers -- we really only need to check the prime numbers whose square is less than or equal to i (this is a fact from number theory*). So we could instead write
nums.filter(i => ps.takeWhile(k => k * k <= i).forall(k => i % k > 0))
So we've derived your definition.
Now, if you happened to try the first definition (with k < i), you would have found that it didn't work. Why not? It has to do with the fact that this is a recursive definition.
Suppose we're trying to decide what comes after 2 in the sequence. The definition tells us to first determine whether 3 belongs. To do so, we consider the list of primes up to the first one greater than or equal to 3 (takeWhile(k => k < i)). The first prime is 2, which is less than 3 -- so far so good. But we don't yet know the second prime, so we need to compute it. Fine, so we need to first see whether 3 belongs ... BOOM!
* It's pretty easy to see that if a number n is composite then the square of one of its factors must be less than or equal to n. If n is composite, then by definition n == a * b, where 1 < a <= b < n (we can guarantee a <= b just by labeling the two factors appropriately). From a <= b it follows that a^2 <= a * b, so it follows that a^2 <= n.
Your explanations are mostly correct, you made only two mistakes:
takeWhile doesn't include the last checked element:
scala> List(1,2,3).takeWhile(_<2)
res1: List[Int] = List(1)
You assume that ps always contains only a two and a three but because Stream is lazy it is possible to add new elements to it. In fact each time a new prime is found it is added to ps and in the next step takeWhile will consider this new added element. Here, it is important to remember that the tail of a Stream is computed only when it is needed, thus takeWhile can't see it before forall is evaluated to true.
Keep these two things in mind and you should came up with this:
ps = [2]
i = 3
takeWhile
2*2 <= 3 -> false
forall on []
-> true
ps = [2,3]
i = 4
takeWhile
2*2 <= 4 -> true
3*3 <= 4 -> false
forall on [2]
4%2 > 0 -> false
ps = [2,3]
i = 5
takeWhile
2*2 <= 5 -> true
3*3 <= 5 -> false
forall on [2]
5%2 > 0 -> true
ps = [2,3,5]
i = 6
...
While these steps describe the behavior of the code, it is not fully correct because not only adding elements to the Stream is lazy but every operation on it. This means that when you call xs.takeWhile(f) not all values until the point when f is false are computed at once - they are computed when forall wants to see them (because it is the only function here that needs to look at all elements before it definitely can result to true, for false it can abort earlier). Here the computation order when laziness is considered everywhere (example only looking at 9):
ps = [2,3,5,7]
i = 9
takeWhile on 2
2*2 <= 9 -> true
forall on 2
9%2 > 0 -> true
takeWhile on 3
3*3 <= 9 -> true
forall on 3
9%3 > 0 -> false
ps = [2,3,5,7]
i = 10
...
Because forall is aborted when it evaluates to false, takeWhile doesn't calculate the remaining possible elements.
That code is easier (for me, at least) to read with some variables renamed suggestively, as
lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i =>
ps.takeWhile{p => p * p <= i}.forall{ p => i % p > 0});
This reads left-to-right quite naturally, as
primes are 2, and those numbers i from 3 up, that all of the primes p whose square does not exceed the i, do not divide i evenly (i.e. without some non-zero remainder).
In a true recursive fashion, to understand this definition as defining the ever increasing stream of primes, we assume that it is so, and from that assumption we see that no contradiction arises, i.e. the truth of the definition holds.
The only potential problem after that, is the timing of accessing the stream ps as it is being defined. As the first step, imagine we just have another stream of primes provided to us from somewhere, magically. Then, after seeing the truth of the definition, check that the timing of the access is okay, i.e. we never try to access the areas of ps before they are defined; that would make the definition stuck, unproductive.
I remember reading somewhere (don't recall where) something like the following -- a conversation between a student and a wizard,
student: which numbers are prime?
wizard: well, do you know what number is the first prime?
s: yes, it's 2.
w: okay (quickly writes down 2 on a piece of paper). And what about the next one?
s: well, next candidate is 3. we need to check whether it is divided by any prime whose square does not exceed it, but I don't yet know what the primes are!
w: don't worry, I'l give them to you. It's a magic I know; I'm a wizard after all.
s: okay, so what is the first prime number?
w: (glances over the piece of paper) 2.
s: great, so its square is already greater than 3... HEY, you've cheated! .....
Here's a pseudocode1 translation of your code, read partially right-to-left, with some variables again renamed for clarity (using p for "prime"):
ps = 2 : filter (\i-> all (\p->rem i p > 0) (takeWhile (\p->p^2 <= i) ps)) [3..]
which is also
ps = 2 : [i | i <- [3..], and [rem i p > 0 | p <- takeWhile (\p->p^2 <= i) ps]]
which is a bit more visually apparent, using list comprehensions. and checks that all entries in a list of Booleans are True (read | as "for", <- as "drawn from", , as "such that" and (\p-> ...) as "lambda of p").
So you see, ps is a lazy list of 2, and then of numbers i drawn from a stream [3,4,5,...] such that for all p drawn from ps such that p^2 <= i, it is true that i % p > 0. Which is actually an optimal trial division algorithm. :)
There's a subtlety here of course: the list ps is open-ended. We use it as it is being "fleshed-out" (that of course, because it is lazy). When ps are taken from ps, it could potentially be a case that we run past its end, in which case we'd have a non-terminating calculation on our hands (a "black hole"). It just so happens :) (and needs to ⁄ can be proved mathematically) that this is impossible with the above definition. So 2 is put into ps unconditionally, so there's something in it to begin with.
But if we try to "simplify",
bad = 2 : [i | i <- [3..], and [rem i p > 0 | p <- takeWhile (\p->p < i) bad]]
it stops working after producing just one number, 2: when considering 3 as the candidate, takeWhile (\p->p < 3) bad demands the next number in bad after 2, but there aren't yet any more numbers there. It "jumps ahead of itself".
This is "fixed" with
bad = 2 : [i | i <- [3..], and [rem i p > 0 | p <- [2..(i-1)] ]]
but that is a much much slower trial division algorithm, very far from the optimal one.
--
1 (Haskell actually, it's just easier for me that way :) )

Algorithm to convert integer (represented as an array) with base n to integer with base m

I have a, very long, integer. The integer is represented by a array of unsigned chars.
Example: the integer 1234 with base 10 is represented in the array as [4,3,2,1], [2,2,3,2] (base 8) and [2,13,4] (base 16)
Now I want to convert my integer with base n to another integer with base m. In my persued for a answer I came accross Wallar's algorithm, originally from here.
from math import *
def baseExpansion(n,c,b):
j = 0
base10 = sum([pow(c,len(n)-k-1)*n[k] for k in range(0,len(n))])
while floor(base10/pow(b,j)) != 0: j = j+1
return [floor(base10/pow(b,j-p)) % b for p in range(1,j+1)]
At first I thought this was my answer but unfortunately it is not. The problem I have is that the algorithm computes the sum. In my case this is a problem because the variable base10 is of type unsigned integer of 32 bits. Therefore when my integer, represented as a array, has more then 10 digits it can not convert the number anymore. Anyone has a solution?
Here's the school-book algorithm for doing what you're trying. You start with a representation for zero and call it a running total. Then, for each digit of the number to be converted, starting with the most significant and going to the least significant, 1) multiply the running total by the base of the source number and 2) add the digit to the running total. Now all you need is algorithms to do the multiplication and addition (and you can actually do both at once). Here's how to do that: 1) set the current digit to a variable, call it "carry", 2) for each digit in your new number, starting with the least significant and going to the most significant: 2a) set carry to the current digit in the new number times the output base plus carry, 2b) set the current digit to carry mod the output base, 2c) set carry to carry divided by the output base. And that should do it. There is an implementation of what you are trying to do somewhere here: http://www.cis.ksu.edu/~howell/calculator/comparison.html

how to create unique integer number from 3 different integers numbers(1 Oracle Long, 1 Date Field, 1 Short)

the thing is that, the 1st number is already ORACLE LONG,
second one a Date (SQL DATE, no timestamp info extra), the last one being a Short value in the range 1000-100'000.
how can I create sort of hash value that will be unique for each combination optimally?
string concatenation and converting to long later:
I don't want this, for example.
Day Month
12 1 --> 121
1 12 --> 121
When you have a few numeric values and need to have a single "unique" (that is, statistically improbable duplicate) value out of them you can usually use a formula like:
h = (a*P1 + b)*P2 + c
where P1 and P2 are either well-chosen numbers (e.g. if you know 'a' is always in the 1-31 range, you can use P1=32) or, when you know nothing particular about the allowable ranges of a,b,c best approach is to have P1 and P2 as big prime numbers (they have the least chance to generate values that collide).
For an optimal solution the math is a bit more complex than that, but using prime numbers you can usually have a decent solution.
For example, Java implementation for .hashCode() for an array (or a String) is something like:
h = 0;
for (int i = 0; i < a.length; ++i)
h = h * 31 + a[i];
Even though personally, I would have chosen a prime bigger than 31 as values inside a String can easily collide, since a delta of 31 places can be quite common, e.g.:
"BB".hashCode() == "Aa".hashCode() == 2122
Your
12 1 --> 121
1 12 --> 121
problem is easily fixed by zero-padding your input numbers to the maximum width expected for each input field.
For example, if the first field can range from 0 to 10000 and the second field can range from 0 to 100, your example becomes:
00012 001 --> 00012001
00001 012 --> 00001012
In python, you can use this:
#pip install pairing
import pairing as pf
n = [12,6,20,19]
print(n)
key = pf.pair(pf.pair(n[0],n[1]),
pf.pair(n[2], n[3]))
print(key)
m = [pf.depair(pf.depair(key)[0]),
pf.depair(pf.depair(key)[1])]
print(m)
Output is:
[12, 6, 20, 19]
477575
[(12, 6), (20, 19)]

Generate a hash sum for several integers

I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling