how to create unique integer number from 3 different integers numbers(1 Oracle Long, 1 Date Field, 1 Short) - hash

the thing is that, the 1st number is already ORACLE LONG,
second one a Date (SQL DATE, no timestamp info extra), the last one being a Short value in the range 1000-100'000.
how can I create sort of hash value that will be unique for each combination optimally?
string concatenation and converting to long later:
I don't want this, for example.
Day Month
12 1 --> 121
1 12 --> 121

When you have a few numeric values and need to have a single "unique" (that is, statistically improbable duplicate) value out of them you can usually use a formula like:
h = (a*P1 + b)*P2 + c
where P1 and P2 are either well-chosen numbers (e.g. if you know 'a' is always in the 1-31 range, you can use P1=32) or, when you know nothing particular about the allowable ranges of a,b,c best approach is to have P1 and P2 as big prime numbers (they have the least chance to generate values that collide).
For an optimal solution the math is a bit more complex than that, but using prime numbers you can usually have a decent solution.
For example, Java implementation for .hashCode() for an array (or a String) is something like:
h = 0;
for (int i = 0; i < a.length; ++i)
h = h * 31 + a[i];
Even though personally, I would have chosen a prime bigger than 31 as values inside a String can easily collide, since a delta of 31 places can be quite common, e.g.:
"BB".hashCode() == "Aa".hashCode() == 2122

Your
12 1 --> 121
1 12 --> 121
problem is easily fixed by zero-padding your input numbers to the maximum width expected for each input field.
For example, if the first field can range from 0 to 10000 and the second field can range from 0 to 100, your example becomes:
00012 001 --> 00012001
00001 012 --> 00001012

In python, you can use this:
#pip install pairing
import pairing as pf
n = [12,6,20,19]
print(n)
key = pf.pair(pf.pair(n[0],n[1]),
pf.pair(n[2], n[3]))
print(key)
m = [pf.depair(pf.depair(key)[0]),
pf.depair(pf.depair(key)[1])]
print(m)
Output is:
[12, 6, 20, 19]
477575
[(12, 6), (20, 19)]

Related

Hashing functions and Universal Hashing Family

I need to determine whether the following Hash Functions Set is universal or not:
Let U be the set of the keys - {000, 001, 002, 003, ... ,999} - all the numbers between 0 and 999 padded with 0 in the beginning where needed. Let n = 10 and 1 < a < 9 ,an integer between 1 and 9. We denote by ha(x) the rightmost digit of the number a*x.
For example, h2(123) = 6, because, 2 * 123 = 246.
We also denote H = {h1, h2, h3, ... ,h9} as our set of hash functions.
Is H is universal? prove.
I know I need to calculate the probability for collision of 2 different keys and check if it's smaller or equal to 1/n (which is 1/10), so I tried to separate into cases - if a is odd or even, because when a is even the last digit of a*x will be 0/2/4/6/8, else it could be anything. But it didn't help me so much as I'm stuck on it.
Would be very glad for some help here.

Using SUM and UNIQUE to count occurrences of value within subset of a matrix

So, presume a matrix like so:
20 2
20 2
30 2
30 1
40 1
40 1
I want to count the number of times 1 occurs for each unique value of column 1. I could do this the long way by [sum(x(1:2,2)==1)] for each value, but I think this would be the perfect use for the UNIQUE function. How could I fix it so that I could get an output like this:
20 0
30 1
40 2
Sorry if the solution seems obvious, my grasp of loops is very poor.
Indeed unique is a good option:
u=unique(x(:,1))
res=arrayfun(#(y)length(x(x(:,1)==y & x(:,2)==1)),u)
Taking apart that last line:
arrayfun(fun,array) applies fun to each element in the array, and puts it in a new array, which it returns.
This function is the function #(y)length(x(x(:,1)==y & x(:,2)==1)) which finds the length of the portion of x where the condition x(:,1)==y & x(:,2)==1) holds (called logical indexing). So for each of the unique elements, it finds the row in X where the first is the unique element, and the second is one.
Try this (as specified in this answer):
>>> [c,~,d] = unique(a(a(:,2)==1))
c =
30
40
d =
1
3
>>> counts = accumarray(d(:),1,[],#sum)
counts =
1
2
>>> res = [c,counts]
Consider you have an array of various integers in 'array'
the tabulate function will sort the unique values and count the occurances.
table = tabulate(array)
look for your unique counts in col 2 of table.

Algorithm to convert integer (represented as an array) with base n to integer with base m

I have a, very long, integer. The integer is represented by a array of unsigned chars.
Example: the integer 1234 with base 10 is represented in the array as [4,3,2,1], [2,2,3,2] (base 8) and [2,13,4] (base 16)
Now I want to convert my integer with base n to another integer with base m. In my persued for a answer I came accross Wallar's algorithm, originally from here.
from math import *
def baseExpansion(n,c,b):
j = 0
base10 = sum([pow(c,len(n)-k-1)*n[k] for k in range(0,len(n))])
while floor(base10/pow(b,j)) != 0: j = j+1
return [floor(base10/pow(b,j-p)) % b for p in range(1,j+1)]
At first I thought this was my answer but unfortunately it is not. The problem I have is that the algorithm computes the sum. In my case this is a problem because the variable base10 is of type unsigned integer of 32 bits. Therefore when my integer, represented as a array, has more then 10 digits it can not convert the number anymore. Anyone has a solution?
Here's the school-book algorithm for doing what you're trying. You start with a representation for zero and call it a running total. Then, for each digit of the number to be converted, starting with the most significant and going to the least significant, 1) multiply the running total by the base of the source number and 2) add the digit to the running total. Now all you need is algorithms to do the multiplication and addition (and you can actually do both at once). Here's how to do that: 1) set the current digit to a variable, call it "carry", 2) for each digit in your new number, starting with the least significant and going to the most significant: 2a) set carry to the current digit in the new number times the output base plus carry, 2b) set the current digit to carry mod the output base, 2c) set carry to carry divided by the output base. And that should do it. There is an implementation of what you are trying to do somewhere here: http://www.cis.ksu.edu/~howell/calculator/comparison.html

binary to decimal in objective-c

I want to convert the decimal number 27 into binary such a way that , first the digit 2 is converted and its binary value is placed in an array and then the digit 7 is converted and its binary number is placed in that array. what should I do?
thanks in advance
That's called binary-coded decimal. It's easiest to work right-to-left. Take the value modulo 10 (% operator in C/C++/ObjC) and put it in the array. Then integer-divide the value by 10 (/ operator in C/C++/ObjC). Continue until your value is zero. Then reverse the array if you need most-significant digit first.
If I understand your question correctly, you want to go from 27 to an array that looks like {0010, 0111}.
If you understand how base systems work (specifically the decimal system), this should be simple.
First, you find the remainder of your number when divided by 10. Your number 27 in this case would result with 7.
Then you integer divide your number by 10 and store it back in that variable. Your number 27 would result in 2.
How many times do you do this?
You do this until you have no more digits.
How many digits can you have?
Well, if you think about the number 100, it has 3 digits because the number needs to remember that one 10^2 exists in the number. On the other hand, 99 does not.
The answer to the previous question is 1 + floor of Log base 10 of the input number.
Log of 100 is 2, plus 1 is 3, which equals number of digits.
Log of 99 is a little less than 2, but flooring it is 1, plus 1 is 2.
In java it is like this:
int input = 27;
int number = 0;
int numDigits = Math.floor(Log(10, input)) + 1;
int[] digitArray = new int [numDigits];
for (int i = 0; i < numDigits; i++) {
number = input % 10;
digitArray[numDigits - i - 1] = number;
input = input / 10;
}
return digitArray;
Java doesn't have a Log function that is portable for any base (it has it for base e), but it is trivial to make a function for it.
double Log( double base, double value ) {
return Math.log(value)/Math.log(base);
}
Good luck.

Generate a hash sum for several integers

I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling