I'm trying to write my own hash function that uses a 30-bit hash.
Here is some code for a FNVa 32-bit hash.
func fnva32(data string) uint32 {
var hash uint32 = 2166136261
for _, c := range data {
hash ^= uint32(c)
hash *= 16777619
}
return hash
}
Now here is my code that converts lowercase letters a-z into a 30-bit hash:
func id(s string) uint {
var id uint
var power uint = 1
for _, c := range s {
id+=(uint(c)-96)*power
power*=26
}
return id%1073741824
}
That specifically limits my hash function to a maximum of 30-bit because I'm using a modulus against that number. But how is that FNVa32 hash limited to 32-bits? They are not using a modulus. How does it not generate a number larger than that?
Also you probably notice that I'm not using prime numbers. I tried some prime numbers but it increased the collisions. Currently I'm getting 291 collisions and FNVa32 is getting 76 collisions, from hashing 600,000 (real) words.
My question is... what is making FNVa32 limit to 32-bit, and how would I change it to be 30-bit instead?
The return type of the fnva32 function is uint32 so there is no way it could return an answer with more bits. Also, the calculation uses a uint32 variable internally.
Related
Native library has FNV-1 hash algorithm https://golang.org/pkg/hash/fnv/ that returns uint64 value (range: 0 through 18446744073709551615).
I need to store this value in PostgreSQL bigserial, but it's range is 1 to 9223372036854775807.
It is possible to change hash size to eg. 56?http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
Can someone help to change native algorithm to produce 56 bit hashes?
https://golang.org/src/hash/fnv/fnv.go
Update
Did it myself using this doc http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
package main
import (
"fmt"
"hash/fnv"
)
func main() {
const MASK uint64 = 1<<63 - 1
h := fnv.New64()
h.Write([]byte("1133"))
hash := h.Sum64()
fmt.Printf("%#x\n", MASK)
fmt.Println(hash)
hash = (hash >> 63) ^ (hash & MASK)
fmt.Println(hash)
}
http://play.golang.org/p/j7q3D73qqu
Is it correct?
Is it correct?
Yes, it's a correct XOR-folding to 63 bits. But there's a much easier way:
hash = hash % 9223372036854775808
The distribution of XOR-folding is dubious, probably proven somewhere but not immediately obvious. Modulo, however, is clearly a wrapping of the hash algo's distribution to a smaller codomain.
I am a complete beginner with the D language.
How to get, as an uint unsigned 32 bits integer in the D language, some hash of a string...
I need a quick and dirty hash code (I don't care much about the "randomness" or the "lack of collision", I care slightly more about performance).
import std.digest.crc;
uint string_hash(string s) {
return crc320f(s);
}
is not good...
(using gdc-5 on Linux/x86-64 with phobos-2)
While Adams answer does exactly what you're looking for, you can also use a union to do the casting.
This is a pretty useful trick so may as well put it here:
/**
* Returns a crc32Of hash of a string
* Uses a union to store the ubyte[]
* And then simply reads that memory as a uint
*/
uint string_hash(string s){
import std.digest.crc;
union hashUnion{
ubyte[4] hashArray;
uint hashNumber;
}
hashUnion x;
x.hashArray = crc32Of(s); // stores the result of crc32Of into the array.
return x.hashNumber; // reads the exact same memory as the hashArray
// but reads it as a uint.
}
A really quick thing could just be this:
uint string_hash(string s) {
import std.digest.crc;
auto r = crc32Of(s);
return *(cast(uint*) r.ptr);
}
Since crc32Of returns a ubyte[4] instead of the uint you want, a conversion is necessary, but since ubyte[4] and uint are the same thing to the machine, we can just do a reinterpret cast with the pointer trick seen there to convert types for free at runtime.
I'm implementing a github push hook listener in dart, and I've come across this document: https://developer.github.com/webhooks/securing/
where it's written:
Using a plain == operator is not advised. A method like secure_compare
performs a “constant time” string comparison, which renders it safe
from certain timing attacks against regular equality operators.
I have to compare 2 hashes for equality. Now I was wondering if there was a way to compare string in constant time in dart? (read: is there a string constant time compare function in dart?)
The default implementation is not constant time, but you can just create your own comparison function that compares every code unit in the String and does not short circuit:
bool secureCompare(String a, String b) {
if(a.codeUnits.length != b.codeUnits.length)
return false;
var r = 0;
for(int i = 0; i < a.codeUnits.length; i++) {
r |= a.codeUnitAt(i) ^ b.codeUnitAt(i);
}
return r == 0;
}
This function will perform a constant time String compare as long as the two input Strings are of the same length. Since you are comparing hashes this shouldn't be a problem, but for variable length Strings this method will still leak timing info because it immediately returns if the lengths are not equal.
Can any one give an example of 2 strings, consisting of alphabetical characters only, that will produce the same hash value with ELFHash?
I need these to test my codes. But it doesn't seem like easy to produce. And to my surprise there there are a lot of example codes of various hash function on the internet but none of them provides examples of collided strings.
Below is the ELF Hash, in case you need it.
unsigned int ELFHash(const std::string& str)
{
unsigned int hash = 0;
unsigned int x = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = (hash << 4) + str[i];
if((x = hash & 0xF0000000L) != 0)
{
hash ^= (x >> 24);
hash &= ~x;
}
}
return (hash & 0x7FFFFFFF);
}
You can find collisions using a brute force method (e.g. compute all possible strings with length lower than 5).
Some example of collisions (that I got in that way):
hash = 23114:
-------------
UMz
SpJ
hash = 4543841:
---------------
AAAAQ
AAABA
hash = 5301994:
---------------
KYQYZ
KYQZJ
KYRIZ
KYRJJ
KZAYZ
I have a HTTP connector in my iPhone project and queries must have a parameter set from the username using the Fowler–Noll–Vo (FNV) Hash.
I have a Java implementation working at this time, this is the code :
long fnv_prime = 0x811C9DC5;
long hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash *= fnv_prime;
hash ^= str.charAt(i);
}
Now on the iPhone side, I did this :
int64_t fnv_prime = 0x811C9DC5;
int64_T hash = 0;
for (int i=0; i < [myString length]; i++)
{
hash *= fnv_prime;
hash ^= [myString characterAtIndex:i];
}
This script doesn't give me the same result has the Java one.
In first loop, I get this :
hash = 0
hash = 100 (first letter is "d")
hash = 1865261300 (for hash = 100 and fnv_prime = -2128831035 like in Java)
Do someone see something I'm missing ?
Thanks in advance for the help !
In Java, this line:
long fnv_prime = 0x811C9DC5;
will yield in fnv_prime the numerical value -2128831035, because the constant is interpreted as an int, which is a 32-bit signed value in Java. That value is then sign-extended when written in a long.
Conversely, in the Objective-C code:
int64_t fnv_prime = 0x811C9DC5;
the 0x811C9DC5 is interpreted as an unsigned int constant (because it does not fit in a signed 32-bit int), with numerical value 2166136261. That value is then written into fnv_prime, and there is no sign to extend since, as far as the C compiler is concerned, the value is positive.
Thus you end up with distinct values for fnv_prime, which explains your distinct results.
This can be corrected in Java by adding a "L" suffix, like this:
long fnv_prime = 0x811C9DC5L;
which forces the Java compiler to interpret the constant as a long, with the same numerical value than what you get with the Objective-C code.
Incidentally, 0x811C9DC5 is not a FNV prime (it is not even prime); it is the 32 bit FNV "offset basis". You will get incorrect hash values if you use this value (and more hash collisions). The correct value for the 32 bit FNV prime is 0x1000193. See http://www.isthe.com/chongo/tech/comp/fnv/index.html
It is a difference in sign extension assigning the 32-bit value 0x811C9DC5 to a 64-bit var.
Are the characters in Java and Objective-c the same? NSString will give you unichars.