Modified FNV-1 hash algorithm in golang - postgresql

Native library has FNV-1 hash algorithm https://golang.org/pkg/hash/fnv/ that returns uint64 value (range: 0 through 18446744073709551615).
I need to store this value in PostgreSQL bigserial, but it's range is 1 to 9223372036854775807.
It is possible to change hash size to eg. 56?http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
Can someone help to change native algorithm to produce 56 bit hashes?
https://golang.org/src/hash/fnv/fnv.go
Update
Did it myself using this doc http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
package main
import (
"fmt"
"hash/fnv"
)
func main() {
const MASK uint64 = 1<<63 - 1
h := fnv.New64()
h.Write([]byte("1133"))
hash := h.Sum64()
fmt.Printf("%#x\n", MASK)
fmt.Println(hash)
hash = (hash >> 63) ^ (hash & MASK)
fmt.Println(hash)
}
http://play.golang.org/p/j7q3D73qqu
Is it correct?

Is it correct?
Yes, it's a correct XOR-folding to 63 bits. But there's a much easier way:
hash = hash % 9223372036854775808
The distribution of XOR-folding is dubious, probably proven somewhere but not immediately obvious. Modulo, however, is clearly a wrapping of the hash algo's distribution to a smaller codomain.

Related

Seed for hash-table non cryptographic hash functions

If one sets the hash table seed during resize or table creation to a random number, will that prevent the DDoS attacks on such hash table or, knowing the hash algorithm, the attacker will still easily get around the seed? What if the algorithm uses the Pearson hash function with randomly generated tables, unknown to the attacker? Does such table hash still need a seed or it is safe enough?
Context: I want to use an on-disk hash table for a key-value database for my toy web server, where the keys may depend on the user input.
There is exist several approaches to protect your hash-subsystem from "adverse selection" attack, most popular of them is named Universal Hashing, where hash-function or it's property randomly selected, at initialization.
In my own approach, I am using same hash function, where each char adding to result with non-linear mixing, dependends of random array of uint32_t[256]. Array is created during system initialization, and in my code, it happening at each start, by reading the /dev/urandom. See my implementation in open source emerSSL program. You're welcome for borrow this entire hash-table implementation, or hash-function only.
Currently, my hash-function from the referred source computes two independent hashes for double hashing search algorithm.
There is "reduced" hash-function form the source, to demonstrate idea of non-linear mixing with S-block array"
uint32_t S_block[0x100]; // Substitute block, random contains
#define NLF(h, c) (S_block[(unsigned char)(c + h)] ^ c)
#define ROL(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
int32_t hash(const char *key) {
uint32_t h = 0x1F351F35; // Barker code * 2
char c;
for(int i = 0; c = key[i]; i++) {
h = ROL(h, 5);
h += NLF(h, c);
}
return h;
}

Go: How does this hash function range from 0-32 bits?

I'm trying to write my own hash function that uses a 30-bit hash.
Here is some code for a FNVa 32-bit hash.
func fnva32(data string) uint32 {
var hash uint32 = 2166136261
for _, c := range data {
hash ^= uint32(c)
hash *= 16777619
}
return hash
}
Now here is my code that converts lowercase letters a-z into a 30-bit hash:
func id(s string) uint {
var id uint
var power uint = 1
for _, c := range s {
id+=(uint(c)-96)*power
power*=26
}
return id%1073741824
}
That specifically limits my hash function to a maximum of 30-bit because I'm using a modulus against that number. But how is that FNVa32 hash limited to 32-bits? They are not using a modulus. How does it not generate a number larger than that?
Also you probably notice that I'm not using prime numbers. I tried some prime numbers but it increased the collisions. Currently I'm getting 291 collisions and FNVa32 is getting 76 collisions, from hashing 600,000 (real) words.
My question is... what is making FNVa32 limit to 32-bit, and how would I change it to be 30-bit instead?
The return type of the fnva32 function is uint32 so there is no way it could return an answer with more bits. Also, the calculation uses a uint32 variable internally.

Golang md5 Sum() function

package main
import (
"crypto/md5"
"fmt"
)
func main() {
hash := md5.New()
b := []byte("test")
fmt.Printf("%x\n", hash.Sum(b))
hash.Write(b)
fmt.Printf("%x\n", hash.Sum(nil))
}
Output:
*md5.digest74657374d41d8cd98f00b204e9800998ecf8427e
098f6bcd4621d373cade4e832627b4f6
Could someone please explain to me why/how do I get different result for the two print ?
I'm building up on the already good answers. I'm not sure if Sum is actually the function you want. From the hash.Hash documentation:
// Sum appends the current hash to b and returns the resulting slice.
// It does not change the underlying hash state.
Sum(b []byte) []byte
This function has a dual use-case, which you seem to mix in an unfortunate way. The use-cases are:
Computing the hash of a single run
Chaining the output of several runs
In case you simply want to compute the hash of something, either use md5.Sum(data) or
digest := md5.New()
digest.Write(data)
hash := digest.Sum(nil)
This code will, according to the excerpt of the documentation above, append the checksum of data to nil, resulting in the checksum of data.
If you want to chain several blocks of hashes, the second use-case of hash.Sum, you can do it like this:
hashed := make([]byte, 0)
for hasData {
digest.Write(data)
hashed = digest.Sum(hashed)
}
This will append each iteration's hash to the already computed hashes. Probably not what you want.
So, now you should be able to see why your code is failing. If not, take this commented version of your code (On play):
hash := md5.New()
b := []byte("test")
fmt.Printf("%x\n", hash.Sum(b)) // gives 74657374<hash> (74657374 = "test")
fmt.Printf("%x\n", hash.Sum([]byte("AAA"))) // gives 414141<hash> (41 = 'A')
fmt.Printf("%x\n", hash.Sum(nil)) // gives <hash> as append(nil, hash) == hash
fmt.Printf("%x\n", hash.Sum(b)) // gives 74657374<hash> (74657374 = "test")
fmt.Printf("%x\n", hash.Sum([]byte("AAA"))) // gives 414141<hash> (41 = 'A')
hash.Write(b)
fmt.Printf("%x\n", hash.Sum(nil)) // gives a completely different hash since internal bytes changed due to Write()
You have 2 ways to actually get a md5.Sum of a byte slice :
func main() {
hash := md5.New()
b := []byte("test")
hash.Write(b)
fmt.Printf("way one : %x\n", hash.Sum(nil))
fmt.Printf("way two : %x\n", md5.Sum(b))
}
According to http://golang.org/src/pkg/crypto/md5/md5.go#L88, your hash.Sum(b) is like calling append(b, actual-hash-of-an-empty-md5-hash).
The definition of Sum :
func (d0 *digest) Sum(in []byte) []byte {
// Make a copy of d0 so that caller can keep writing and summing.
d := *d0
hash := d.checkSum()
return append(in, hash[:]...)
}
When you call Sum(nil) it returns d.checkSum() directly as a byte slice, however if you call Sum([]byte) it appends d.checkSum() to your input.
From the docs:
// Sum appends the current hash to b and returns the resulting slice.
// It does not change the underlying hash state.
Sum(b []byte) []byte
so "*74657374*d41d8cd98f00b204e9800998ecf8427e" is actually a hex representation of "test", plus the initial state of the hash.
fmt.Printf("%x", []byte{"test"})
will result in... "74657374"!
So basically hash.Sum(b) is not doing what you think it does. The second statement is the right hash.
I would like to tell you to the point:
why/how do I get different result for the two print ?
Ans:
hash := md5.New()
As you are creating a new instance of md5 hash once you call hash.Sum(b) it actually md5 hash for b as hash itself is empty, hence you got 74657374d41d8cd98f00b204e9800998ecf8427e as output.
Now in next statement hash.Write(b) you are writing b to the hash instance then calling hash.Sum(nil) it will calculate md5 for b that you just written and sum it to previous value i.e 74657374d41d8cd98f00b204e9800998ecf8427e
This is the reason you are getting these outputs.
For your reference look at the Sum API:
func (d0 *digest) Sum(in []byte) []byte {
85 // Make a copy of d0 so that caller can keep writing and summing.
86 d := *d0
87 hash := d.checkSum()
88 return append(in, hash[:]...)
89 }

Why do I need to add the original salt to each hash iteration of a password?

I understand it is important to hash passwords over multiple iterations to make things harder for an attacker. I have read numerous times that when processing these iterations, it is critical to hash not only the result of the previous hashing, but also append the original salt each time. In other words:
I need to not do this:
var hash = sha512(salt + password);
for (i = 0; i < 1000; i++) {
hash = sha512(hash);
}
And instead, need to do this:
var hash = sha512(salt + password);
for (i = 0; i < 1000; i++) {
hash = sha512(salt + hash);
}
My question is regarding the math here. Why does my bad example above make things easier for an attacker? I've heard that it would increase the likelihood of collisions but I am not understanding why.
It is not that you simply need to do "hash = sha512(salt + hash)" - it's more complex than that. An HMAC is a better way of adding your salt (and PBKDF2 is based on HMAC - see below for more detail on PBKDF2) - there's a good discussion at When is it safe to use a broken hash function? for those details.
You are correct in that you need to have multiple iterations of a hash function for security.
However, don't roll your own. See How to securely hash passwords?, and note that PBKDF2, BCrypt, and Scrypt are all means of doing so.
PBKDF2, also known as PKCS#5v2 and RFC2898 is in fact reasonably close to what you're doing (multiple iterations of a normal hash function), particular in the form of PBKDF2-HMAC-SHA-512, in particular section 5.2 lists:
For each block of the derived key apply the function F defined
below to the password P, the salt S, the iteration count c, and
the block index to compute the block:
T_1 = F (P, S, c, 1) ,
T_2 = F (P, S, c, 2) ,
...
T_l = F (P, S, c, l) ,
where the function F is defined as the exclusive-or sum of the
first c iterates of the underlying pseudorandom function PRF
applied to the password P and the concatenation of the salt S
and the block index i:
F (P, S, c, i) = U_1 \xor U_2 \xor ... \xor U_c
where
U_1 = PRF (P, S || INT (i)) ,
U_2 = PRF (P, U_1) ,
...
U_c = PRF (P, U_{c-1}) .
Here, INT (i) is a four-octet encoding of the integer i, most
significant octet first.
P.S. SHA-512 was a good choice of hash primitive - SHA-512 (and SHA-384) are also superior to MD5, SHA-1, and even SHA-224 and SHA-256 because SHA-384 and up use 64-bit operations which current GPU's (early 2014) do not have as much of an advantage over current CPU's with as they do 32-bit operations, thus reducing the margin of superiority attackers have for offline attacks.

Purpose of avalanching

I was researching different hash functions and came across SuperFastHash. This hashing function used a technique called "avalanching" which was defined like this:
/* Force "avalanching" of final 127 bits */
hash ^= hash << 3;
hash += hash >> 5;
hash ^= hash << 4;
hash += hash >> 17;
hash ^= hash << 25;
hash += hash >> 6;
What is the purpose of avalanching? Why are theese specific bit shift steps used (3, 5, 4..)?
Avalanching is just a term to define the "difussion" of small changes on input to the final result, for criptographic hashes where non-reversability is a really crucial having similar inputs provide really different results is a desirable feature to avoid an approximation attack crack a single hash.
See more info about this at http://en.wikipedia.org/wiki/Avalanche_effect
I can not see why it uses that steps but it is using AND and XOR with the own shifted result to increase the diffusion, probably other values will perform similar but that will need a deeper analysis