Purpose of avalanching - hash

I was researching different hash functions and came across SuperFastHash. This hashing function used a technique called "avalanching" which was defined like this:
/* Force "avalanching" of final 127 bits */
hash ^= hash << 3;
hash += hash >> 5;
hash ^= hash << 4;
hash += hash >> 17;
hash ^= hash << 25;
hash += hash >> 6;
What is the purpose of avalanching? Why are theese specific bit shift steps used (3, 5, 4..)?

Avalanching is just a term to define the "difussion" of small changes on input to the final result, for criptographic hashes where non-reversability is a really crucial having similar inputs provide really different results is a desirable feature to avoid an approximation attack crack a single hash.
See more info about this at http://en.wikipedia.org/wiki/Avalanche_effect
I can not see why it uses that steps but it is using AND and XOR with the own shifted result to increase the diffusion, probably other values will perform similar but that will need a deeper analysis

Related

Seed for hash-table non cryptographic hash functions

If one sets the hash table seed during resize or table creation to a random number, will that prevent the DDoS attacks on such hash table or, knowing the hash algorithm, the attacker will still easily get around the seed? What if the algorithm uses the Pearson hash function with randomly generated tables, unknown to the attacker? Does such table hash still need a seed or it is safe enough?
Context: I want to use an on-disk hash table for a key-value database for my toy web server, where the keys may depend on the user input.
There is exist several approaches to protect your hash-subsystem from "adverse selection" attack, most popular of them is named Universal Hashing, where hash-function or it's property randomly selected, at initialization.
In my own approach, I am using same hash function, where each char adding to result with non-linear mixing, dependends of random array of uint32_t[256]. Array is created during system initialization, and in my code, it happening at each start, by reading the /dev/urandom. See my implementation in open source emerSSL program. You're welcome for borrow this entire hash-table implementation, or hash-function only.
Currently, my hash-function from the referred source computes two independent hashes for double hashing search algorithm.
There is "reduced" hash-function form the source, to demonstrate idea of non-linear mixing with S-block array"
uint32_t S_block[0x100]; // Substitute block, random contains
#define NLF(h, c) (S_block[(unsigned char)(c + h)] ^ c)
#define ROL(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
int32_t hash(const char *key) {
uint32_t h = 0x1F351F35; // Barker code * 2
char c;
for(int i = 0; c = key[i]; i++) {
h = ROL(h, 5);
h += NLF(h, c);
}
return h;
}

Hash function for 8 / 16 bit "graphics" on 8 bit processor

For an implementation of coherent noise (similar to Perlin noise), I'm looking for a hash function suitable for graphics.
I don't need it to be in any way cryptographic, and really, I don't even need it to be a super brilliant hash.
I just want to to combine two 16 bit numbers and output an 8 bit hash. As random as possible is good, but also, fast on a AVR processor (8 bit, as used by Arduino) is good.
Currently I'm using an implementation here:
const uint32_t hash(uint32_t a)
{
a -= (a<<6);
a ^= (a>>17);
a -= (a<<9);
a ^= (a<<4);
a -= (a<<3);
a ^= (a<<10);
a ^= (a>>15);
return a;
}
But given that I'm truncating all but 8 bits, and I don't need anything spectacular, can I get away with something using fewer instructions?
… I'm inspired in this search by the lib8tion library that's packaged with FastLED. It has specific functions to, for example, multiple two uint8_t numbers to give a uint16_t number in the fewest possible clock cycles.
Check out Pearson hashing:
unsigned char hash(unsigned short a, unsigned short b) {
static const unsigned char t[256] = {...};
return t[t[t[t[a & 0xFF] ^ (b & 0xFF)] ^ (a >> 8)] ^ (b >> 8)];
}

Fastest type to use for comparing hashes in matlab

I have a table in Matlab with some columns representing 128 bit hashes.
I would like to match rows, to one or more rows, based on these hashes.
Currently, the hashes are represented as hexadecimal strings, and compared with strcmp(). Still, it takes many seconds to process the table.
What is the fastest way to compare two hashes in matlab?
I have tried turning them into categorical variables, but that is much slower. Matlab as far as I know does not have a 128 bit numerical type. nominal and ordinal types are deprecated.
Are there any others that could work?
The code below is analogous to what I am doing:
nodetype = { 'type1'; 'type2'; 'type1'; 'type2' };
hash = {'d285e87940fb9383ec5e983041f8d7a6'; 'd285e87940fb9383ec5e983041f8d7a6'; 'ec9add3cf0f67f443d5820708adc0485'; '5dbdfa232b5b61c8b1e8c698a64e1cc9' };
entries = table(categorical(nodetype),hash,'VariableNames',{'type','hash'});
%nodes to match. filter by type or some other way so rows don't match to
%themselves.
A = entries(entries.type=='type1',:);
B = entries(entries.type=='type2',:);
%pick a node/row with a hash to find all counterparts of
row_to_match_in_A = A(1,:);
matching_rows_in_B = B(strcmp(B.hash,row_to_match_in_A.hash),:);
% do stuff with matching rows...
disp(matching_rows_in_B);
The hash strings are faithful representations of what I am using, but they are not necessarily read or stored as strings in the original source. They are just converted for this purpose because its the fastest way to do the comparison.
Optimization is nice, if you need it. Try it out yourself and measure the performance gain for relevant test cases.
Some suggestions:
Sorted arrays are easier/faster to search
Matlab's default numbers are double, but you can also construct integers. Why not use 2 uint64's instead of the 128bit column? First search for the upper 64bit, then for the lower; or even better: use ismember with the row option and put your hashes in rows:
A = uint64([0 0;
0 1;
1 0;
1 1;
2 0;
2 1]);
srch = uint64([1 1;
0 1]);
[ismatch, loc] = ismember(srch, A, 'rows')
> loc =
4
2
Look into the compare functions you use (eg edit ismember) and strip out unnecessary operations (eg sort) and safety checks that you know in advance won't pose a problem. Like this solution does. Or if you intend do call a search function multiple times, sort in advance and skip the check/sort in the search function later on.

Modified FNV-1 hash algorithm in golang

Native library has FNV-1 hash algorithm https://golang.org/pkg/hash/fnv/ that returns uint64 value (range: 0 through 18446744073709551615).
I need to store this value in PostgreSQL bigserial, but it's range is 1 to 9223372036854775807.
It is possible to change hash size to eg. 56?http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
Can someone help to change native algorithm to produce 56 bit hashes?
https://golang.org/src/hash/fnv/fnv.go
Update
Did it myself using this doc http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
package main
import (
"fmt"
"hash/fnv"
)
func main() {
const MASK uint64 = 1<<63 - 1
h := fnv.New64()
h.Write([]byte("1133"))
hash := h.Sum64()
fmt.Printf("%#x\n", MASK)
fmt.Println(hash)
hash = (hash >> 63) ^ (hash & MASK)
fmt.Println(hash)
}
http://play.golang.org/p/j7q3D73qqu
Is it correct?
Is it correct?
Yes, it's a correct XOR-folding to 63 bits. But there's a much easier way:
hash = hash % 9223372036854775808
The distribution of XOR-folding is dubious, probably proven somewhere but not immediately obvious. Modulo, however, is clearly a wrapping of the hash algo's distribution to a smaller codomain.

I'm using ELF Hash to write a specially tweaked version of hash map. Wanting to produce collisions

Can any one give an example of 2 strings, consisting of alphabetical characters only, that will produce the same hash value with ELFHash?
I need these to test my codes. But it doesn't seem like easy to produce. And to my surprise there there are a lot of example codes of various hash function on the internet but none of them provides examples of collided strings.
Below is the ELF Hash, in case you need it.
unsigned int ELFHash(const std::string& str)
{
unsigned int hash = 0;
unsigned int x = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = (hash << 4) + str[i];
if((x = hash & 0xF0000000L) != 0)
{
hash ^= (x >> 24);
hash &= ~x;
}
}
return (hash & 0x7FFFFFFF);
}
You can find collisions using a brute force method (e.g. compute all possible strings with length lower than 5).
Some example of collisions (that I got in that way):
hash = 23114:
-------------
UMz
SpJ
hash = 4543841:
---------------
AAAAQ
AAABA
hash = 5301994:
---------------
KYQYZ
KYQZJ
KYRIZ
KYRJJ
KZAYZ