Objective-C : Fowler–Noll–Vo (FNV) Hash implementation - iphone

I have a HTTP connector in my iPhone project and queries must have a parameter set from the username using the Fowler–Noll–Vo (FNV) Hash.
I have a Java implementation working at this time, this is the code :
long fnv_prime = 0x811C9DC5;
long hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash *= fnv_prime;
hash ^= str.charAt(i);
}
Now on the iPhone side, I did this :
int64_t fnv_prime = 0x811C9DC5;
int64_T hash = 0;
for (int i=0; i < [myString length]; i++)
{
hash *= fnv_prime;
hash ^= [myString characterAtIndex:i];
}
This script doesn't give me the same result has the Java one.
In first loop, I get this :
hash = 0
hash = 100 (first letter is "d")
hash = 1865261300 (for hash = 100 and fnv_prime = -2128831035 like in Java)
Do someone see something I'm missing ?
Thanks in advance for the help !

In Java, this line:
long fnv_prime = 0x811C9DC5;
will yield in fnv_prime the numerical value -2128831035, because the constant is interpreted as an int, which is a 32-bit signed value in Java. That value is then sign-extended when written in a long.
Conversely, in the Objective-C code:
int64_t fnv_prime = 0x811C9DC5;
the 0x811C9DC5 is interpreted as an unsigned int constant (because it does not fit in a signed 32-bit int), with numerical value 2166136261. That value is then written into fnv_prime, and there is no sign to extend since, as far as the C compiler is concerned, the value is positive.
Thus you end up with distinct values for fnv_prime, which explains your distinct results.
This can be corrected in Java by adding a "L" suffix, like this:
long fnv_prime = 0x811C9DC5L;
which forces the Java compiler to interpret the constant as a long, with the same numerical value than what you get with the Objective-C code.

Incidentally, 0x811C9DC5 is not a FNV prime (it is not even prime); it is the 32 bit FNV "offset basis". You will get incorrect hash values if you use this value (and more hash collisions). The correct value for the 32 bit FNV prime is 0x1000193. See http://www.isthe.com/chongo/tech/comp/fnv/index.html

It is a difference in sign extension assigning the 32-bit value 0x811C9DC5 to a 64-bit var.

Are the characters in Java and Objective-c the same? NSString will give you unichars.

Related

How can I make a good hash function without unsigned integers?

I'm looking for a simple hash function that doesn't rely on integer overflow, and doesn't rely on unsigned integers.
The problem is that I have to create the hash function in blueprint from Unreal Engine (only has signed 32 bit integer, with undefined overflow behavior) and in PHP5, with a version that uses 64 bit signed integers.
So when I use the 'common' simple hash functions, they don't give the same result on both platforms because they all rely on bit-overflowing behavior of unsigned integers.
The only thing that is really important is that is has good 'randomness'. Does anyone know something simple that would accomplish this?
It's meant for a very basic signing symstem for sending messages to a server. Doesn't need to be top security... it's for storing high scores of a simple game on a server. The idea is that I would generate several hash-integers from the message (using different 'start numbers') and append them to make a hash-signature ). I just need to make sure that if people sniff the network messages send to the server that they cannot easily send faked messages. They would need to provide the correct hash-signature with their message, which they shouldn't be able to do unless they know the hash function being used. Ofcourse if they reverse engineer the game they can still 'hack' it, but I wouldn't know how to counter that...
I have no access to existing hash functions in the unreal engine blueprint system.
The first thing I would try would be to simulate the behavior of unsigned integers using signed integers, by explicitly applying the modulo operator whenever the accumulated hash-value gets large enough that it might risk overflowing.
Example code in C (apologies for the poor hash function, but the same technique should be applicable to any hash function, at least in principle):
#include <stdio.h>
#include <string.h>
int hashFunction(const char * buf, int numBytes)
{
const int multiplier = 33;
const int maxAllowedValue = 2147483648-256; // assuming 32-bit ints here
const int maxPreMultValue = maxAllowedValue/multiplier;
int hash = 536870912; // arbitrary starting number
for (int i=0; i<numBytes; i++)
{
hash = hash % maxPreMultValue; // make sure hash cannot overflow in the next operation!
hash = (hash*multiplier)+buf[i];
}
return hash;
}
int main(int argc, char ** argv)
{
while(1)
{
printf("Enter a string to hash:\n");
char buf[1024]; fgets(buf, sizeof(buf), stdin);
printf("Hash code for that string is: %i\n", hashFunction(buf, strlen(buf)));
}
}

kdb c++ interface: create byte list from std::string

The following is very slow for long strings:
std::string s = "long string";
K klist = DBVec::CreateList(KG , s.length());
for (int i=0; i<s.length(); i++)
{
kG(klist)[i]=s.c_str()[i];
}
It works acceptably fast (<100ms) for strings up to 100k, but slows to a crawl (tens of minutes, possibly hours) for strings of a few million characters. I don't see anything other than kG that can create nonlinearity. I don't see any reason for accessor function kG to be non-constant time, but there is just nothing else in this loop. Unfortunately I don't know how kG works due to lack of documentation.
Question: given a blob of binary data as std::string, what's the efficient way to construct a byte list?
kG is a macro defined in k.h which expands to ((x)->G0), i.e. follow the G0 pointer of the K object
http://kx.com/q/d/a/c.htm#Strings documents kp, which creates a K string object directly from a string, so presumably you could do K klist = kp(s.c_str()), which is probably faster
This works:
memcpy(kG(klist), s.c_str(), s.length());
Still wonder why that loop is not O(N).

I'm using ELF Hash to write a specially tweaked version of hash map. Wanting to produce collisions

Can any one give an example of 2 strings, consisting of alphabetical characters only, that will produce the same hash value with ELFHash?
I need these to test my codes. But it doesn't seem like easy to produce. And to my surprise there there are a lot of example codes of various hash function on the internet but none of them provides examples of collided strings.
Below is the ELF Hash, in case you need it.
unsigned int ELFHash(const std::string& str)
{
unsigned int hash = 0;
unsigned int x = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = (hash << 4) + str[i];
if((x = hash & 0xF0000000L) != 0)
{
hash ^= (x >> 24);
hash &= ~x;
}
}
return (hash & 0x7FFFFFFF);
}
You can find collisions using a brute force method (e.g. compute all possible strings with length lower than 5).
Some example of collisions (that I got in that way):
hash = 23114:
-------------
UMz
SpJ
hash = 4543841:
---------------
AAAAQ
AAABA
hash = 5301994:
---------------
KYQYZ
KYQZJ
KYRIZ
KYRJJ
KZAYZ

How to unpack (64-bit) unsigned long in 64-bit Perl?

I'm trying to unpack an unsigned long value that is passed from a C program to a Perl script via SysV::IPC.
It is known that the value is correct (I made a test which sends the same value into two queues, one read by Perl and the second by the C application), and all predecessing values are read correctly (used q instead of i! to work with 64-bit integers).
It is also known that PHP had something similar in bugs (search for "unsigned long on 64 bit machines") (seems to be similar:
Pack / unpack a 64-bit int on 64-bit architecture in PHP)
Arguments tested so far:
..Q ( = some value that is larger than expected)
..L ( = 0)
..L! ( = large value)
..l ( = 0)
..l! ( = large value)
..lN! ( = 0)
..N, ..N! ( = 0)
use bigint; use bignum; -- no effect.
Details:
sizeof(unsigned long) = 8;
Data::Dumper->new([$thatstring])->Useqq(1)->Dump(); a lot of null bytes along some meaningful..
byteorder='12345678';
Solution:
- x4Q with padding four bytes.
Unpacking using Q in the template works out of the box if you have 64-bit Perl:
The TEMPLATE is a sequence of characters that give the order
and type of values, as follows:
...
q A signed quad (64-bit) value.
Q An unsigned quad value.
(Quads are available only if your system supports 64-bit
integer values _and_ if Perl has been compiled to support those.
Causes a fatal error otherwise.)
For a more robust solution, unpack the value into an 8-byte string and use the Math::Int64 module to convert it to an integer:
use Math::Int64 qw( :native_if_available int64 );
...
$string_value = unpack("A8", $longint_from_the_C_program);
# one of these two functions will work, depending on your system's endian-ness
$int_value = Math::Int64::native_to_int64($string_value);
$int_value = Math::Int64::net_to_int64($string_value);
The solution was simple: added x4Q to skip four bytes before actual value; need to more visually think of padding/alignment..

What does the & symbol mean in Objective-C?

What does the & symbol mean in Objective-C? I am currently looking at data constucts and am getting really confused by it.
I have looked around the web for it but have not found an answer at all. I know this is possibly a basic Objective-C concept, but I just can't get my head around it.
For example:
int *pIntData = (int *)&incomingPacket[0];
What is the code doing with incoming packet here?
& is the C address-of unary operator. It returns the memory address of its operand.
In your example, it will return the address of the first element of the incomingPacket array, which is then cast to an int* (pointer to int)
Same thing it means in C.
int *pIntData = (int *)&incomingPacket[0];
Basically this says that the address of the beginning of incomingPacket (&incomingPacket[0]) is a pointer to an int (int *). The local variable pIntData is defined as a pointer to an int, and is set to that value.
Thus:
*pIntData will equal to the first int at the beginning of incomingPacket.
pIntData[0] is the same thing.
pIntData[5] will be the 6th int into the incomingPacket.
Why do this? If you know the data you are being streamed is an array of ints, then this makes it easier to iterate through the ints.
This statement, If I am not mistaken, could also have been written as:
int *pIntData = (int *) incomingPacket;