How can I make a good hash function without unsigned integers? - hash

I'm looking for a simple hash function that doesn't rely on integer overflow, and doesn't rely on unsigned integers.
The problem is that I have to create the hash function in blueprint from Unreal Engine (only has signed 32 bit integer, with undefined overflow behavior) and in PHP5, with a version that uses 64 bit signed integers.
So when I use the 'common' simple hash functions, they don't give the same result on both platforms because they all rely on bit-overflowing behavior of unsigned integers.
The only thing that is really important is that is has good 'randomness'. Does anyone know something simple that would accomplish this?
It's meant for a very basic signing symstem for sending messages to a server. Doesn't need to be top security... it's for storing high scores of a simple game on a server. The idea is that I would generate several hash-integers from the message (using different 'start numbers') and append them to make a hash-signature ). I just need to make sure that if people sniff the network messages send to the server that they cannot easily send faked messages. They would need to provide the correct hash-signature with their message, which they shouldn't be able to do unless they know the hash function being used. Ofcourse if they reverse engineer the game they can still 'hack' it, but I wouldn't know how to counter that...
I have no access to existing hash functions in the unreal engine blueprint system.

The first thing I would try would be to simulate the behavior of unsigned integers using signed integers, by explicitly applying the modulo operator whenever the accumulated hash-value gets large enough that it might risk overflowing.
Example code in C (apologies for the poor hash function, but the same technique should be applicable to any hash function, at least in principle):
#include <stdio.h>
#include <string.h>
int hashFunction(const char * buf, int numBytes)
{
const int multiplier = 33;
const int maxAllowedValue = 2147483648-256; // assuming 32-bit ints here
const int maxPreMultValue = maxAllowedValue/multiplier;
int hash = 536870912; // arbitrary starting number
for (int i=0; i<numBytes; i++)
{
hash = hash % maxPreMultValue; // make sure hash cannot overflow in the next operation!
hash = (hash*multiplier)+buf[i];
}
return hash;
}
int main(int argc, char ** argv)
{
while(1)
{
printf("Enter a string to hash:\n");
char buf[1024]; fgets(buf, sizeof(buf), stdin);
printf("Hash code for that string is: %i\n", hashFunction(buf, strlen(buf)));
}
}

Related

c++ libtomcrypt library outputting shorter hashes/truncated hashes

I am trying to generate hashes to use in a blockchain project, when looking for a crypto library i stumbled accross tomcrypt and chose to download it since it was easy to install, but now i have a problem, when I create the hashes (btw i'm usign SHA3_512 but the bug is present in every other SHA hashing algorithm) sometimes it outputs the correct hash but truncated
photo example
Hash truncating example
this is the code for the hashing function
string hashSHA3_512(const std::string& input) {
//Initial
unsigned char* hashResult = new unsigned char[sha3_512_desc.hashsize];
//Initialize a state variable for the hash
hash_state md;
sha3_512_init(&md);
//Process the text - remember you can call process() multiple times
sha3_process(&md, (const unsigned char*) input.c_str(), input.size());
//Finish the hash calculation
sha3_done(&md, hashResult);
// Convert to string
string stringifiedHash(reinterpret_cast<char*>(hashResult));
// Return the result
return stringToHex(stringifiedHash);
}
and here is the code for the toHex function even if I already checked and the truncating hash problem pops up before this function is called
string stringToHex(const std::string& input)
{
static const char hex_digits[] = "0123456789abcdef";
std::string output;
output.reserve(input.length() * 2);
for (unsigned char c : input)
{
output.push_back(hex_digits[c >> 4]);
output.push_back(hex_digits[c & 15]);
}
return output;
}
if someone has knowledge about this library or in general about this problem and possible fixes pls explain to me, i'm stuck from 3 days
UPDATE
I figured out the program is truncating the hashes when it encounters 2 consecutive zeros in hex so 8 zeros in binary (or simply 2 bytes) but I still don't understand why, if you do pls let me and hopefully other people with the same problem know

Seed for hash-table non cryptographic hash functions

If one sets the hash table seed during resize or table creation to a random number, will that prevent the DDoS attacks on such hash table or, knowing the hash algorithm, the attacker will still easily get around the seed? What if the algorithm uses the Pearson hash function with randomly generated tables, unknown to the attacker? Does such table hash still need a seed or it is safe enough?
Context: I want to use an on-disk hash table for a key-value database for my toy web server, where the keys may depend on the user input.
There is exist several approaches to protect your hash-subsystem from "adverse selection" attack, most popular of them is named Universal Hashing, where hash-function or it's property randomly selected, at initialization.
In my own approach, I am using same hash function, where each char adding to result with non-linear mixing, dependends of random array of uint32_t[256]. Array is created during system initialization, and in my code, it happening at each start, by reading the /dev/urandom. See my implementation in open source emerSSL program. You're welcome for borrow this entire hash-table implementation, or hash-function only.
Currently, my hash-function from the referred source computes two independent hashes for double hashing search algorithm.
There is "reduced" hash-function form the source, to demonstrate idea of non-linear mixing with S-block array"
uint32_t S_block[0x100]; // Substitute block, random contains
#define NLF(h, c) (S_block[(unsigned char)(c + h)] ^ c)
#define ROL(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
int32_t hash(const char *key) {
uint32_t h = 0x1F351F35; // Barker code * 2
char c;
for(int i = 0; c = key[i]; i++) {
h = ROL(h, 5);
h += NLF(h, c);
}
return h;
}

cmph Minimal perfect hashing

I've spent days trying to make the library work on my system.
The library has several algorithms which generate MPHFs.
My understanding of minimal hash function is, that when I hash two distinct keys using the MPHF, they'll return two different ids.
This does not seem to be the case with the 2 million keys that I've generated (integers, read as string by the algorithm). I've tried couple of algorithms that the library implements but all of them result in duplicate 'ids' for a lot of keys.
Here is what I've written:
#include <cmph.h>
#include <iostream>
#include <fstream>
#include <bitset>
#include <string>
#include <sstream>
#include <limits.h>
using namespace std;
int main(int argc, char** argv){
FILE *fp = fopen("keys.txt", "r");
FILE *read = fopen("keys2.txt", "r");
ofstream ids("ids2.txt");
if(!fp || !read || !ids.is_open()){
cerr<<"Failed to open the file\n";
exit(1);
}
cmph_t* hash = NULL;
// source of keys
cmph_io_adapter_t *source = cmph_io_nlfile_adapter(fp);
cmph_config_t *config = cmph_config_new(source);
cmph_config_set_algo(config, CMPH_BDZ);
hash = cmph_new(config);
cmph_config_destroy(config);
char *k = (char *)malloc(sizeof(12));
while(fgets(k, INT_MAX, read) != NULL){
string key = k;
unsigned int id = cmph_search(hash, k, (cmph_uint32)key.length());
ids<<id<<"\n";
}
cmph_destroy(hash);
cmph_io_nlfile_adapter_destroy(source);
fclose(fp);
fclose(read);
ids.close();
}
Shouldn't the ids be unique for every distinct key if the algorithm claims to generate a minimal perfect hash function? There are 2048383 keys. For my project I would need the ids to map from 0 to 2048382, since I plan to use a minimal perfect hash function.
I am not sure where I am going wrong with my understanding.
Please help.
If your keys2.txt contains keys that weren't part of the set that was used to generate your hash, then, by definition of the mphf, you'll get either duplicate hashes or, possibly, values out of your range. It's up to you to store all keys that were used to generate hash and then verify that the key that was passed to cmph_search was the same as the one that resulted in the hash/id returned by cmph_search

Does 'mixing' the result of a 32-bit hash to create a 64-bit hash have any value?

For example, if you're programming in Java, and you want to create a 64-bit hash function for an arbitrary object, does it make sense to apply something like murmurHash3's 'finalizer' to the result of Object.hashCode()?
Specifically, is the following hash function
long Mix(int i)
{
long result = i;
return result ^ (result << 32) ^ (result << 33); // Or some 'better' way of mixing up the bits of i.
}
long Hash(Object o)
{
return Mix(o.hashCode());
}
better than simply doing
long Hash(Object o)
{
return o.hashCode();
}
(I'm well aware that the second one gives you nothing over a 32-bit hash)
The hash is going to be used to implement (recursive) hash-join, and the buckets are going to be determined by doing hash % prime. A concern is that it's going to be hard to make a good sequence of independent hash functions for the 'recursive' part if we only have 32-bits to start out with.
I'm thinking the answer is 'no', and that you really need to start out with a 64-bit hash which was computed directly from the value of the object.
I guess a side question is whether you actually need a 64-bit hash in the first place for the purposes of hash-join.

How to hash with ed25519-donna

I apologize for asking somewhat of a programming question, but I want to be sure I'm properly using this library cryptographically.
I have managed to implement ed25519-donna except for hashing the data for a signature.
As far as I can tell, this is the function that hashes data:
void ed25519_hash(uint8_t *hash, const uint8_t *in, size_t inlen);
but I can't figure out what *hash is. I'm fairly certain that *in and inlen are the data to be hashed and its length.
Is it something specific to SHA512?
How can one hash with ed25519-donna?
Program hangs
I've compiled with ed25519-donna-master/ed25519.o and the OpenSSL flags -lssl -lcrypto. The key generation, signing, and verification functions work as expected.
It's running without error, but the application hangs on these lines, and the cores are not running at 100%, so I don't think it's busy processing:
extern "C"
{
#include "ed25519-donna-master/ed25519.h"
#include "ed25519-donna-master/ed25519-hash.h"
}
#include <openssl/rand.h>
unsigned char* hash;
const unsigned char* in = convertStringToUnsignedCharStar( myString );
std::cout << in << std::endl;
std::cout << "this is the last portion output and 'in' outputs correctly" << std::endl;
ed25519_hash(hash, in, sizeof(in) );
std::cout << hash << std::endl;
std::cout << "this is never output" << std::endl;
How can this code be modified so that ed25519_hash can function? It works the same way regardless of whether hash and in are unsigned char* or uint8_t*s.
For uint8_t*, I used this code:
uint8_t* hash;
const uint8_t* in = reinterpret_cast<const uint8_t*>(myString.c_str());
“…but I can't figure out what *hash is.”
That uint8_t *hash is the buffer (unsigned char*) that will contain the resulting hash after you called the function.
So, you're looking at a function that expects 3 parameters (also known as arguments):
an uint8_t * buffer to hold the resulting hash,
the input data to be hashed,
the length of the input data to be hashed.
“Is it something specific to SHA512?”
Nope, it's regular C source. But I think you’re a bit confused by the documentation. It states…
If you are not compiling against OpenSSL, you will need a hash function.
…
To use a custom hash function, use -DED25519_CUSTOMHASH
when compiling ed25519.c and put your custom hash implementation
in ed25519-hash-custom.h. The hash must have a 512bit digest and
implement
…
void ed25519_hash(uint8_t *hash, const uint8_t *in, size_t inlen);
So, unless you are not compiling against OpenSSL and implementing your own hash function, you won't be needing this function. Looking at your code, you are compiling against OpenSSL, which means you're playing with the wrong function.
“How can one hash with ed25519-donna?”
By using the provided functionality the library offers.
Your question makes me wonder if you scrolled down to the “Usage” part of the readme, because it completely answers your question and tells you what functions to use.
For your convenience, let me point you to the part of the documentation you need to follow and where you find the functions you need to hash, sign, verify etc. using ed25519-donna:
To use the code, link against ed25519.o -mbits and:
#include "ed25519.h"
Add -lssl -lcrypto when using OpenSSL (Some systems don't
need -lcrypto? It might be trial and error).
To generate a private key, simply generate 32 bytes from a secure cryptographic source:
ed25519_secret_key sk;
randombytes(sk, sizeof(ed25519_secret_key));
To generate a public key:
ed25519_public_key pk;
ed25519_publickey(sk, pk);
To sign a message:
ed25519_signature sig;
ed25519_sign(message, message_len, sk, pk, signature);
To verify a signature:
int valid = ed25519_sign_open(message, message_len, pk, signature) == 0;
To batch verify signatures:
const unsigned char *mp[num] = {message1, message2..}
size_t ml[num] = {message_len1, message_len2..}
const unsigned char *pkp[num] = {pk1, pk2..}
const unsigned char *sigp[num] = {signature1, signature2..}
int valid[num]
/* valid[i] will be set to 1 if the individual signature was valid, 0 otherwise */
int all_valid = ed25519_sign_open_batch(mp, ml, pkp, sigp, num, valid) == 0;
…
As you see, it's all in there… just follow the documentation.