D language unsigned hash of string - hash

I am a complete beginner with the D language.
How to get, as an uint unsigned 32 bits integer in the D language, some hash of a string...
I need a quick and dirty hash code (I don't care much about the "randomness" or the "lack of collision", I care slightly more about performance).
import std.digest.crc;
uint string_hash(string s) {
return crc320f(s);
}
is not good...
(using gdc-5 on Linux/x86-64 with phobos-2)

While Adams answer does exactly what you're looking for, you can also use a union to do the casting.
This is a pretty useful trick so may as well put it here:
/**
* Returns a crc32Of hash of a string
* Uses a union to store the ubyte[]
* And then simply reads that memory as a uint
*/
uint string_hash(string s){
import std.digest.crc;
union hashUnion{
ubyte[4] hashArray;
uint hashNumber;
}
hashUnion x;
x.hashArray = crc32Of(s); // stores the result of crc32Of into the array.
return x.hashNumber; // reads the exact same memory as the hashArray
// but reads it as a uint.
}

A really quick thing could just be this:
uint string_hash(string s) {
import std.digest.crc;
auto r = crc32Of(s);
return *(cast(uint*) r.ptr);
}
Since crc32Of returns a ubyte[4] instead of the uint you want, a conversion is necessary, but since ubyte[4] and uint are the same thing to the machine, we can just do a reinterpret cast with the pointer trick seen there to convert types for free at runtime.

Related

In DPI-C, How to map data type to reg or wire

I am writing a CRC16 function in C to use in System Verilog.
Requirement as below:
Output of CRC16 has 16 bits
Input of CRC16 has bigger than 72 bits
The difficulty is that I don't know whether DPI-C can support map data type with reg/wire in System Verilog to C or not ?
how many maximum length of reg/wire can support to use DPI-C.
Can anybody help me ?
Stay with compatible types across the language boundaries. For output use shortint For input, use an array of byte in SystemVerilog which maps to array of char in C.
Dpi support has provision for any bit width, converting packed arrays into c-arrays. The question is: what are you going to do with 72-bit data at c side?
But, svBitVecVal for two-state bits and svLogicVecVal for four-stat logics could be used at 'c' side to retrieve values. Look at H.7.6/7 of lrm for more info.
Here is an example from lrm H.10.2 for 4-state data (logic):
SystemVerilog:
typedef struct {int x; int y;} pair;
import "DPI-C" function void f1(input int i1, pair i2, output logic [63:0] o3);
C:
void f1(const int i1, const pair *i2, svLogicVecVal* o3)
{
int tab[8];
printf("%d\n", i1);
o3[0].aval = i2->x;
o3[0].bval = 0;
o3[1].aval = i2->y;
o3[1].b = 0;
...
}

c++ libtomcrypt library outputting shorter hashes/truncated hashes

I am trying to generate hashes to use in a blockchain project, when looking for a crypto library i stumbled accross tomcrypt and chose to download it since it was easy to install, but now i have a problem, when I create the hashes (btw i'm usign SHA3_512 but the bug is present in every other SHA hashing algorithm) sometimes it outputs the correct hash but truncated
photo example
Hash truncating example
this is the code for the hashing function
string hashSHA3_512(const std::string& input) {
//Initial
unsigned char* hashResult = new unsigned char[sha3_512_desc.hashsize];
//Initialize a state variable for the hash
hash_state md;
sha3_512_init(&md);
//Process the text - remember you can call process() multiple times
sha3_process(&md, (const unsigned char*) input.c_str(), input.size());
//Finish the hash calculation
sha3_done(&md, hashResult);
// Convert to string
string stringifiedHash(reinterpret_cast<char*>(hashResult));
// Return the result
return stringToHex(stringifiedHash);
}
and here is the code for the toHex function even if I already checked and the truncating hash problem pops up before this function is called
string stringToHex(const std::string& input)
{
static const char hex_digits[] = "0123456789abcdef";
std::string output;
output.reserve(input.length() * 2);
for (unsigned char c : input)
{
output.push_back(hex_digits[c >> 4]);
output.push_back(hex_digits[c & 15]);
}
return output;
}
if someone has knowledge about this library or in general about this problem and possible fixes pls explain to me, i'm stuck from 3 days
UPDATE
I figured out the program is truncating the hashes when it encounters 2 consecutive zeros in hex so 8 zeros in binary (or simply 2 bytes) but I still don't understand why, if you do pls let me and hopefully other people with the same problem know

How can I make a good hash function without unsigned integers?

I'm looking for a simple hash function that doesn't rely on integer overflow, and doesn't rely on unsigned integers.
The problem is that I have to create the hash function in blueprint from Unreal Engine (only has signed 32 bit integer, with undefined overflow behavior) and in PHP5, with a version that uses 64 bit signed integers.
So when I use the 'common' simple hash functions, they don't give the same result on both platforms because they all rely on bit-overflowing behavior of unsigned integers.
The only thing that is really important is that is has good 'randomness'. Does anyone know something simple that would accomplish this?
It's meant for a very basic signing symstem for sending messages to a server. Doesn't need to be top security... it's for storing high scores of a simple game on a server. The idea is that I would generate several hash-integers from the message (using different 'start numbers') and append them to make a hash-signature ). I just need to make sure that if people sniff the network messages send to the server that they cannot easily send faked messages. They would need to provide the correct hash-signature with their message, which they shouldn't be able to do unless they know the hash function being used. Ofcourse if they reverse engineer the game they can still 'hack' it, but I wouldn't know how to counter that...
I have no access to existing hash functions in the unreal engine blueprint system.
The first thing I would try would be to simulate the behavior of unsigned integers using signed integers, by explicitly applying the modulo operator whenever the accumulated hash-value gets large enough that it might risk overflowing.
Example code in C (apologies for the poor hash function, but the same technique should be applicable to any hash function, at least in principle):
#include <stdio.h>
#include <string.h>
int hashFunction(const char * buf, int numBytes)
{
const int multiplier = 33;
const int maxAllowedValue = 2147483648-256; // assuming 32-bit ints here
const int maxPreMultValue = maxAllowedValue/multiplier;
int hash = 536870912; // arbitrary starting number
for (int i=0; i<numBytes; i++)
{
hash = hash % maxPreMultValue; // make sure hash cannot overflow in the next operation!
hash = (hash*multiplier)+buf[i];
}
return hash;
}
int main(int argc, char ** argv)
{
while(1)
{
printf("Enter a string to hash:\n");
char buf[1024]; fgets(buf, sizeof(buf), stdin);
printf("Hash code for that string is: %i\n", hashFunction(buf, strlen(buf)));
}
}

C programming on IAR- timestamp Conversion to readable format

I am using Z-stack-CC2530-2.5 for developing Zigbee-based application. I've come across a timestmap conversion problem.
I am using osal_ConvertUTCTime method to convert a uint32 timestamp value to timestampStruct as follows:
osal_ConvertUTCTime(& timestampStruct, timestamp);
The Struct is defined as follows:
typedef struct{
uint8 seconds;
uint8 min;
uint8 hour;
uint8 day;
uint8 month;
uint16 year;
} UTCTimeStruct
My Question:
How to convert the Struct's content to be written on the UART port in a human readable format ?
Example:
HalUARTWrite (Port0, timestampStruct, len) // Output: 22/1/2013 12:05:45
Thank you.
I do not have the prototype of the function HalUartWrite at the moment, but I googled it and someone used it as this:
HalUARTWrite(DEBUG_UART_PORT, "12345", 6);
so I guess the second argument must be a pointer to char. You can't just pass a struct UTCTimeStruct variable into the second argument. If you just need to output the raw data to the serial port. You need to cast the struct into char * in order to make the compiler happy. But generally, this is bad practice. This might not be a problem in your case as you work in a 8-bit processor that all the struct fields are either a char or a short. In general, if you cast a struct into a char * and print it out, due to struct padding, you get a lot of nonsense characters between your struct fields.
OK. A bit off topic. Back to your question, you need to convert the struct into a friendly string yourself. Because you know your output string is of format "22/1/2013 12:05:45" which has fixed length, you can simply declare a char[] of that length. And manually fill in the numbers by bit-manipulating the uint32 timestamp value. After that, you can pass the char[] into the second argument and the exact length into the third argument.

Using memcpy/memset

When using memset or memcpy within an Obj-C program, will the compiler optimise the setting (memset) or copying (memcpy) of data into 32-bit writes or will it do it byte by byte?
You can see the libc implementations of these methods in the Darwin source. In 10.6.3, memset works at the word level. I didn't check memcpy, but probably it's the same.
You are correct that it's possible for the compiler to do the work inline instead of calling these functions. I suppose I'll let someone who knows better answer what it will do, though I would not expect a problem.
Memset will come as part of your standard C library so it depends on the implementation you are using. I would guess most implementations will copy in blocks of the native CPU size (32/64 bits) and then the remainder byte-by-byte.
Here is glibc's version of memcpy for an example implementation:
void *
memcpy (dstpp, srcpp, len)
void *dstpp;
const void *srcpp;
size_t len;
{
unsigned long int dstp = (long int) dstpp;
unsigned long int srcp = (long int) srcpp;
/* Copy from the beginning to the end. */
/* If there not too few bytes to copy, use word copy. */
if (len >= OP_T_THRES)
{
/* Copy just a few bytes to make DSTP aligned. */
len -= (-dstp) % OPSIZ;
BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);
/* Copy whole pages from SRCP to DSTP by virtual address manipulation,
as much as possible. */
PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);
/* Copy from SRCP to DSTP taking advantage of the known alignment of
DSTP. Number of bytes remaining is put in the third argument,
i.e. in LEN. This number may vary from machine to machine. */
WORD_COPY_FWD (dstp, srcp, len, len);
/* Fall out and copy the tail. */
}
/* There are just a few bytes to copy. Use byte memory operations. */
BYTE_COPY_FWD (dstp, srcp, len);
return dstpp;
}
So you can see it copies a few bytes first to get aligned, then copies in words, then finally in bytes again. It does some optimized page copying using some kernel operations.