Golang md5 Sum() function - hash

package main
import (
"crypto/md5"
"fmt"
)
func main() {
hash := md5.New()
b := []byte("test")
fmt.Printf("%x\n", hash.Sum(b))
hash.Write(b)
fmt.Printf("%x\n", hash.Sum(nil))
}
Output:
*md5.digest74657374d41d8cd98f00b204e9800998ecf8427e
098f6bcd4621d373cade4e832627b4f6
Could someone please explain to me why/how do I get different result for the two print ?

I'm building up on the already good answers. I'm not sure if Sum is actually the function you want. From the hash.Hash documentation:
// Sum appends the current hash to b and returns the resulting slice.
// It does not change the underlying hash state.
Sum(b []byte) []byte
This function has a dual use-case, which you seem to mix in an unfortunate way. The use-cases are:
Computing the hash of a single run
Chaining the output of several runs
In case you simply want to compute the hash of something, either use md5.Sum(data) or
digest := md5.New()
digest.Write(data)
hash := digest.Sum(nil)
This code will, according to the excerpt of the documentation above, append the checksum of data to nil, resulting in the checksum of data.
If you want to chain several blocks of hashes, the second use-case of hash.Sum, you can do it like this:
hashed := make([]byte, 0)
for hasData {
digest.Write(data)
hashed = digest.Sum(hashed)
}
This will append each iteration's hash to the already computed hashes. Probably not what you want.
So, now you should be able to see why your code is failing. If not, take this commented version of your code (On play):
hash := md5.New()
b := []byte("test")
fmt.Printf("%x\n", hash.Sum(b)) // gives 74657374<hash> (74657374 = "test")
fmt.Printf("%x\n", hash.Sum([]byte("AAA"))) // gives 414141<hash> (41 = 'A')
fmt.Printf("%x\n", hash.Sum(nil)) // gives <hash> as append(nil, hash) == hash
fmt.Printf("%x\n", hash.Sum(b)) // gives 74657374<hash> (74657374 = "test")
fmt.Printf("%x\n", hash.Sum([]byte("AAA"))) // gives 414141<hash> (41 = 'A')
hash.Write(b)
fmt.Printf("%x\n", hash.Sum(nil)) // gives a completely different hash since internal bytes changed due to Write()

You have 2 ways to actually get a md5.Sum of a byte slice :
func main() {
hash := md5.New()
b := []byte("test")
hash.Write(b)
fmt.Printf("way one : %x\n", hash.Sum(nil))
fmt.Printf("way two : %x\n", md5.Sum(b))
}
According to http://golang.org/src/pkg/crypto/md5/md5.go#L88, your hash.Sum(b) is like calling append(b, actual-hash-of-an-empty-md5-hash).
The definition of Sum :
func (d0 *digest) Sum(in []byte) []byte {
// Make a copy of d0 so that caller can keep writing and summing.
d := *d0
hash := d.checkSum()
return append(in, hash[:]...)
}
When you call Sum(nil) it returns d.checkSum() directly as a byte slice, however if you call Sum([]byte) it appends d.checkSum() to your input.

From the docs:
// Sum appends the current hash to b and returns the resulting slice.
// It does not change the underlying hash state.
Sum(b []byte) []byte
so "*74657374*d41d8cd98f00b204e9800998ecf8427e" is actually a hex representation of "test", plus the initial state of the hash.
fmt.Printf("%x", []byte{"test"})
will result in... "74657374"!
So basically hash.Sum(b) is not doing what you think it does. The second statement is the right hash.

I would like to tell you to the point:
why/how do I get different result for the two print ?
Ans:
hash := md5.New()
As you are creating a new instance of md5 hash once you call hash.Sum(b) it actually md5 hash for b as hash itself is empty, hence you got 74657374d41d8cd98f00b204e9800998ecf8427e as output.
Now in next statement hash.Write(b) you are writing b to the hash instance then calling hash.Sum(nil) it will calculate md5 for b that you just written and sum it to previous value i.e 74657374d41d8cd98f00b204e9800998ecf8427e
This is the reason you are getting these outputs.
For your reference look at the Sum API:
func (d0 *digest) Sum(in []byte) []byte {
85 // Make a copy of d0 so that caller can keep writing and summing.
86 d := *d0
87 hash := d.checkSum()
88 return append(in, hash[:]...)
89 }

Related

Twincat 3: How to convert 4 HEX array to Float?

we are receiving (via UDP datagram) a float value codified by 4 bytes hex array.
We need to convert from 4 hex bytes to a float.
udp_data[0] = 'BE';
udp_data[1] = '7A';
udp_data[2] = 'E0';
udp_data[3] = 'F4';
In the given example, the correct equivalence, after transformation, udp_data is equivalent to -0.24499:
What is the optimal conversion in Twincat 3 PLC? maybe some library? We need to perform 52 transformation at once of this type.
I attached an example with an example taken from an online calculator:
Thanks!!
You can use a UNION type, which will at the same address hold a byte array (like the one you get from your UDP communication) and the real var which you want to convert to.
When you change the byte array, the real automatically reflects it. The conversion works the other way around also, in fact.
TYPE U_Convert :
UNION
arrUDP_Data: ARRAY [0 .. 3] OF BYTE; // Array must start with LSB
rReal : REAL;
END_UNION
END_TYPE
In MAIN you can declare the following var.
VAR
uConvert: U_Convert;
fValue : REAL;
END_VAR
And in the body of MAIN, update the byte array to requested values.
// Here we update the byte array
uConvert.arrUDP_Data[0] := 16#F4; // LSB
uConvert.arrUDP_Data[1] := 16#E0;
uConvert.arrUDP_Data[2] := 16#7A;
uConvert.arrUDP_Data[3] := 16#BE; // MSB
// Here we 'use' the converted value
fValue := uConvert.rReal;
I assume you have an array of bytes.
Header (put this in own function block if you want):
PROGRAM MAIN
VAR
aByteArray : ARRAY[1..4] OF BYTE := [16#F4, 16#E0, 16#7A, 16#BE];
pt : POINTER TO REAL;
fRealValue : REAL;
END_VAR
Body:
pt := ADR(aByteArray);
fRealValue := pt^;
Will give you the desired result:

Modified FNV-1 hash algorithm in golang

Native library has FNV-1 hash algorithm https://golang.org/pkg/hash/fnv/ that returns uint64 value (range: 0 through 18446744073709551615).
I need to store this value in PostgreSQL bigserial, but it's range is 1 to 9223372036854775807.
It is possible to change hash size to eg. 56?http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
Can someone help to change native algorithm to produce 56 bit hashes?
https://golang.org/src/hash/fnv/fnv.go
Update
Did it myself using this doc http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold
package main
import (
"fmt"
"hash/fnv"
)
func main() {
const MASK uint64 = 1<<63 - 1
h := fnv.New64()
h.Write([]byte("1133"))
hash := h.Sum64()
fmt.Printf("%#x\n", MASK)
fmt.Println(hash)
hash = (hash >> 63) ^ (hash & MASK)
fmt.Println(hash)
}
http://play.golang.org/p/j7q3D73qqu
Is it correct?
Is it correct?
Yes, it's a correct XOR-folding to 63 bits. But there's a much easier way:
hash = hash % 9223372036854775808
The distribution of XOR-folding is dubious, probably proven somewhere but not immediately obvious. Modulo, however, is clearly a wrapping of the hash algo's distribution to a smaller codomain.

Go: How does this hash function range from 0-32 bits?

I'm trying to write my own hash function that uses a 30-bit hash.
Here is some code for a FNVa 32-bit hash.
func fnva32(data string) uint32 {
var hash uint32 = 2166136261
for _, c := range data {
hash ^= uint32(c)
hash *= 16777619
}
return hash
}
Now here is my code that converts lowercase letters a-z into a 30-bit hash:
func id(s string) uint {
var id uint
var power uint = 1
for _, c := range s {
id+=(uint(c)-96)*power
power*=26
}
return id%1073741824
}
That specifically limits my hash function to a maximum of 30-bit because I'm using a modulus against that number. But how is that FNVa32 hash limited to 32-bits? They are not using a modulus. How does it not generate a number larger than that?
Also you probably notice that I'm not using prime numbers. I tried some prime numbers but it increased the collisions. Currently I'm getting 291 collisions and FNVa32 is getting 76 collisions, from hashing 600,000 (real) words.
My question is... what is making FNVa32 limit to 32-bit, and how would I change it to be 30-bit instead?
The return type of the fnva32 function is uint32 so there is no way it could return an answer with more bits. Also, the calculation uses a uint32 variable internally.

Read mongodump output with go and mgo

I'm trying to read a collection dump generated by mongodump. The file is a few gigabytes so I want to read it incrementally.
I can read the first object with something like this:
buf := make([]byte, 100000)
f, _ := os.Open(path)
f.Read(buf)
var m bson.M
bson.Unmarshal(buf, &m)
However I don't know how much of the buf was consumed, so I don't know how to read the next one.
Is this possible with mgo?
Using mgo's bson.Unmarshal() alone is not enough -- that function is designed to take a []byte representing a single document, and unmarshal it into a value.
You will need a function that can read the next whole document from the dump file, then you can pass the result to bson.Unmarshal().
Comparing this to encoding/json or encoding/gob, it would be convenient if mgo.bson had a Reader type that consumed documents from an io.Reader.
Anyway, from the source for mongodump, it looks like the dump file is just a series of bson documents, with no file header/footer or explicit record separators.
BSONTool::processFile shows how mongorestore reads the dump file. Their code reads 4 bytes to determine the length of the document, then uses that size to read the rest of the document. Confirmed that the size prefix is part of the bson spec.
Here is a playground example that shows how this could be done in Go: read the length field, read the rest of the document, unmarshal, repeat.
The method File.Read returns the number of bytes read.
File.Read
Read reads up to len(b) bytes from the File. It returns the number of bytes read and an error, if any. EOF is signaled by a zero count with err set to io.EOF.
So you can get the number of bytes read by simply storing the return parameters of you read:
n, err := f.Read(buf)
I managed to solve it with the following code:
for len(buf) > 0 {
var r bson.Raw
var m userObject
bson.Unmarshal(buf, &r)
r.Unmarshal(&m)
fmt.Println(m)
buf = buf[len(r.Data):]
}
Niks Keets' answer did not work for me. Somehow len(r.Data) was always the whole buffer length. So I came out with this other code:
for len(buff) > 0 {
messageSize := binary.LittleEndian.Uint32(buff)
err = bson.Unmarshal(buff, &myObject)
if err != nil {
panic(err)
}
// Do your stuff
buff = buff[messageSize:]
}
Of course you have to handle truncated strucs at the end of the buffer. In my case I could load the whole file into memory.

Maple - Sequence element affectation

I am encountering a problem when manipulating sequence element in Maple. First of all, here is the code.
b[0] := t -> (1-t)^3;
b[1] := t -> 3*t*(1-t)^2;
b[2] := t -> 3*t^2*(1-t);
b[3] := t -> t^3;
P := seq([seq([j*(i+1), j*(i-1)], i = 1 .. 4)], j = 1 .. 3);
EvalGamma := proc (b, P, i, t)
local CP, res;
option trace;
CP := P[i];
res := CP[1]*b[0](t)+CP[2]*b[1](t)+CP[3]*b[2](t)+CP[4]*b[3](t);
RETURN res;
end proc;
The variable P is a sequence of sequence : P[i] is a sequence of four 2D points.
But the affectation CP := P[i]; doesn't do what I want : I don't know why but the result is not P[i] in the procedure.
And the weird thing is that, outside the procedure, the following lines work :
CP := P[1];
CP[1];
I would appreciate any suggestions. Thanks.
I'm assuming you call the procedure as
EvalGamma(b,P,i,t)
The problem you are having is that when P is inserted into the sequence of arguments, the nested sequence of arguments is "flattened" to produce the final argument list. An easy way to fix this is to place the sequence for P inside a list structure. So use
P := [seq([seq([j*(i+1), j*(i-1)], i = 1 .. 4)], j = 1 .. 3)];
Once you do that, I think everything will work as expected.
When you call EvalGamma you cannot pass that global P which is an expression sequence of (three) lists (or lists). If you try and do so then EvalGamma will receive 6 arguments instead of 4 as you intend, because each of the three lists (of lists) in expression sequence P gets interpreted as a separate argument of the call.
Instead, you could create P as a list, ie,
P := [seq([seq([j*(i+1), j*(i-1)], i = 1 .. 4)], j = 1 .. 3)];
or you could pass it like EavlGamma(b, [P], some_i, some_name). But you should only do one of those two choices.
Note that the return syntax should be either return res; or (deprecated) RETURN(res);.