How to generate hash number of a string in Go? - hash

For example:
hash("HelloWorld") = 1234567
Is there any built-in function could do this ?
Thanks.

The hash package is helpful for this. Note it's an abstraction over specific hash implementations. Some ready made are found in the package subdirectories.
Example:
package main
import (
"fmt"
"hash/fnv"
)
func hash(s string) uint32 {
h := fnv.New32a()
h.Write([]byte(s))
return h.Sum32()
}
func main() {
fmt.Println(hash("HelloWorld"))
fmt.Println(hash("HelloWorld."))
}
(Also here)
Output:
926844193
107706013

Here is a function you could use to generate a hash number:
// FNV32a hashes using fnv32a algorithm
func FNV32a(text string) uint32 {
algorithm := fnv.New32a()
algorithm.Write([]byte(text))
return algorithm.Sum32()
}
I put together a group of those utility hash functions here: https://github.com/shomali11/util
You will find FNV32, FNV32a, FNV64, FNV64a, MD5, SHA1, SHA256 and SHA512

Related

Swift - Create post- and prefix operator?

In math it is common to write the amount of a number x as |x|. I would like to adopt a similar method to my code. My try on this looks like this:
prefix operator |
postfix operator |
extension Int {
lazy var absolute = false
static prefix func | (right: Int) -> Int {
assert(right.absolute, "Missed closing absolute value bar.")
right.absolute = false
if right < 0 {
return -value
}
return value
}
static postfix func | (left: Int) -> Int {
assert(!left.absolute, "Missed opening absolute value bar.")
left.absolute = true
return left
}
}
(I think this code won't compile as you cannot add stored properties in extensions as far as I know. It is only there to demonstrate my attempt. I added this functionality to my custom types.)
Despite the fact that this feels like a rather bad solution to me, another problem with this code is, that it won't throw any error, if I forget the opening bar. The assert will only break the running program whenever I call another amount function after forgetting the opening bar in the previous amount function call.
Let me know if you have a better solution!
Thanks.
Let me first say that I don't think this is a good idea. It's much more trouble than it's worth. But here goes:
prefix operator |
postfix operator |
prefix func | <T: Comparable & SignedNumeric>(f: () -> T) -> T {
return f()
}
postfix func | <T: Comparable & SignedNumeric>(n: T) -> () -> T {
return { abs(n) }
}
|42| // returns 42
|(-42)| // returns 42
The idea is that the postfix operator returns a function that is then used as the argument to the prefix operator, which then returns the end result. I originally had it the other way around (the prefix operator returning the function), but the compiler did not like that – it seems the postfix operator has a higher precedence.
The advantage of returning a function is that |42 doesn't compile (because the argument types don't match) and while 42| compiles, you will get an error as soon as you use it in a computation because of a type mismatch.
If you use this with literals, you still have to parenthesize negative numbers because Swift can't parse two consecutive prefix operators. I also haven't tested this very much, there may be other edge cases where it doesn't compile.

Could not cast value of type 'Swift.Array<Swift.String>' to 'Swift.AnyHashable' [duplicate]

I am making a structure that acts like a String, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32. (See this question for more background.)
What I want to do
I want to be able to use my custom ScalarString struct as a key in a dictionary. For example:
var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value
// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...
// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
// do something with value
}
Problem
In order to do that, ScalarString needs to implement the Hashable Protocol. I thought I would be able to do something like this:
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
var hashValue : Int {
get {
return self.scalarArray.hashValue // error
}
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.hashValue == right.hashValue
}
but then I discovered that Swift arrays don't have a hashValue.
What I read
The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,
Object property (array is does not have hashValue)
ID property (not sure how this could be implemented well)
Formula (seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
ObjectIdentifier (I'm using a struct, not a class)
Inheriting from NSObject (I'm using a struct, not a class)
Here are some other things I read:
Implementing Swift's Hashable Protocol
Swift Comparison Protocols
Perfect hash function
Membership of custom objects in Swift Arrays and Dictionaries
How to implement Hashable for your custom class
Writing a good Hashable implementation in Swift
Question
Swift Strings have a hashValue property, so I know it is possible to do.
How would I create a hashValue for my custom structure?
Updates
Update 1: I would like to do something that does not involve converting to String and then using String's hashValue. My whole point for making my own structure was so that I could avoid doing lots of String conversions. String gets it's hashValue from somewhere. It seems like I could get it using the same method.
Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.
Java hashCode algorithm
C algorithms
hash function for string (SO question and answers in C)
Hashing tutorial (Virginia Tech Algorithm Visualization Research Group)
General Purpose Hash Function Algorithms
Update 3
I would prefer not to import any external frameworks unless that is the recommended way to go for these things.
I submitted a possible solution using the DJB Hash Function.
Update
Martin R writes:
As of Swift 4.1, the compiler can synthesize Equatable and Hashable
for types conformance automatically, if all members conform to
Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash
combiner is built-in into the Swift standard library (SE-0206).
Therefore there is no need anymore to define your own hashing
function, it suffices to declare the conformance:
struct ScalarString: Hashable, ... {
private var scalarArray: [UInt32] = []
// ... }
Thus, the answer below needs to be rewritten (yet again). Until that happens refer to Martin R's answer from the link above.
Old Answer:
This answer has been completely rewritten after submitting my original answer to code review.
How to implement to Hashable protocol
The Hashable protocol allows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to
Implement the Equatable protocol (Hashable inherits from Equatable)
Return a computed hashValue
These points follow from the axiom given in the documentation:
x == y implies x.hashValue == y.hashValue
where x and y are values of some Type.
Implement the Equatable protocol
In order to implement the Equatable protocol, you define how your type uses the == (equivalence) operator. In your example, equivalence can be determined like this:
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
The == function is global so it goes outside of your class or struct.
Return a computed hashValue
Your custom class or struct must also have a computed hashValue variable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the == function does that extra work. (Update: It looks like == may do all the work.)
There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.
var hashValue: Int {
return self.scalarArray.count
}
This would give a hash collision every time two arrays had the same number of elements but different values. NSArray apparently uses this approach.
DJB Hash Function
A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.
A Swift implementation provided by #MartinR follows:
var hashValue: Int {
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce. This is equivalent, I believe:
var hashValue: Int {
// DJB Hash Function
var hash = 5381
for(var i = 0; i < self.scalarArray.count; i++)
{
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
return hash
}
The &+ operator allows Int to overflow and start over again for long strings.
Big Picture
We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarString is the custom type from the question. This will be different for different people, of course.
// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
// required var for the Hashable protocol
var hashValue: Int {
// DJB hash function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
}
// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
Other helpful reading
Which hashing algorithm is best for uniqueness and speed?
Overflow Operators
Why are 5381 and 33 so important in the djb2 algorithm?
How are hash collisions handled?
Credits
A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.
Update
Swift is open source now so it is possible to see how hashValue is implemented for String from the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.
Edit (31 May '17): Please refer to the accepted answer. This answer is pretty much just a demonstration on how to use the CommonCrypto Framework
Okay, I got ahead and extended all arrays with the Hashable protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. You have to put
#import <CommonCrypto/CommonDigest.h>
into your bridging header for this to work. It's a shame that pointers have to be used though:
extension Array : Hashable, Equatable {
public var hashValue : Int {
var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
withUnsafeBufferPointer { ptr in
hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
}
}
return hash[0]
}
}
Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality
public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
return lhs.hashValue == rhs.hashValue
}
This is as good as it gets with CommonCrypto. It's ugly, but fast and not manypretty much no hash collisions for sure
Edit (15 July '15): I just made some speed tests:
Randomly filled Int arrays of size n took on average over 1000 runs
n -> time
1000 -> 0.000037 s
10000 -> 0.000379 s
100000 -> 0.003402 s
Whereas with the string hashing method:
n -> time
1000 -> 0.001359 s
10000 -> 0.011036 s
100000 -> 0.122177 s
So the SHA-256 way is about 33 times faster than the string way. I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now
It is not a very elegant solution but it works nicely:
"\(scalarArray)".hashValue
or
scalarArray.description.hashValue
Which just uses the textual representation as a hash source
One suggestion - since you are modeling a String, would it work to convert your [UInt32] array to a String and use the String's hashValue? Like this:
var hashValue : Int {
get {
return String(self.scalarArray.map { UnicodeScalar($0) }).hashValue
}
}
That could conveniently allow you to compare your custom struct against Strings as well, though whether or not that is a good idea depends on what you are trying to do...
Note also that, using this approach, instances of ScalarString would have the same hashValue if their String representations were canonically equivalent, which may or may not be what you desire.
So I suppose that if you want the hashValue to represent a unique String, my approach would be good. If you want the hashValue to represent a unique sequence of UInt32 values, #Kametrixom's answer is the way to go...

Can a macro generate additional data?

foo!(x, y, z);
// expands to
fn xx(self) -> T {..}
fn xy(self) -> T {..}
...
fn xxx(self) -> T {..}
fn xxy(self) -> T {..}
fn xyz(self) -> T {..}
fn xzx(self) -> T {..}
//and so on
...
Is it possible for macros to generate additional data? I would like to implement vector swizzling. There are many combinations for a Vector4. 4 + 2^2 + 3^3 + 4^4 = 291 combinations
I haven't done anything with macros besides simple substitution, so I am wondering if something like that could be expressed or do I need compiler plugins for that?
Rust supports 3 methods of code generation:
macros declared with macro!
procedural macros relying on plugins (unstable)
build.rs
The latter is a built-in build script specifically supporting code generation/3rd-party libraries build (such as C libraries).
In your case, you are specifically interesting in the Code Generation part, which is simple enough (quoting the docs):
// build.rs
use std::env;
use std::fs::File;
use std::io::Write;
use std::path::Path;
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
let dest_path = Path::new(&out_dir).join("hello.rs");
let mut f = File::create(&dest_path).unwrap();
f.write_all(b"
pub fn message() -> &'static str {
\"Hello, World!\"
}
").unwrap();
}
Given this, you can automatically generate any .rs file before the build starts without encountering the macro hygiene issue or having to rely on a nightly compiler.

How to dynamically switch between hash algorithms in golang?

I want to be able to switch between hash algorithms depending on caller input, for example, implement a function:
func GenericHash(dat []byte, hash unint) (string, error) { ... }
where hash is the algorithm type as specified by crypto.Hash.
I'm not sure how to write this function, in particular, where the import statements should go. If I include all the import statements for algorithms that I will use at the top, go complains that they're imported and not used. Is there anyway to import on demand?
What you need to do is import the packages for their side effects only (i.e. use the blank identifier when importing the packages). This means that the imported packages' init functions will be executed, but you will not be able to access any of their exported members directly.
Here is one way you could solve your problem:
import (
"errors"
"encoding/hex"
"crypto"
_ "crypto/md5"
_ "crypto/sha1"
// import more hash packages
)
func GenericHash(dat []byte, hash crypto.Hash) (string, error) {
if !hash.Available() {
return "", errors.New("hash unavailable")
}
h := hash.New()
return hex.EncodeToString(h.Sum(dat)), nil
}

Setting An Interface{} Parameter By Reference

I am having difficulty understanding how to set an interface value that has been passed as a pointer. I am trying to accomplish something along the lines of this:
import "fmt"
var Stuff map[string]interface{}
func main() {
var num int
Stuff["key"] = 9001
get("key", &num)
fmt.Println("num:", num)
}
func get(k string, v interface{}) {
*v = Stuff[k]
}
What would I have to do to make my program output be
num: 9001
Edit: is there a possible catch-all solution using reflect?
You can emulate the AppEngine datastore interface using reflect; usually I say minimize reflection, but you (and AppEngine and other ORMs) have no other great option here to present the interface you want. For something emulating Get you:
get a reflect.Value with ValueOf()
get the type of the thing you want to create
create it with reflect.Zero
optionally fill in some data with reflect.Field(), etc.
use reflect.Indirect() and Value.Set() to set the original through the pointer.
A trivial example that just zeroes a struct through a pointer is at http://play.golang.org/p/g7dNlrG_vr and copied here:
package main
import (
"fmt"
"reflect"
)
func main() {
i := 1
clear(&i)
fmt.Println(i)
}
func clear(dst interface{}) {
// ValueOf to enter reflect-land
dstPtrValue := reflect.ValueOf(dst)
// need the type to create a value
dstPtrType := dstPtrValue.Type()
// *T -> T, crashes if not a ptr
dstType := dstPtrType.Elem()
// the *dst in *dst = zero
dstValue := reflect.Indirect(dstPtrValue)
// the zero in *dst = zero
zeroValue := reflect.Zero(dstType)
// the = in *dst = 0
dstValue.Set(zeroValue)
}
For emulating GetMulti you need more steps to work with the slice. An example is at http://play.golang.org/p/G_6jit2t-2 and below:
package main
import (
"fmt"
"reflect"
)
func main() {
s := []int{}
getMultiZeroes(&s, 10)
fmt.Println(s)
}
func getMultiZeroes(slicePtrIface interface{}, howMany int) {
// enter `reflect`-land
slicePtrValue := reflect.ValueOf(slicePtrIface)
// get the type
slicePtrType := slicePtrValue.Type()
// navigate from `*[]T` to `T`
sliceElemType := slicePtrType.Elem().Elem() // crashes if input type not `*[]T`
// we'll need this to Append() to
sliceValue := reflect.Indirect(slicePtrValue)
// and this to Append()
sliceElemValue := reflect.Zero(sliceElemType)
// append requested number of zeroes
for i := 0; i < howMany; i++ {
// s := append(s, v)
sliceValue.Set(reflect.Append(sliceValue, sliceElemValue))
}
}
In live code (as opposed to testing like you're doing), it'd be faster to use a type switch (as Martin suggested) so that specialized native code runs for each type; that might also be handy if you have different behavior by type. An example for GetMulti is at http://play.golang.org/p/q-9WyUqv6P and below:
package main
import "fmt"
func main() {
s := []int{}
getZeroes(&s)
fmt.Println(s)
fails := []float32{}
getZeroes(&fails)
}
func getZeroes(slicePtrIface interface{}) {
switch sp := slicePtrIface.(type) {
case *[]int:
(*sp) = append((*sp), 0, 0)
case *[]string:
(*sp) = append((*sp), "", "")
default:
panic(fmt.Sprintf("getZeroes: passed type %T, which is not a pointer to a slice of a supported type", slicePtrIface))
}
}
You could even trivially combine the two; write custom code for common types and call the slow reflect-based version in the default case. Demo at http://play.golang.org/p/6qw52B7eC3 (not copying because it's a such a simple stitching together of the two above).
There happened to be another recent question on how to make a value to pass to GetMulti, rather than emulating the GetMulti itself, if that comes up.
More for general reference than to answer this:
"Go lacks pass by reference" is useful to know, but also needs some elaboration. Go has pointers, and other types like slices that contain pointers to data. The sense in which there isn't "pass by reference" is just that Go will never change a value argument (int, struct) into a pointer implicitly. C++ reference arguments do exactly that: C++ void f(i int&) { i++; } changes i in the caller without the caller explicitly passing in a pointer at the callsite. func (i int) { i++ } doesn't.
In Go, you can look at the types passed to a function call and tell what data it can change. With C++ reference arguments or some languages' "pass by reference" semantics, any call might change locals; you can't tell without looking up the declarations.
For purposes of avoiding unnecessary copying of data, there are already pointers in the implementations of slice, string, map, interface, and channel values. Of those types, pointers, slices, and maps will actually let you modify data through them. Also, like in C++, Go's this-like receiver parameter can be a pointer without an explicit & in the calling code. There's more about this in Russ Cox's godata post and this summary on when you need a pointer or not.
The Go Programming Language Specification
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
In Go everything is passed by value; nothing is passed by reference. Therefore, pass a pointer. For example,
package main
import "fmt"
var Stuff map[string]interface{}
func main() {
Stuff = make(map[string]interface{})
Stuff["key"] = 9001
var value interface{}
get("key", &value)
num := value.(int)
fmt.Println("num:", num)
}
func get(k string, v interface{}) {
*v.(*interface{}) = Stuff[k]
}
Output:
num: 9001
First: There is absolutely no concept of "pass by reference" in Go. There isn't. What you can do is pass around a pointer. This pointer value is passed by value as everything in Go is passed by value.
Second: Instead of passing in a pointer to an interface and modify the pointees value (doable but ugly) you could return the value (much nicer).
Third: It cannot (i.e. not without reflection or unsafe) be done without type assertions.
And you should never (in the sense of "no until you mastered Go and interfaces") use pointer to interface.
Fifth: If your solution requires interface{} you might be doing something wrong. Are you sure your entities are not describable by some (non empty) interface?
That said, something like that works.
func main() {
var num int
Stuff["key"] = 9001
num = get("key").(int)
}
func get(k string) interface{}{
return Stuff[k]
}
Martin Gallagher solution works perfectly, but as he said, you can't use generics in Golang, so code looks a bit ugly. I guess another solution is to use always interface{} as the type and then cast (or check the type) in your program. Something like this: http://play.golang.org/p/0o20jToXHV
http://play.golang.org/p/kx5HvEiOm9
Without generics you will have to implement the switch {} for each of your supported types.