How should I compute an hashValue? - swift

I saw several examples of implementations of the Hashable variable hashValue, such as these:
extension Country: Hashable {
var hashValue: Int {
return name.hashValue ^ capital.hashValue ^ visited.hashValue
}
}
var hashValue: Int {
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
var hashValue : Int {
get {
return String(self.scalarArray.map { UnicodeScalar($0) }).hashValue
}
}
and so on.
In some cases OR, XOR or BITWISE operators are used. In other cases, other algorithms, map or filter functions etc.
Now I'm a bit confused. What is the rule of thumb to compute a good hashValue?
In the simplest case with two string variables, should I combine these two with an OR operator?

From Swift 4.2, you can use Hasher
Swift 4.2 implements hashing based on the SipHash family of
pseudorandom functions, specifically SipHash-1-3 and SipHash-2-4, with
1 or 2 rounds of hashing per message block and 3 or 4 rounds of
finalization, respectively.
Now if you want to customize how your type implements Hashable, you
can override the hash(into:) method instead of hashValue. The
hash(into:) method passes a Hasher object by reference, which you call
combine(_:) on to add the essential state information of your type.
// Swift >= 4.2
struct Color: Hashable {
let red: UInt8
let green: UInt8
let blue: UInt8
// Synthesized by compiler
func hash(into hasher: inout Hasher) {
hasher.combine(self.red)
hasher.combine(self.green)
hasher.combine(self.blue)
}
// Default implementation from protocol extension
var hashValue: Int {
var hasher = Hasher()
self.hash(into: &hasher)
return hasher.finalize()
}
}

Related

Cache results of Swift hash(into:) Hashable protocol requirement

I have a class being heavily used in Sets and Dictionaries.
For performance reasons this class implements Hashable in a old way and caches the computed hash:
let hashValue: Int
init(...) {
self.hashValue = ...
}
In Xcode 10.2 I see a warning, that hashValue is deprected and will stop being a protocol requirement soon.
What bothers me is a lack of ability to cache the computed hash anyhow, because hash(into:) does not return anything.
func hash(into hasher: inout Hasher) {
hasher.combine(...)
}
Consider the following example in a playground
class Class: Hashable {
let param: Int
init(param: Int) {
self.param = param
}
static func ==(lhs: Class, rhs: Class) -> Bool {
return lhs.param == rhs.param
}
public func hash(into hasher: inout Hasher) {
print("in hash")
hasher.combine(param)
}
}
var dict = [Class: Int]()
let instance = Class(param: 1)
dict[instance] = 1
dict[instance] = 2
You will see the following logs
in hash
in hash
in hash
I have no idea, why we see 3 calls instead of 2, but we do =).
So, every time you use a same instance as a dictionary key or add this instance into a set, you get a new hash(into:) call.
In my code such an overhead turned out to be very expensive. Does anyone know a workaround?
One option is to create your own Hasher, feed it the "essential components" of your instance and then call finalize() in order to get out an Int hash value, which can be cached.
For example:
class C : Hashable {
let param: Int
private lazy var cachedHashValue: Int = {
var hasher = Hasher()
hasher.combine(param)
// ... repeat for other "essential components"
return hasher.finalize()
}()
init(param: Int) {
self.param = param
}
static func ==(lhs: C, rhs: C) -> Bool {
return lhs.param == rhs.param
}
public func hash(into hasher: inout Hasher) {
hasher.combine(cachedHashValue)
}
}
A couple of things to note about this:
It relies on your "essential components" being immutable, otherwise a new hash value would need calculating upon mutation.
Hash values aren't guaranteed to remain stable across executions of the program, so don't serialise cachedHashValue.
Obviously in the case of storing a single Int this won't be all that effective, but for more expensive instances this could well help improve performance.

Extension for average to return Double from Numeric generic

Suppose I create a protocol and structure for a Column of homogeneously typed data:
protocol Columnizable {
associatedtype Item
var name: String { get }
var values: [Item] { get }
}
struct Column<T>: Columnizable {
var name: String
var values = [T]()
}
I would like to create a Protocol extension that allows Numeric to have an average function that could compute the average of values if the type conforms to Numeric protocol- for instance, namely Double and Int
extension Columnizable where Item: Numeric {
func average() -> Double {
return Double(sum()) / values.count
}
func sum() -> Item {
return values.reduce(0, +)
}
}
My attempt at the average function cannot be compiled because of:
Cannot invoke initializer for type 'Double' with an argument list of type '(Self.item)'
Attempts to cast to Double do not work. Any advice for best practices here would be appreciated.
I needed to use the BinaryInteger or BinaryFloatingPoint protocols since they can easily be transformed to a Double. As #rob napier called out, that a Complex type would not be Double convertible.
extension Columnizable where Item: BinaryInteger {
var average: Double {
return Double(total) / Double(values.count)
}
}
extension Columnizable where Item: BinaryFloatingPoint {
var average: Item {
return total / Item(values.count)
}
}
stackoverflow.com/a/28288619/2019221

Swift Increment through Enum

I love that Swift allows for the use of enum methods. I'm trying to work with a method but am looking for a more extensible method of doing this:
enum CopyState{
case binary, hex, both
init(){
self = .both
}
mutating func next() {
if self == .binary{
self = .hex
} else if self == .hex {
self = .both
} else if self == .both{
self = .binary
}
}
}
var state = CopyState()
state.next()
I'd like to essentially cast the enum to an integer and increment it modulo the number of total enum options
Adding or removing enum options is a hassle (I'm using a last() and a next() method).
Update Starting with Swift 4.2 you can make use of the newly added support CaseIterable protocol, which adds compiler support for generating a list of all cases for an enum. Though #ninestones's comment pointed put that we are not guaranteed for allCases to return the cases in the same order as defined, the synthesized implementation does this, and it's unlikely that definition will change.
Your enum could then look something like this (no more hardcoded start value):
enum CopyState: CaseIterable {
case binary, hex, both
mutating func next() {
let allCases = type(of: self).allCases
self = allCases[(allCases.index(of: self)! + 1) % allCases.count]
}
}
You can make this piece of functionality available to all CaseIterable enums:
extension CaseIterable where Self: Equatable {
mutating func next() {
let allCases = Self.allCases
// just a sanity check, as the possibility of a enum case to not be
// present in `allCases` is quite low
guard let selfIndex = allCases.index(of: self) else { return }
let nextIndex = Self.allCases.index(after: selfIndex)
self = allCases[nextIndex == allCases.endIndex ? allCases.startIndex : nextIndex]
}
}
enum CopyState: CaseIterable {
case binary, hex, both
}
var state = CopyState.hex
state.next()
print(state) // both
state.next()
print(state) // binary
Or, a little bit more verbose, but with a better separation of concerns:
extension Collection {
// adding support for computing indexes in a circular fashion
func circularIndex(after i: Index) -> Index {
let nextIndex = index(after: i)
return nextIndex == endIndex ? startIndex : nextIndex
}
}
extension Collection where Element: Equatable {
// adding support for retrieving the next element in a circular fashion
func circularElement(after element: Element) -> Element? {
return index(of: element).map { self[circularIndex(after: $0)] }
}
}
// Protocol to allow iterating in place (similar to a type conforming to both Sequence and IteratorProtocol)
protocol InPlaceIterable {
mutating func next()
}
extension InPlaceIterable where Self: CaseIterable, Self: Equatable {
// adding default implementation for enums
mutating func next() {
self = type(of: self).allCases.circularElement(after: self)!
}
}
// now the enums need only the protocol conformances, they get the
// functionalities for free
enum CopyState: CaseIterable, InPlaceIterable {
case binary, hex, both
}
You could use Int as raw value for your enum (note that this is also the default raw value if you don't specify it), and use it like this:
enum CopyState: Int {
case binary, hex, both
mutating func next(){
self = CopyState(rawValue: rawValue + 1) ?? .binary
}
}
var state = CopyState.hex
state.next()
print(state) // both
state.next()
print(state) // binary
This works fine as long as you have the raw values of the enum cases in consecutive order. By default the compiler assigns consecutive raw values.
You'd also need to keep in mind to update the next() method if the first case changes, otherwise it will no longer correctly work.
An alternative to the above limitation, suggested by #MartinR, is to force unwrap the raw value zero:
mutating func next(){
self = CopyState(rawValue: rawValue + 1) ?? CopyState(rawValue: 0)!
}
The above code won't require updating the method when the first enum case changes, however it has the potential of crashing the app if the starting raw value of the enum changes.
Swift doc says
When you’re working with enumerations that store integer or string raw
values, you don’t have to explicitly assign a raw value for each case.
When you don’t, Swift automatically assigns the values for you.
For example, when integers are used for raw values, the implicit value
for each case is one more than the previous case. If the first case
doesn’t have a value set, its value is 0.
So this is safe (Swift5)
enum CopyState: Int {
case binary, hex, both
mutating func next(){
self = CopyState(rawValue: rawValue + 1) ?? CopyState(rawValue: 0)!
}
}
If someone is interested in both previous() and next() cases and more generic cyclic offset function like advanced(by n: Int):
extension CaseIterable where Self: Equatable {
func previous() -> Self {
let all = Self.allCases
var idx = all.firstIndex(of: self)!
if idx == all.startIndex {
let lastIndex = all.index(all.endIndex, offsetBy: -1)
return all[lastIndex]
} else {
all.formIndex(&idx, offsetBy: -1)
return all[idx]
}
}
func next() -> Self {
let all = Self.allCases
let idx = all.firstIndex(of: self)!
let next = all.index(after: idx)
return all[next == all.endIndex ? all.startIndex : next]
}
func advanced(by n: Int) -> Self {
let all = Array(Self.allCases)
let idx = (all.firstIndex(of: self)! + n) % all.count
if idx >= 0 {
return all[idx]
} else {
return all[all.count + idx]
}
}
}
Usage:
let s = CopyState.hex.advanced(by: -4)
print(s) // binary
2020
Just FTR here's #Cristik awesome answer, latest syntax:
enum Fruits: CaseIterable {
case apple, banana, pitahaya, cherry
mutating func loopme() {
let a = type(of: self).allCases
self = a[(a.firstIndex(of: self)! + 1) % a.count]
}
}
I would not rely on CaseIterable here. CaseIterable only states to return a collection of all enum values and does not make any guarantee about the order.
The fact, that it does return the cases in the coded order is an implementation detail and may change w/o breaking Protocol conformance.
However, I would assign int values to the cases explicitly:
Your code states that you want an order.
You help your team mates to not just change the order of implicitly assigned cases and break your next() logic.
Closest is #Cristik 's Int backed enum.
enum CopyState: Int {
case binary = 0
case hex = 1
case both = 2
mutating func next() {
self = CopyState(rawValue: rawValue + 1) ?? .binary // or what ever should be the default
}
}
What you've got here is a "circular case sequence". You can make that! 😺
By doing so, you can start with just this, as you had…
enum CopyState {
case binary, hex, both
init() { self = .both }
}
…and be able to do stuff like this:
var state = CopyState()
state.next() // binary
state.offset(by: -2) // hex
CopyState.allCases.elementsEqual( CopyState().prefix(3) ) // true
In reverse order of how you'd write the required pieces:
Adopt the protocol.
extension CopyState: CircularCaseSequence { }
Declare the protocol.
public protocol CircularCaseSequence:
CaseIterable, Sequence, IteratorProtocol, Equatable
{ }
Note the CaseIterable there. Even if CopyState is RawRepresentable, as shown in other answers, it should still be CaseIterable.
Conform to Sequence.
public extension CircularCaseSequence {
mutating func next() -> Self? {
self = offset(by: 1)
return self
}
}
Allow cases to be offset.
public extension CaseIterable where Self: Equatable {
/// Another case from `allCases`.
///
/// Circularly wraps `offset` to always provide an element,
/// even when the resulting `index` is not valid.
func offset(by offset: Int) -> Self {
Self.allCases[self, moduloOffset: offset]!
}
}
Allow Collection elements to be offset.
public extension Collection where Element: Equatable {
/// Circularly wraps `index`, to always provide an element,
/// even when `index` is not valid.
subscript(
_ element: Element,
moduloOffset offset: Int
) -> Element? {
firstIndex(of: element).map {
self[modulo: index($0, offsetBy: offset)]
}
}
}
Allow Collection indices to be wrapped.
public extension Collection {
/// Circularly wraps `index`, to always provide an element,
/// even when `index` is not valid.
subscript(modulo index: Index) -> Element {
self[
self.index(
startIndex,
offsetBy:
distance(from: startIndex, to: index)
.modulo(count)
)
]
}
}
Define modulo.
public extension BinaryInteger {
func modulo(_ divisor: Self) -> Self {
let remainder = self % divisor
return
remainder >= 0
? remainder
: remainder + divisor
}
}
Something I sometimes do is create a simple dictionary like this:
let copyStateDictionary = [
CopyState.binary: CopyState.hex,
CopyState.hex: CopyState.both,
CopyState.both: CopyState.binary
]
Then you can "increment" your variable with:
state = copyStateDictionary[state]
There might be a programmatic way of generating this dictionary, rather than hard-coding it, but if it's just 3-4 values, hard-coding is OK.

Implementing a hash combiner in Swift

I'm extending a struct conform to Hashable. I'll use the DJB2 hash combiner to accomplish this.
To make it easy to write hash function for other things, I'd like to extend the Hashable protocol so that my hash function can be written like this:
extension MyStruct: Hashable {
public var hashValue: Int {
return property1.combineHash(with: property2).combineHash(with: property3)
}
}
But when I try to write the extension to Hashable that implements `combineHash(with:), like this:
extension Hashable {
func combineHash(with hashableOther:Hashable) -> Int {
let ownHash = self.hashValue
let otherHash = hashableOther.hashValue
return (ownHash << 5) &+ ownHash &+ otherHash
}
}
… then I get this compilation error:
/Users/benjohn/Code/Nice/nice/nice/CombineHash.swift:12:43: Protocol 'Hashable' can only be used as a generic constraint because it has Self or associated type requirements
Is this something that Swift won't let me do, or am I just doing it wrong and getting an unhelpful error message?
Aside A comment from JAL links to a code review of a swift hash function that is also written by Martin who provides the accepted answer below! He mentions a different hash combiner in that discussion, which is based on one in the c++ boost library. The discussion really is worth reading. The alternative combiner has fewer collisions (on the data tested).
Use the method hash(into:) from the Apple Developer Documentation:
https://developer.apple.com/documentation/swift/hashable
struct GridPoint {
var x: Int
var y: Int
}
extension GridPoint: Hashable {
static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
return lhs.x == rhs.x && lhs.y == rhs.y
}
func hash(into hasher: inout Hasher) {
hasher.combine(x)
hasher.combine(y)
}
}
You cannot define a parameter of type P if P
is a protocol which has Self or associated type requirements.
In this case it is the Equatable protocol from which Hashable
inherits, which has a Self requirement:
public static func ==(lhs: Self, rhs: Self) -> Bool
What you can do is to define a generic method instead:
extension Hashable {
func combineHash<T: Hashable>(with hashableOther: T) -> Int {
let ownHash = self.hashValue
let otherHash = hashableOther.hashValue
return (ownHash << 5) &+ ownHash &+ otherHash
}
}

Make struct Hashable?

I'm trying to create a dictionary of the sort [petInfo : UIImage]() but I'm getting the error Type 'petInfo' does not conform to protocol 'Hashable'. My petInfo struct is this:
struct petInfo {
var petName: String
var dbName: String
}
So I want to somehow make it hashable but none of its components are an integer which is what the var hashValue: Int requires. How can I make it conform to the protocol if none of its fields are integers? Can I use the dbName if I know it's going to be unique for all occurrences of this struct?
Simply return dbName.hashValue from your hashValue function. FYI - the hash value does not need to be unique. The requirement is that two objects that equate equal must also have the same hash value.
struct PetInfo: Hashable {
var petName: String
var dbName: String
var hashValue: Int {
return dbName.hashValue
}
static func == (lhs: PetInfo, rhs: PetInfo) -> Bool {
return lhs.dbName == rhs.dbName && lhs.petName == rhs.petName
}
}
As of Swift 5 var hashValue:Int has been deprecated in favour of func hash(into hasher: inout Hasher) (introduced in Swift 4.2), so to update the answer #rmaddy gave use:
func hash(into hasher: inout Hasher) {
hasher.combine(dbName)
}