Swift semantics regarding dictionary access - swift

I'm currently reading the excellent Advanced Swift book from objc.io, and I'm running into something that I don't understand.
If you run the following code in a playground, you will notice that when modifying a struct contained in a dictionary a copy is made by the subscript access, but then it appears that the original value in the dictionary is replaced by the copy. I don't understand why. What exactly is happening ?
Also, is there a way to avoid the copy ? According to the author of the book, there isn't, but I just want to be sure.
import Foundation
class Buffer {
let id = UUID()
var value = 0
func copy() -> Buffer {
let new = Buffer()
new.value = self.value
return new
}
}
struct COWStruct {
var buffer = Buffer()
init() { print("Creating \(buffer.id)") }
mutating func change() -> String {
if isKnownUniquelyReferenced(&buffer) {
buffer.value += 1
return "No copy \(buffer.id)"
} else {
let newBuffer = buffer.copy()
newBuffer.value += 1
buffer = newBuffer
return "Copy \(buffer.id)"
}
}
}
var array = [COWStruct()]
array[0].buffer.value
array[0].buffer.id
array[0].change()
array[0].buffer.value
array[0].buffer.id
var dict = ["key": COWStruct()]
dict["key"]?.buffer.value
dict["key"]?.buffer.id
dict["key"]?.change()
dict["key"]?.buffer.value
dict["key"]?.buffer.id
// If the above `change()` was made on a copy, why has the original value changed ?
// Did the copied & modified struct replace the original struct in the dictionary ?

dict["key"]?.change() // Copy
is semantically equivalent to:
if var value = dict["key"] {
value.change() // Copy
dict["key"] = value
}
The value is pulled out of the dictionary, unwrapped into a temporary, mutated, and then placed back into the dictionary.
Because there's now two references to the underlying buffer (one from our local temporary value, and one from the COWStruct instance in the dictionary itself) – we're forcing a copy of the underlying Buffer instance, as it's no longer uniquely referenced.
So, why doesn't
array[0].change() // No Copy
do the same thing? Surely the element should be pulled out of the array, mutated and then stuck back in, replacing the previous value?
The difference is that unlike Dictionary's subscript which comprises of a getter and setter, Array's subscript comprises of a getter and a special accessor called mutableAddressWithPinnedNativeOwner.
What this special accessor does is return a pointer to the element in the array's underlying buffer, along with an owner object to ensure that the buffer isn't deallocated from under the caller. Such an accessor is called an addressor, as it deals with addresses.
Therefore when you say:
array[0].change()
you're actually mutating the actual element in the array directly, rather than a temporary.
Such an addressor cannot be directly applied to Dictionary's subscript because it returns an Optional, and the underlying value isn't stored as an optional. So it currently has to be unwrapped with a temporary, as we cannot return a pointer to the value in storage.
In Swift 3, you can avoid copying your COWStruct's underlying Buffer by removing the value from the dictionary before mutating the temporary:
if var value = dict["key"] {
dict["key"] = nil
value.change() // No Copy
dict["key"] = value
}
As now only the temporary has a view onto the underlying Buffer instance.
And, as #dfri points out in the comments, this can be reduced down to:
if var value = dict.removeValue(forKey: "key") {
value.change() // No Copy
dict["key"] = value
}
saving on a hashing operation.
Additionally, for convenience, you may want to consider making this into an extension method:
extension Dictionary {
mutating func withValue<R>(
forKey key: Key, mutations: (inout Value) throws -> R
) rethrows -> R? {
guard var value = removeValue(forKey: key) else { return nil }
defer {
updateValue(value, forKey: key)
}
return try mutations(&value)
}
}
// ...
dict.withValue(forKey: "key") {
$0.change() // No copy
}
In Swift 4, you should be able to use the values property of Dictionary in order to perform a direct mutation of the value:
if let index = dict.index(forKey: "key") {
dict.values[index].change()
}
As the values property now returns a special Dictionary.Values mutable collection that has a subscript with an addressor (see SE-0154 for more info on this change).
However, currently (with the version of Swift 4 that ships with Xcode 9 beta 5), this still makes a copy. This is due to the fact that both the Dictionary and Dictionary.Values instances have a view onto the underlying buffer – as the values computed property is just implemented with a getter and setter that passes around a reference to the dictionary's buffer.
So when calling the addressor, a copy of the dictionary's buffer is triggered, therefore leading to two views onto COWStruct's Buffer instance, therefore triggering a copy of it upon change() being called.
I have filed a bug over this here. (Edit: This has now been fixed on master with the unofficial introduction of generalised accessors using coroutines, so will be fixed in Swift 5 – see below for more info).
In Swift 4.1, Dictionary's subscript(_:default:) now uses an addressor, so we can efficiently mutate values so long as we supply a default value to use in the mutation.
For example:
dict["key", default: COWStruct()].change() // No copy
The default: parameter uses #autoclosure such that the default value isn't evaluated if it isn't needed (such as in this case where we know there's a value for the key).
Swift 5 and beyond
With the unofficial introduction of generalised accessors in Swift 5, two new underscored accessors have been introduced, _read and _modify which use coroutines in order to yield a value back to the caller. For _modify, this can be an arbitrary mutable expression.
The use of coroutines is exciting because it means that a _modify accessor can now perform logic both before and after the mutation. This allows them to be much more efficient when it comes to copy-on-write types, as they can for example deinitialise the value in storage while yielding a temporary mutable copy of the value that's uniquely referenced to the caller (and then reinitialising the value in storage upon control returning to the callee).
The standard library has already updated many previously inefficient APIs to make use of the new _modify accessor – this includes Dictionary's subscript(_:) which can now yield a uniquely referenced value to the caller (using the deinitialisation trick I mentioned above).
The upshot of these changes means that:
dict["key"]?.change() // No copy
will be able to perform an mutation of the value without having to make a copy in Swift 5 (you can even try this out for yourself with a master snapshot).

Related

Does Swift know not to initialize an object if my set already contains it?

I have a global var notes: Set<Note> that contains notes initialized with downloaded data.
In the code below, does Swift know to skip the initialization of my Note object if notes already contains it?
for dictionary in downloadedNoteDictionaries {
let note = Note(dictionary: dictionary)
notes.insert(note)
}
I'm wondering because my app downloads dozens of notes per request and initializing a Note object seems rather computationally expensive.
If the answer to my question is no, then how could I improve my code's performance?
My Note class—which I just realized should probably be a struct instead—has the property let id: Int64 as its sole essential component, but apparently, you can't access an element of a set by its hash value? I don't want to use Set's instance method first(where:) because it has a complexity of O(n), and notes could contain millions of Note objects.
You cannot rely on Swift to eliminate the construction of a new Note in your code. Your Set needs to ask the Note for its hashValue, and may need to call == with your Note as an argument. Those computations require the Note object. Possibly if Swift can inline everything, it can notice that your hashValue and == depend only on the id property, but it is certainly not guaranteed to notice or to act on that information.
It sounds like you should be using an [Int64: Note] instead of a Set<Note>.
No, Swift will not avoid creating the new Note object. The problem here is trying to determine if an object already exists in a set. In order to check if an object already exists in the set, you must have some way to identify this object consistently and have that identification persist across future reads and writes to that set. Assuming we want to adopt Swift's hashing enhancements which deprecates the old methods for having to manually provide a hashValue for any object conforming to the Hashable, we should not use a Set as a solution to this problem. The reason is because Swift's new recommended hashing methods use a random seed to generate hashes for added security. Depending on hash values to identify an element in a set alone would therefore not be possible.
It seems that your method of identifying these objects are by an id. I would do as Rob suggests and use a dictionary where the keys are the id. This would help you answer existential questions in order avoid instantiating a Note object.
If you need the resulting dictionary as a Set, you can still create a new Set from this resulting dictionary sequence.
It is possible to pull out a particular element from a set as long as you know how to identify it, and the operations would be O(1). You would use these two methods:
https://developer.apple.com/documentation/swift/set/2996833-firstindex
https://developer.apple.com/documentation/swift/set/2830801-subscript
Here is an example that I was able to run in a playground. However, this assumes that you have a way to identify in your data / model the id of the Note object beforehand in order to avoid creating the Note object. In this case, I am assuming that a Note shares an id with a Dictionary it holds:
import UIKit
struct DictionaryThatNoteUses {
var id: String
init(id: String = UUID().uuidString) {
self.id = id
}
}
struct Note: Hashable {
let dictionary: DictionaryThatNoteUses
var id: String {
return dictionary.id
}
func hash(into hasher: inout Hasher) {
hasher.combine(id)
}
static func == (lhs: Note, rhs: Note) -> Bool {
return lhs.id == rhs.id
}
}
var downloadedNoteDictionaries: [DictionaryThatNoteUses] = [
DictionaryThatNoteUses(),
DictionaryThatNoteUses(),
DictionaryThatNoteUses()
]
var notesDictionary: [String: Note] = [:]
var notesSet: Set<Note> = []
// Add a dictionary with the same id as the first dictionary
downloadedNoteDictionaries.append(downloadedNoteDictionaries.first!) // so now there are four Dictionary objects in this array
func createDictionaryOfNotes() {
func avoidCreatingDuplicateNotesObject(with id: String) {
print("avoided creating duplicate notes object with id \(id)") // prints once because of the duplicated note at the end matching the first, so a new Note object is not instantiated
}
downloadedNoteDictionaries.forEach {
guard notesDictionary[$0.id] == nil else { return avoidCreatingDuplicateNotesObject(with: $0.id) }
let note = Note(dictionary: $0)
notesDictionary[note.id] = note
}
}
createDictionaryOfNotes()
// Obtain a set for set operations on those unique Note objects
notesSet = Set<Note>(notesDictionary.values)
print("number of items in dictionary = \(notesDictionary.count), number of items in set = \(notesSet.count)") // prints 3 and 3 because the 4th object was a duplicate
// Grabbing a specific element from the set
// Let's test with the second Note object from the notesDictionary
let secondNotesObjectFromDictionary = notesDictionary.values[notesDictionary.values.index(notesDictionary.values.startIndex, offsetBy: 1)]
let thirdNotesObjectFromDictionary = notesDictionary.values[notesDictionary.values.index(notesDictionary.values.startIndex, offsetBy: 2)]
if let secondNotesObjectIndexInSet = notesSet.firstIndex(of: secondNotesObjectFromDictionary) {
print("do the two objects match: \(notesSet[secondNotesObjectIndexInSet] == secondNotesObjectFromDictionary)") // prints true
print("does the third object from dictionary match the second object from the set: \(thirdNotesObjectFromDictionary == notesSet[secondNotesObjectIndexInSet])") // prints false
}

Swift - Detecting whether item was inserted into NSMutableSet

This is more for interest rather than a problem, but I have an NSMutableSet, retrieved from UserDefaults and my objective is to append an item to it and then write it back. I am using an NSMutableSet because I only want unique items to be inserted.
The type of object to be inserted is a custom class, I have overrode hashCode and isEqual.
var stopSet: NSMutableSet = []
if let ud = UserDefaults.standard.object(forKey: "favStops") as? Data {
stopSet = NSKeyedUnarchiver.unarchiveObject(with: ud) as! NSMutableSet
}
stopSet.add(self.theStop!)
let outData = NSKeyedArchiver.archivedData(withRootObject: stopSet)
UserDefaults.standard.set(outData, forKey: "favStops")
NSLog("Saved to UserDefaults")
I get the set, call mySet.add(obj) and then write the set back to UserDefaults. Everything seems to work fine and (as far as I can see) there don't appear to be duplicates.
However is it possible to tell whether a call to mySet.add(obj) actually caused an item to be written to the set. mySet.add(obj) doesn't have a return value and if you use Playgrounds (rather than a project) you get in the output on the right hand side an indication of whether the set was actually changed based on the method call.
I know sets are not meant to store duplicate objects so in theory I should just trust that, but I was just wondering if the set did return a response that you could access - as opposed to just getting the length before the insert and after if I really wanted to know!
Swift has its own native type, Set, so you should use it instead of NSMutableSet.
Set's insert method actually returns a Bool indicating whether the insertion succeeded or not, which you can see in the function signature:
mutating func insert(_ newMember: Element) -> (inserted: Bool, memberAfterInsert: Element)
The following test code showcases this behaviour:
var set = Set<Int>()
let (inserted, element) = set.insert(0)
let (again, newElement) = set.insert(0)
print(inserted,element) //true, 0
print(again,oldElement) //false,0
The second value of the tuple returns the newly inserted element in case the insertion succeeded and the oldElement otherwise. oldElement is not necessarily equal in every aspect to the element you tried to insert. (since for custom types you might define the isEqual method in a way that doesn't compare each property of the type).
You don't need to handle the return value of the insert function, there is no compiler warning if you just write insert like this:
set.insert(1)

Struct cannot have a stored property that references itself [duplicate]

Reference cycles in Swift occur when properties of reference types have strong ownership of each other (or with closures).
Is there, however, a possibility of having reference cycles with value types only?
I tried this in playground without succes (Error: Recursive value type 'A' is not allowed).
struct A {
var otherA: A? = nil
init() {
otherA = A()
}
}
A reference cycle (or retain cycle) is so named because it indicates a cycle in the object graph:
Each arrow indicates one object retaining another (a strong reference). Unless the cycle is broken, the memory for these objects will never be freed.
When capturing and storing value types (structs and enums), there is no such thing as a reference. Values are copied, rather than referenced, although values can hold references to objects.
In other words, values can have outgoing arrows in the object graph, but no incoming arrows. That means they can't participate in a cycle.
As the compiler told you, what you're trying to do is illegal. Exactly because this is a value type, there's no coherent, efficient way to implement what you're describing. If a type needs to refer to itself (e.g., it has a property that is of the same type as itself), use a class, not a struct.
Alternatively, you can use an enum, but only in a special, limited way: an enum case's associated value can be an instance of that enum, provided the case (or the entire enum) is marked indirect:
enum Node {
case None(Int)
indirect case left(Int, Node)
indirect case right(Int, Node)
indirect case both(Int, Node, Node)
}
Disclaimer: I'm making an (hopefully educated) guess about the inner workings of the Swift compiler here, so apply grains of salt.
Aside from value semantics, ask yourself: Why do we have structs? What is the advantage?
One advantage is that we can (read: want to) store them on the stack (resp. in an object frame), i.e. just like primitive values in other languages. In particular, we don't want to allocate dedicated space on the heap to point to. That makes accessing struct values more efficient: we (read: the compiler) always knows where exactly in memory it finds the value, relative to the current frame or object pointer.
In order for that to work out for the compiler, it needs to know how much space to reserve for a given struct value when determining the structure of the stack or object frame. As long as struct values are trees of fixed size (disregarding outgoing references to objects; they point to the heap are not of interest for us), that is fine: the compiler can just add up all the sizes it finds.
If you had a recursive struct, this fails: you can implement lists or binary trees in this way. The compiler can not figure out statically how to store such values in memory, so we have to forbid them.
Nota bene: The same reasoning explains why structs are pass-by-value: we need them to physically be at their new context.
Quick and easy hack workaround: just embed it in an array.
struct A {
var otherA: [A]? = nil
init() {
otherA = [A()]
}
}
You normally cannot have a reference cycle with value types simply because Swift normally doesn't allow references to value types. Everything is copied.
However, if you're curious, you actually can induce a value-type reference cycle by capturing self in a closure.
The following is an example. Note that the MyObject class is present merely to illustrate the leak.
class MyObject {
static var objCount = 0
init() {
MyObject.objCount += 1
print("Alloc \(MyObject.objCount)")
}
deinit {
print("Dealloc \(MyObject.objCount)")
MyObject.objCount -= 1
}
}
struct MyValueType {
var closure: (() -> ())?
var obj = MyObject()
init(leakMe: Bool) {
if leakMe {
closure = { print("\(self)") }
}
}
}
func test(leakMe leakMe: Bool) {
print("Creating value type. Leak:\(leakMe)")
let _ = MyValueType(leakMe: leakMe)
}
test(leakMe: true)
test(leakMe: false)
Output:
Creating value type. Leak:true
Alloc 1
Creating value type. Leak:false
Alloc 2
Dealloc 2
Is there, however, a possibility of having reference cycles with value types only?
Depends on what you mean with "value types only".
If you mean completely no reference including hidden ones inside, then the answer is NO. To make a reference cycle, you need at least one reference.
But in Swift, Array, String or some other types are value types, which may contain references inside their instances. If your "value types" includes such types, the answer is YES.

Reference cycles with value types?

Reference cycles in Swift occur when properties of reference types have strong ownership of each other (or with closures).
Is there, however, a possibility of having reference cycles with value types only?
I tried this in playground without succes (Error: Recursive value type 'A' is not allowed).
struct A {
var otherA: A? = nil
init() {
otherA = A()
}
}
A reference cycle (or retain cycle) is so named because it indicates a cycle in the object graph:
Each arrow indicates one object retaining another (a strong reference). Unless the cycle is broken, the memory for these objects will never be freed.
When capturing and storing value types (structs and enums), there is no such thing as a reference. Values are copied, rather than referenced, although values can hold references to objects.
In other words, values can have outgoing arrows in the object graph, but no incoming arrows. That means they can't participate in a cycle.
As the compiler told you, what you're trying to do is illegal. Exactly because this is a value type, there's no coherent, efficient way to implement what you're describing. If a type needs to refer to itself (e.g., it has a property that is of the same type as itself), use a class, not a struct.
Alternatively, you can use an enum, but only in a special, limited way: an enum case's associated value can be an instance of that enum, provided the case (or the entire enum) is marked indirect:
enum Node {
case None(Int)
indirect case left(Int, Node)
indirect case right(Int, Node)
indirect case both(Int, Node, Node)
}
Disclaimer: I'm making an (hopefully educated) guess about the inner workings of the Swift compiler here, so apply grains of salt.
Aside from value semantics, ask yourself: Why do we have structs? What is the advantage?
One advantage is that we can (read: want to) store them on the stack (resp. in an object frame), i.e. just like primitive values in other languages. In particular, we don't want to allocate dedicated space on the heap to point to. That makes accessing struct values more efficient: we (read: the compiler) always knows where exactly in memory it finds the value, relative to the current frame or object pointer.
In order for that to work out for the compiler, it needs to know how much space to reserve for a given struct value when determining the structure of the stack or object frame. As long as struct values are trees of fixed size (disregarding outgoing references to objects; they point to the heap are not of interest for us), that is fine: the compiler can just add up all the sizes it finds.
If you had a recursive struct, this fails: you can implement lists or binary trees in this way. The compiler can not figure out statically how to store such values in memory, so we have to forbid them.
Nota bene: The same reasoning explains why structs are pass-by-value: we need them to physically be at their new context.
Quick and easy hack workaround: just embed it in an array.
struct A {
var otherA: [A]? = nil
init() {
otherA = [A()]
}
}
You normally cannot have a reference cycle with value types simply because Swift normally doesn't allow references to value types. Everything is copied.
However, if you're curious, you actually can induce a value-type reference cycle by capturing self in a closure.
The following is an example. Note that the MyObject class is present merely to illustrate the leak.
class MyObject {
static var objCount = 0
init() {
MyObject.objCount += 1
print("Alloc \(MyObject.objCount)")
}
deinit {
print("Dealloc \(MyObject.objCount)")
MyObject.objCount -= 1
}
}
struct MyValueType {
var closure: (() -> ())?
var obj = MyObject()
init(leakMe: Bool) {
if leakMe {
closure = { print("\(self)") }
}
}
}
func test(leakMe leakMe: Bool) {
print("Creating value type. Leak:\(leakMe)")
let _ = MyValueType(leakMe: leakMe)
}
test(leakMe: true)
test(leakMe: false)
Output:
Creating value type. Leak:true
Alloc 1
Creating value type. Leak:false
Alloc 2
Dealloc 2
Is there, however, a possibility of having reference cycles with value types only?
Depends on what you mean with "value types only".
If you mean completely no reference including hidden ones inside, then the answer is NO. To make a reference cycle, you need at least one reference.
But in Swift, Array, String or some other types are value types, which may contain references inside their instances. If your "value types" includes such types, the answer is YES.

Implementing a multimap in Swift with Arrays and Dictionaries

I'm trying to implement a basic multimap in Swift. Here's a relevant (non-functioning) snippet:
class Multimap<K: Hashable, V> {
var _dict = Dictionary<K, V[]>()
func put(key: K, value: V) {
if let existingValues = self._dict[key] {
existingValues += value
} else {
self._dict[key] = [value]
}
}
}
However, I'm getting an error on the existingValues += value line:
Could not find an overload for '+=' that accepts the supplied arguments
This seems to imply that the value type T[] is defined as an immutable array, but I can't find any way to explicitly declare it as mutable. Is this possible in Swift?
The problem is that you are defining existingValues as a constant with let. However, I would suggest changing the method to be:
func put(key: K, value: V) {
var values = [value]
if let existingValues = self._dict[key] {
values.extend(existingValues)
}
self._dict[key] = values
}
}
I feel that the intent of this is clearer as it doesn't require modifying the local array and reassigning later.
if var existingValues = self._dict[key] { //var, not let
existingValues += value;
// should set again.
self._dict[key] = existingValues
} else {
self._dict[key] = [value]
}
Assignment and Copy Behavior for Arrays
The assignment and copy behavior for Swift’s Array type is more complex than for its Dictionary type. Array provides C-like performance when you work with an array’s contents and copies an array’s contents only when copying is necessary.
If you assign an Array instance to a constant or variable, or pass an Array instance as an argument to a function or method call, the contents of the array are not copied at the point that the assignment or call takes place. Instead, both arrays share the same sequence of element values. When you modify an element value through one array, the result is observable through the other.
For arrays, copying only takes place when you perform an action that has the potential to modify the length of the array. This includes appending, inserting, or removing items, or using a ranged subscript to replace a range of items in the array. If and when array copying does take place, the copy behavior for an array’s contents is the same as for a dictionary’s keys and values, as described in Assignment and Copy Behavior for Dictionaries.
See: https://itunes.apple.com/us/book/the-swift-programming-language/id881256329?mt=11
Buckets is a data structures library for swift. It provides a multimap and allows subscript notation.
One easy way to implement a multi-map is to use a list of pairs (key, value) sorted by key, using binary search to find ranges of entries. This works best when you need to get a bunch of data, all at once. It doesn't work so well when you are constantly deleting and inserting elements.
See std::lower_bound from C++ for a binary search implementation which can be easily written in swift.