Implementing a multimap in Swift with Arrays and Dictionaries

Implementing a multimap in Swift with Arrays and Dictionaries - swift

I'm trying to implement a basic multimap in Swift. Here's a relevant (non-functioning) snippet:
class Multimap<K: Hashable, V> {
var _dict = Dictionary<K, V[]>()
func put(key: K, value: V) {
if let existingValues = self._dict[key] {
existingValues += value
} else {
self._dict[key] = [value]
}
}
}
However, I'm getting an error on the existingValues += value line:
Could not find an overload for '+=' that accepts the supplied arguments
This seems to imply that the value type T[] is defined as an immutable array, but I can't find any way to explicitly declare it as mutable. Is this possible in Swift?

The problem is that you are defining existingValues as a constant with let. However, I would suggest changing the method to be:
func put(key: K, value: V) {
var values = [value]
if let existingValues = self._dict[key] {
values.extend(existingValues)
}
self._dict[key] = values
}
}
I feel that the intent of this is clearer as it doesn't require modifying the local array and reassigning later.

if var existingValues = self._dict[key] { //var, not let
existingValues += value;
// should set again.
self._dict[key] = existingValues
} else {
self._dict[key] = [value]
}
Assignment and Copy Behavior for Arrays
The assignment and copy behavior for Swift’s Array type is more complex than for its Dictionary type. Array provides C-like performance when you work with an array’s contents and copies an array’s contents only when copying is necessary.
If you assign an Array instance to a constant or variable, or pass an Array instance as an argument to a function or method call, the contents of the array are not copied at the point that the assignment or call takes place. Instead, both arrays share the same sequence of element values. When you modify an element value through one array, the result is observable through the other.
For arrays, copying only takes place when you perform an action that has the potential to modify the length of the array. This includes appending, inserting, or removing items, or using a ranged subscript to replace a range of items in the array. If and when array copying does take place, the copy behavior for an array’s contents is the same as for a dictionary’s keys and values, as described in Assignment and Copy Behavior for Dictionaries.
See: https://itunes.apple.com/us/book/the-swift-programming-language/id881256329?mt=11

Buckets is a data structures library for swift. It provides a multimap and allows subscript notation.

One easy way to implement a multi-map is to use a list of pairs (key, value) sorted by key, using binary search to find ranges of entries. This works best when you need to get a bunch of data, all at once. It doesn't work so well when you are constantly deleting and inserting elements.
See std::lower_bound from C++ for a binary search implementation which can be easily written in swift.

Related

Iterating with for .. in on a changing collection

I'm experimenting with iteration on an array using a for .. in .. loop. My question is related to the case where the collection is changed within the loop body.
It seems that the iteration is safe, even if the list shrinks in the meantime. The for iteration variables successively take the values of the (indexes and) elements that were in the array at the start of the loop, despite the changes made on the flow. Example:
var slist = [ "AA", "BC", "DE", "FG" ]
for (i, st) in slist.enumerated() { // for st in slist gives a similar result
print ("Index \(i): \(st)")
if st == "AA" { // at one iteration change completely the list
print (" --> check 0: \(slist[0]), and 2: \(slist[2])")
slist.append ("KLM")
slist.insert(st+"XX", at:0) // shift the elements in the array
slist[2]="bc" // replace some elements to come
print (" --> check again 0: \(slist[0]), and 2: \(slist[2])")
slist.remove(at:3)
slist.remove(at:3)
slist.remove(at:1) // makes list shorter
}
}
print (slist)
This works very well, the iteration being made on the values [ "AA", "BC", "DE", "FG" ] even if after the first iteration the array is completely changed to ["AAXX", "bc", "KLM"]
I wanted to know if I can safely rely on this behavior. Unfortunately, the language guide does not tell anything about iterating on a collection when the collection is modified. And the for .. in section doesn't address this question either. So:
Can I safely rely on a guarantee about this iteration behavior provided in the language specifications ?
Or am I simply lucky with the current version of Swift 5.4? In this case, is there any clue in the language specification that one cannot take it for granted? And is there a performance overhead for this iteration behavior (e.g. some copy) compared to indexed iteration?

The documentation for IteratorProtocol says "whenever you use a for-in loop with an array, set, or any other collection or sequence, you’re using that type’s iterator." So, we are guaranteed that a for in loop is going to be using .makeIterator() and .next() which is defined most generally on Sequence and IteratorProtocol respectively.
The documentation for Sequence says that "the Sequence protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration." As a consequence, this means that an iterator for a Sequence is not required to make a copy, and so I do not think that modifying a sequence while iterating over it is, in general, safe.
This same caveat does not occur in the documentation for Collection, but I also don't think there is any guarantee that the iterator makes a copy, and so I do not think that modifying a collection while iterating over it is, in general, safe.
But, most collection types in Swift are structs with value semantics or copy-on-write semantics. I'm not really sure where the documentation for this is, but this link does say that "in Swift, Array, String, and Dictionary are all value types... You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back." In particular, this means that for Array, .makeIterator() cannot hold a reference to your array because the iterator for Array does not have to "do anything special" to prevent other code (i.e. your code) from modifying the data it holds.
We can explore this in more detail. The Iterator type of Array is defined as type IndexingIterator<Array<Element>>. The documentation IndexingIterator says that it is the default implementation of the iterator for collections, so we can assume that most collections will use this. We can see in the source code for IndexingIterator that it holds a copy of its collection
#frozen
public struct IndexingIterator<Elements: Collection> {
#usableFromInline
internal let _elements: Elements
#usableFromInline
internal var _position: Elements.Index
#inlinable
#inline(__always)
/// Creates an iterator over the given collection.
public /// #testable
init(_elements: Elements) {
self._elements = _elements
self._position = _elements.startIndex
}
...
}
and that the default .makeIterator() simply creates this copy.
extension Collection where Iterator == IndexingIterator<Self> {
/// Returns an iterator over the elements of the collection.
#inlinable // trivial-implementation
#inline(__always)
public __consuming func makeIterator() -> IndexingIterator<Self> {
return IndexingIterator(_elements: self)
}
}
Although you might not want to trust this source code, the documentation for library evolution claims that "the #inlinable attribute is a promise from the library developer that the current definition of a function will remain correct when used with future versions of the library" and the #frozen also means that the members of IndexingIterator cannot change.
Altogether, this means that any collection type with value semantics and an IndexingIterator as its Iterator must make a copy when using using for in loops (at least until the next ABI break, which should be a long-way off). Even then, I don't think Apple is likely to change this behavior.
In Conclusion
I don't know of any place that it is explicitly spelled out in the docs "you can modify an array while you iterate over it, and the iteration will proceed as if you made a copy" but that's also the kind of language that probably shouldn't be written down as writing such code could definitely confuse a beginner.
However, there is enough documentation lying around which says that a for in loop just calls .makeIterator() and that for any collection with value semantics and the default iterator type (for example, Array), .makeIterator() makes a copy and so cannot be influenced by code inside the loop. Further, because Array and some other types like Set and Dictionary are copy-on-write, modifying these collections inside a loop will have a one-time copy penalty as the body of the loop will not have a unique reference to its storage (because the iterator will). This is the exact same penalty that modifying the collection outside the loop with have if you don’t have a unique reference to the storage.
Without these assumptions, you aren't guaranteed safety, but you might have it anyway in some circumstances.
Edit:
I just realized we can create some cases where this is unsafe for sequences.
import Foundation
/// This is clearly fine and works as expected.
print("Test normal")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x {
print(i)
}
}
/// This is also okay. Reassigning `x` does not mutate the reference that the iterator holds.
print("Test reassignment")
for _ in 0...10 {
var x: NSMutableArray = [0,1,2,3]
for i in x {
x = []
print(i)
}
}
/// This crashes. The iterator assumes that the last index it used is still valid, but after removing the objects, there are no valid indices.
print("Test removal")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x {
x.removeAllObjects()
print(i)
}
}
/// This also crashes. `.enumerated()` gets a reference to `x` which it expects will not be modified behind its back.
print("Test removal enumerated")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x.enumerated() {
x.removeAllObjects()
print(i)
}
}
The fact that this is an NSMutableArray is important because this type has reference semantics. Since NSMutableArray conforms to Sequence, we know that mutating a sequence while iterating over it is not safe, even when using .enumerated().

The slist.enumerate() create a new instance of EnumeratedSequence<[String]>
To create an instance of EnumeratedSequence, call enumerated() on a sequence or collection. The following example enumerates the elements of an array. reference
If you remove the .enumerate() produce the same result, any st has the old value. This occurs because the for-in loop generates a new instance of IndexingIterator<[String]>.
Whenever you use a for-in loop with an array, set, or any other collection or sequence, you’re using that type’s iterator. Swift uses a sequence’s or collection’s iterator internally to enable the for-in loop language construct. reference
About the questions:
You would be able to remove all the elements and still perform the loop safe because a new instance is generated to perform the interactions.
Swift uses the iterator internally to enable for-in then there's no overhead to compare. Logically that the larger the array the performance will be affected.

Swift semantics regarding dictionary access

I'm currently reading the excellent Advanced Swift book from objc.io, and I'm running into something that I don't understand.
If you run the following code in a playground, you will notice that when modifying a struct contained in a dictionary a copy is made by the subscript access, but then it appears that the original value in the dictionary is replaced by the copy. I don't understand why. What exactly is happening ?
Also, is there a way to avoid the copy ? According to the author of the book, there isn't, but I just want to be sure.
import Foundation
class Buffer {
let id = UUID()
var value = 0
func copy() -> Buffer {
let new = Buffer()
new.value = self.value
return new
}
}
struct COWStruct {
var buffer = Buffer()
init() { print("Creating \(buffer.id)") }
mutating func change() -> String {
if isKnownUniquelyReferenced(&buffer) {
buffer.value += 1
return "No copy \(buffer.id)"
} else {
let newBuffer = buffer.copy()
newBuffer.value += 1
buffer = newBuffer
return "Copy \(buffer.id)"
}
}
}
var array = [COWStruct()]
array[0].buffer.value
array[0].buffer.id
array[0].change()
array[0].buffer.value
array[0].buffer.id
var dict = ["key": COWStruct()]
dict["key"]?.buffer.value
dict["key"]?.buffer.id
dict["key"]?.change()
dict["key"]?.buffer.value
dict["key"]?.buffer.id
// If the above `change()` was made on a copy, why has the original value changed ?
// Did the copied & modified struct replace the original struct in the dictionary ?

dict["key"]?.change() // Copy
is semantically equivalent to:
if var value = dict["key"] {
value.change() // Copy
dict["key"] = value
}
The value is pulled out of the dictionary, unwrapped into a temporary, mutated, and then placed back into the dictionary.
Because there's now two references to the underlying buffer (one from our local temporary value, and one from the COWStruct instance in the dictionary itself) – we're forcing a copy of the underlying Buffer instance, as it's no longer uniquely referenced.
So, why doesn't
array[0].change() // No Copy
do the same thing? Surely the element should be pulled out of the array, mutated and then stuck back in, replacing the previous value?
The difference is that unlike Dictionary's subscript which comprises of a getter and setter, Array's subscript comprises of a getter and a special accessor called mutableAddressWithPinnedNativeOwner.
What this special accessor does is return a pointer to the element in the array's underlying buffer, along with an owner object to ensure that the buffer isn't deallocated from under the caller. Such an accessor is called an addressor, as it deals with addresses.
Therefore when you say:
array[0].change()
you're actually mutating the actual element in the array directly, rather than a temporary.
Such an addressor cannot be directly applied to Dictionary's subscript because it returns an Optional, and the underlying value isn't stored as an optional. So it currently has to be unwrapped with a temporary, as we cannot return a pointer to the value in storage.
In Swift 3, you can avoid copying your COWStruct's underlying Buffer by removing the value from the dictionary before mutating the temporary:
if var value = dict["key"] {
dict["key"] = nil
value.change() // No Copy
dict["key"] = value
}
As now only the temporary has a view onto the underlying Buffer instance.
And, as #dfri points out in the comments, this can be reduced down to:
if var value = dict.removeValue(forKey: "key") {
value.change() // No Copy
dict["key"] = value
}
saving on a hashing operation.
Additionally, for convenience, you may want to consider making this into an extension method:
extension Dictionary {
mutating func withValue<R>(
forKey key: Key, mutations: (inout Value) throws -> R
) rethrows -> R? {
guard var value = removeValue(forKey: key) else { return nil }
defer {
updateValue(value, forKey: key)
}
return try mutations(&value)
}
}
// ...
dict.withValue(forKey: "key") {
$0.change() // No copy
}
In Swift 4, you should be able to use the values property of Dictionary in order to perform a direct mutation of the value:
if let index = dict.index(forKey: "key") {
dict.values[index].change()
}
As the values property now returns a special Dictionary.Values mutable collection that has a subscript with an addressor (see SE-0154 for more info on this change).
However, currently (with the version of Swift 4 that ships with Xcode 9 beta 5), this still makes a copy. This is due to the fact that both the Dictionary and Dictionary.Values instances have a view onto the underlying buffer – as the values computed property is just implemented with a getter and setter that passes around a reference to the dictionary's buffer.
So when calling the addressor, a copy of the dictionary's buffer is triggered, therefore leading to two views onto COWStruct's Buffer instance, therefore triggering a copy of it upon change() being called.
I have filed a bug over this here. (Edit: This has now been fixed on master with the unofficial introduction of generalised accessors using coroutines, so will be fixed in Swift 5 – see below for more info).
In Swift 4.1, Dictionary's subscript(_:default:) now uses an addressor, so we can efficiently mutate values so long as we supply a default value to use in the mutation.
For example:
dict["key", default: COWStruct()].change() // No copy
The default: parameter uses #autoclosure such that the default value isn't evaluated if it isn't needed (such as in this case where we know there's a value for the key).
Swift 5 and beyond
With the unofficial introduction of generalised accessors in Swift 5, two new underscored accessors have been introduced, _read and _modify which use coroutines in order to yield a value back to the caller. For _modify, this can be an arbitrary mutable expression.
The use of coroutines is exciting because it means that a _modify accessor can now perform logic both before and after the mutation. This allows them to be much more efficient when it comes to copy-on-write types, as they can for example deinitialise the value in storage while yielding a temporary mutable copy of the value that's uniquely referenced to the caller (and then reinitialising the value in storage upon control returning to the callee).
The standard library has already updated many previously inefficient APIs to make use of the new _modify accessor – this includes Dictionary's subscript(_:) which can now yield a uniquely referenced value to the caller (using the deinitialisation trick I mentioned above).
The upshot of these changes means that:
dict["key"]?.change() // No copy
will be able to perform an mutation of the value without having to make a copy in Swift 5 (you can even try this out for yourself with a master snapshot).

How to deeply duplicate a multidimensional array in Swift

This question has been asked and answered for a couple other coding languages, but I think I may have a unique problem anyway. So, I want to duplicate a three dimensional array (filled with arbitrary objects). I believe I found that this:
var duplicateArray = originalArray
Does not work, since, for whatever reason, they thought it would a nice safety measure to have this create a duplicate array, but filled with pointers as sub-arrays instead of duplicating the sub-arrays as well. This seems like a strange design choice, since if duplicateArray and originalArray were one-dimensional, this would work as intended. Anyway, so I tried this (where object is some arbitrary object):
var duplicateArray = [[[object]]]()
for x in 0..<originalArray.count {
var tempArrYZ = [[object]]()
for y in 0..<originalArray[x].count {
var tempArrZ = [object]()
for z in 0..<originalArray[x][y].count {
let copiedObj = originalArray[x][y][z]
tempArrZ.append(copiedObj)
}
tempArrYZ.append(tempArrZ)
}
duplicateArray.append(tempArrYZ)
}
This still does not work; all the values in duplicateArray will act like a pointer for their values in originalArray. Perhaps someone has a simple way of deeply duplicating multidimensional arrays, or perhaps someone can find my error?
EDIT: How is this a duplicate of that other question? I'm asking specifically how to "deeply" duplicate. The question that's being referred to nebulously asked about duplicating arrays.

var duplicateArray = originalArray
Would work if the objects are not of reference type. However, for the reference type you need to actually create the copy of the object with copy. Your original code was pretty close.
var duplicateArray = [[[object]]]()
for x in 0..<originalArray.count {
var tempArrYZ = [[object]]()
for y in 0..<originalArray[x].count {
var tempArrZ = [object]()
for z in 0..<originalArray[x][y].count {
let copiedObj = originalArray[x][y][z].copy()
tempArrZ.append(copiedObj)
}
tempArrYZ.append(tempArrZ)
}
duplicateArray.append(tempArrYZ)
}

As already stated, your problem isn't really the copying of the array, it's the copying of Objects. Arrays, like all structs, are copied by value. Objects are copied by reference.
When you copy an array of objects, it's a brand new array with brand new references to the contained objects. Your code is simply creating additional references to the same objects then organizing them in a similar fashion.
Anyway, here's my simpler/functional implementation for copying arrays:
func copyArrayWithObjects <T: Copying>(items: [T]) -> [T]{
return items.map { $0.copy() }
}
func copy2DArrayWithObjects <T: Copying>(items: [[T]]) -> [[T]] {
return items.map(copyObjectsInArray)
}
func copy3DArrayWithObjects<T: Copying>(items: [[[T]]]) -> [[[T]]] {
return items.map(copy2DObjectInArray)
}
Then you can simply do this:
let copiedArray = copy3DArrayWithObjects(originalArray)
Theoretically I think it's possible to create a function to do this for an n-dimension array, but I haven't found a solution yet.

I think it would be best to write an extension on Array that adds conformance to NSCopying, which recursively copies the elements. This solution would be very elegant because it could scale to any number of dimmensions.

Swift arrays are value types so the snippet you provided is fine.
var duplicateArray = originalArray
See this example in a Playground as proof:
var array = [[["test"]]]
var newarray = array
// print different memory addresses
print(unsafeAddressOf(array[0][0][0])) // 0x00007ff7a302a760
print(unsafeAddressOf(newarray[0][0][0])) // 0x00007ff7a33000e0
If you use NSArray or reference types inside the Swift array, then they will no longer copy implicitly and will be treated with the same address - this can also be proved in the Playground. You would need to call copy() explicitly on reference types.

Sense behind the empty subscript?

Why does this even compile? What is the need for an empty subscript which obviously behaves like a function without parameters?
extension Array {
subscript() -> Int {
return 0
}
}
let array = [1,3,2]
print(array[]) // "0"
Note that it can also be used for an assignment, so it behaves like a computed property named [].

Why does this even compile
It compiles because you defined an empty-subscript extension to Array:
extension Array {
subscript() -> Int {
return 0
}
}
Array already has a subscript defined, whereby you supply an index number and get back the element at that index. This extension adds another subscript, whereby you supply nothing and get back the number zero.
Without that extension, this would not compile:
let array = [1,3,2]
print(array[])
What is the need for an empty subscript which obviously behaves like a function without parameters
There's no "need"; it's a convenience. You could, after all, make exactly the same "objection" to subscripts in general! They do nothing that you cannot accomplish by methods. In fact, such methods exist; the subscript notation is merely a pleasant piece of syntactic sugar.

swift function to iterate possibly reversed array

I'd like to create a function that will iterate over an array (or collection or sequence). Then I will call that function with an array, and the reversed version of the array (but efficiently: without creating a new array to hold the reverse).
If I do this:
func doIteration(points: [CGPoint]) {
for p in points {
doSomethingWithPoint(p)
}
// I also need random access to points
doSomethingElseWithPoint(points[points.count-2]) // ignore obvious index error
}
And if I have this:
let points : [CGPoint] = whatever
I can do this just fine:
doIteration(points)
But then if I do this:
doIteration(points.reverse())
I get 'Cannot convert value of type 'ReverseRandomAccessCollection<[CGPoint]> to expected argument type [_]'
Now, I DON'T want to do this:
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
even though it will work, because that will (correct me if I'm wrong) create a new array, initializing it from the ReverseRandomAccessCollection returned by reverse().
So I guess I'd like to write my doIteration function to take some sort of sequence type, so I can pass in the result of reverse() directly, but ReverseRandomAccessCollection doesn't conform to anything at all. I think I'm missing something - what's the accepted pattern here?

If you change your parameter's type to a generic, you should get the functionality you need:
func doIteration
<C: CollectionType where C.Index: RandomAccessIndexType, C.Generator.Element == CGPoint>
(points: C) {
for p in points {
doSomethingWithPoint(p)
}
doSomethingElseWithPoint(points[points.endIndex - 2])
}
More importantly, this won't cause a copy of the array to be made. If you look at the type generated by the reverse() method:
let points: [CGPoint] = []
let reversed = points.reverse() // ReverseRandomAccessCollection<Array<__C.CGPoint>>
doIteration(reversed)
You'll see that it just creates a struct that references the original array, in reverse. (although it does have value-type semantics) And the original function can accept this new collection, because of the correct generic constraints.

You can do this
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
or this
doIteration(points.reverse() as [CGPoint])
but I don't think there is any real difference by the point of view of a the footprint.
Scenario 1
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
Infact in this case a new Array containing references to the CGPoint(s) present in the original array is created. This thanks to the Copy-on-write mechanism that Swift used to manage structures.
So the memory allocated is the following:
points.count * sizeOf(pointer)
Scenario 2
On the other hand you can write something like this
doIteration(points.reverse() as [CGPoint])
But are you really saving memory? Let's see.
A temporary variable is created, that variable is available inside the scope of the function doIteration and requires exactly a pointer for each element contained in points so again we have:
points.count * sizeOf(pointer)
So I think you can safely choose one of the 2 solutions.
Considerations
We should remember that Swift manages structures in a very smart way.
When I write
var word = "Hello"
var anotherWord = word
On the first line Swift create a Struct and fill it with the value "Hello".
On the second line Swift detect that there is no real reason to create a copy of the original String so writes inside the anotherWord a reference to the original value.
Only when word or anotherWord is modified Swift really create a copy of the original value.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Implementing a multimap in Swift with Arrays and Dictionaries - swift

Buckets is a data structures library for swift. It provides a multimap and allows subscript notation.

Related

Iterating with for .. in on a changing collection

Swift semantics regarding dictionary access

How to deeply duplicate a multidimensional array in Swift

Sense behind the empty subscript?

swift function to iterate possibly reversed array

Categories

Resources