How safe are swift collections when used with invalidated iterators / indices?

How safe are swift collections when used with invalidated iterators / indices? - swift

I'm not seeing a lot of info in the swift stdlib reference. For example, Dictionary says certain methods (like remove) will invalidate indices, but that's it.
For a language to call itself "safe", it needs a solution to the classic C++ footguns:
get pointer to element in a vector, then add more elements (pointer is now invalidated), now use pointer, crash
start iterating through a collection. while iterating, remove some elements (either before or after the current iterator position). continue iterating, crash.
(edit: in c++, you're lucky to crash - worse case is memory corruption)
I believe 1 is solved by swift because if a collection stores classes, taking a reference (e.g. strong pointer) to an element will increase the refcount. However, I don't know the answer for 2.
It would be super useful if there was a comparison of footguns in c++ that are/are not solved by swift.
EDIT, due to Robs answer:
It does appear that there's some undocumented snapshot-like behavior going on
with Dictionary and/or for loops. The iteration creates a snapshot / hidden
copy of it when it starts.
Which gives me both a big "WAT" and "cool, that's sort of safe, I guess", and "how expensive is this copy?".
I don't see this documented either in Generator or in for-loop.
The below code prints two logical snapshots of the dictionary. The first
snapshot is userInfo as it was at the start of the iteration loop, and does
not reflect any modifications made during the loop.
var userInfo: [String: String] = [
"first_name" : "Andrei",
"last_name" : "Puni",
"job_title" : "Mad scientist"
]
userInfo["added_one"] = "1" // can modify because it's var
print("first snapshot:")
var hijacked = false
for (key, value) in userInfo {
if !hijacked {
userInfo["added_two"] = "2" // doesn't error
userInfo.removeValueForKey("first_name") // doesn't error
hijacked = true
}
print("- \(key): \(value)")
}
userInfo["added_three"] = "3" // modify again
print("final snapshot:")
for (key, value) in userInfo {
print("- \(key): \(value)")
}

As you say, #1 is not an issue. You do not have a pointer to the object in Swift. You either have its value or a reference to it. If you have its value, then it's a copy. If you have a reference, then it's protected. So there's no issue here.
But let's consider the second and experiment, be surprised, and then stop being surprised.
var xs = [1,2,3,4]
for x in xs { // (1)
if x == 2 {
xs.removeAll() // (2)
}
print(x) // Prints "1\n2\n3\n\4\n"
}
xs // [] (3)
Wait, how does it print all the values when we blow away the values at (2). We are very surprised now.
But we shouldn't be. Swift arrays are values. The xs at (1) is a value. Nothing can ever change it. It's not "a pointer to memory that includes an array structure that contains 4 elements." It's the value [1,2,3,4]. At (2), we don't "remove all elements from the thing xs pointed to." We take the thing xs is, create an array that results if you remove all the elements (that would be [] in all cases), and then assign that new array to xs. Nothing bad happens.
So what does the documentation mean by "invalidates all indices?" It means exactly that. If we generated indices, they're no good anymore. Let's see:
var xs = [1,2,3,4]
for i in xs.indices {
if i == 2 {
xs.removeAll()
}
print(xs[i]) // Prints "1\n2\n" and then CRASH!!!
}
Once xs.removeAll() is called, there's no promise that the old result of xs.indices means anything anymore. You are not permitted to use those indices safely against the collection they came from.
"Invalidates indices" in Swift is not the same as C++'s "invalidates iterators." I'd call that pretty safe, except the fact that using collection indices is always a bit dangerous and so you should avoid indexing collections when you can help it; iterate them instead. Even if you need the indexes for some reason, use enumerate to get them without creating any of the danger of indexing.
(Side note, dict["key"] is not indexing into dict. Dictionaries are a little confusing because their key is not their index. Accessing dictionaries by their DictionaryIndex index is just as dangerous as accessing arrays by their Int index.)
Note also that the above doesn't apply to NSArray. If you modify NSArray while iterating it, you'll get a "mutated collection while iterating" error. I'm only discussing Swift data types.
EDIT: for-in is very explicit in how it works:
The generate() method is called on the collection expression to obtain a value of a generator type—that is, a type that conforms to the GeneratorType protocol. The program begins executing a loop by calling the next() method on the stream. If the value returned is not None, it is assigned to the item pattern, the program executes the statements, and then continues execution at the beginning of the loop. Otherwise, the program does not perform assignment or execute the statements, and it is finished executing the for-in statement.
The returned Generator is a struct and contains a collection value. You would not expect any changes to some other value to modify its behavior. Remember: [1,2,3] is no different than 4. They're both values. When you assign them, they make copies. So when you create a Generator over a collection value, you're going to snapshot that value, just like if I created a Generator over the number 4. (This raises an interesting problem, because Generators aren't really values, and so really shouldn't be structs. They should be classes. Swift stdlib has been fixing that. See the new AnyGenerator for instance. But they still contain an array value, and you would never expect changes to some other array value to impact them.)
See also "Structures and Enumerations Are Value Types" which goes into more detail on the importance of value types in Swift. Arrays are just structs.
Yes, that means there's logically copying. Swift has many optimizations to minimize actual copying when it's not needed. In your case, when you mutate the dictionary while it's being iterated, that will force a copy to happen. Mutation is cheap if you're the only consumer of a particular value's backing storage. But it's O(n) if you're not. (This is determined by the Swift builtin isUniquelyReferenced().) Long story short: Swift Collections are Copy-on-Write, and simply passing an array does not cause real memory to be allocated or copied.
You don't get COW for free. Your own structs are not COW. It's something that Swift does in stdlib. (See Mike Ash's great discussion of how you would recreate it.) Passing your own custom structs causes real copies to happen. That said, the majority of the memory in most structs is stored in collections, and those collections are COW, so the cost of copying structs is usually pretty small.
The book doesn't spend a lot of time drilling into value types in Swift (it explains it all; it just doesn't keep saying "hey, and this is what that implies"). On the other hand, it was the constant topic at WWDC. You may be interested particularly in Building Better Apps with Value Types in Swift which is all about this topic. I believe Swift in Practice also discussed it.
EDIT2:
#KarlP raises an interesting point in the comments below, and it's worth addressing. None of the value-safety promises we're discussing are related to for-in. They're based on Array. for-in makes no promises at all about what would happen if you mutated a collection while it is being iterated. That wouldn't even be meaningful. for-in doesn't "iterate over collections," it calls next() on Generators. So if your Generator becomes undefined if the collection is changed, then for-in will blow up because the Generator blew up.
That means that the following might be unsafe, depending on how strictly you read the spec:
func nukeFromOrbit<C: RangeReplaceableCollectionType>(var xs: C) {
var hijack = true
for x in xs {
if hijack {
xs.removeAll()
hijack = false
}
print(x)
}
}
And the compiler won't help you here. It'll work fine for all of the Swift collections. But if calling next() after mutation for your collection is undefined behavior, then this is undefined behavior.
My opinion is that it would be poor Swift to make a collection that allows its Generator to become undefined in this case. You could even argue that you've broken the Generator spec if you do (it offers no UB "out" unless the generator has been copied or has returned nil). So you could argue that the above code is totally within spec and your generator is broken. Those arguments tend to be a bit messy with a "spec" like Swift's which doesn't dive into all the corner cases.
Does this mean you can write unsafe code in Swift without getting a clear warning? Absolutely. But in the many cases that commonly cause real-world bugs, Swift's built-in behavior does the right thing. And in that, it is safer than some other options.

Related

Swift Mutating Data at Range

I am building a Data object that looks something like this:
struct StructuredData {
var crc: UInt16
var someData: UInt32
var someMoreData: UInt64
// etc.
}
I'm running a CRC algorithm that will start at byte 2 and process length 12.
When the CRC is returned, it must exist at the beginning of the Data object. As I see it, my options are:
Generate a Data object that does not include the CRC, process it, and then build another Data object that does (so that the CRC value that I now have will be at the start of the Data object.
Generate the data object to include a zero'd out CRC to start with and then mutate the data at range [0..<2].
Obviously, 2 would be preferable, as it uses less memory and less processing, but this type of an optimization I'm not sure is necessary anymore. I'd still rather go with 2, except I do not know how to mutate the data at a given index range. Any help is greatly appreciated.

I do not recommend the way of mutating Data using this:
data.replaceSubrange(0..<2, with: UnsafeBufferPointer(start: &self.crc, count: 1))
Please try this:
data.replaceSubrange(0..<2, with: &self.crc, count: 2)
It is hard to explain why, but I'll try...
In Swift, inout parameters work in copy-in-copy-out semantics. When you write something like this:
aMethod(&param)
Swift allocates some region of enough size to hold the content of param,
copies param into the region, (copy-in)
calls the method with passing the address of the region,
and when returned from the call, copies the content of the region back to param (copy-out).
In many cases, Swift optimizes (which may happen even in -Onone setting) the steps by just passing the actual address of param, but it is not clearly documented.
So, when an inout parameter is passed to the initializer of UnsafeBufferPointer, the address received by UnsafeBufferPointer may be pointing to a temporal region, which will be released immediately after the initializer is finished.
Thus, replaceSubrange(_:with:) may copy the bytes in the already released region into the Data.
I believe the first code would work in this case as crc is a property of a struct, but if there is a simple and safe alternative, you should better avoid the unsafe way.
ADDITION for the comment of Brandon Mantzey's own answer.
data.append(UnsafeBufferPointer(start: &self.crcOfRecordData, count: 1))
Using safe in the meaning above. This is not safe, for the same reason described above.
I would write it as:
data.append(Data(bytes: &self.crcOfRecordData, count: MemoryLayout<UInt16>.size))
(Assuming the type of crcOfRecordData as UInt16.)
If you do not prefer creating an extra Data instance, you can write it as:
withUnsafeBytes(of: &self.crcOfRecordData) {urbp in
data.append(urbp.baseAddress!.assumingMemoryBound(to: UInt8.self), count: MemoryLayout<UInt16>.size)
}
This is not referred in the comment, but in the meaning of above safe, the following line is not safe.
let uint32Data = Data(buffer: UnsafeBufferPointer(start: &self.someData, count: 1))
All the same reason.
I would write it as:
let uint32Data = Data(bytes: &self.someData, count: MemoryLayout<UInt32>.size)
Though, observable unexpected behavior may happen in some very limited condition and with very little probability.
Such behavior would happen only when the following two conditions are met:
Swift compiler generates non-Optimized copy-in-copy-out code
Between the very narrow period, since the temporal region is released till the append method (or Data.init) finishes copying the whole content, the region is modified for another use.
The condition #1 becomes true only limited cases in the current implementation of Swift.
The condition #2 happens very rarely only in the multi-threaded environment. (Though, Apple's framework uses many hidden threads as you can find in the debugger of Xcode.)
In fact, I have not seen any questions regarding the unsafe cases above, my safe may be sort of overkill.
But alternative safe codes are not so complex, are they?
In my opinion, you should better be accustomed to use all-cases-safe code.

I figured it out. I actually had a syntax error that was boggling me because I hadn't seen that before.
Here's the answer:
data.replaceSubrange(0..<2, with: UnsafeBufferPointer(start: &self.crc, count: 1))

Ambiguous use of 'lazy'

I have no idea why this example is ambiguous. (My apologies for not adding the code here, it's simply too long.)
I have added prefix (_ maxLength) as an overload to LazyDropWhileBidirectionalCollection. subscript(position) is defined on LazyPrefixCollection. Yet, the following code from the above example shouldn't be ambiguous, yet it is:
print([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2)[0]) // Ambiguous use of 'lazy'
It is my understanding that an overload that's higher up in the protocol hierarchy will get used.
According to the compiler it can't choose between two types; namely LazyRandomAccessCollection and LazySequence. (Which doesn't make sense since subscript(position) is not a method of LazySequence.) LazyRandomAccessCollection would be the logical choice here.
If I remove the subscript, it works:
print(Array([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2))) // [0, 1]
What could be the issue?

The trail here is just too complicated and ambiguous. You can see this by dropping elements. In particular, drop the last subscript:
let z = [0, 1, 2].lazy.drop(while: {_ in false}).prefix(2)
In this configuration, the compiler wants to type z as LazyPrefixCollection<LazyDropWhileBidirectionalCollection<[Int]>>. But that isn't indexable by integers. I know it feels like it should be, but it isn't provable by the current compiler. (see below) So your [0] fails. And backtracking isn't powerful enough to get back out of this crazy maze. There are just too many overloads with different return types, and the compiler doesn't know which one you want.
But this particular case is trivially fixed:
print([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2).first!)
That said, I would absolutely avoid pushing the compiler this hard. This is all too clever for Swift today. In particular overloads that return different types are very often a bad idea in Swift. When they're simple, yes, you can get away with it. But when you start layering them on, the compiler doesn't have a strong enough proof engine to resolve it. (That said, if we studied this long enough, I'm betting it actually is ambiguous somehow, but the diagnostic is misleading. That's a very common situation when you get into overly-clever Swift.)
Now that you describe it (in the comments), the reasoning is straightforward.
LazyDropWhileCollection can't have an integer index. Index subscripting is required to be O(1). That's the meaning of the Index subscript versus other subscripts. (The Index subscript must also return the Element type or crash; it can't return an Element?. That's way there's a DictionaryIndex that's separate from Key.)
Since the collection is lazy and has an arbitrary number of missing elements, looking up any particular integer "count" (first, second, etc.) is O(n). It's not possible to know what the 100th element is without walking through at least 100 elements. To be a collection, its O(1) index has to be in a form that can only be created by having previously walked the sequence. It can't be Int.
This is important because when you write code like:
for i in 1...1000 { print(xs[i]) }
you expect that to be on the order of 1000 "steps," but if this collection had an integer index, it would be on the order of 1 million steps. By wrapping the index, they prevent you from writing that code in the first place.
This is especially important in highly generic languages like Swift where layers of general-purpose algorithms can easily cascade an unexpected O(n) operation into completely unworkable performance (by "unworkable" I mean things that you expected to take milliseconds taking minutes or more).

Change the last row to this:
let x = [0, 1, 2]
let lazyX: LazySequence = x.lazy
let lazyX2: LazyRandomAccessCollection = x.lazy
let lazyX3: LazyBidirectionalCollection = x.lazy
let lazyX4: LazyCollection = x.lazy
print(lazyX.drop(while: {_ in false}).prefix(2)[0])
You can notice that the array has 4 different lazy conformations - you will have to be explicit.

Swift pointer from ArraySlice

I'm trying to determine the "Swift-y" way of creating my own contiguous memory containers (in my particular case, I'm building n-dimensional arrays). I want my containers to be as close to Swift's builtin Array as possible - in terms of functionality and usability.
I need to access the pointer to memory of my containers for stuff like Accelerate and BLAS operations.
I want to know whether an ArraySlice's pointer would point to the first element of the slice, or the first element of its base.
When I tried to test UnsafePointer<Int>(array) == UnsafePointer<Int>(array[1...2]) it looks like Swift doesn't allow pointer construction from ArraySlices (or I just did it incorrectly).
I'm looking for advice on which way would be the most "Swift-y"?
I understand that when slicing an array the follow is true:
let array = [1, 2, 3]
array[1] == array[1...2][1]
and
array[1...2][0] != 2 # index out of bounds error
In other words, indexing is always performed relative to the base Array.
Which suggests: that we should return a pointer to the base's first element. Because slices are relative to their base.
However, iteration through a slice (obviously) only considers elements of that slice:
for i in array[1..2] # i takes on 2 followed by 3
Which suggests: that we should return a pointer to the slice's first element. Because slices have their own starting point.
If my user wanted to operate on a slice in a BLAS operation it would be intuitive to expect:
mmul(matrix1[1...2, 0...1].pointer, matrix2[4...5, 0...1].pointer)
to point to the first elements of slice, but I don't know if this is the way a Swift ArraySlice would do things.
My Question: Should a container slice object's pointer point to the first element of the slice, or, the first element of the base container.

This operation is unsafe:
UnsafePointer<Int>(array)
What you mean is:
array.withUnsafeBufferPointer { ... }
This applies to your types as well, and is the pattern you should employ to interoperate with BLAS and Accelerate. You should not try to use a pointer method IMO.
There is no promise that array will continue to exist by the time you actually access the pointer, even if that happens in the same line of code. ARC is free to destroy that memory shockingly quickly.
UnsafeBufferPointer is actually a very nice type in that it is already promised to be contiguous and it behaves as a Collection.
My suggestion here would be to manage your own memory internally, probably with a ManagedBuffer, but maybe just with a UnsafeMutablePointer that you alloc and destroy yourself. It's very important that you manage the layout of the data so that it's compatible with Accelerate. You don't want Array<Array<UInt8>>. That's going to add too much structure. You want a blob of bytes that you index into in the good-ol' C ways (row*width+column, etc). You probably don't want your slices to return pointers at all directly. Your mmul function is likely going to need special logic to understand how to pull the pieces it needs out of slices with minimal copying so that it works with vDSP_mmul. "Generic" and "Accelerate" seldom go together.
For example, considering this:
mmul(matrix1[1...2, 0...1].pointer, matrix2[4...5, 0...1].pointer)
(Obviously I assume your real matrices are dramatically larger; this kind of matrix doesn't make much sense to send to vDSP.)
You're going to have to write your own mmul here obviously since this memory isn't laid out correctly. So you might as well pass the slices. Then you'd do something like (totally untested, uncompiled, and I'm sure the syntax is wildly wrong):
mmul(m1: MatrixSlice, m2: MatrixSlice) -> Matrix {
var s1 = UnsafeMutablePointer<Float>.alloc(m1.rows * m1.columns)
// use vDSP_vgathr to copy each sliced row out of m1 into s1
var s2 = UnsafeMutablePointer<Float>.alloc(m2.rows * m2.columns)
// use vDSP_vgathr to copy each sliced row out of m2 into s2
var result = UnsafeMutablePointer<Float>.alloc(m1.rows * m2.columns)
vDSP_mmul(s1, 1, s2, 1, result, 1, m1.rows, m2.columns, m1.columns)
s1.destroy()
s2.destroy()
// This will need to call result.move() or moveInitializeFrom or something
return Matrix(result)
}
I'm just throwing out stuff here, but this is probably the kind of structure you'd want.
To your underlying question about whether the pointer to the container or to the data is usually passed by Swift, the answer is unfortunately "magic" for Array and no one else. Passing an Array to something that wants a pointer will magically (by the compiler, not the stdlib) pass a pointer to the storage of the Array. No other type gets this magic. Not even ContiguousArray gets this magic. If you pass a ContiguousArray to something that wants a pointer, you'll pass the pointer to the container (and if it's mutable, corrupt the container; true story, hated that one…)

Thanks in part to #RobNapier the answer to my question is: ArraySlice's pointer should point to the slice's first element.
The way I verified this was simply:
var array = [5,4,3,325,67,7,3]
array.withUnsafeBufferPointer{ $0 } != array[3...6].withUnsafeBufferPointer{ $0 } # true
^--- points to 5's address ^--- points to 325's address

How to cast [Int8] to [UInt8] in Swift

I have a buffer that contains just characters
let buffer: [Int8] = ....
Then I need to pass this to a function process that takes [UInt8] as an argument.
func process(buffer: [UInt8]) {
// some code
}
What would be the best way to pass the [Int8] buffer to cast to [Int8]? I know following code would work, but in this case the buffer contains just bunch of characters, and it is unnecessary to use functions like map.
process(buffer.map{ x in UInt8(x) }) // OK
process([UInt8](buffer)) // error
process(buffer as! [UInt8]) // error
I am using Xcode7 b3 Swift2.

I broadly agree with the other answers that you should just stick with map, however, if your array were truly huge, and it really was painful to create a whole second buffer just for converting to the same bit pattern, you could do it like this:
// first, change your process logic to be generic on any kind of container
func process<C: CollectionType where C.Generator.Element == UInt8>(chars: C) {
// just to prove it's working...
print(String(chars.map { UnicodeScalar($0) }))
}
// sample input
let a: [Int8] = [104, 101, 108, 108, 111] // ascii "Hello"
// access the underlying raw buffer as a pointer
a.withUnsafeBufferPointer { buf -> Void in
process(
UnsafeBufferPointer(
// cast the underlying pointer to the type you want
start: UnsafePointer(buf.baseAddress),
count: buf.count))
}
// this prints [h, e, l, l, o]
Note withUnsafeBufferPointer means what it says. It’s unsafe and you can corrupt memory if you get this wrong (be especially careful with the count). It works based on your external knowledge that, for example, if any of the integers are negative then your code doesn't mind them becoming corrupt unsigned integers. You might know that, but the Swift type system can't, so it won't allow it without resort to the unsafe types.
That said, the above code is correct and within the rules and these techniques are justifiable if you need the performance edge. You almost certainly won’t unless you’re dealing with gigantic amounts of data or writing a library that you will call a gazillion times.
It’s also worth noting that there are circumstances where an array is not actually backed by a contiguous buffer (for example if it were cast from an NSArray) in which case calling .withUnsafeBufferPointer will first copy all the elements into a contiguous array. Also, Swift arrays are growable so this copy of underlying elements happens often as the array grows. If performance is absolutely critical, you could consider allocating your own memory using UnsafeMutablePointer and using it fixed-size style using UnsafeBufferPointer.
For a humorous but definitely not within the rules example that you shouldn’t actually use, this will also work:
process(unsafeBitCast(a, [UInt8].self))
It's also worth noting that these solutions are not the same as a.map { UInt8($0) } since the latter will trap at runtime if you pass it a negative integer. If this is a possibility you may need to filter them first.

IMO, the best way to do this would be to stick to the same base type throughout the whole application to avoid the whole need to do casts/coercions. That is, either use Int8 everywhere, or UInt8, but not both.
If you have no choice, e.g. if you use two separate frameworks over which you have no control, and one of them uses Int8 while another uses UInt8, then you should use map if you really want to use Swift. The latter 2 lines from your examples (process([UInt8](buffer)) and
process(buffer as! [UInt8])) look more like C approach to the problem, that is, we don't care that this area in memory is an array on singed integers we will now treat it as if it is unsigneds. Which basically throws whole Swift idea of strong types to the window.
What I would probably try to do is to use lazy sequences. E.g. check if it possible to feed process() method with something like:
let convertedBuffer = lazy(buffer).map() {
UInt8($0)
}
process(convertedBuffer)
This would at least save you from extra memory overhead (as otherwise you would have to keep 2 arrays), and possibly save you some performance (thanks to laziness).

You cannot cast arrays in Swift. It looks like you can, but what's really happening is that you are casting all the elements, one by one. Therefore, you can use cast notation with an array only if the elements can be cast.
Well, you cannot cast between numeric types in Swift. You have to coerce, which is a completely different thing - that is, you must make a new object of a different numeric type, based on the original object. The only way to use an Int8 where a UInt8 is expected is to coerce it: UInt8(x).
So what is true for one Int8 is true for an entire array of Int8. You cannot cast from an array of Int8 to an array of UInt8, any more than you could cast one of them. The only way to end up with an array of UInt8 is to coerce all the elements. That is exactly what your map call does. That is the way to do it; saying it is "unnecessary" is meaningless.

Can a Swift subscript return a variable?

In C#, the only indexer that actually returns a variable1,2 are array indexers.
void Make42(ref int x) {x=42;}
void SetArray(int[] array){
Make42(ref array[0]);} //works as intended; array[0] becomes 42
void SetList(List<int> list){
Make42(ref list[0]);} //does not work as intended, list[0] stays 0
In the example above, array[0] is a variable, but list[0] is not. This is the culprit behind many developers, writing high-performance libraries, being forced to write their own List implementations (that, unlike the built-in List, expose the underlying array) to get benchmark worthy performance out of the language.
In Swift, ref is called inout and indexer seems to be called subscript. I'm wondering if there's any mechanisms in Swift's subscript to explicitly return a variable rather than a value (a value can be a value-type or reference-type).

If I may bring in C++ parlance, you'd be looking to return a reference to a value. I'm using the term here because it's generally better understood by the programming crowd.
The reason C# limits it to just arrays is that returning arbitrary references may compromise the language's safety guarantees if not done properly.
There appears to be no syntax to return a reference in Swift. Since returning references is hard to verify, and since Swift is rather new and since it aims at being a safe language, it is understandable that Apple didn't go this way.
If you get to a level where you need this kind of performance, you may want to consider C++, in which you can sacrifice almost all the safety you want to get performance.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse