Ambiguous use of 'lazy' - swift

I have no idea why this example is ambiguous. (My apologies for not adding the code here, it's simply too long.)
I have added prefix (_ maxLength) as an overload to LazyDropWhileBidirectionalCollection. subscript(position) is defined on LazyPrefixCollection. Yet, the following code from the above example shouldn't be ambiguous, yet it is:
print([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2)[0]) // Ambiguous use of 'lazy'
It is my understanding that an overload that's higher up in the protocol hierarchy will get used.
According to the compiler it can't choose between two types; namely LazyRandomAccessCollection and LazySequence. (Which doesn't make sense since subscript(position) is not a method of LazySequence.) LazyRandomAccessCollection would be the logical choice here.
If I remove the subscript, it works:
print(Array([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2))) // [0, 1]
What could be the issue?

The trail here is just too complicated and ambiguous. You can see this by dropping elements. In particular, drop the last subscript:
let z = [0, 1, 2].lazy.drop(while: {_ in false}).prefix(2)
In this configuration, the compiler wants to type z as LazyPrefixCollection<LazyDropWhileBidirectionalCollection<[Int]>>. But that isn't indexable by integers. I know it feels like it should be, but it isn't provable by the current compiler. (see below) So your [0] fails. And backtracking isn't powerful enough to get back out of this crazy maze. There are just too many overloads with different return types, and the compiler doesn't know which one you want.
But this particular case is trivially fixed:
print([0, 1, 2].lazy.drop(while: {_ in false}).prefix(2).first!)
That said, I would absolutely avoid pushing the compiler this hard. This is all too clever for Swift today. In particular overloads that return different types are very often a bad idea in Swift. When they're simple, yes, you can get away with it. But when you start layering them on, the compiler doesn't have a strong enough proof engine to resolve it. (That said, if we studied this long enough, I'm betting it actually is ambiguous somehow, but the diagnostic is misleading. That's a very common situation when you get into overly-clever Swift.)
Now that you describe it (in the comments), the reasoning is straightforward.
LazyDropWhileCollection can't have an integer index. Index subscripting is required to be O(1). That's the meaning of the Index subscript versus other subscripts. (The Index subscript must also return the Element type or crash; it can't return an Element?. That's way there's a DictionaryIndex that's separate from Key.)
Since the collection is lazy and has an arbitrary number of missing elements, looking up any particular integer "count" (first, second, etc.) is O(n). It's not possible to know what the 100th element is without walking through at least 100 elements. To be a collection, its O(1) index has to be in a form that can only be created by having previously walked the sequence. It can't be Int.
This is important because when you write code like:
for i in 1...1000 { print(xs[i]) }
you expect that to be on the order of 1000 "steps," but if this collection had an integer index, it would be on the order of 1 million steps. By wrapping the index, they prevent you from writing that code in the first place.
This is especially important in highly generic languages like Swift where layers of general-purpose algorithms can easily cascade an unexpected O(n) operation into completely unworkable performance (by "unworkable" I mean things that you expected to take milliseconds taking minutes or more).

Change the last row to this:
let x = [0, 1, 2]
let lazyX: LazySequence = x.lazy
let lazyX2: LazyRandomAccessCollection = x.lazy
let lazyX3: LazyBidirectionalCollection = x.lazy
let lazyX4: LazyCollection = x.lazy
print(lazyX.drop(while: {_ in false}).prefix(2)[0])
You can notice that the array has 4 different lazy conformations - you will have to be explicit.

Related

How to define `last` iterator without collecting/allocating?

Using the example from the Julia Docs, we can define an iterator like the following:
struct Squares
count::Int
end
Base.iterate(S::Squares, state=1) = state > S.count ? nothing : (state*state, state+1)
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count
But even though there's a length defined, asking for last(Squares(5)) results in an error:
julia> last(Squares(5))
ERROR: MethodError: no method matching lastindex(::Squares)
Since length is defined, is there a way to iterate through and return the last value without doing an allocating collect? If so, would it be bad to extend the Base.last method for my type?
As you can read in the docstring of last:
Get the last element of an ordered collection, if it can be computed in O(1) time. This is accomplished by calling lastindex to get the last index.
The crucial part is O(1) computation time. In your example the cost of computing last element is O(count) (of course if we want to use the definition of the iterator as in general it would be possible compute it in O(1) time).
The idea is to avoid defining last for collections for which it is expensive to compute it. For this reason the default definition of last is:
last(a) = a[end]
which requires not only lastindex but also getindex defined for the passed value (as the assumption is that if someone defines lastindex and getindex for some type then these operations can be performed fast).
If you look at Interfaces section of the Julia manual you will notice that the iteration interface (something that your example implements) is less demanding than indexing interface (something that is defined for your example in the next section of the manual). Usually the distinction is made that indexing interface is only added for collections that can be indexed efficiently.
If you still want last to work on your type you can either:
add a definition to Base.last specifically - there is nothing wrong with doing this;
add a definition of getindex, firstindex, and lastindex to make the collection indexable (and then the default definition of last would work) - this is the approach presented in the Julia manual

How exactly does a variadic parameter work?

I'm a Swift newbie and am having a bit of trouble understanding what a variadic parameter is exactly, and why it's useful. I'm currently following along with the online Swift 5.3 guide, and this is the example that was given for this type of parameter.
func arithmeticMean(_ numbers: Double...) -> Double {
var total: Double = 0
for number in numbers {
total += number
}
return total / Double(numbers.count)
}
arithmeticMean(1, 2, 3, 4, 5)
// returns 3.0, which is the arithmetic mean of these five numbers
arithmeticMean(3, 8.25, 18.75)
// returns 10.0, which is the arithmetic mean of these three numbers
Apparently, the variadic parameter called numbers has a type of Double..., which allows it to be used in the body of the function as a constant array. Why does the function return Double(numbers.count) instead of just numbers.count? And instead of creating a variadic parameter, why not just create a parameter that takes in an array that's outside of the function like this?
func addition(numbers : [Int]) -> Int
{
var total : Int = 0
for number in numbers
{
total += number
}
return total
}
let totalBruhs : [Int] = [4, 5, 6, 7, 8, 69]
addition(numbers: totalBruhs)
Also, why can there only be one variadic parameter per function?
Variadic parameters need (well, not need, but nice) to exist in Swift because they exist in C, and many things in Swift bridge to C. In C, creating a quick array of arbitrary length is not so simple as in Swift.
If you were building Swift from scratch with no backwards compatibility to C, then maybe they'd have been added, and maybe not. (Though I'm betting yes, just because so many Swift developers are used to languages where they exist. But then again, languages like Zig have intentionally gotten rid of variadic parameters, so I don't know. Zig also demonstrates that you don't need variadic parameters to bridge to C, but still, it's kind of nice. And #Rob's comments below are worth reading. He's probably not wrong. Also, his answer is insightful.)
But they're also convenient because you don't need to add the [...], which makes it much nicer when there's just one value. In particular, consider something like print:
func print(_ items: Any..., separator: String = " ", terminator: String = "\n")
Without variadic parameters, you'd need to put [...] in every print call, or you'd need overloads. Variadic doesn't change the world here, but it's kind of nice. It's particularly nice when you think about the ambiguities an overload would create. Say you didn't have variadics and instead had two overloads:
func print(_ items: [Any]) { ... }
func print(_ item: Any) { print([item]) }
That's actually a bit ambiguous, since Array is also a kind of Any. So print([1,2,3]) would print [[1,2,3]]. I'm sure there's some possible work-arounds, but variadics fix that up very nicely.
There can be only one because otherwise there are ambiguous cases.
func f(_ xs: Int..., _ ys: Int...)
What should f(1,2,3) do in this case? What is xs and what is ys?
The function you've shown here doesn't return Double(numbers.count). It converts numbers.count to a Double so it can be divided into another Double. The function returns total / Double(numbers.count).
And instead of creating a variadic parameter, why not just create a parameter that takes in an array that's outside of the function ... ?
I agree with you that it feels intuitive to use arrays for arithmetic functions like “mean”, “sum”, etc.
That having been said, there are situations where the variadic pattern feels quite natural:
There are scenarios where you are writing a function where using an array might not be logical or intuitive at the calling point.
Consider a max function that is supposed to be returning the larger of two values. It doesn’t feel quite right to impose a constraint that the caller must create an array of these values in order to return the larger of two values. You really want to allow a nice, simple syntax:
let result = max(a, b)
But at the same time, as an API developer, there’s also no reason to restrict the max implementation to only allow two parameters. Maybe the caller might want to use three. Or more. As an API developer, we design API’s for naturally calling points for the primary use cases, but provide as much flexibility as we can. So a variadic function parameter is both very natural and very flexible.
There are lots of possible example of this pattern, namely any function that naturally feels like it should take two parameters, but might take more. Consider a union function for two rectangles and you want the bounding rectangle. Again, you don’t want the caller to have to create an array for what might be a simple union of two rectangles.
Another common example would be where you might have a variable number of parameters but might not be dealing with arrays. The classic example would be printf pattern. Or another is where you are interacting with some SQL database and might be binding values to ? placeholders in the SQL or the like (to protect against SQL injection attacks):
let sql = "SELECT book_id, isbn FROM books WHERE title = ? AND author = ?"
let resultSet = db.query(sql, title, author)
Again, in these cases, suggesting that the caller must create an array for this heterogenous collection of values might not feel natural at the calling point.
So, the question isn’t “why would I use variadic parameter where arrays are logical and intuitive?” but rather “why would I force the use of array parameters where it might not be?”

Runtime complexity of getting the length of a string in different representations

In Swift 2, given a string s, what's the runtime complexity of these statements:
s.characters.count
s.endIndex
s.utf8.count
s.utf16.count
s.unicodeScalars.count
Also, if I know a string contains alphabets only, what's the most efficient way of getting the n-th character?
OK, I'm trying to answer my own question. Feel free to correct me if I'm wrong.
s.characters.count // O(N)
s.endIndex // O(1)
s.utf8.count // O(N)
s.utf16.count // O(N)
s.unicodeScalars.count // O(N)
Apple's documentation on CollectionType.count says "Complexity: O(1) if Index conforms to RandomAccessIndexType; O(N) otherwise." Since none of the Index of CharacterView, UnicodeScalarView, UTF16View or UTF8View conforms to RandomAccessIndexType, accessing their counts are all O(N).
If you don't have access to the source code for those expressions (you don't, unless you work for Apple), and the documentation doesn't mention the complexity (it doesn't, I've checked), it might be worth actually benchmarking the operations with strings of size 1, 10, 100, 1000 and so on.
The resulting data, while not definitive, would at least give you an indication of the time complexity for each.
In terms of getting the correct character at a given index of a string, that's already covered here. Whatever method Apple will have chosen for indexing a string is going to be as fast as they can make it (they're not in the business of preferring slow code over fast, all other things being equal).

How safe are swift collections when used with invalidated iterators / indices?

I'm not seeing a lot of info in the swift stdlib reference. For example, Dictionary says certain methods (like remove) will invalidate indices, but that's it.
For a language to call itself "safe", it needs a solution to the classic C++ footguns:
get pointer to element in a vector, then add more elements (pointer is now invalidated), now use pointer, crash
start iterating through a collection. while iterating, remove some elements (either before or after the current iterator position). continue iterating, crash.
(edit: in c++, you're lucky to crash - worse case is memory corruption)
I believe 1 is solved by swift because if a collection stores classes, taking a reference (e.g. strong pointer) to an element will increase the refcount. However, I don't know the answer for 2.
It would be super useful if there was a comparison of footguns in c++ that are/are not solved by swift.
EDIT, due to Robs answer:
It does appear that there's some undocumented snapshot-like behavior going on
with Dictionary and/or for loops. The iteration creates a snapshot / hidden
copy of it when it starts.
Which gives me both a big "WAT" and "cool, that's sort of safe, I guess", and "how expensive is this copy?".
I don't see this documented either in Generator or in for-loop.
The below code prints two logical snapshots of the dictionary. The first
snapshot is userInfo as it was at the start of the iteration loop, and does
not reflect any modifications made during the loop.
var userInfo: [String: String] = [
"first_name" : "Andrei",
"last_name" : "Puni",
"job_title" : "Mad scientist"
]
userInfo["added_one"] = "1" // can modify because it's var
print("first snapshot:")
var hijacked = false
for (key, value) in userInfo {
if !hijacked {
userInfo["added_two"] = "2" // doesn't error
userInfo.removeValueForKey("first_name") // doesn't error
hijacked = true
}
print("- \(key): \(value)")
}
userInfo["added_three"] = "3" // modify again
print("final snapshot:")
for (key, value) in userInfo {
print("- \(key): \(value)")
}
As you say, #1 is not an issue. You do not have a pointer to the object in Swift. You either have its value or a reference to it. If you have its value, then it's a copy. If you have a reference, then it's protected. So there's no issue here.
But let's consider the second and experiment, be surprised, and then stop being surprised.
var xs = [1,2,3,4]
for x in xs { // (1)
if x == 2 {
xs.removeAll() // (2)
}
print(x) // Prints "1\n2\n3\n\4\n"
}
xs // [] (3)
Wait, how does it print all the values when we blow away the values at (2). We are very surprised now.
But we shouldn't be. Swift arrays are values. The xs at (1) is a value. Nothing can ever change it. It's not "a pointer to memory that includes an array structure that contains 4 elements." It's the value [1,2,3,4]. At (2), we don't "remove all elements from the thing xs pointed to." We take the thing xs is, create an array that results if you remove all the elements (that would be [] in all cases), and then assign that new array to xs. Nothing bad happens.
So what does the documentation mean by "invalidates all indices?" It means exactly that. If we generated indices, they're no good anymore. Let's see:
var xs = [1,2,3,4]
for i in xs.indices {
if i == 2 {
xs.removeAll()
}
print(xs[i]) // Prints "1\n2\n" and then CRASH!!!
}
Once xs.removeAll() is called, there's no promise that the old result of xs.indices means anything anymore. You are not permitted to use those indices safely against the collection they came from.
"Invalidates indices" in Swift is not the same as C++'s "invalidates iterators." I'd call that pretty safe, except the fact that using collection indices is always a bit dangerous and so you should avoid indexing collections when you can help it; iterate them instead. Even if you need the indexes for some reason, use enumerate to get them without creating any of the danger of indexing.
(Side note, dict["key"] is not indexing into dict. Dictionaries are a little confusing because their key is not their index. Accessing dictionaries by their DictionaryIndex index is just as dangerous as accessing arrays by their Int index.)
Note also that the above doesn't apply to NSArray. If you modify NSArray while iterating it, you'll get a "mutated collection while iterating" error. I'm only discussing Swift data types.
EDIT: for-in is very explicit in how it works:
The generate() method is called on the collection expression to obtain a value of a generator type—that is, a type that conforms to the GeneratorType protocol. The program begins executing a loop by calling the next() method on the stream. If the value returned is not None, it is assigned to the item pattern, the program executes the statements, and then continues execution at the beginning of the loop. Otherwise, the program does not perform assignment or execute the statements, and it is finished executing the for-in statement.
The returned Generator is a struct and contains a collection value. You would not expect any changes to some other value to modify its behavior. Remember: [1,2,3] is no different than 4. They're both values. When you assign them, they make copies. So when you create a Generator over a collection value, you're going to snapshot that value, just like if I created a Generator over the number 4. (This raises an interesting problem, because Generators aren't really values, and so really shouldn't be structs. They should be classes. Swift stdlib has been fixing that. See the new AnyGenerator for instance. But they still contain an array value, and you would never expect changes to some other array value to impact them.)
See also "Structures and Enumerations Are Value Types" which goes into more detail on the importance of value types in Swift. Arrays are just structs.
Yes, that means there's logically copying. Swift has many optimizations to minimize actual copying when it's not needed. In your case, when you mutate the dictionary while it's being iterated, that will force a copy to happen. Mutation is cheap if you're the only consumer of a particular value's backing storage. But it's O(n) if you're not. (This is determined by the Swift builtin isUniquelyReferenced().) Long story short: Swift Collections are Copy-on-Write, and simply passing an array does not cause real memory to be allocated or copied.
You don't get COW for free. Your own structs are not COW. It's something that Swift does in stdlib. (See Mike Ash's great discussion of how you would recreate it.) Passing your own custom structs causes real copies to happen. That said, the majority of the memory in most structs is stored in collections, and those collections are COW, so the cost of copying structs is usually pretty small.
The book doesn't spend a lot of time drilling into value types in Swift (it explains it all; it just doesn't keep saying "hey, and this is what that implies"). On the other hand, it was the constant topic at WWDC. You may be interested particularly in Building Better Apps with Value Types in Swift which is all about this topic. I believe Swift in Practice also discussed it.
EDIT2:
#KarlP raises an interesting point in the comments below, and it's worth addressing. None of the value-safety promises we're discussing are related to for-in. They're based on Array. for-in makes no promises at all about what would happen if you mutated a collection while it is being iterated. That wouldn't even be meaningful. for-in doesn't "iterate over collections," it calls next() on Generators. So if your Generator becomes undefined if the collection is changed, then for-in will blow up because the Generator blew up.
That means that the following might be unsafe, depending on how strictly you read the spec:
func nukeFromOrbit<C: RangeReplaceableCollectionType>(var xs: C) {
var hijack = true
for x in xs {
if hijack {
xs.removeAll()
hijack = false
}
print(x)
}
}
And the compiler won't help you here. It'll work fine for all of the Swift collections. But if calling next() after mutation for your collection is undefined behavior, then this is undefined behavior.
My opinion is that it would be poor Swift to make a collection that allows its Generator to become undefined in this case. You could even argue that you've broken the Generator spec if you do (it offers no UB "out" unless the generator has been copied or has returned nil). So you could argue that the above code is totally within spec and your generator is broken. Those arguments tend to be a bit messy with a "spec" like Swift's which doesn't dive into all the corner cases.
Does this mean you can write unsafe code in Swift without getting a clear warning? Absolutely. But in the many cases that commonly cause real-world bugs, Swift's built-in behavior does the right thing. And in that, it is safer than some other options.

How to cast [Int8] to [UInt8] in Swift

I have a buffer that contains just characters
let buffer: [Int8] = ....
Then I need to pass this to a function process that takes [UInt8] as an argument.
func process(buffer: [UInt8]) {
// some code
}
What would be the best way to pass the [Int8] buffer to cast to [Int8]? I know following code would work, but in this case the buffer contains just bunch of characters, and it is unnecessary to use functions like map.
process(buffer.map{ x in UInt8(x) }) // OK
process([UInt8](buffer)) // error
process(buffer as! [UInt8]) // error
I am using Xcode7 b3 Swift2.
I broadly agree with the other answers that you should just stick with map, however, if your array were truly huge, and it really was painful to create a whole second buffer just for converting to the same bit pattern, you could do it like this:
// first, change your process logic to be generic on any kind of container
func process<C: CollectionType where C.Generator.Element == UInt8>(chars: C) {
// just to prove it's working...
print(String(chars.map { UnicodeScalar($0) }))
}
// sample input
let a: [Int8] = [104, 101, 108, 108, 111] // ascii "Hello"
// access the underlying raw buffer as a pointer
a.withUnsafeBufferPointer { buf -> Void in
process(
UnsafeBufferPointer(
// cast the underlying pointer to the type you want
start: UnsafePointer(buf.baseAddress),
count: buf.count))
}
// this prints [h, e, l, l, o]
Note withUnsafeBufferPointer means what it says. It’s unsafe and you can corrupt memory if you get this wrong (be especially careful with the count). It works based on your external knowledge that, for example, if any of the integers are negative then your code doesn't mind them becoming corrupt unsigned integers. You might know that, but the Swift type system can't, so it won't allow it without resort to the unsafe types.
That said, the above code is correct and within the rules and these techniques are justifiable if you need the performance edge. You almost certainly won’t unless you’re dealing with gigantic amounts of data or writing a library that you will call a gazillion times.
It’s also worth noting that there are circumstances where an array is not actually backed by a contiguous buffer (for example if it were cast from an NSArray) in which case calling .withUnsafeBufferPointer will first copy all the elements into a contiguous array. Also, Swift arrays are growable so this copy of underlying elements happens often as the array grows. If performance is absolutely critical, you could consider allocating your own memory using UnsafeMutablePointer and using it fixed-size style using UnsafeBufferPointer.
For a humorous but definitely not within the rules example that you shouldn’t actually use, this will also work:
process(unsafeBitCast(a, [UInt8].self))
It's also worth noting that these solutions are not the same as a.map { UInt8($0) } since the latter will trap at runtime if you pass it a negative integer. If this is a possibility you may need to filter them first.
IMO, the best way to do this would be to stick to the same base type throughout the whole application to avoid the whole need to do casts/coercions. That is, either use Int8 everywhere, or UInt8, but not both.
If you have no choice, e.g. if you use two separate frameworks over which you have no control, and one of them uses Int8 while another uses UInt8, then you should use map if you really want to use Swift. The latter 2 lines from your examples (process([UInt8](buffer)) and
process(buffer as! [UInt8])) look more like C approach to the problem, that is, we don't care that this area in memory is an array on singed integers we will now treat it as if it is unsigneds. Which basically throws whole Swift idea of strong types to the window.
What I would probably try to do is to use lazy sequences. E.g. check if it possible to feed process() method with something like:
let convertedBuffer = lazy(buffer).map() {
UInt8($0)
}
process(convertedBuffer)
This would at least save you from extra memory overhead (as otherwise you would have to keep 2 arrays), and possibly save you some performance (thanks to laziness).
You cannot cast arrays in Swift. It looks like you can, but what's really happening is that you are casting all the elements, one by one. Therefore, you can use cast notation with an array only if the elements can be cast.
Well, you cannot cast between numeric types in Swift. You have to coerce, which is a completely different thing - that is, you must make a new object of a different numeric type, based on the original object. The only way to use an Int8 where a UInt8 is expected is to coerce it: UInt8(x).
So what is true for one Int8 is true for an entire array of Int8. You cannot cast from an array of Int8 to an array of UInt8, any more than you could cast one of them. The only way to end up with an array of UInt8 is to coerce all the elements. That is exactly what your map call does. That is the way to do it; saying it is "unnecessary" is meaningless.