Compatibility of SubSequence indices - swift

For most Swift Collections, indices of a Collection's SubSequence are compatible for use with the base Collection.
func foo<T: Collection>(_ buffer: T) -> T.Iterator.Element
where T.Index == T.SubSequence.Index
{
let start = buffer.index(buffer.startIndex, offsetBy: 2)
let end = buffer.index(buffer.startIndex, offsetBy: 3)
let sub = buffer[start ... end]
return buffer[sub.startIndex]
}
This works fine for most collections:
print(foo([0, 1, 2, 3, 4])) // 2
And even for String.UTF8View:
print(foo("01234".utf8) - 0x30 /* ASCII 0 */) // 2
But when using String.CharacterView, things start breaking:
print(foo("01234".characters)) // "0"
For the CharacterView, SubSequences create completely independent instances, i.e. Index starts again at 0. To convert back to a main String index, one has to use the distance function and add that to the startIndex of the SubSequence in the main String.
func foo<T: Collection>(_ buffer: T) -> T.Iterator.Element
where T.Index == T.SubSequence.Index, T.SubSequence: Collection, T.SubSequence.IndexDistance == T.IndexDistance
{
let start = buffer.index(buffer.startIndex, offsetBy: 2)
let end = buffer.index(buffer.startIndex, offsetBy: 3)
let sub = buffer[start ... end]
let subIndex = sub.startIndex
let distance = sub.distance(from: sub.startIndex, to: subIndex)
let bufferIndex = buffer.index(start, offsetBy: distance)
return buffer[bufferIndex]
}
With this, all three examples now correctly print 2.
Why are String SubSequence indices not compatible with their base String? As long as everything is immutable, it doesn't make sense to me why Strings are a special case, even with all the Unicode stuff. I've also noticed that substring functions return Strings and not Slices as most other collections do. However, substrings are still documented to be return in O(1). Strange magic.
Is there a way to constraint a generic function to restrict to collections where the SubSequence indices are compatible with the base Sequence?
Can one even assume that SubSequence indices are compatible for non-String collections, or is this just a coincidence, and one should always use distance(from:to:) to convert indices?

That has been discussed on swift-evolution, filed as bug report
SR-1927 – Subsequences of String Views don’t behave correctly and recently been fixed
in StringCharacterView.swift
with
commit.
With that fix String.CharacterView behaves
like other collections in that its slices should use the same indices for the same elements as the original collection.

Related

Is first(where:) Method always O(n) or it can be O(1) with usage of Set or Dictionary?

I like to know if I use Set instead of Array can my method of first(where:) became Complexity:O(1)?
Apple says that the first(where:) Method is O(n), is it in general so or it depends on how we use it?
for example look at these two ways of coding:
var numbers: [Int] = [Int]()
numbers = [3, 7, 4, -2, 9, -6, 10, 1]
if let searchResult = numbers.first(where: { value in value == -2 })
{
print("The number \(searchResult) Exist!")
}
else
{
print("The number does not Exist!")
}
and this:
var numbers: Set<Int> = Set<Int>()
numbers = [3, 7, 4, -2, 9, -6, 10, 1]
if let searchResult = numbers.first(where: { value in value == -2 })
{
print("The number \(searchResult) Exist!")
}
else
{
print("The number does not Exist!")
}
can we say that in second way Complexity is O(1)?
It's still O(n) even when you use a Set. .first(where:) is defined on a sequence, and it is necessary to check the items in the sequence one at a time to find the first one that makes the predicate true.
Your example is simply checking if the item exists in the Set, but since you are using .first(where:) and a predicate { value in value == -2} Swift will run that predicate for each element in the sequence in turn until it finds one that returns true. Swift doesn't know that you are really just checking to see if the item is in the set.
If you want O(1), then use .contains(-2) on the Set.
I recommend to learn more about Big-O notation. O(1) is a strict subset of O(n). Thus every function that is O(1) is also in O(n).
That said, Apple’s documentation is actually misleading as it does not take the complexity of the predicate function into account. The following is clearly O(n^2):
numbers.first(where: { value in numbers.contains(value + 42) })
Both Set and Dictionary conform to the Sequence protocol, which is the one that exposes the first(where:) function. And this function has the following requirement, taken from the documentation:
Complexity: O(n), where n is the length of the sequence.
Now, this is the upper limit of the function complexity, it might well be that some sequences optimize the search based on their data type and the storage details.
Bottom line: you need to reach the documentation for a particular type if you want to know more about the performance of some feature, however if you're only circulating some protocol references, then you should assume the "worst" - aka what's in the protocol documentation.
This is the implementation of the first(where:) function in the sequence:
/// - Complexity: O(*n*), where *n* is the length of the sequence.
#inlinable
public func first(
where predicate: (Element) throws -> Bool
) rethrows -> Element? {
for element in self {
if try predicate(element) {
return element
}
}
return nil
}
From the Swift Source Code on the Github
As you can see, It's a simple for loop and the complexity is O(n) (assuming the predicate complexity is 1 🤷🏻‍♂️).
The predicate executes n times. So the worst case is O(n)
The Set has not an overload for this function (since it is nonsense and there will be nothing more than the first one in a Set). If you know about the sequence and you are just looking for a value (not a predicate), just use contains or firstIndex(of:). These two have overloads with the complexity of O(1)
From the Swift Source Code on the Github

Swift lazy subscript ignores filter

How does subscripting a lazy filter work?
let ary = [0,1,2,3]
let empty = ary.lazy.filter { $0 > 4 }.map { $0 + 1 }
print(Array(empty)) // []
print(empty[2]) // 3
It looks like it just ignores the filter and does the map anyway. Is this documented somewhere? What other lazy collections have exceptional behavior like this?
It comes down to subscripting a LazyFilterCollection with an integer which in this case ignores the predicate and forwards the subscript operation to the base.
For example, if we're looking for the strictly positive integers in an array :
let array = [-10, 10, 20, 30]
let lazyFilter = array.lazy.filter { $0 > 0 }
print(lazyFilter[3]) // 30
Or, if we're looking for the lowercase characters in a string :
let str = "Hello"
let lazyFilter = str.lazy.filter { $0 > "Z" }
print(lazyFilter[str.startIndex]) //H
In both cases, the subscript is forwarded to the base collection.
The proper way of subscripting a LazyFilterCollection is using a LazyFilterCollection<Base>.Index as described in the documentation :
let start = lazyFilter.startIndex
let index = lazyFilter.index(start, offsetBy: 1)
print(lazyFilter[index])
Which yields 20 for the array example, or l for the string example.
In your case, trying to access the index 3:
let start = empty.startIndex
let index = empty.index(start, offsetBy: 3)
print(empty)
would raise the expected runtime error :
Fatal error: Index out of range
To add to Carpsen90's answer, you run into one of Collection's particularities: it's not recommended, nor safe to access collections by an absolute index, even if the type system allows this. Because the collection you receive might be a subset of another one.
Let's take a simpler example, array slicing:
let array = [0, 1, 2, 3, 4]
let slice = array[2..<3]
print(slice) // [2]
print(slice.first) // Optional(2)
print(slice[0]) // crashes with array index out of bounds
Even if slice is a collection indexable by an integer, it's still unsafe to use absolute integers to access elements of that collection, as the collection might have a different set of indices.

How to compare Range<String.Index> and DefaultBidirectionalIndices<String.CharacterView>?

This comparison worked in Swift 2 but doesn't anymore in Swift 3:
let myStringContainsOnlyOneCharacter = mySting.rangeOfComposedCharacterSequence(at: myString.startIndex) == mySting.characters.indices
How do I compare Range and DefaultBidirectionalIndices?
From SE-0065 – A New Model for Collections and Indices
In Swift 2, collection.indices returned a Range<Index>, but because a range is a simple pair of indices and indices can no longer be advanced on their own, Range<Index> is no longer iterable.
In order to keep code like the above working, Collection has acquired an associated Indices type that is always iterable, ...
Since rangeOfComposedCharacterSequence returns a range of
character indices, the solution is not to use indices, but
startIndex..<endIndex:
myString.rangeOfComposedCharacterSequence(at: myString.startIndex)
== myString.startIndex..<myString.endIndex
As far as I know, String nor String.CharacterView does not have a concise method returning Range<String.Index> or something comparable to it.
You may need to create a Range explicitly with range operator:
let myStringContainsOnlyOneCharacter = myString.rangeOfComposedCharacterSequence(at: myString.startIndex)
== myString.startIndex..<myString.endIndex
Or compare only upper bound, in your case:
let onlyOne = myString.rangeOfComposedCharacterSequence(at: myString.startIndex).upperBound
== myString.endIndex

Swift Range behaving like NSRange and not including endIndex

I ran across some funky Range behavior and now I'm questioning everything I thought I knew about Range in Swift.
let range = Range<Int>(start: 0, end: 2)
print(range.count) // Prints 2
Since Range uses a start & end instead of a location & length that NSRange uses I would expect the range above to have a count of 3. It almost seems like it is being treated like an NSRange since a count of 2 makes sense if your location = 0 and length = 2.
let array = ["A", "B", "C", "D"]
let slice = array[range]
I would expect slice to contain ABC since range's end index is 2, but slice actually contains AB, which does correspond to the range.count == 2, but doesn't add up since the range's endIndex == 2 which should include C.
What am I missing here?
I'm using Xcode 7.2's version of Swift, not any of the open source versions.
Range objects Range<T> in Swift are, by default, presented as a half-open interval [start,end), i.e. start..<end (see HalfOpenInterval IntervalType).
You can see this if your print your range variable
let range = Range<Int>(start: 0, end: 2)
print(range) // 0..<2
Also, as Leo Dabus pointed out below (thanks!), you can simplify the declaration of Range<Int> by using the half-open range operator ..< directly:
let range = 0..<2
print(range) // 0..<2 (naturally)
Likewise, you can declare Range<Int> ranges using the closed range operator ...
let range = 0...1
print(range) // 0..<2
And we see, interestingly (in context of the question), that this again verifies that the default representation of Ranges are by means of the half-open operator.
That the half-open interval is default for Range is written somewhat implicitly, in text, in the language reference for range:
Like other collections, a range containing one element has an endIndex
that is the successor of its startIndex; and an empty range has
startIndex == endIndex.
Range conforms, however, to CollectionType protocol. In the language reference to the latter, it's stated clearly that the startIndex and endIndex defines a half-open interval:
The sequence view of the elements is identical to the collection view.
In other words, the following code binds the same series of values to
x as does for x in self {}:
for i in startIndex..<endIndex {
let x = self[i]
}
To wrap it up; Range is defined as half-open interval (startIndex ..< endIndex), even if it's somewhat obscure to find in the docs.
See also
Swift Language Guide - Basic Operators - Range Operators
Swift includes two range operators, which are shortcuts for expressing
a range of values.
...
The closed range operator (a...b) defines a range that runs from a to
b, and includes the values a and b. The value of a must not be greater
than b.
...
The half-open range operator (a..< b) defines a range that runs from a
to b, but does not include b. It is said to be half-open because it
contains its first value, but not its final value.

How to compare Swift String.Index?

So I have an instance of Range<String.Index> obtained from a search method. And also a standalone String.Index by other means how can I tell wether this index is within the aforementioned range or not?
Example code:
let str = "Hello!"
let range = Range(start: str.startIndex, end: str.endIndex)
let anIndex = advance(str.startIndex, 3)
// How to tell if `anIndex` is within `range` ?
Since comparison operators do not work on String.Index instances, the only way seems to be to perform a loop through the string using advance but this seems overkill for such a simple operation.
The beta 5 release notes mention:
The idea of a Range has been split into three separate concepts:
Ranges, which are Collections of consecutive discrete ForwardIndexType values. Ranges are used for slicing and iteration.
Intervals over Comparable values, which can efficiently check for containment. Intervals are used for pattern matching in switch
statements and by the ~= operator.
Striding over Strideable values, which are Comparable and can be advanced an arbitrary distance in O(1).
Efficient containment checking is what you want, and this is possible since String.Index is Comparable:
let range = str.startIndex..<str.endIndex as HalfOpenInterval
// or this:
let range = HalfOpenInterval(str.startIndex, str.endIndex)
let anIndex = advance(str.startIndex, 3)
range.contains(anIndex) // true
// or this:
range ~= anIndex // true
(For now, it seems that explicitly naming HalfOpenInterval is necessary, otherwise the ..< operator creates a Range by default, and Range doesn't support contains and ~= because it uses only ForwardIndexType.)