How to compare Swift String.Index? - swift

So I have an instance of Range<String.Index> obtained from a search method. And also a standalone String.Index by other means how can I tell wether this index is within the aforementioned range or not?
Example code:
let str = "Hello!"
let range = Range(start: str.startIndex, end: str.endIndex)
let anIndex = advance(str.startIndex, 3)
// How to tell if `anIndex` is within `range` ?
Since comparison operators do not work on String.Index instances, the only way seems to be to perform a loop through the string using advance but this seems overkill for such a simple operation.

The beta 5 release notes mention:
The idea of a Range has been split into three separate concepts:
Ranges, which are Collections of consecutive discrete ForwardIndexType values. Ranges are used for slicing and iteration.
Intervals over Comparable values, which can efficiently check for containment. Intervals are used for pattern matching in switch
statements and by the ~= operator.
Striding over Strideable values, which are Comparable and can be advanced an arbitrary distance in O(1).
Efficient containment checking is what you want, and this is possible since String.Index is Comparable:
let range = str.startIndex..<str.endIndex as HalfOpenInterval
// or this:
let range = HalfOpenInterval(str.startIndex, str.endIndex)
let anIndex = advance(str.startIndex, 3)
range.contains(anIndex) // true
// or this:
range ~= anIndex // true
(For now, it seems that explicitly naming HalfOpenInterval is necessary, otherwise the ..< operator creates a Range by default, and Range doesn't support contains and ~= because it uses only ForwardIndexType.)

Related

How to shift a string's Range?

I have the Range of a word and its enclosing sentence within a big long String. After extracting that sentence into its own String, I'd like to know the position of the word within it.
If we were dealing with integer indexes, I would just subtract the sentence's starting index from the word's range and I'd be done. For example, if the word was in characters 10–12 and its sentence started at character 8, then I'd have a new word range of 2–4.
Here's what I've got, ready to copy&paste to a Playground:
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "A beginning is the time for taking the most delicate care that the balances are correct. This every sister of the Bene Gesserit knows."
let sentenceInString = bigLongString.range(of: "This every sister of the Bene Gesserit knows.")!
let wordInString = bigLongString.range(of: "sister")!
let sentence = String(bigLongString[sentenceInString])
// The Code In Question
let wordInSentence = ??? // Something that shifts the `wordInString` range
// The Test (again, just for testing. it should read "This every *sister* of the Bene Gesserit knows.")
print(sentence.replacingCharacters(in: wordInSentence,
with: "*\(sentence[wordInSentence])*"))
Also, note that wordInString may refer to any instance of a given word, not just the first one. (So, re-finding the word in sentence, i.e., sentence.range(of: "sister"), won't do the trick here unfortunately.) The range needs to be shifted somehow.
Thanks for reading!
EDIT:
Introducing a slightly more complicated bigLongString seems to be an issue with the solution I posted. E.g.,
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!
This can get kinda tricky, depending on precisely what you need to do.
NSRange
Firstly, as you may have noticed, Range<String.Index> and NSRange are different.
Range<String.Index> is how Swift represent ranges of indices in native Swift.Strings. It's an opaque type, that's only usable by the String APIs that consume it. It understands Swift strings as collections of Swift.Characters, which represent what Unicode calls "extended grapheme clusters".
NSRange is the older range representation, used by Objective C to represent ranges in Foundation.NSStrings. It's an open container, containing a "start" location and a length. Importantly, these NSRange and NSString understand collections of utf16 encoded unicode scalars.
Because NSRange and NSString expose so many of their internals, they haven't undergone the same migration from utf16 to utf8 that Swift.String underwent. A migration that most people probably didn't even notice, since Swift.String guarded its implementation details much more than NSString did.
NSRange is more amenable to the kinds of simple operations you might be looking for. You can offset the start location just like you describe. However, you need to be careful that the resulting range doesn't start/end in the middle of an extended grapheme cluster. In that case, slicing could lead to a substring with invalid unicode characters (for example, you might accidentally cut an e away from its accent. the accent modifier isn't valid on its own without the e.)
Bridging back and forth between NSRange and Range<String.Index> is possible, but can be error prone if you're not careful. For that reason, I suggest you try to minimize conversions, by trying to either exclusively use NSRange, or Range<String.Index>, but not mix the two too much.
replacingCharacters(in:with:)
I suspect you're only using this as example way of consuming wordInSentence, but it's still worth noting that:
Foundation.NSString.replacingCharacters(in:with:)](https://developer.apple.com/documentation/foundation/nsstring/1412937-replacingoccurrences) is an NSString API that's imported onto Swift.String when Foundation is imported. It accept an NSString. If you're dealing with Range<String.Index>, you should use its Swift-native counterpart, Swift.String.replaceSubrange(_:with:).
Substring is your friend
Don't fight it; unless you absolutely need sentence to be a String, keep it as a Substring for the duration of these short-lived processing actions. Not only does this save you a copy of the string's contents, but it also makes it so that the indices can be shared between the slice and the parent string. This is valid:
let sentence = bigLongString[sentenceInString]
print(sentence[wordInString])
or even just: bigLongString[sentenceInString][wordInString] or bigLongString[wordInString]
Shifting around
I couldn't find a native solution for this, so I rolled my own. I could definitely be missing something simpler, but here's what I came up with:
import Foundation
struct SubstringOffset {
let offset: String.IndexDistance
let parent: String
init(of substring: Substring, in parent: String) {
self.offset = parent.distance(from: parent.startIndex, to: substring.startIndex)
self.parent = parent
}
func convert(indexInParent: String.Index, toIndexIn newString: String) -> String.Index {
let distance = parent.distance(from: parent.startIndex, to: indexInParent)
let distanceInNewString = distance - offset
return newString.index(newString.startIndex, offsetBy: distanceInNewString)
}
func convert(rangeInParent: Range<String.Index>, toRangeIn newString: String) -> Range<String.Index> {
let newLowerBound = self.convert(indexInParent: rangeInParent.lowerBound, toIndexIn: newString)
let span = self.parent.distance(from: rangeInParent.lowerBound, to: rangeInParent.upperBound)
let newUpperBound = newString.index(newLowerBound, offsetBy: span)
return newLowerBound ..< newUpperBound
}
}
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!
var sentence: String = String(bigLongString[sentenceInString])
let offset = SubstringOffset(of: bigLongString[sentenceInString], in: bigLongString)
// The Code In Question
let wordInSentence: Range<String.Index> = offset.convert(rangeInParent: wordInString, toRangeIn: sentence)
sentence.replaceSubrange(wordInSentence, with: "*\(sentence[wordInSentence])*")
print(sentence)
OK, this is what I've come up with. It appears to work OK for both examples in the question.
We use the String instance method distance(from:to:) to get the distance between the bigLongString start and the sentence start. (Analogous to the "8" in the question.) Then the word range is shifted back by this amount by shifting the upper and lower bounds separately, and then reforming them into a Range.
let wordStartInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.lowerBound)
let wordEndInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.upperBound)
let wordStart = sentence.index(sentence.startIndex, offsetBy: wordStartInSentence)
let wordEnd = sentence.index(sentence.startIndex, offsetBy: wordEndInSentence)
let wordInSentence = wordStart..<wordEnd
EDIT: Updated answer to work for the more complicated bigLongString example (and coincidentally also reduce the "string walking," especially when bigLongString is very big).

Usage of Range operator in Data.subdata

In Swift 3, I wonder why I'm able to use the half-open range operator ..< in Data.subdata(in:) but not the closed range operator ....
I've searched everywhere but can't understand why it gives me this error :
no '...' candidates produce the expected contextual result type
'Range' (aka 'Range')
Here's an example of both the one that works and the one doesn't :
import Foundation
let x = Data(bytes: [0x0, 0x1])
let y : UInt8 = x.subdata(in: 0..<2).withUnsafeBytes{$0.pointee}
let z : UInt8 = x.subdata(in: 0...1).withUnsafeBytes{$0.pointee} // This fails
Thanks!
..< is the half-open range operator, which can either create a Range or CountableRange (depending on whether the Bound is Strideable with an Integer Stride or not). The range that is created is inclusive of the lower bound, but exclusive of the upper bound.
... is the closed range operator, which can either create a ClosedRange or CountableClosedRange (same requirements as above). The range that is created is inclusive of both the upper and lower bounds.
Therefore as subdata(in:) expects a Range<Int>, you cannot use the closed range operator ... in order to construct the argument – you must use the half-open range operator instead.
However, it would be trivial to extend Data and add an overload that does accept a ClosedRange<Int>, which would allow you to use the closed range operator.
extension Data {
func subdata(in range: ClosedRange<Index>) -> Data {
return subdata(in: range.lowerBound ..< range.upperBound + 1)
}
}
let x = Data(bytes: [0x0, 0x1])
let z : UInt8 = x.subdata(in: 0...1).withUnsafeBytes {$0.pointee}
At-risk of my comment being closed I'll speak out anyway - I followed the advice of a closed/deleted comment as the selected answer did not work with Swift 5. The working solution was:
x.subdata(in: Range(0...1))
I assume this has to do with interpretations of "Range" vs "NSRange" - but honestly I don't know.

Compatibility of SubSequence indices

For most Swift Collections, indices of a Collection's SubSequence are compatible for use with the base Collection.
func foo<T: Collection>(_ buffer: T) -> T.Iterator.Element
where T.Index == T.SubSequence.Index
{
let start = buffer.index(buffer.startIndex, offsetBy: 2)
let end = buffer.index(buffer.startIndex, offsetBy: 3)
let sub = buffer[start ... end]
return buffer[sub.startIndex]
}
This works fine for most collections:
print(foo([0, 1, 2, 3, 4])) // 2
And even for String.UTF8View:
print(foo("01234".utf8) - 0x30 /* ASCII 0 */) // 2
But when using String.CharacterView, things start breaking:
print(foo("01234".characters)) // "0"
For the CharacterView, SubSequences create completely independent instances, i.e. Index starts again at 0. To convert back to a main String index, one has to use the distance function and add that to the startIndex of the SubSequence in the main String.
func foo<T: Collection>(_ buffer: T) -> T.Iterator.Element
where T.Index == T.SubSequence.Index, T.SubSequence: Collection, T.SubSequence.IndexDistance == T.IndexDistance
{
let start = buffer.index(buffer.startIndex, offsetBy: 2)
let end = buffer.index(buffer.startIndex, offsetBy: 3)
let sub = buffer[start ... end]
let subIndex = sub.startIndex
let distance = sub.distance(from: sub.startIndex, to: subIndex)
let bufferIndex = buffer.index(start, offsetBy: distance)
return buffer[bufferIndex]
}
With this, all three examples now correctly print 2.
Why are String SubSequence indices not compatible with their base String? As long as everything is immutable, it doesn't make sense to me why Strings are a special case, even with all the Unicode stuff. I've also noticed that substring functions return Strings and not Slices as most other collections do. However, substrings are still documented to be return in O(1). Strange magic.
Is there a way to constraint a generic function to restrict to collections where the SubSequence indices are compatible with the base Sequence?
Can one even assume that SubSequence indices are compatible for non-String collections, or is this just a coincidence, and one should always use distance(from:to:) to convert indices?
That has been discussed on swift-evolution, filed as bug report
SR-1927 – Subsequences of String Views don’t behave correctly and recently been fixed
in StringCharacterView.swift
with
commit.
With that fix String.CharacterView behaves
like other collections in that its slices should use the same indices for the same elements as the original collection.

How to compare Range<String.Index> and DefaultBidirectionalIndices<String.CharacterView>?

This comparison worked in Swift 2 but doesn't anymore in Swift 3:
let myStringContainsOnlyOneCharacter = mySting.rangeOfComposedCharacterSequence(at: myString.startIndex) == mySting.characters.indices
How do I compare Range and DefaultBidirectionalIndices?
From SE-0065 – A New Model for Collections and Indices
In Swift 2, collection.indices returned a Range<Index>, but because a range is a simple pair of indices and indices can no longer be advanced on their own, Range<Index> is no longer iterable.
In order to keep code like the above working, Collection has acquired an associated Indices type that is always iterable, ...
Since rangeOfComposedCharacterSequence returns a range of
character indices, the solution is not to use indices, but
startIndex..<endIndex:
myString.rangeOfComposedCharacterSequence(at: myString.startIndex)
== myString.startIndex..<myString.endIndex
As far as I know, String nor String.CharacterView does not have a concise method returning Range<String.Index> or something comparable to it.
You may need to create a Range explicitly with range operator:
let myStringContainsOnlyOneCharacter = myString.rangeOfComposedCharacterSequence(at: myString.startIndex)
== myString.startIndex..<myString.endIndex
Or compare only upper bound, in your case:
let onlyOne = myString.rangeOfComposedCharacterSequence(at: myString.startIndex).upperBound
== myString.endIndex

Swift Range behaving like NSRange and not including endIndex

I ran across some funky Range behavior and now I'm questioning everything I thought I knew about Range in Swift.
let range = Range<Int>(start: 0, end: 2)
print(range.count) // Prints 2
Since Range uses a start & end instead of a location & length that NSRange uses I would expect the range above to have a count of 3. It almost seems like it is being treated like an NSRange since a count of 2 makes sense if your location = 0 and length = 2.
let array = ["A", "B", "C", "D"]
let slice = array[range]
I would expect slice to contain ABC since range's end index is 2, but slice actually contains AB, which does correspond to the range.count == 2, but doesn't add up since the range's endIndex == 2 which should include C.
What am I missing here?
I'm using Xcode 7.2's version of Swift, not any of the open source versions.
Range objects Range<T> in Swift are, by default, presented as a half-open interval [start,end), i.e. start..<end (see HalfOpenInterval IntervalType).
You can see this if your print your range variable
let range = Range<Int>(start: 0, end: 2)
print(range) // 0..<2
Also, as Leo Dabus pointed out below (thanks!), you can simplify the declaration of Range<Int> by using the half-open range operator ..< directly:
let range = 0..<2
print(range) // 0..<2 (naturally)
Likewise, you can declare Range<Int> ranges using the closed range operator ...
let range = 0...1
print(range) // 0..<2
And we see, interestingly (in context of the question), that this again verifies that the default representation of Ranges are by means of the half-open operator.
That the half-open interval is default for Range is written somewhat implicitly, in text, in the language reference for range:
Like other collections, a range containing one element has an endIndex
that is the successor of its startIndex; and an empty range has
startIndex == endIndex.
Range conforms, however, to CollectionType protocol. In the language reference to the latter, it's stated clearly that the startIndex and endIndex defines a half-open interval:
The sequence view of the elements is identical to the collection view.
In other words, the following code binds the same series of values to
x as does for x in self {}:
for i in startIndex..<endIndex {
let x = self[i]
}
To wrap it up; Range is defined as half-open interval (startIndex ..< endIndex), even if it's somewhat obscure to find in the docs.
See also
Swift Language Guide - Basic Operators - Range Operators
Swift includes two range operators, which are shortcuts for expressing
a range of values.
...
The closed range operator (a...b) defines a range that runs from a to
b, and includes the values a and b. The value of a must not be greater
than b.
...
The half-open range operator (a..< b) defines a range that runs from a
to b, but does not include b. It is said to be half-open because it
contains its first value, but not its final value.