How to shift a string's Range? - swift

I have the Range of a word and its enclosing sentence within a big long String. After extracting that sentence into its own String, I'd like to know the position of the word within it.
If we were dealing with integer indexes, I would just subtract the sentence's starting index from the word's range and I'd be done. For example, if the word was in characters 10–12 and its sentence started at character 8, then I'd have a new word range of 2–4.
Here's what I've got, ready to copy&paste to a Playground:
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "A beginning is the time for taking the most delicate care that the balances are correct. This every sister of the Bene Gesserit knows."
let sentenceInString = bigLongString.range(of: "This every sister of the Bene Gesserit knows.")!
let wordInString = bigLongString.range(of: "sister")!
let sentence = String(bigLongString[sentenceInString])
// The Code In Question
let wordInSentence = ??? // Something that shifts the `wordInString` range
// The Test (again, just for testing. it should read "This every *sister* of the Bene Gesserit knows.")
print(sentence.replacingCharacters(in: wordInSentence,
with: "*\(sentence[wordInSentence])*"))
Also, note that wordInString may refer to any instance of a given word, not just the first one. (So, re-finding the word in sentence, i.e., sentence.range(of: "sister"), won't do the trick here unfortunately.) The range needs to be shifted somehow.
Thanks for reading!
EDIT:
Introducing a slightly more complicated bigLongString seems to be an issue with the solution I posted. E.g.,
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!

This can get kinda tricky, depending on precisely what you need to do.
NSRange
Firstly, as you may have noticed, Range<String.Index> and NSRange are different.
Range<String.Index> is how Swift represent ranges of indices in native Swift.Strings. It's an opaque type, that's only usable by the String APIs that consume it. It understands Swift strings as collections of Swift.Characters, which represent what Unicode calls "extended grapheme clusters".
NSRange is the older range representation, used by Objective C to represent ranges in Foundation.NSStrings. It's an open container, containing a "start" location and a length. Importantly, these NSRange and NSString understand collections of utf16 encoded unicode scalars.
Because NSRange and NSString expose so many of their internals, they haven't undergone the same migration from utf16 to utf8 that Swift.String underwent. A migration that most people probably didn't even notice, since Swift.String guarded its implementation details much more than NSString did.
NSRange is more amenable to the kinds of simple operations you might be looking for. You can offset the start location just like you describe. However, you need to be careful that the resulting range doesn't start/end in the middle of an extended grapheme cluster. In that case, slicing could lead to a substring with invalid unicode characters (for example, you might accidentally cut an e away from its accent. the accent modifier isn't valid on its own without the e.)
Bridging back and forth between NSRange and Range<String.Index> is possible, but can be error prone if you're not careful. For that reason, I suggest you try to minimize conversions, by trying to either exclusively use NSRange, or Range<String.Index>, but not mix the two too much.
replacingCharacters(in:with:)
I suspect you're only using this as example way of consuming wordInSentence, but it's still worth noting that:
Foundation.NSString.replacingCharacters(in:with:)](https://developer.apple.com/documentation/foundation/nsstring/1412937-replacingoccurrences) is an NSString API that's imported onto Swift.String when Foundation is imported. It accept an NSString. If you're dealing with Range<String.Index>, you should use its Swift-native counterpart, Swift.String.replaceSubrange(_:with:).
Substring is your friend
Don't fight it; unless you absolutely need sentence to be a String, keep it as a Substring for the duration of these short-lived processing actions. Not only does this save you a copy of the string's contents, but it also makes it so that the indices can be shared between the slice and the parent string. This is valid:
let sentence = bigLongString[sentenceInString]
print(sentence[wordInString])
or even just: bigLongString[sentenceInString][wordInString] or bigLongString[wordInString]
Shifting around
I couldn't find a native solution for this, so I rolled my own. I could definitely be missing something simpler, but here's what I came up with:
import Foundation
struct SubstringOffset {
let offset: String.IndexDistance
let parent: String
init(of substring: Substring, in parent: String) {
self.offset = parent.distance(from: parent.startIndex, to: substring.startIndex)
self.parent = parent
}
func convert(indexInParent: String.Index, toIndexIn newString: String) -> String.Index {
let distance = parent.distance(from: parent.startIndex, to: indexInParent)
let distanceInNewString = distance - offset
return newString.index(newString.startIndex, offsetBy: distanceInNewString)
}
func convert(rangeInParent: Range<String.Index>, toRangeIn newString: String) -> Range<String.Index> {
let newLowerBound = self.convert(indexInParent: rangeInParent.lowerBound, toIndexIn: newString)
let span = self.parent.distance(from: rangeInParent.lowerBound, to: rangeInParent.upperBound)
let newUpperBound = newString.index(newLowerBound, offsetBy: span)
return newLowerBound ..< newUpperBound
}
}
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!
var sentence: String = String(bigLongString[sentenceInString])
let offset = SubstringOffset(of: bigLongString[sentenceInString], in: bigLongString)
// The Code In Question
let wordInSentence: Range<String.Index> = offset.convert(rangeInParent: wordInString, toRangeIn: sentence)
sentence.replaceSubrange(wordInSentence, with: "*\(sentence[wordInSentence])*")
print(sentence)

OK, this is what I've come up with. It appears to work OK for both examples in the question.
We use the String instance method distance(from:to:) to get the distance between the bigLongString start and the sentence start. (Analogous to the "8" in the question.) Then the word range is shifted back by this amount by shifting the upper and lower bounds separately, and then reforming them into a Range.
let wordStartInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.lowerBound)
let wordEndInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.upperBound)
let wordStart = sentence.index(sentence.startIndex, offsetBy: wordStartInSentence)
let wordEnd = sentence.index(sentence.startIndex, offsetBy: wordEndInSentence)
let wordInSentence = wordStart..<wordEnd
EDIT: Updated answer to work for the more complicated bigLongString example (and coincidentally also reduce the "string walking," especially when bigLongString is very big).

Related

Swift Dictionary is slow?

Situation: I was solving LeetCode 3. Longest Substring Without Repeating Characters, when I use the Dictionary using Swift the result is Time Limit Exceeded that failed to last test case, but using the same notion of code with C++ it acctually passed with runtime just fine. I thought in swift Dictionary is same thing as UnorderdMap.
Some research: I found some resources said use NSDictionary over regular one but it requires reference type instead of Int or Character etc.
Expected result: fast performance in accessing Dictionary in Swift
Question: I know there are better answer for the question, but the main goal here is Is there a effiencient to access and write to Dictionary or someting we can use to substitude.
func lengthOfLongestSubstring(_ s: String) -> Int {
var window:[Character:Int] = [:] //swift dictionary is kind of slow?
let array = Array(s)
var res = 0
var left = 0, right = 0
while right < s.count {
let rightChar = array[right]
right += 1
window[rightChar, default: 0] += 1
while window[rightChar]! > 1 {
let leftChar = array[left]
window[leftChar, default: 0] -= 1
left += 1
}
res = max(res, right - left)
}
return res
}
Because complexity of count in String is O(n), so that you should save count in a variable. You can read at chapter
Strings and Characters in Swift Book
Extended grapheme clusters can be composed of multiple Unicode scalars. This means that different characters—and different representations of the same character—can require different amounts of memory to store. Because of this, characters in Swift don’t each take up the same amount of memory within a string’s representation. As a result, the number of characters in a string can’t be calculated without iterating through the string to determine its extended grapheme cluster boundaries. If you are working with particularly long string values, be aware that the count property must iterate over the Unicode scalars in the entire string in order to determine the characters for that string.
The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.

What's faster/should be rather used for short(ish) Strings: Split or Substring?

Swift 5, Xcode 10.
I'm looping through an array of Strings (size probably < 20), each of them looks something like this:
johnsmith.20190202102030.conf
janedoe.19700101115959.conf
I know the first part (the name) beforehand but want to extract the middle part (birthday: 8, 12 or 14 characters long).
Version 1:
let f = "johnsmith.20190202102030.conf"
let name = "johnsmith"
let start = f.index(f.startIndex, offsetBy: name.count+1)
let end = f.index(f.startIndex, offsetBy: f.count-5)
let birthday = String(f[start..<end])
Version 2:
let f = "johnsmith.20190202102030.conf"
let farr = f.split(separator: ".").map(String.init)
let birthday = farr[1]
I'm currently only doing this for 10 Strings and (of course) didn't notice any difference in speed. Even with 100 Strings there probably won't be much of a difference anyway but I'm curious:
Ignoring the length of the code and potential errors, is there a reason (apart from personal preference) to prefer using one version over the other (e.g. speed with 100k Strings - I'm not asking for actual measurements!)?
From my very rough testing, it seems that the substring version is faster. However, in your case I would opt for using the version using split. The code is much more readable to me.

UTF8 String length and indices in Go vs Swift

I have apps in Go and Swift which process strings, such as finding substrings and their indices. At first it worked nicely even with multi-byte characters (e.g. emojis), using to Go's utf8.RuneCountInString() and Swift's native String.
But there are some UTF8 characters that break the string length and indices for substrings, e.g. a string "Lorem 😂😃✌️🤔 ipsum":
Go's utf8.RuneCountInString("Lorem 😂😃✌️🤔 ipsum") returns 17 and the start index of ipsum is 12.
Swift's "Lorem 😂😃✌️🤔 ipsum".count returns 16 and the start index of ipsum is 11.
Using Swift String's utf8, utf16 or casting to NSString gives also different lengths and indices. There are also other emojis composed from multiple other emoji's like 👨‍👩‍👧‍👦 which gives even funnier numbers.
This is with Go 1.8 and Swift 4.1.
Is there any way to get the same string lengths and substrings' indices with same values with Go and Swift?
EDIT
I created a Swift String extension based on #MartinR's great answer:
extension String {
func runesRangeToNSRange(from: Int, to: Int) -> NSRange {
let length = to - from
let start = unicodeScalars.index(unicodeScalars.startIndex, offsetBy: from)
let end = unicodeScalars.index(start, offsetBy: length)
let range = start..<end
return NSRange(range, in: self)
}
}
In Swift a Character is an “extended grapheme cluster,” and each of "😂", "😃", "✌️", "🤔", "👨‍👩‍👧‍👦" counts as a single character.
I have no experience with Go, but as I understand it from Strings, bytes, runes and characters in Go,
a “rune” is a Unicode code point, which essentially corresponds to a UnicodeScalar in Swift.
In your example, the difference comes from "✌️" which
counts as a single Swift character, but is built from two Unicode scalars:
print("✌️".count) // 1
print("✌️".unicodeScalars.count) // 2
Here is an example how you can compute the length and offsets in
terms of Unicode scalars:
let s = "Lorem 😂😃✌️🤔 ipsum"
print(s.unicodeScalars.count) // 17
if let idx = s.range(of: "ipsum") {
print(s.unicodeScalars.distance(from: s.startIndex, to: idx.lowerBound)) // 12
}
As you can see, this gives the same numbers as in your example from Go.
A rune in Go identifies a specific UTF-8 code point; that does not necessarily mean it maps 1:1 to visually distinct characters. Some characters may be made up of multiple runes/code points, therefor counting runes may not give you what you'd expect from a visual inspection of the string. I don't know what "some text".count actually counts in Swift so I can't offer any comparison there.

If given a Substring, is it possible to access the underlying complete String on which it is based?

Say I have the following code...
let x = "ABCDE"
// 'x' is a String
var y = x[1...3]
// 'y' is a Substring that equals "BCD"
If you only have access to y, is it possible to access x, or specifically parts of x which are outside the range of y? (i.e. can you access 'A' or 'E', or grow the range of y?)
So here's what Apple says:
Important
Don’t store substrings longer than you need them to perform a specific
operation. A substring holds a reference to the entire storage of the
string it comes from, not just to the portion it presents, even when
there is no other reference to the original string. Storing substrings
may, therefore, prolong the lifetime of string data that is no longer
otherwise accessible, which can appear to be memory leakage.
Now I find their use of the word "otherwise" in the last sentence rather interesting. It seems to me to keep the door open on this question - could a substring be manipulated to be expanded to include memory on either side that we know still exists as part of the original string?
So here's what I'd think is a fair test:
let x = "ABCDEFGH"
let substr = x.prefix(3)
var substrIndex = substr.startIndex
substr.formIndex(&substrIndex, offsetBy: 4) // offset beyond the substring
let prefix = substr.prefix(through:substrIndex)
print(prefix)
So what'cha think that would print?
Actually we never get to the print. We get a runtime fatal error instead.
Thread 1: Fatal error: Operation results in an invalid index
BTW, even trying the following results in an EXC_BAD_ACCESS crash:
let x = "ABCDEFGH"
var substr = x.prefix(3)
withUnsafePointer(to: &substr)
{ substrPointer in
let z = substrPointer.advanced(by: 3)
print(z.pointee)
}
So I don't think there's a way to get to the rest of the string if you just have a substring... from within Substring or String classes anyhow, or even dealing with unsafe pointers. I'm sure there's a way using direct memory access, for Apple claims the rest of the String's memory is there... but you'd probably have to fall back to C or C++.

Display certain number of letters

I have a word that is being displayed into a label. Could I program it, where it will only show the last 2 characters of the word, or the the first 3 only? How can I do this?
Swift's string APIs can be a little confusing. You get access to the characters of a string via its characters property, on which you can then use prefix() or suffix() to get the substring you want. That subset of characters needs to be converted back to a String:
let str = "Hello, world!"
// first three characters:
let prefixSubstring = String(str.characters.prefix(3))
// last two characters:
let suffixSubstring = String(str.characters.suffix(2))
I agree it is definitely confusing working with String indexing in Swift and they have changed a little bit from Swift 1 to 2 making googling a bit of a challenge but it can actually be quite simple once you get a hang of the methods. You basically need to make it into a two-step process:
1) Find the index you need
2) Advance from there
For example:
let sampleString = "HelloWorld"
let lastThreeindex = sampleString.endIndex.advancedBy(-3)
sampleString.substringFromIndex(lastThreeindex) //prints rld
let secondIndex = sampleString.startIndex.advancedBy(2)
sampleString.substringToIndex(secondIndex) //prints He