If given a Substring, is it possible to access the underlying complete String on which it is based? - swift

Say I have the following code...
let x = "ABCDE"
// 'x' is a String
var y = x[1...3]
// 'y' is a Substring that equals "BCD"
If you only have access to y, is it possible to access x, or specifically parts of x which are outside the range of y? (i.e. can you access 'A' or 'E', or grow the range of y?)

So here's what Apple says:
Important
Don’t store substrings longer than you need them to perform a specific
operation. A substring holds a reference to the entire storage of the
string it comes from, not just to the portion it presents, even when
there is no other reference to the original string. Storing substrings
may, therefore, prolong the lifetime of string data that is no longer
otherwise accessible, which can appear to be memory leakage.
Now I find their use of the word "otherwise" in the last sentence rather interesting. It seems to me to keep the door open on this question - could a substring be manipulated to be expanded to include memory on either side that we know still exists as part of the original string?
So here's what I'd think is a fair test:
let x = "ABCDEFGH"
let substr = x.prefix(3)
var substrIndex = substr.startIndex
substr.formIndex(&substrIndex, offsetBy: 4) // offset beyond the substring
let prefix = substr.prefix(through:substrIndex)
print(prefix)
So what'cha think that would print?
Actually we never get to the print. We get a runtime fatal error instead.
Thread 1: Fatal error: Operation results in an invalid index
BTW, even trying the following results in an EXC_BAD_ACCESS crash:
let x = "ABCDEFGH"
var substr = x.prefix(3)
withUnsafePointer(to: &substr)
{ substrPointer in
let z = substrPointer.advanced(by: 3)
print(z.pointee)
}
So I don't think there's a way to get to the rest of the string if you just have a substring... from within Substring or String classes anyhow, or even dealing with unsafe pointers. I'm sure there's a way using direct memory access, for Apple claims the rest of the String's memory is there... but you'd probably have to fall back to C or C++.

Related

Swift Dictionary is slow?

Situation: I was solving LeetCode 3. Longest Substring Without Repeating Characters, when I use the Dictionary using Swift the result is Time Limit Exceeded that failed to last test case, but using the same notion of code with C++ it acctually passed with runtime just fine. I thought in swift Dictionary is same thing as UnorderdMap.
Some research: I found some resources said use NSDictionary over regular one but it requires reference type instead of Int or Character etc.
Expected result: fast performance in accessing Dictionary in Swift
Question: I know there are better answer for the question, but the main goal here is Is there a effiencient to access and write to Dictionary or someting we can use to substitude.
func lengthOfLongestSubstring(_ s: String) -> Int {
var window:[Character:Int] = [:] //swift dictionary is kind of slow?
let array = Array(s)
var res = 0
var left = 0, right = 0
while right < s.count {
let rightChar = array[right]
right += 1
window[rightChar, default: 0] += 1
while window[rightChar]! > 1 {
let leftChar = array[left]
window[leftChar, default: 0] -= 1
left += 1
}
res = max(res, right - left)
}
return res
}
Because complexity of count in String is O(n), so that you should save count in a variable. You can read at chapter
Strings and Characters in Swift Book
Extended grapheme clusters can be composed of multiple Unicode scalars. This means that different characters—and different representations of the same character—can require different amounts of memory to store. Because of this, characters in Swift don’t each take up the same amount of memory within a string’s representation. As a result, the number of characters in a string can’t be calculated without iterating through the string to determine its extended grapheme cluster boundaries. If you are working with particularly long string values, be aware that the count property must iterate over the Unicode scalars in the entire string in order to determine the characters for that string.
The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.

Why are Data.endIndex and Data.count different?

let str = "This is a swift bug"
let data = Data(str.utf8)
print("data size = ", data.endIndex, data.count)
let trimmed = data[2..<data.endIndex]
print("trimmed size = ", trimmed.endIndex, trimmed.count)
The result is
data size = 19 19
trimmed size = 19 17
According to the Apple doc about endIndex:
This is the “one-past-the-end” position, and will always be equal to the count.
Is it a bug? or I'm missing something?
You should open an Apple Feedback for the documentation of Data.endIndex. It's incorrect.
The startIndex of Data is not promised to be zero, and this is an example of when it isn't. Using the Int subscript on Data is unfortunately very dangerous unless you know precisely how the Data was constructed (and specifically that it has a zero index).
Data uniquely mixes two facts that make it tricky to use correctly:
It is its own Slice
Its Index is Int
For some discussion of this, and suggested patterns, see Data.popFirst(), removeFirst() adjust indices. Also see Data ranged subscribe strange behavior for another version of this question.
When you use an expression like array[2..<array.endIndex] you are creating a slice. A slice is a sort of window onto an array (or something similar to an array). Its startIndex is not necessarily 0 and its endIndex is not necessarily one after the last index of the original.
Example:
let arr = Array(1...10)
print(arr.startIndex) // 0
print(arr.endIndex) // 10
let slice = arr[2...4]
print(slice.startIndex) // 2
print(slice.endIndex) // 5
print(slice.count) // 3
You see how this works? The slice has its own logic. Its size (count) is the size of the slice, but its index numbers come from the original array, because the slice is nothing but a pointer into a section of the original array. It has no independent existence; it is just a way of seeing, as it were.
An important consequence is that slice[0] will crash: the first available index of slice is 2, as we have already been told. This is why it is crucial to know whether you're dealing with an original array or a slice.
However, at least you have reason to know that this issue might exist, because slice has a special type — Array<Int>.SubSequence, meaning an ArraySlice. But the fact that you are encountering this by way of Data makes it more tricky, because trimmed is typed as a Data, not as a DataSlice! It is in fact a Data.SubSequence, but you have no simple way of finding that out! That's because Data.SubSequence is typealiased to Data itself. This is to be regarded as a flaw in the Data implementation.
Nevertheless, it is exactly the same phenomenon. These answers should look strangely familiar:
let str = "This is a swift bug"
let data = Data(str.utf8)
let trimmed = data[2...4]
print(trimmed.startIndex) // 2
print(trimmed.endIndex) // 5
print(trimmed.count) // 3
The best way to solve this is Don't Do That. To take a subrange of a Data as a true Data, use subdata:
let trimmed2 = data.subdata(in: 2..<5)
print(trimmed2.startIndex) // 0, and so on; it's an independent copy

How to shift a string's Range?

I have the Range of a word and its enclosing sentence within a big long String. After extracting that sentence into its own String, I'd like to know the position of the word within it.
If we were dealing with integer indexes, I would just subtract the sentence's starting index from the word's range and I'd be done. For example, if the word was in characters 10–12 and its sentence started at character 8, then I'd have a new word range of 2–4.
Here's what I've got, ready to copy&paste to a Playground:
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "A beginning is the time for taking the most delicate care that the balances are correct. This every sister of the Bene Gesserit knows."
let sentenceInString = bigLongString.range(of: "This every sister of the Bene Gesserit knows.")!
let wordInString = bigLongString.range(of: "sister")!
let sentence = String(bigLongString[sentenceInString])
// The Code In Question
let wordInSentence = ??? // Something that shifts the `wordInString` range
// The Test (again, just for testing. it should read "This every *sister* of the Bene Gesserit knows.")
print(sentence.replacingCharacters(in: wordInSentence,
with: "*\(sentence[wordInSentence])*"))
Also, note that wordInString may refer to any instance of a given word, not just the first one. (So, re-finding the word in sentence, i.e., sentence.range(of: "sister"), won't do the trick here unfortunately.) The range needs to be shifted somehow.
Thanks for reading!
EDIT:
Introducing a slightly more complicated bigLongString seems to be an issue with the solution I posted. E.g.,
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!
This can get kinda tricky, depending on precisely what you need to do.
NSRange
Firstly, as you may have noticed, Range<String.Index> and NSRange are different.
Range<String.Index> is how Swift represent ranges of indices in native Swift.Strings. It's an opaque type, that's only usable by the String APIs that consume it. It understands Swift strings as collections of Swift.Characters, which represent what Unicode calls "extended grapheme clusters".
NSRange is the older range representation, used by Objective C to represent ranges in Foundation.NSStrings. It's an open container, containing a "start" location and a length. Importantly, these NSRange and NSString understand collections of utf16 encoded unicode scalars.
Because NSRange and NSString expose so many of their internals, they haven't undergone the same migration from utf16 to utf8 that Swift.String underwent. A migration that most people probably didn't even notice, since Swift.String guarded its implementation details much more than NSString did.
NSRange is more amenable to the kinds of simple operations you might be looking for. You can offset the start location just like you describe. However, you need to be careful that the resulting range doesn't start/end in the middle of an extended grapheme cluster. In that case, slicing could lead to a substring with invalid unicode characters (for example, you might accidentally cut an e away from its accent. the accent modifier isn't valid on its own without the e.)
Bridging back and forth between NSRange and Range<String.Index> is possible, but can be error prone if you're not careful. For that reason, I suggest you try to minimize conversions, by trying to either exclusively use NSRange, or Range<String.Index>, but not mix the two too much.
replacingCharacters(in:with:)
I suspect you're only using this as example way of consuming wordInSentence, but it's still worth noting that:
Foundation.NSString.replacingCharacters(in:with:)](https://developer.apple.com/documentation/foundation/nsstring/1412937-replacingoccurrences) is an NSString API that's imported onto Swift.String when Foundation is imported. It accept an NSString. If you're dealing with Range<String.Index>, you should use its Swift-native counterpart, Swift.String.replaceSubrange(_:with:).
Substring is your friend
Don't fight it; unless you absolutely need sentence to be a String, keep it as a Substring for the duration of these short-lived processing actions. Not only does this save you a copy of the string's contents, but it also makes it so that the indices can be shared between the slice and the parent string. This is valid:
let sentence = bigLongString[sentenceInString]
print(sentence[wordInString])
or even just: bigLongString[sentenceInString][wordInString] or bigLongString[wordInString]
Shifting around
I couldn't find a native solution for this, so I rolled my own. I could definitely be missing something simpler, but here's what I came up with:
import Foundation
struct SubstringOffset {
let offset: String.IndexDistance
let parent: String
init(of substring: Substring, in parent: String) {
self.offset = parent.distance(from: parent.startIndex, to: substring.startIndex)
self.parent = parent
}
func convert(indexInParent: String.Index, toIndexIn newString: String) -> String.Index {
let distance = parent.distance(from: parent.startIndex, to: indexInParent)
let distanceInNewString = distance - offset
return newString.index(newString.startIndex, offsetBy: distanceInNewString)
}
func convert(rangeInParent: Range<String.Index>, toRangeIn newString: String) -> Range<String.Index> {
let newLowerBound = self.convert(indexInParent: rangeInParent.lowerBound, toIndexIn: newString)
let span = self.parent.distance(from: rangeInParent.lowerBound, to: rangeInParent.upperBound)
let newUpperBound = newString.index(newLowerBound, offsetBy: span)
return newLowerBound ..< newUpperBound
}
}
// The Setup (this is just to get easy testing values, no need for feedback on this part)
let bigLongString = "Really…? Thought I had it."
let sentenceInString = bigLongString.range(of: "Thought I had it.")!
let wordInString = bigLongString.range(of: "I")!
var sentence: String = String(bigLongString[sentenceInString])
let offset = SubstringOffset(of: bigLongString[sentenceInString], in: bigLongString)
// The Code In Question
let wordInSentence: Range<String.Index> = offset.convert(rangeInParent: wordInString, toRangeIn: sentence)
sentence.replaceSubrange(wordInSentence, with: "*\(sentence[wordInSentence])*")
print(sentence)
OK, this is what I've come up with. It appears to work OK for both examples in the question.
We use the String instance method distance(from:to:) to get the distance between the bigLongString start and the sentence start. (Analogous to the "8" in the question.) Then the word range is shifted back by this amount by shifting the upper and lower bounds separately, and then reforming them into a Range.
let wordStartInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.lowerBound)
let wordEndInSentence = bigLongString.distance(from: sentenceInString.lowerBound,
to: wordInString.upperBound)
let wordStart = sentence.index(sentence.startIndex, offsetBy: wordStartInSentence)
let wordEnd = sentence.index(sentence.startIndex, offsetBy: wordEndInSentence)
let wordInSentence = wordStart..<wordEnd
EDIT: Updated answer to work for the more complicated bigLongString example (and coincidentally also reduce the "string walking," especially when bigLongString is very big).

Data ranged subscribe strange behavior

I was playing with swift's Data in the following a small code:
var d = Data(count: 10)
d[5] = 3
let d2 = d[5..<8]
print("\(d2[0])")
To my surprise, this code throws exception on print() while the following code does not:
var d = Data(count: 10)
d[5] = 3
let d2 = d.subdata(in: 5..<8)
print("\(d2[0])")
I somehow understand why this happens, but I don't get why this is designed like this. When I use subdata() I get a whole copy of range, so indexing is valid from 0. But when I use range subscribe [], I get access to the requested range while indexing is the same as before. So in my first example d2[5] is 3.
But I wonder why it is designed like this? I don't want to make a copy of my data by using subdata() method. I just wanted to access a portion of my data with better indexing.
This is especially creates unexpected behaviors if you pass it to a function. For example, following code creates unexpected results and exceptions and you may not find out easily why:
func testit(idata: Data) {
if idata.count > 0 {
print("\(idata.count)")
print("\(idata[0])")
}
}
//...
var d = Data(count: 10)
d[5] = 3
let d2 = d[5..<8]
testit(idata: d2)
This code is really strange. Because if you debug your code, you see that print("\(idata.count)") prints 3 as size of idata which is correct, but accessing it with idata[0] creates exception.
Is there any reason for this design? I was expecting that I could access resulting Data from subscribe starting index 0 while it is not true. Can I do this without using subdata() which creates copy of data or using additional arguments to pass base of data slice?
d[5..<8] returns Data.Slice – which happens to be Data. Generally, slices share the indices with their base collection, as documented in Slice.
One possible reason for this design decision is that it guarantees that subscripting a slice is a O(1) operation (adding an offset for accessing the base collection is not necessarily O(1), e.g. not for strings.)
It is also convenient, as in this example to locate the text after the second occurrence of a character in a string:
let string = "abcdefgabcdefg"
// Find first occurrence of "d":
if let r1 = string.range(of: "d") {
// Find second occurrence of "d":
if let r2 = string[r1.upperBound...].range(of: "d") {
print(string[r2.upperBound...]) // efg
}
}
As a consequence, you must never assume that the indices of a collection are zero-based (unless documented, as for Array.startIndex). Use startIndex to get the first index, or first to get the first element.

Display certain number of letters

I have a word that is being displayed into a label. Could I program it, where it will only show the last 2 characters of the word, or the the first 3 only? How can I do this?
Swift's string APIs can be a little confusing. You get access to the characters of a string via its characters property, on which you can then use prefix() or suffix() to get the substring you want. That subset of characters needs to be converted back to a String:
let str = "Hello, world!"
// first three characters:
let prefixSubstring = String(str.characters.prefix(3))
// last two characters:
let suffixSubstring = String(str.characters.suffix(2))
I agree it is definitely confusing working with String indexing in Swift and they have changed a little bit from Swift 1 to 2 making googling a bit of a challenge but it can actually be quite simple once you get a hang of the methods. You basically need to make it into a two-step process:
1) Find the index you need
2) Advance from there
For example:
let sampleString = "HelloWorld"
let lastThreeindex = sampleString.endIndex.advancedBy(-3)
sampleString.substringFromIndex(lastThreeindex) //prints rld
let secondIndex = sampleString.startIndex.advancedBy(2)
sampleString.substringToIndex(secondIndex) //prints He