Swift4 right way to test substrings against strings? - swift

I'm parsing the first two characters on a line of text and doing lots of comparisons against possible patterns:
In my Card class:
static let ourTypes = ["PL", "SY", "XT"]
In lots of other places:
if Card.ourTypes.contains(line[0..<2]) { continue }
Swift4 (3?) changed the []'s to return a Substring. I know I can cast it back with String(line[0..<2]), but I suspect that's the wrong solution... is there a better way?

One way would be to make your ourTypes array to be [Substring], then you wouldn't have to convert your Substring to make contains work:
static let ourTypes: [Substring] = ["PL", "SY", "XT"]
if Card.ourTypes.contains(line.prefix(2)) { continue }
#matt's observation that searching with contains is better with a Set (because it's more efficient) can be accomplished with:
static let ourTypes: Set<Substring> = ["PL", "SY", "XT"]

The String cast, while a bit jarring, is not expensive. Deriving a true independent substring from a string simply is a two-step process: access the slice, then unlink the indices and storage from the original. That is all that String() means here. So I think your original approach is actually correct and nonproblematic.
If you really want to stay in the String world, though, you can, by calling removeSubrange instead of taking a slice. You give up the convenience of slice notation and slice-related methods, but everything depends on your priorities. And by the way, if contains is your main test here, use a Set, not an Array:
let ourTypes = Set(["PL", "SY", "XT"])
var line = "PLARF"
line.removeSubrange(line.index(line.startIndex, offsetBy: 2)...)
ourTypes.contains(line) // true

Related

Swift string vs [Character] and performance

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Fixed length array and a forced unwrapping of the last and the first elements

I have an array with 3 elements and want to take the first one and the last one elements.
let array = ["a", "b", "c"]
let first: String = array.first!
let last: String = array.last!
SwiftLint mark a force unwrap as a warning. Can I avoid a forced unwrapping when asking about the first and the last elements for a well known (defined) arrays?
I don't want to use a default values like in an example below
let first :String = array.first ?? ""
Edit:
Why am I asking about it? Because, I would like to avoid an warnings from the SwiftLint when using a forced unwrapping when asking for a first and a last element of an array which was defined by a literal and has enough elements to be sure that there is the first and the last element.
Edit 2:
I have found a name for what I was looking for. It's called Static-Sized Arrays. Static-Sized Arrays discussion stoped in 2017 and there is no chance to use it.
Try with index:
let first = array[0]
let last = array[array.count - 1]
Why am I asking about it? Because, I would like to avoid an warnings
from the SwiftLint when using a forced unwrapping when asking for a
first and a last element of an array which was defined by a literal
and has enough elements to be sure that there is the first and the
last element.
You can't really avoid to unwrap optional value, so if you only need it for two cases extensions can help here.
extension Collection {
func first() -> Element {
guard let first = self.first else {
fatalError() // or maybe return any kind of default value?
}
return first
}
}
let array = [1, 2]
array.first() // 1
And if it need to be only in one swift file you can place this code in that file and mark extensions with private keyword.
Can I avoid a forced unwrapping when asking about the first and the last elements for a well known (defined) arrays?
No you don't have to worry about it for a fixed array , actually the optional attachment for the properties first and last is designated to avoid crashes for an empty arrays

swift function to iterate possibly reversed array

I'd like to create a function that will iterate over an array (or collection or sequence). Then I will call that function with an array, and the reversed version of the array (but efficiently: without creating a new array to hold the reverse).
If I do this:
func doIteration(points: [CGPoint]) {
for p in points {
doSomethingWithPoint(p)
}
// I also need random access to points
doSomethingElseWithPoint(points[points.count-2]) // ignore obvious index error
}
And if I have this:
let points : [CGPoint] = whatever
I can do this just fine:
doIteration(points)
But then if I do this:
doIteration(points.reverse())
I get 'Cannot convert value of type 'ReverseRandomAccessCollection<[CGPoint]> to expected argument type [_]'
Now, I DON'T want to do this:
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
even though it will work, because that will (correct me if I'm wrong) create a new array, initializing it from the ReverseRandomAccessCollection returned by reverse().
So I guess I'd like to write my doIteration function to take some sort of sequence type, so I can pass in the result of reverse() directly, but ReverseRandomAccessCollection doesn't conform to anything at all. I think I'm missing something - what's the accepted pattern here?
If you change your parameter's type to a generic, you should get the functionality you need:
func doIteration
<C: CollectionType where C.Index: RandomAccessIndexType, C.Generator.Element == CGPoint>
(points: C) {
for p in points {
doSomethingWithPoint(p)
}
doSomethingElseWithPoint(points[points.endIndex - 2])
}
More importantly, this won't cause a copy of the array to be made. If you look at the type generated by the reverse() method:
let points: [CGPoint] = []
let reversed = points.reverse() // ReverseRandomAccessCollection<Array<__C.CGPoint>>
doIteration(reversed)
You'll see that it just creates a struct that references the original array, in reverse. (although it does have value-type semantics) And the original function can accept this new collection, because of the correct generic constraints.
You can do this
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
or this
doIteration(points.reverse() as [CGPoint])
but I don't think there is any real difference by the point of view of a the footprint.
Scenario 1
let reversedPoints : [CGPoint] = points.reverse()
doIteration(reversedPoints)
Infact in this case a new Array containing references to the CGPoint(s) present in the original array is created. This thanks to the Copy-on-write mechanism that Swift used to manage structures.
So the memory allocated is the following:
points.count * sizeOf(pointer)
Scenario 2
On the other hand you can write something like this
doIteration(points.reverse() as [CGPoint])
But are you really saving memory? Let's see.
A temporary variable is created, that variable is available inside the scope of the function doIteration and requires exactly a pointer for each element contained in points so again we have:
points.count * sizeOf(pointer)
So I think you can safely choose one of the 2 solutions.
Considerations
We should remember that Swift manages structures in a very smart way.
When I write
var word = "Hello"
var anotherWord = word
On the first line Swift create a Struct and fill it with the value "Hello".
On the second line Swift detect that there is no real reason to create a copy of the original String so writes inside the anotherWord a reference to the original value.
Only when word or anotherWord is modified Swift really create a copy of the original value.

Swift Cannot invoke '+' with an argument list of type '($T10, CGFloat)'

I am a beginner in Swift.
I have this error
Cannot invoke '+' with an argument list of type '($T10, CGFloat)'
func loadBackground(key: NSString, width:CGFloat, height:CGFloat) -> UIImage!{
var imageName = key + "_" + width + "_" + height
return UIImage(named: imageName)!
}
The error states that you're trying to concat a string and a float (or different types anyway, that cannot be concatenated via +). You could just interpolate & construct the string like this instead:
func loadBackground(key: NSString, width:CGFloat, height:CGFloat) -> UIImage!{
var imageName = "\(key)_\(width)_\(height)"
return UIImage(named: imageName)!
}
You can read more about this here
OK a few more words after #Grimxn's comment...
First of all, specifying width and height as CGFloats might be handy when calling the method and grabbing values from frame/bounds, but it will most probably bite you in the future (think for example a frame with an almost 'perfect' width like 120.001 - or any crazy number that came out from a division for example). So, I believe that Ints would serve better in this case in order to maintain a (relatively safer) mapping between sizes/filenames.
PS. Also a let might be preferable over a var in your case since imageName is just constructed and returned without further modification and finally a UIImage? as a return type would force you to handle (or at least check first for) any cases that the image could not be found and hence make your code safer.

Swift Looping Over a List

I'm trying to find if a string is in a word list read from a file. This is what I have so far. The content?[index] does seem to work. But the loop/optional stuff is causing things to not work.
Also, there is an efficiency question. Is it maybe better to put a list into a dictionary and have keys as say the first letter or something? Then try to see if that object exists with the same key instead of looping through the whole list each time.
let testString = "Hello"
let path = NSBundle.mainBundle().pathForResource("wordlist", ofType: "txt")
var content = String.stringWithContentsOfFile(path, encoding: NSUTF8StringEncoding, error: nil)?.componentsSeparatedByString("\n")
let count = content?.count
for word in 0..<count
{
if testString == content?[word]{
// fount word}
}
It complains about count being an int? instead of an int. Thanks for suggestions on how to work this best.
I think the problem is here:
let count = content?.count
which is an optional (Int?). The solution would be to unwrap it with a conditional:
if let count = content?.count {
for word in 0..<count
{
if testString == content?[word] {
// fount word}
}
}
}
As for the algorithm, it depends from the usage. If you do one search only, then the current implementation is good, which is an O(n).
In case of multiple searches, I would use this algorithm:
sort all keys
sort all words
then loop through both
compare key with word:
if equal, 1 word is found, advance key and continue the loop
if less, advance word and continue
if greater, advance key and continue
loop ends when either no other key or no other word is available.
Not sure, but complexity should be O(N), plus the cost of sorting the 2 lists.
Addendum A better way to implement that loop is:
if let content = content {
for word in 0 ..< content.count
{
if testString == content[word] {
// fount word}
}
}
}
Unwrap once and use anywhere (but within the block).
Addendum 2 A better algorithm is the following:
Store all keys in a hashset. Loop through all words, check if the word is in the set, and if yes add to the list of the found words. Much simpler.
If the number of words is less than the number of keys, I would invert that, by populating the hashset from the list of words and looping through the keys.
The complexity of this algorithm should be at most O(2n), where n is the max between the number of keys and the number of words.