Swift Looping Over a List - swift

I'm trying to find if a string is in a word list read from a file. This is what I have so far. The content?[index] does seem to work. But the loop/optional stuff is causing things to not work.
Also, there is an efficiency question. Is it maybe better to put a list into a dictionary and have keys as say the first letter or something? Then try to see if that object exists with the same key instead of looping through the whole list each time.
let testString = "Hello"
let path = NSBundle.mainBundle().pathForResource("wordlist", ofType: "txt")
var content = String.stringWithContentsOfFile(path, encoding: NSUTF8StringEncoding, error: nil)?.componentsSeparatedByString("\n")
let count = content?.count
for word in 0..<count
{
if testString == content?[word]{
// fount word}
}
It complains about count being an int? instead of an int. Thanks for suggestions on how to work this best.

I think the problem is here:
let count = content?.count
which is an optional (Int?). The solution would be to unwrap it with a conditional:
if let count = content?.count {
for word in 0..<count
{
if testString == content?[word] {
// fount word}
}
}
}
As for the algorithm, it depends from the usage. If you do one search only, then the current implementation is good, which is an O(n).
In case of multiple searches, I would use this algorithm:
sort all keys
sort all words
then loop through both
compare key with word:
if equal, 1 word is found, advance key and continue the loop
if less, advance word and continue
if greater, advance key and continue
loop ends when either no other key or no other word is available.
Not sure, but complexity should be O(N), plus the cost of sorting the 2 lists.
Addendum A better way to implement that loop is:
if let content = content {
for word in 0 ..< content.count
{
if testString == content[word] {
// fount word}
}
}
}
Unwrap once and use anywhere (but within the block).
Addendum 2 A better algorithm is the following:
Store all keys in a hashset. Loop through all words, check if the word is in the set, and if yes add to the list of the found words. Much simpler.
If the number of words is less than the number of keys, I would invert that, by populating the hashset from the list of words and looping through the keys.
The complexity of this algorithm should be at most O(2n), where n is the max between the number of keys and the number of words.

Related

Limit text to a certain number of words in Swift

In a mobile App I use an API that can only handle about 300 words. How can I trimm a string in Swift so that it doesn't contain more words?
The native .trimmingCharacters(in: CharacterSet) does not seem to be able to do this as it is intended to trimm certain characters.
There is no off-the shelf way to limit the number of words in a string.
If you look at this post, it documents using the method enumerateSubstrings(in: Range) and setting an option of .byWords. It looks like it returns an array of Range values.
You could use that to create an extension on String that would return the first X words of that string:
extension String {
func firstXWords(_ wordCount: Int) -> Substring {
var ranges: [Range<String.Index>] = []
self.enumerateSubstrings(in: self.startIndex..., options: .byWords) { _, range, _, _ in
ranges.append(range)
}
if ranges.count > wordCount - 1 {
return self[self.startIndex..<ranges[wordCount - 1].upperBound]
} else {
return self[self.startIndex..<self.endIndex]
}
}
}
If we then run the code:
let sentence = "I want to an algorithm that could help find out how many words are there in a string separated by space or comma or some character. And then append each word separated by a character to an array which could be added up later I'm making an average calculator so I want the total count of data and then add up all the words. By words I mean the numbers separated by a character, preferably space Thanks in advance"
print(sentence.firstXWords(10))
The output is:
I want to an algorithm that could help find out
Using enumerateSubstrings(in: Range) is going to give much better results than splitting your string using spaces, since there are a lot more separators than just spaces in normal text (newlines, commas, colons, em spaces, etc.) It will also work for languages like Japanese and Chinese that often don't have spaces between words.
You might be able to rewrite the function to terminate the enumeration of the string as soon as it reaches the desired number of words. If you want a small percentage of the words in a very long string that would make it significantly faster (the code above should have O(n) performance, although I haven't dug deeply enough to be sure of that. I also couldn't figure out how to terminate the enumerateSubstrings() function early, although I didn't try that hard.)
Leo Dabus provided an improved version of my function. It extends StringProtocol rather than String, which means it can work on substrings. Plus, it stops once it hits your desired word count, so it will be much faster for finding the first few words of very long strings:
extension StringProtocol {
func firstXWords(_ n: Int) -> SubSequence {
var endIndex = self.endIndex
var words = 0
enumerateSubstrings(in: startIndex..., options: .byWords) { _, range, _, stop in
words += 1
if words == n {
stop = true
endIndex = range.upperBound
}
}
return self[..<endIndex] }
}

Swift string vs [Character] and performance

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Fixed length array and a forced unwrapping of the last and the first elements

I have an array with 3 elements and want to take the first one and the last one elements.
let array = ["a", "b", "c"]
let first: String = array.first!
let last: String = array.last!
SwiftLint mark a force unwrap as a warning. Can I avoid a forced unwrapping when asking about the first and the last elements for a well known (defined) arrays?
I don't want to use a default values like in an example below
let first :String = array.first ?? ""
Edit:
Why am I asking about it? Because, I would like to avoid an warnings from the SwiftLint when using a forced unwrapping when asking for a first and a last element of an array which was defined by a literal and has enough elements to be sure that there is the first and the last element.
Edit 2:
I have found a name for what I was looking for. It's called Static-Sized Arrays. Static-Sized Arrays discussion stoped in 2017 and there is no chance to use it.
Try with index:
let first = array[0]
let last = array[array.count - 1]
Why am I asking about it? Because, I would like to avoid an warnings
from the SwiftLint when using a forced unwrapping when asking for a
first and a last element of an array which was defined by a literal
and has enough elements to be sure that there is the first and the
last element.
You can't really avoid to unwrap optional value, so if you only need it for two cases extensions can help here.
extension Collection {
func first() -> Element {
guard let first = self.first else {
fatalError() // or maybe return any kind of default value?
}
return first
}
}
let array = [1, 2]
array.first() // 1
And if it need to be only in one swift file you can place this code in that file and mark extensions with private keyword.
Can I avoid a forced unwrapping when asking about the first and the last elements for a well known (defined) arrays?
No you don't have to worry about it for a fixed array , actually the optional attachment for the properties first and last is designated to avoid crashes for an empty arrays

Swift4 right way to test substrings against strings?

I'm parsing the first two characters on a line of text and doing lots of comparisons against possible patterns:
In my Card class:
static let ourTypes = ["PL", "SY", "XT"]
In lots of other places:
if Card.ourTypes.contains(line[0..<2]) { continue }
Swift4 (3?) changed the []'s to return a Substring. I know I can cast it back with String(line[0..<2]), but I suspect that's the wrong solution... is there a better way?
One way would be to make your ourTypes array to be [Substring], then you wouldn't have to convert your Substring to make contains work:
static let ourTypes: [Substring] = ["PL", "SY", "XT"]
if Card.ourTypes.contains(line.prefix(2)) { continue }
#matt's observation that searching with contains is better with a Set (because it's more efficient) can be accomplished with:
static let ourTypes: Set<Substring> = ["PL", "SY", "XT"]
The String cast, while a bit jarring, is not expensive. Deriving a true independent substring from a string simply is a two-step process: access the slice, then unlink the indices and storage from the original. That is all that String() means here. So I think your original approach is actually correct and nonproblematic.
If you really want to stay in the String world, though, you can, by calling removeSubrange instead of taking a slice. You give up the convenience of slice notation and slice-related methods, but everything depends on your priorities. And by the way, if contains is your main test here, use a Set, not an Array:
let ourTypes = Set(["PL", "SY", "XT"])
var line = "PLARF"
line.removeSubrange(line.index(line.startIndex, offsetBy: 2)...)
ourTypes.contains(line) // true

how to make sense of pattern matching of optionals in for loop in swift

in a practice problem I was asked to print out all elements that are not nil in an array of string, and I realize
for case let name? in names{
print(name)
}
would do the job. But isn't it counter-intuitive?
In the snippet, I read it as "for every element (with actual value or nil)that is in names", but in fact it should be "for every element (actual value)that is in names".
Can anyone help me to make sense of the snippet?
You want to know why this code:
let names = ["b", nil, "x"]
for case let name? in names {
print(name)
}
Produces this output:
b
x
You are wondering what happens to the nil.
The answer is the "Optional Pattern" found in the Language Reference:
The optional pattern provides a convenient way to iterate over an
array of optional values in a for-in statement, executing the body of
the loop only for non-nil elements.
The case keyword is vital. It changes the nature of the for loop significantly. As you can see from this complier error, name? inside the loop is not an optional at all.
Think of the ? as an operator that removes the optionality of name. If the assignment would result in nil, that iteration of the loop does not happen, and the next iteration starts.
Notice that without the case you do not get the same behavior at all.
This:
for name in names {
print(name)
}
Will get this output:
Optional("b")
nil
Optional("x")
And that neither of these work at all.
You can use filter with conditional and then print the result like this:
let nameNotNil = names.filter{$0 != nil} //filter all values diferent nil
print(nameNotNil) // can be nil if any didnt has strings