Limit text to a certain number of words in Swift

Limit text to a certain number of words in Swift - swift

In a mobile App I use an API that can only handle about 300 words. How can I trimm a string in Swift so that it doesn't contain more words?
The native .trimmingCharacters(in: CharacterSet) does not seem to be able to do this as it is intended to trimm certain characters.

There is no off-the shelf way to limit the number of words in a string.
If you look at this post, it documents using the method enumerateSubstrings(in: Range) and setting an option of .byWords. It looks like it returns an array of Range values.
You could use that to create an extension on String that would return the first X words of that string:
extension String {
func firstXWords(_ wordCount: Int) -> Substring {
var ranges: [Range<String.Index>] = []
self.enumerateSubstrings(in: self.startIndex..., options: .byWords) { _, range, _, _ in
ranges.append(range)
}
if ranges.count > wordCount - 1 {
return self[self.startIndex..<ranges[wordCount - 1].upperBound]
} else {
return self[self.startIndex..<self.endIndex]
}
}
}
If we then run the code:
let sentence = "I want to an algorithm that could help find out how many words are there in a string separated by space or comma or some character. And then append each word separated by a character to an array which could be added up later I'm making an average calculator so I want the total count of data and then add up all the words. By words I mean the numbers separated by a character, preferably space Thanks in advance"
print(sentence.firstXWords(10))
The output is:
I want to an algorithm that could help find out
Using enumerateSubstrings(in: Range) is going to give much better results than splitting your string using spaces, since there are a lot more separators than just spaces in normal text (newlines, commas, colons, em spaces, etc.) It will also work for languages like Japanese and Chinese that often don't have spaces between words.
You might be able to rewrite the function to terminate the enumeration of the string as soon as it reaches the desired number of words. If you want a small percentage of the words in a very long string that would make it significantly faster (the code above should have O(n) performance, although I haven't dug deeply enough to be sure of that. I also couldn't figure out how to terminate the enumerateSubstrings() function early, although I didn't try that hard.)
Leo Dabus provided an improved version of my function. It extends StringProtocol rather than String, which means it can work on substrings. Plus, it stops once it hits your desired word count, so it will be much faster for finding the first few words of very long strings:
extension StringProtocol {
func firstXWords(_ n: Int) -> SubSequence {
var endIndex = self.endIndex
var words = 0
enumerateSubstrings(in: startIndex..., options: .byWords) { _, range, _, stop in
words += 1
if words == n {
stop = true
endIndex = range.upperBound
}
}
return self[..<endIndex] }
}

Related

Is there a faster method to find words beginning with string inside a string?

I have a field called keywords on Core Data that stores keywords separated by spaces, like:
car nascar race daytona crash
I have a list populated by core data. Every element on that list has keywords.
I have a search field on that view.
I want that list to be filtered as the user types.
If the user types c the app will check elements that have keywords beginning with c. In that case, the element mentioned above will be shown because it has car and crash, both beginning with c.
In order to check that, I created this extension
extension String {
func containsWordStartingWith(insensitive searchWord: String) -> Bool {
let lowercaseSelf = self.lowercased().trimmingCharacters(in: .whitespaces)
let lowercaseSearch = searchWord.lowercased().trimmingCharacters(in: .whitespaces)
let array = lowercaseSelf.components(separatedBy: " ")
return array.contains(where: {$0.hasPrefix(lowercaseSearch)})
}
}
This works but is slow as hell and typing characters on the search bar makes the app stall.
How can I improve that with something faster?

First thing I would do is split the single keywords string into a Set of actual keywords. If possible you should even store it in Core Data that way, so there's no need for a split step.
let keywords = "car nascar race daytona crash"
let keywordSet = Set(keywords.split(separator: " "))
Now the utility method you want is trivial and fast:
func keywordSet(_ set : Set<Substring>, containsWordStartingWith s: Substring) -> Bool {
for keyword in set {
if keyword.hasPrefix(s) { return true }
}
return false
}
Testing:
keywordSet(keywordSet, containsWordStartingWith:"c")

Swift string vs [Character] and performance

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}

First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Is there a built-in Swift function to pad Strings at the beginning?

The String function padding(toLength:withPad:startingAt:) will pad strings by adding padding characters on the end to "fill out" the string to the desired length.
Is there an equivalent function that will pad strings by prepending padding characters at the beginning?
This would be useful if you want to right-justify a substring in a monospaced output string, for example.
I could certainly write one, but I would expect there to be a built-in function, seeing as how there is already a function that pads at the end.

You can do this by reversing the string, padding at the end, end then reversing again…
let string = "abc"
// Pad at end
string.padding(toLength: 7, withPad: "X", startingAt: 0)
// "abcXXXX"
// Pad at start
String(String(string.reversed()).padding(toLength: 7, withPad: "X", startingAt: 0).reversed())
// "XXXXabc"

Since Swift string manipulations are a rehash of the old NSString class, I suppose Apple never bothered to complete the feature set and just gave us toll free bridging as mana from the gods.
Or, since Objective-C never shied away from super verbose yet cryptic code, they expect us to use the native function twice :
let a = "42"
"".padding(toLength:10, withPad:a.padding(toLength:10, withPad:"0", startingAt:0), startingAt:a.characters.count)
// 0000000042
.
[EDIT] Objective-C ranting aside, the solution is a bit more subtle than that and adding some more useful padding methods to the String type is probably going to make things easier to use and maintain:
For example:
extension String
{
func padding(leftTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:count % paddedLength)
}
func padding(rightTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
return self.padding(toLength:paddedLength, withPad:pad, startingAt:padStart)
}
func padding(sidesTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:(paddedLength+count)/2 % paddedLength)
}
}

Guard Statement Parameter Error [duplicate]

How do you get the length of a String? For example, I have a variable defined like:
var test1: String = "Scott"
However, I can't seem to find a length method on the string.

As of Swift 4+
It's just:
test1.count
for reasons.
(Thanks to Martin R)
As of Swift 2:
With Swift 2, Apple has changed global functions to protocol extensions, extensions that match any type conforming to a protocol. Thus the new syntax is:
test1.characters.count
(Thanks to JohnDifool for the heads up)
As of Swift 1
Use the count characters method:
let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
println("unusualMenagerie has \(count(unusualMenagerie)) characters")
// prints "unusualMenagerie has 40 characters"
right from the Apple Swift Guide
(note, for versions of Swift earlier than 1.2, this would be countElements(unusualMenagerie) instead)
for your variable, it would be
length = count(test1) // was countElements in earlier versions of Swift
Or you can use test1.utf16count

TLDR:
For Swift 2.0 and 3.0, use test1.characters.count. But, there are a few things you should know. So, read on.
Counting characters in Swift
Before Swift 2.0, count was a global function. As of Swift 2.0, it can be called as a member function.
test1.characters.count
It will return the actual number of Unicode characters in a String, so it's the most correct alternative in the sense that, if you'd print the string and count characters by hand, you'd get the same result.
However, because of the way Strings are implemented in Swift, characters don't always take up the same amount of memory, so be aware that this behaves quite differently than the usual character count methods in other languages.
For example, you can also use test1.utf16.count
But, as noted below, the returned value is not guaranteed to be the same as that of calling count on characters.
From the language reference:
Extended grapheme clusters can be composed of one or more Unicode
scalars. This means that different characters—and different
representations of the same character—can require different amounts of
memory to store. Because of this, characters in Swift do not each take
up the same amount of memory within a string’s representation. As a
result, the number of characters in a string cannot be calculated
without iterating through the string to determine its extended
grapheme cluster boundaries. If you are working with particularly long
string values, be aware that the characters property must iterate over
the Unicode scalars in the entire string in order to determine the
characters for that string.
The count of the characters returned by the characters property is not
always the same as the length property of an NSString that contains
the same characters. The length of an NSString is based on the number
of 16-bit code units within the string’s UTF-16 representation and not
the number of Unicode extended grapheme clusters within the string.
An example that perfectly illustrates the situation described above is that of checking the length of a string containing a single emoji character, as pointed out by n00neimp0rtant in the comments.
var emoji = "👍"
emoji.characters.count //returns 1
emoji.utf16.count //returns 2

Swift 1.2 Update: There's no longer a countElements for counting the size of collections. Just use the count function as a replacement: count("Swift")
Swift 2.0, 3.0 and 3.1:
let strLength = string.characters.count
Swift 4.2 (4.0 onwards): [Apple Documentation - Strings]
let strLength = string.count

Swift 1.1
extension String {
var length: Int { return countElements(self) } //
}
Swift 1.2
extension String {
var length: Int { return count(self) } //
}
Swift 2.0
extension String {
var length: Int { return characters.count } //
}
Swift 4.2
extension String {
var length: Int { return self.count }
}
let str = "Hello"
let count = str.length // returns 5 (Int)

Swift 4
"string".count
;)
Swift 3
extension String {
var length: Int {
return self.characters.count
}
}
usage
"string".length

If you are just trying to see if a string is empty or not (checking for length of 0), Swift offers a simple boolean test method on String
myString.isEmpty
The other side of this coin was people asking in ObjectiveC how to ask if a string was empty where the answer was to check for a length of 0:
NSString is empty

Swift 5.1, 5
let flag = "🇵🇷"
print(flag.count)
// Prints "1" -- Counts the characters and emoji as length 1
print(flag.unicodeScalars.count)
// Prints "2" -- Counts the unicode lenght ex. "A" is 65
print(flag.utf16.count)
// Prints "4"
print(flag.utf8.count)
// Prints "8"

tl;dr If you want the length of a String type in terms of the number of human-readable characters, use countElements(). If you want to know the length in terms of the number of extended grapheme clusters, use endIndex. Read on for details.
The String type is implemented as an ordered collection (i.e., sequence) of Unicode characters, and it conforms to the CollectionType protocol, which conforms to the _CollectionType protocol, which is the input type expected by countElements(). Therefore, countElements() can be called, passing a String type, and it will return the count of characters.
However, in conforming to CollectionType, which in turn conforms to _CollectionType, String also implements the startIndex and endIndex computed properties, which actually represent the position of the index before the first character cluster, and position of the index after the last character cluster, respectively. So, in the string "ABC", the position of the index before A is 0 and after C is 3. Therefore, endIndex = 3, which is also the length of the string.
So, endIndex can be used to get the length of any String type, then, right?
Well, not always...Unicode characters are actually extended grapheme clusters, which are sequences of one or more Unicode scalars combined to create a single human-readable character.
let circledStar: Character = "\u{2606}\u{20DD}" // ☆⃝
circledStar is a single character made up of U+2606 (a white star), and U+20DD (a combining enclosing circle). Let's create a String from circledStar and compare the results of countElements() and endIndex.
let circledStarString = "\(circledStar)"
countElements(circledStarString) // 1
circledStarString.endIndex // 2

In Swift 2.0 count doesn't work anymore. You can use this instead:
var testString = "Scott"
var length = testString.characters.count

Here's something shorter, and more natural than using a global function:
aString.utf16count
I don't know if it's available in beta 1, though. But it's definitely there in beta 2.

Updated for Xcode 6 beta 4, change method utf16count --> utf16Count
var test1: String = "Scott"
var length = test1.utf16Count
Or
var test1: String = "Scott"
var length = test1.lengthOfBytesUsingEncoding(NSUTF16StringEncoding)

As of Swift 1.2 utf16Count has been removed. You should now use the global count() function and pass the UTF16 view of the string. Example below...
let string = "Some string"
count(string.utf16)

For Xcode 7.3 and Swift 2.2.
let str = "🐶"
If you want the number of visual characters:
str.characters.count
If you want the "16-bit code units within the string’s UTF-16 representation":
str.utf16.count
Most of the time, 1 is what you need.
When would you need 2? I've found a use case for 2:
let regex = try! NSRegularExpression(pattern:"🐶",
options: NSRegularExpressionOptions.UseUnixLineSeparators)
let str = "🐶🐶🐶🐶🐶🐶"
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.utf16.count), withTemplate: "dog")
print(result) // dogdogdogdogdogdog
If you use 1, the result is incorrect:
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.characters.count), withTemplate: "dog")
print(result) // dogdogdog🐶🐶🐶

You could try like this
var test1: String = "Scott"
var length = test1.bridgeToObjectiveC().length

in Swift 2.x the following is how to find the length of a string
let findLength = "This is a string of text"
findLength.characters.count
returns 24

Swift 2.0:
Get a count: yourString.text.characters.count
Fun example of how this is useful would be to show a character countdown from some number (150 for example) in a UITextView:
func textViewDidChange(textView: UITextView) {
yourStringLabel.text = String(150 - yourStringTextView.text.characters.count)
}

In swift4 I have always used string.count till today I have found that
string.endIndex.encodedOffset
is the better substitution because it is faster - for 50 000 characters string is about 6 time faster than .count. The .count depends on the string length but .endIndex.encodedOffset doesn't.
But there is one NO. It is not good for strings with emojis, it will give wrong result, so only .count is correct.

In Swift 4 :
If the string does not contain unicode characters then use the following
let str : String = "abcd"
let count = str.count // output 4
If the string contains unicode chars then use the following :
let spain = "España"
let count1 = spain.count // output 6
let count2 = spain.utf8.count // output 7

In Xcode 6.1.1
extension String {
var length : Int { return self.utf16Count }
}
I think that brainiacs will change this on every minor version.

Get string value from your textview or textfield:
let textlengthstring = (yourtextview?.text)! as String
Find the count of the characters in the string:
let numberOfChars = textlength.characters.count

Here is what I ended up doing
let replacementTextAsDecimal = Double(string)
if string.characters.count > 0 &&
replacementTextAsDecimal == nil &&
replacementTextHasDecimalSeparator == nil {
return false
}

Swift 4 update comparing with swift 3
Swift 4 removes the need for a characters array on String. This means that you can directly call count on a string without getting characters array first.
"hello".count // 5
Whereas in swift 3, you will have to get characters array and then count element in that array. Note that this following method is still available in swift 4.0 as you can still call characters to access characters array of the given string
"hello".characters.count // 5
Swift 4.0 also adopts Unicode 9 and it can now interprets grapheme clusters. For example, counting on an emoji will give you 1 while in swift 3.0, you may get counts greater than 1.
"👍🏽".count // Swift 4.0 prints 1, Swift 3.0 prints 2
"👨‍❤️‍💋‍👨".count // Swift 4.0 prints 1, Swift 3.0 prints 4

Swift 4
let str = "Your name"
str.count
Remember: Space is also counted in the number

You can get the length simply by writing an extension:
extension String {
// MARK: Use if it's Swift 2
func stringLength(str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 3
func stringLength(_ str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 4
func stringLength(_ str: String) -> Int {
return str.count
}
}

Best way to count String in Swift is this:
var str = "Hello World"
var length = count(str.utf16)

String and NSString are toll free bridge so you can use all methods available to NSString with swift String
let x = "test" as NSString
let y : NSString = "string 2"
let lenx = x.count
let leny = y.count

test1.characters.count
will get you the number of letters/numbers etc in your string.
ex:
test1 = "StackOverflow"
print(test1.characters.count)
(prints "13")

Apple made it different from other major language. The current way is to call:
test1.characters.count
However, to be careful, when you say length you mean the count of characters not the count of bytes, because those two can be different when you use non-ascii characters.
For example;
"你好啊hi".characters.count will give you 5 but this is not the count of the bytes.
To get the real count of bytes, you need to do "你好啊hi".lengthOfBytes(using: String.Encoding.utf8). This will give you 11.

Right now (in Swift 2.3) if you use:
myString.characters.count
the method will return a "Distance" type, if you need the method to return an Integer you should type cast like so:
var count = myString.characters.count as Int

my two cents for swift 3/4
If You need to conditionally compile
#if swift(>=4.0)
let len = text.count
#else
let len = text.characters.count
#endif

Swift Looping Over a List

I'm trying to find if a string is in a word list read from a file. This is what I have so far. The content?[index] does seem to work. But the loop/optional stuff is causing things to not work.
Also, there is an efficiency question. Is it maybe better to put a list into a dictionary and have keys as say the first letter or something? Then try to see if that object exists with the same key instead of looping through the whole list each time.
let testString = "Hello"
let path = NSBundle.mainBundle().pathForResource("wordlist", ofType: "txt")
var content = String.stringWithContentsOfFile(path, encoding: NSUTF8StringEncoding, error: nil)?.componentsSeparatedByString("\n")
let count = content?.count
for word in 0..<count
{
if testString == content?[word]{
// fount word}
}
It complains about count being an int? instead of an int. Thanks for suggestions on how to work this best.

I think the problem is here:
let count = content?.count
which is an optional (Int?). The solution would be to unwrap it with a conditional:
if let count = content?.count {
for word in 0..<count
{
if testString == content?[word] {
// fount word}
}
}
}
As for the algorithm, it depends from the usage. If you do one search only, then the current implementation is good, which is an O(n).
In case of multiple searches, I would use this algorithm:
sort all keys
sort all words
then loop through both
compare key with word:
if equal, 1 word is found, advance key and continue the loop
if less, advance word and continue
if greater, advance key and continue
loop ends when either no other key or no other word is available.
Not sure, but complexity should be O(N), plus the cost of sorting the 2 lists.
Addendum A better way to implement that loop is:
if let content = content {
for word in 0 ..< content.count
{
if testString == content[word] {
// fount word}
}
}
}
Unwrap once and use anywhere (but within the block).
Addendum 2 A better algorithm is the following:
Store all keys in a hashset. Loop through all words, check if the word is in the set, and if yes add to the list of the found words. Much simpler.
If the number of words is less than the number of keys, I would invert that, by populating the hashset from the list of words and looping through the keys.
The complexity of this algorithm should be at most O(2n), where n is the max between the number of keys and the number of words.

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse