From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?
The String function padding(toLength:withPad:startingAt:) will pad strings by adding padding characters on the end to "fill out" the string to the desired length.
Is there an equivalent function that will pad strings by prepending padding characters at the beginning?
This would be useful if you want to right-justify a substring in a monospaced output string, for example.
I could certainly write one, but I would expect there to be a built-in function, seeing as how there is already a function that pads at the end.
You can do this by reversing the string, padding at the end, end then reversing again…
let string = "abc"
// Pad at end
string.padding(toLength: 7, withPad: "X", startingAt: 0)
// "abcXXXX"
// Pad at start
String(String(string.reversed()).padding(toLength: 7, withPad: "X", startingAt: 0).reversed())
// "XXXXabc"
Since Swift string manipulations are a rehash of the old NSString class, I suppose Apple never bothered to complete the feature set and just gave us toll free bridging as mana from the gods.
Or, since Objective-C never shied away from super verbose yet cryptic code, they expect us to use the native function twice :
let a = "42"
"".padding(toLength:10, withPad:a.padding(toLength:10, withPad:"0", startingAt:0), startingAt:a.characters.count)
// 0000000042
.
[EDIT] Objective-C ranting aside, the solution is a bit more subtle than that and adding some more useful padding methods to the String type is probably going to make things easier to use and maintain:
For example:
extension String
{
func padding(leftTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:count % paddedLength)
}
func padding(rightTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
return self.padding(toLength:paddedLength, withPad:pad, startingAt:padStart)
}
func padding(sidesTo paddedLength:Int, withPad pad:String=" ", startingAt padStart:Int=0) -> String
{
let rightPadded = self.padding(toLength:max(count,paddedLength), withPad:pad, startingAt:padStart)
return "".padding(toLength:paddedLength, withPad:rightPadded, startingAt:(paddedLength+count)/2 % paddedLength)
}
}
How do you get the length of a String? For example, I have a variable defined like:
var test1: String = "Scott"
However, I can't seem to find a length method on the string.
As of Swift 4+
It's just:
test1.count
for reasons.
(Thanks to Martin R)
As of Swift 2:
With Swift 2, Apple has changed global functions to protocol extensions, extensions that match any type conforming to a protocol. Thus the new syntax is:
test1.characters.count
(Thanks to JohnDifool for the heads up)
As of Swift 1
Use the count characters method:
let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
println("unusualMenagerie has \(count(unusualMenagerie)) characters")
// prints "unusualMenagerie has 40 characters"
right from the Apple Swift Guide
(note, for versions of Swift earlier than 1.2, this would be countElements(unusualMenagerie) instead)
for your variable, it would be
length = count(test1) // was countElements in earlier versions of Swift
Or you can use test1.utf16count
TLDR:
For Swift 2.0 and 3.0, use test1.characters.count. But, there are a few things you should know. So, read on.
Counting characters in Swift
Before Swift 2.0, count was a global function. As of Swift 2.0, it can be called as a member function.
test1.characters.count
It will return the actual number of Unicode characters in a String, so it's the most correct alternative in the sense that, if you'd print the string and count characters by hand, you'd get the same result.
However, because of the way Strings are implemented in Swift, characters don't always take up the same amount of memory, so be aware that this behaves quite differently than the usual character count methods in other languages.
For example, you can also use test1.utf16.count
But, as noted below, the returned value is not guaranteed to be the same as that of calling count on characters.
From the language reference:
Extended grapheme clusters can be composed of one or more Unicode
scalars. This means that different characters—and different
representations of the same character—can require different amounts of
memory to store. Because of this, characters in Swift do not each take
up the same amount of memory within a string’s representation. As a
result, the number of characters in a string cannot be calculated
without iterating through the string to determine its extended
grapheme cluster boundaries. If you are working with particularly long
string values, be aware that the characters property must iterate over
the Unicode scalars in the entire string in order to determine the
characters for that string.
The count of the characters returned by the characters property is not
always the same as the length property of an NSString that contains
the same characters. The length of an NSString is based on the number
of 16-bit code units within the string’s UTF-16 representation and not
the number of Unicode extended grapheme clusters within the string.
An example that perfectly illustrates the situation described above is that of checking the length of a string containing a single emoji character, as pointed out by n00neimp0rtant in the comments.
var emoji = "👍"
emoji.characters.count //returns 1
emoji.utf16.count //returns 2
Swift 1.2 Update: There's no longer a countElements for counting the size of collections. Just use the count function as a replacement: count("Swift")
Swift 2.0, 3.0 and 3.1:
let strLength = string.characters.count
Swift 4.2 (4.0 onwards): [Apple Documentation - Strings]
let strLength = string.count
Swift 1.1
extension String {
var length: Int { return countElements(self) } //
}
Swift 1.2
extension String {
var length: Int { return count(self) } //
}
Swift 2.0
extension String {
var length: Int { return characters.count } //
}
Swift 4.2
extension String {
var length: Int { return self.count }
}
let str = "Hello"
let count = str.length // returns 5 (Int)
Swift 4
"string".count
;)
Swift 3
extension String {
var length: Int {
return self.characters.count
}
}
usage
"string".length
If you are just trying to see if a string is empty or not (checking for length of 0), Swift offers a simple boolean test method on String
myString.isEmpty
The other side of this coin was people asking in ObjectiveC how to ask if a string was empty where the answer was to check for a length of 0:
NSString is empty
Swift 5.1, 5
let flag = "🇵🇷"
print(flag.count)
// Prints "1" -- Counts the characters and emoji as length 1
print(flag.unicodeScalars.count)
// Prints "2" -- Counts the unicode lenght ex. "A" is 65
print(flag.utf16.count)
// Prints "4"
print(flag.utf8.count)
// Prints "8"
tl;dr If you want the length of a String type in terms of the number of human-readable characters, use countElements(). If you want to know the length in terms of the number of extended grapheme clusters, use endIndex. Read on for details.
The String type is implemented as an ordered collection (i.e., sequence) of Unicode characters, and it conforms to the CollectionType protocol, which conforms to the _CollectionType protocol, which is the input type expected by countElements(). Therefore, countElements() can be called, passing a String type, and it will return the count of characters.
However, in conforming to CollectionType, which in turn conforms to _CollectionType, String also implements the startIndex and endIndex computed properties, which actually represent the position of the index before the first character cluster, and position of the index after the last character cluster, respectively. So, in the string "ABC", the position of the index before A is 0 and after C is 3. Therefore, endIndex = 3, which is also the length of the string.
So, endIndex can be used to get the length of any String type, then, right?
Well, not always...Unicode characters are actually extended grapheme clusters, which are sequences of one or more Unicode scalars combined to create a single human-readable character.
let circledStar: Character = "\u{2606}\u{20DD}" // ☆⃝
circledStar is a single character made up of U+2606 (a white star), and U+20DD (a combining enclosing circle). Let's create a String from circledStar and compare the results of countElements() and endIndex.
let circledStarString = "\(circledStar)"
countElements(circledStarString) // 1
circledStarString.endIndex // 2
In Swift 2.0 count doesn't work anymore. You can use this instead:
var testString = "Scott"
var length = testString.characters.count
Here's something shorter, and more natural than using a global function:
aString.utf16count
I don't know if it's available in beta 1, though. But it's definitely there in beta 2.
Updated for Xcode 6 beta 4, change method utf16count --> utf16Count
var test1: String = "Scott"
var length = test1.utf16Count
Or
var test1: String = "Scott"
var length = test1.lengthOfBytesUsingEncoding(NSUTF16StringEncoding)
As of Swift 1.2 utf16Count has been removed. You should now use the global count() function and pass the UTF16 view of the string. Example below...
let string = "Some string"
count(string.utf16)
For Xcode 7.3 and Swift 2.2.
let str = "🐶"
If you want the number of visual characters:
str.characters.count
If you want the "16-bit code units within the string’s UTF-16 representation":
str.utf16.count
Most of the time, 1 is what you need.
When would you need 2? I've found a use case for 2:
let regex = try! NSRegularExpression(pattern:"🐶",
options: NSRegularExpressionOptions.UseUnixLineSeparators)
let str = "🐶🐶🐶🐶🐶🐶"
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.utf16.count), withTemplate: "dog")
print(result) // dogdogdogdogdogdog
If you use 1, the result is incorrect:
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.characters.count), withTemplate: "dog")
print(result) // dogdogdog🐶🐶🐶
You could try like this
var test1: String = "Scott"
var length = test1.bridgeToObjectiveC().length
in Swift 2.x the following is how to find the length of a string
let findLength = "This is a string of text"
findLength.characters.count
returns 24
Swift 2.0:
Get a count: yourString.text.characters.count
Fun example of how this is useful would be to show a character countdown from some number (150 for example) in a UITextView:
func textViewDidChange(textView: UITextView) {
yourStringLabel.text = String(150 - yourStringTextView.text.characters.count)
}
In swift4 I have always used string.count till today I have found that
string.endIndex.encodedOffset
is the better substitution because it is faster - for 50 000 characters string is about 6 time faster than .count. The .count depends on the string length but .endIndex.encodedOffset doesn't.
But there is one NO. It is not good for strings with emojis, it will give wrong result, so only .count is correct.
In Swift 4 :
If the string does not contain unicode characters then use the following
let str : String = "abcd"
let count = str.count // output 4
If the string contains unicode chars then use the following :
let spain = "España"
let count1 = spain.count // output 6
let count2 = spain.utf8.count // output 7
In Xcode 6.1.1
extension String {
var length : Int { return self.utf16Count }
}
I think that brainiacs will change this on every minor version.
Get string value from your textview or textfield:
let textlengthstring = (yourtextview?.text)! as String
Find the count of the characters in the string:
let numberOfChars = textlength.characters.count
Here is what I ended up doing
let replacementTextAsDecimal = Double(string)
if string.characters.count > 0 &&
replacementTextAsDecimal == nil &&
replacementTextHasDecimalSeparator == nil {
return false
}
Swift 4 update comparing with swift 3
Swift 4 removes the need for a characters array on String. This means that you can directly call count on a string without getting characters array first.
"hello".count // 5
Whereas in swift 3, you will have to get characters array and then count element in that array. Note that this following method is still available in swift 4.0 as you can still call characters to access characters array of the given string
"hello".characters.count // 5
Swift 4.0 also adopts Unicode 9 and it can now interprets grapheme clusters. For example, counting on an emoji will give you 1 while in swift 3.0, you may get counts greater than 1.
"👍🏽".count // Swift 4.0 prints 1, Swift 3.0 prints 2
"👨❤️💋👨".count // Swift 4.0 prints 1, Swift 3.0 prints 4
Swift 4
let str = "Your name"
str.count
Remember: Space is also counted in the number
You can get the length simply by writing an extension:
extension String {
// MARK: Use if it's Swift 2
func stringLength(str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 3
func stringLength(_ str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 4
func stringLength(_ str: String) -> Int {
return str.count
}
}
Best way to count String in Swift is this:
var str = "Hello World"
var length = count(str.utf16)
String and NSString are toll free bridge so you can use all methods available to NSString with swift String
let x = "test" as NSString
let y : NSString = "string 2"
let lenx = x.count
let leny = y.count
test1.characters.count
will get you the number of letters/numbers etc in your string.
ex:
test1 = "StackOverflow"
print(test1.characters.count)
(prints "13")
Apple made it different from other major language. The current way is to call:
test1.characters.count
However, to be careful, when you say length you mean the count of characters not the count of bytes, because those two can be different when you use non-ascii characters.
For example;
"你好啊hi".characters.count will give you 5 but this is not the count of the bytes.
To get the real count of bytes, you need to do "你好啊hi".lengthOfBytes(using: String.Encoding.utf8). This will give you 11.
Right now (in Swift 2.3) if you use:
myString.characters.count
the method will return a "Distance" type, if you need the method to return an Integer you should type cast like so:
var count = myString.characters.count as Int
my two cents for swift 3/4
If You need to conditionally compile
#if swift(>=4.0)
let len = text.count
#else
let len = text.characters.count
#endif
I'm trying to find if a string is in a word list read from a file. This is what I have so far. The content?[index] does seem to work. But the loop/optional stuff is causing things to not work.
Also, there is an efficiency question. Is it maybe better to put a list into a dictionary and have keys as say the first letter or something? Then try to see if that object exists with the same key instead of looping through the whole list each time.
let testString = "Hello"
let path = NSBundle.mainBundle().pathForResource("wordlist", ofType: "txt")
var content = String.stringWithContentsOfFile(path, encoding: NSUTF8StringEncoding, error: nil)?.componentsSeparatedByString("\n")
let count = content?.count
for word in 0..<count
{
if testString == content?[word]{
// fount word}
}
It complains about count being an int? instead of an int. Thanks for suggestions on how to work this best.
I think the problem is here:
let count = content?.count
which is an optional (Int?). The solution would be to unwrap it with a conditional:
if let count = content?.count {
for word in 0..<count
{
if testString == content?[word] {
// fount word}
}
}
}
As for the algorithm, it depends from the usage. If you do one search only, then the current implementation is good, which is an O(n).
In case of multiple searches, I would use this algorithm:
sort all keys
sort all words
then loop through both
compare key with word:
if equal, 1 word is found, advance key and continue the loop
if less, advance word and continue
if greater, advance key and continue
loop ends when either no other key or no other word is available.
Not sure, but complexity should be O(N), plus the cost of sorting the 2 lists.
Addendum A better way to implement that loop is:
if let content = content {
for word in 0 ..< content.count
{
if testString == content[word] {
// fount word}
}
}
}
Unwrap once and use anywhere (but within the block).
Addendum 2 A better algorithm is the following:
Store all keys in a hashset. Loop through all words, check if the word is in the set, and if yes add to the list of the found words. Much simpler.
If the number of words is less than the number of keys, I would invert that, by populating the hashset from the list of words and looping through the keys.
The complexity of this algorithm should be at most O(2n), where n is the max between the number of keys and the number of words.