Validate Unicode code point in Swift - swift

I'm writing a routine in Swift that needs to try to convert an arbitrary integer to a UnicodeScalar or return an error. The constructor UnicodeScalar(_:Int) does the job for valid code points, but it crashes when passed integers that are not valid code points.
Is there a Swift (or Foundation) function I can call to pre-flight that an integer i is a valid Unicode code point and won't cause UnicodeScalar(i) to crash?

Update for Swift 3:
UnicodeScalar has a failable initializer now which verifies if the
given number is a valid Unicode code point or not:
if let unicode = UnicodeScalar(0xD800) {
print(unicode)
} else {
print("invalid")
}
(Previous answer:) You can use the built-in UTF32() codec to check if a given integer
is a valid Unicode scalar:
extension UnicodeScalar {
init?(code: UInt32) {
var codegen = GeneratorOfOne(code) // As suggested by #rintaro
var utf32 = UTF32()
guard case let .Result(scalar) = utf32.decode(&codegen) else {
return nil
}
self = scalar
}
}
(Using ideas from https://stackoverflow.com/a/24757284/1187415
and https://stackoverflow.com/a/31285671/1187415.)

The Swift documentation states
A Unicode scalar is any Unicode code point in the range U+0000 to
U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
The UnicodeScalar constructor does not crash for all values in those ranges in Swift 2.0b4. I use this convenience constructor:
extension UnicodeScalar {
init?(code: Int) {
guard (code >= 0 && code <= 0xD7FF) || (code >= 0xE000 && code <= 0x10FFFF) else {
return nil
}
self.init(code)
}
}

Related

Swift: String starts(with:) vs hasPrefix

String.hasPrefix (or [NSString hasPrefix]) was always part of Foundation. However, I just noticed that now we also have starts(with:).
This method comes from Sequence but it also works for String.
My question is, which one should I prefer? Are there any performance considerations? I'm used to hasPrefix from Objective-C days, but starts(with:) is more intuitive and works for other sequences.
String.hasPrefix() is implemented in StringLegacy.swift as
extension String {
public func hasPrefix(_ prefix: String) -> Bool {
if _fastPath(self._guts.isNFCFastUTF8 && prefix._guts.isNFCFastUTF8) {
guard prefix._guts.count <= self._guts.count else { return false }
return prefix._guts.withFastUTF8 { nfcPrefix in
let prefixEnd = nfcPrefix.count
return self._guts.withFastUTF8(range: 0..<prefixEnd) { nfcSlicedSelf in
return _binaryCompare(nfcSlicedSelf, nfcPrefix) == 0
}
}
}
return starts(with: prefix)
}
}
which means (if I understand it correctly): If both the string and the prefix candidate use a UTF-8 based storage then the UTF-8 bytes are compared directly. Otherwise it falls back to starts(with:) and does a Character based comparison.
So there is no difference in the result, but hasPrefix() is optimized for native Swift strings.
Note: This is the from the master (Swift 5) branch, the situation might be different in earlier versions.

Counting character frequencies in a Swift string

I'm trying to convert Java code to Swift and facing the issue:
single-quoted string literal found, use '"' charArray[(s[i].asciiValue)! - ('a'.asciiValue)!]++ ^~~ "a"
Java Code:
for(String s: str){
char arr[] = new char[26]
for(int i =0;i< s.length(); i++){
arr[s.charAt(i) -'a']++;
}
}
Swift Code:
extension String {
var asciiArray: [UInt32] {
return unicodeScalars.filter{$0.isASCII}.map{$0.value}
}
}
extension Character {
var asciiValue: UInt32? {
return String(self).unicodeScalars.filter{$0.isASCII}.first?.value
}
}
class GroupXXX {
func groupXXX(strList: [String]) {
for str in strList {
var charArray = [Character?](repeating: nil, count: 26)
var s = str.characters.map { $0 }
for i in 0..<s.count {
charArray[(s[i].asciiValue)! - ('a'.asciiValue)!]++
}
}
}
}
There are several problems in your Swift code:
There are no single-quoted character literals in Swift (as already explained
by JeremyP).
The ++ operator has been removed in Swift 3.
s[i] does not compile because Swift strings are not indexed by
integers.
Defining the array as [Character?] makes no sense, and you cannot
increment a Character?. The Swift equivalent of the Java char
would be UInt16.
You don't check if the character is in the range "a"..."z".
Apparently you want to count the number of occurrences of
each character "a" to "z" in a string.
This is how I would do it in Swift:
Define the "frequency" array as an array of integers.
Enumerate the unicodeScalars property of the string.
Use a switch statement to check for the valid range of characters.
Then the custom extension are not needed anymore and the code becomes
var frequencies = [Int](repeating: 0, count: 26)
for c in str.unicodeScalars {
switch c {
case "a"..."z":
frequencies[Int(c.value - UnicodeScalar("a").value)] += 1
default:
break // ignore all other characters
}
}
charArray[(s[i].asciiValue)! - ('a'.asciiValue)!]++
As the error says, use double quotes, Swift doesn't have a syntax that differentiates between characters and strings (characters themselves may be sequences of bytes in swift).
You may need to explicitly force the character to be a Character if the compiler can't differentiate.
charArray[(s[i].asciiValue)! - (Character("a").asciiValue)!]++

Why is a non-optional value printed as optional?

I've read Non-optional shown as optional on print but that doesn't help my question.
I'm returning a Integer, but once it's printing it's being printed as optional. Why?
I'm trying to solve a code challenge. The goal is to:
Write an extension for collections of integers that returns the number
of times a specific digit appears in any of its numbers.
Here is my implementation:
extension Collection where Iterator.Element == Int {
func challenge37(count character : Character) -> Int?{
guard nil != Int(String(character)) else{
print("character wasn't an integer")
return nil
}
var counts : [Int] = []
for item in self{
var counter = 0
let stringInt = String(describing: item)
for currentCharacter in stringInt.characters{
if character == currentCharacter{
counter += 1
}
}
counts.append(counter)
}
guard let min = counts.min() else{
print("no min")
return nil
}
return min
}
}
As you can see here I'm printing it:
print([5,15,512,522].challenge37(count: "5")) // Optional(1)
Inside the function your returning an Int. However the actual signature of your method is Int? meaning it is in fact an optional and you got it wrong!
Basically your method signature is correct. But when you call the function you're getting an optional as the response and must unwrap it.
print([5,15,512,522].challenge37(count: "5")!) // 1
Additionally had you paid close attention you would have noticed that Xcode must gave you a warning (and solutions to solve it)
Expression implicitly coerced from Int? to Any
Xcode gave you the warning because it found out that you're attempting to print an optional and knows that's usually unwanted. Obviously its solution is to unwrap it either through force unwrap or defaulting.

Idiomatic way to unwrap an integer string input

Sorry if this is a basic question, but I am learning Swift and I don’t understand how to unwrap inputs from readLine().
For example, I would expect this
let n: Int = Int(readLine(strippingNewline: true) ?? -1)
to work, but it doesn’t. Nor does replacing the -1 with a "-1" to match types.
let n: Int = Int(readLine(strippingNewline: true) ?? "-1")
What is the “right” way to do this then? Can someone explain exactly what Swift is doing when it unwraps optionals and uses them as arguments for a constructor like Int?
The whole concept of optionals is a bit foreign to me (a Python programmer); in Python you handle invalid input the “ghetto way”, putting out fires only after they happen:
try:
n = int(input())
except ValueError:
n = None
but I assume the paradigm in Swift is different.
There are two optionals at play here.
First, readLine(strippingNewline: true) is optional. It can return nil if there's no input recieved prior to the End of File (EOF) character being received. It must be unwrapped before being passed into Int()
Secondly, Int() is optional, because the String it was given may not be a valid string representation of a number.
Do not use -1 in Swift to represent "no value". This is called a sentinel value, and it's exactly what optionals are invented to prevent. How do you distinguish between a -1 meaning "no/invalid input", and a -1 meaning "the user's input was -1?
Here is how I would write this code:
guard let userInput = readLine(strippingNewline: true) else {
// If we got to here, readLine(strippingNewLine:) returned nil
fatalError("Received EOF before any input was given")
}
// If we got to here, then userInput is not nil
if let n = Int(userInput) {
// If we got to here, then userInput contained a valid
// String representation of an Int
print("The user entered the Int \(n)")
}
else {
// If we got to here, then userInput did not contain a
// valid String representation of an Int.
print("That is not a valid Int.")
}

Guard Statement Parameter Error [duplicate]

How do you get the length of a String? For example, I have a variable defined like:
var test1: String = "Scott"
However, I can't seem to find a length method on the string.
As of Swift 4+
It's just:
test1.count
for reasons.
(Thanks to Martin R)
As of Swift 2:
With Swift 2, Apple has changed global functions to protocol extensions, extensions that match any type conforming to a protocol. Thus the new syntax is:
test1.characters.count
(Thanks to JohnDifool for the heads up)
As of Swift 1
Use the count characters method:
let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
println("unusualMenagerie has \(count(unusualMenagerie)) characters")
// prints "unusualMenagerie has 40 characters"
right from the Apple Swift Guide
(note, for versions of Swift earlier than 1.2, this would be countElements(unusualMenagerie) instead)
for your variable, it would be
length = count(test1) // was countElements in earlier versions of Swift
Or you can use test1.utf16count
TLDR:
For Swift 2.0 and 3.0, use test1.characters.count. But, there are a few things you should know. So, read on.
Counting characters in Swift
Before Swift 2.0, count was a global function. As of Swift 2.0, it can be called as a member function.
test1.characters.count
It will return the actual number of Unicode characters in a String, so it's the most correct alternative in the sense that, if you'd print the string and count characters by hand, you'd get the same result.
However, because of the way Strings are implemented in Swift, characters don't always take up the same amount of memory, so be aware that this behaves quite differently than the usual character count methods in other languages.
For example, you can also use test1.utf16.count
But, as noted below, the returned value is not guaranteed to be the same as that of calling count on characters.
From the language reference:
Extended grapheme clusters can be composed of one or more Unicode
scalars. This means that different characters—and different
representations of the same character—can require different amounts of
memory to store. Because of this, characters in Swift do not each take
up the same amount of memory within a string’s representation. As a
result, the number of characters in a string cannot be calculated
without iterating through the string to determine its extended
grapheme cluster boundaries. If you are working with particularly long
string values, be aware that the characters property must iterate over
the Unicode scalars in the entire string in order to determine the
characters for that string.
The count of the characters returned by the characters property is not
always the same as the length property of an NSString that contains
the same characters. The length of an NSString is based on the number
of 16-bit code units within the string’s UTF-16 representation and not
the number of Unicode extended grapheme clusters within the string.
An example that perfectly illustrates the situation described above is that of checking the length of a string containing a single emoji character, as pointed out by n00neimp0rtant in the comments.
var emoji = "👍"
emoji.characters.count //returns 1
emoji.utf16.count //returns 2
Swift 1.2 Update: There's no longer a countElements for counting the size of collections. Just use the count function as a replacement: count("Swift")
Swift 2.0, 3.0 and 3.1:
let strLength = string.characters.count
Swift 4.2 (4.0 onwards): [Apple Documentation - Strings]
let strLength = string.count
Swift 1.1
extension String {
var length: Int { return countElements(self) } //
}
Swift 1.2
extension String {
var length: Int { return count(self) } //
}
Swift 2.0
extension String {
var length: Int { return characters.count } //
}
Swift 4.2
extension String {
var length: Int { return self.count }
}
let str = "Hello"
let count = str.length // returns 5 (Int)
Swift 4
"string".count
;)
Swift 3
extension String {
var length: Int {
return self.characters.count
}
}
usage
"string".length
If you are just trying to see if a string is empty or not (checking for length of 0), Swift offers a simple boolean test method on String
myString.isEmpty
The other side of this coin was people asking in ObjectiveC how to ask if a string was empty where the answer was to check for a length of 0:
NSString is empty
Swift 5.1, 5
let flag = "🇵🇷"
print(flag.count)
// Prints "1" -- Counts the characters and emoji as length 1
print(flag.unicodeScalars.count)
// Prints "2" -- Counts the unicode lenght ex. "A" is 65
print(flag.utf16.count)
// Prints "4"
print(flag.utf8.count)
// Prints "8"
tl;dr If you want the length of a String type in terms of the number of human-readable characters, use countElements(). If you want to know the length in terms of the number of extended grapheme clusters, use endIndex. Read on for details.
The String type is implemented as an ordered collection (i.e., sequence) of Unicode characters, and it conforms to the CollectionType protocol, which conforms to the _CollectionType protocol, which is the input type expected by countElements(). Therefore, countElements() can be called, passing a String type, and it will return the count of characters.
However, in conforming to CollectionType, which in turn conforms to _CollectionType, String also implements the startIndex and endIndex computed properties, which actually represent the position of the index before the first character cluster, and position of the index after the last character cluster, respectively. So, in the string "ABC", the position of the index before A is 0 and after C is 3. Therefore, endIndex = 3, which is also the length of the string.
So, endIndex can be used to get the length of any String type, then, right?
Well, not always...Unicode characters are actually extended grapheme clusters, which are sequences of one or more Unicode scalars combined to create a single human-readable character.
let circledStar: Character = "\u{2606}\u{20DD}" // ☆⃝
circledStar is a single character made up of U+2606 (a white star), and U+20DD (a combining enclosing circle). Let's create a String from circledStar and compare the results of countElements() and endIndex.
let circledStarString = "\(circledStar)"
countElements(circledStarString) // 1
circledStarString.endIndex // 2
In Swift 2.0 count doesn't work anymore. You can use this instead:
var testString = "Scott"
var length = testString.characters.count
Here's something shorter, and more natural than using a global function:
aString.utf16count
I don't know if it's available in beta 1, though. But it's definitely there in beta 2.
Updated for Xcode 6 beta 4, change method utf16count --> utf16Count
var test1: String = "Scott"
var length = test1.utf16Count
Or
var test1: String = "Scott"
var length = test1.lengthOfBytesUsingEncoding(NSUTF16StringEncoding)
As of Swift 1.2 utf16Count has been removed. You should now use the global count() function and pass the UTF16 view of the string. Example below...
let string = "Some string"
count(string.utf16)
For Xcode 7.3 and Swift 2.2.
let str = "🐶"
If you want the number of visual characters:
str.characters.count
If you want the "16-bit code units within the string’s UTF-16 representation":
str.utf16.count
Most of the time, 1 is what you need.
When would you need 2? I've found a use case for 2:
let regex = try! NSRegularExpression(pattern:"🐶",
options: NSRegularExpressionOptions.UseUnixLineSeparators)
let str = "🐶🐶🐶🐶🐶🐶"
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.utf16.count), withTemplate: "dog")
print(result) // dogdogdogdogdogdog
If you use 1, the result is incorrect:
let result = regex.stringByReplacingMatchesInString(str,
options: NSMatchingOptions.WithTransparentBounds,
range: NSMakeRange(0, str.characters.count), withTemplate: "dog")
print(result) // dogdogdog🐶🐶🐶
You could try like this
var test1: String = "Scott"
var length = test1.bridgeToObjectiveC().length
in Swift 2.x the following is how to find the length of a string
let findLength = "This is a string of text"
findLength.characters.count
returns 24
Swift 2.0:
Get a count: yourString.text.characters.count
Fun example of how this is useful would be to show a character countdown from some number (150 for example) in a UITextView:
func textViewDidChange(textView: UITextView) {
yourStringLabel.text = String(150 - yourStringTextView.text.characters.count)
}
In swift4 I have always used string.count till today I have found that
string.endIndex.encodedOffset
is the better substitution because it is faster - for 50 000 characters string is about 6 time faster than .count. The .count depends on the string length but .endIndex.encodedOffset doesn't.
But there is one NO. It is not good for strings with emojis, it will give wrong result, so only .count is correct.
In Swift 4 :
If the string does not contain unicode characters then use the following
let str : String = "abcd"
let count = str.count // output 4
If the string contains unicode chars then use the following :
let spain = "España"
let count1 = spain.count // output 6
let count2 = spain.utf8.count // output 7
In Xcode 6.1.1
extension String {
var length : Int { return self.utf16Count }
}
I think that brainiacs will change this on every minor version.
Get string value from your textview or textfield:
let textlengthstring = (yourtextview?.text)! as String
Find the count of the characters in the string:
let numberOfChars = textlength.characters.count
Here is what I ended up doing
let replacementTextAsDecimal = Double(string)
if string.characters.count > 0 &&
replacementTextAsDecimal == nil &&
replacementTextHasDecimalSeparator == nil {
return false
}
Swift 4 update comparing with swift 3
Swift 4 removes the need for a characters array on String. This means that you can directly call count on a string without getting characters array first.
"hello".count // 5
Whereas in swift 3, you will have to get characters array and then count element in that array. Note that this following method is still available in swift 4.0 as you can still call characters to access characters array of the given string
"hello".characters.count // 5
Swift 4.0 also adopts Unicode 9 and it can now interprets grapheme clusters. For example, counting on an emoji will give you 1 while in swift 3.0, you may get counts greater than 1.
"👍🏽".count // Swift 4.0 prints 1, Swift 3.0 prints 2
"👨‍❤️‍💋‍👨".count // Swift 4.0 prints 1, Swift 3.0 prints 4
Swift 4
let str = "Your name"
str.count
Remember: Space is also counted in the number
You can get the length simply by writing an extension:
extension String {
// MARK: Use if it's Swift 2
func stringLength(str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 3
func stringLength(_ str: String) -> Int {
return str.characters.count
}
// MARK: Use if it's Swift 4
func stringLength(_ str: String) -> Int {
return str.count
}
}
Best way to count String in Swift is this:
var str = "Hello World"
var length = count(str.utf16)
String and NSString are toll free bridge so you can use all methods available to NSString with swift String
let x = "test" as NSString
let y : NSString = "string 2"
let lenx = x.count
let leny = y.count
test1.characters.count
will get you the number of letters/numbers etc in your string.
ex:
test1 = "StackOverflow"
print(test1.characters.count)
(prints "13")
Apple made it different from other major language. The current way is to call:
test1.characters.count
However, to be careful, when you say length you mean the count of characters not the count of bytes, because those two can be different when you use non-ascii characters.
For example;
"你好啊hi".characters.count will give you 5 but this is not the count of the bytes.
To get the real count of bytes, you need to do "你好啊hi".lengthOfBytes(using: String.Encoding.utf8). This will give you 11.
Right now (in Swift 2.3) if you use:
myString.characters.count
the method will return a "Distance" type, if you need the method to return an Integer you should type cast like so:
var count = myString.characters.count as Int
my two cents for swift 3/4
If You need to conditionally compile
#if swift(>=4.0)
let len = text.count
#else
let len = text.characters.count
#endif