How to determine the display count of a Swift String?

How to determine the display count of a Swift String? - swift

I've reviewed questions such as Get the length of a String and Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings? but neither cover this specific question.
This all started when trying to apply skin tone modifiers to Emoji characters (see Add skin tone modifier to an emoji programmatically). This led to wondering what happens when you apply a skin tone modifier to a regular character such as "A".
Examples:
let tonedThumbsUp = "👍" + "🏻" // 👍🏻
let tonedA = "A" + "🏾" // A🏾
I'm trying to detect that second case. The count of both of those strings is 1. And the unicodeScalars.count is 2 for both.
How do I determine if the resulting string appears as a single character when displayed? In other words, how can I determine if the skin tone modifier was applied to make a single character or not?
I've tried a few ways to dump information about the string but none give the desired result.
func dumpString(_ str: String) {
print("Raw:", str, str.count)
print("Scalars:", str.unicodeScalars, str.unicodeScalars.count)
print("UTF16:", str.utf16, str.utf16.count)
print("UTF8:", str.utf8, str.utf16.count)
print("Range:", str.startIndex, str.endIndex)
print("First/Last:", str.first == str.last, str.first, str.last)
}
dumpString("A🏽")
dumpString("\u{1f469}\u{1f3fe}")
Results:
Raw: A🏽 1
Scalars: A🏽 2
UTF16: A🏽 3
UTF8: A🏽 3
First/Last: true Optional("A🏽") Optional("A🏽")
Raw: 👩🏾 1
Scalars: 👩🏾 2
UTF16: 👩🏾 4
UTF8: 👩🏾 4
First/Last: true Optional("👩🏾") Optional("👩🏾")

What happens if you print 👍🏻 on a system that doesn't support the Fitzpatrick modifiers? You get 👍 followed by whatever the system uses for an unknown character placeholder.
So I think to answer this, you must consult your system's typesetter. For Apple platforms, you can use Core Text to create a CTLine and then count the line's glyph runs. Example:
import Foundation
import CoreText
func test(_ string: String) {
let richText = NSAttributedString(string: string)
let line = CTLineCreateWithAttributedString(richText as CFAttributedString)
let runs = CTLineGetGlyphRuns(line) as! [CTRun]
print(string, runs.count)
}
test("👍" + "🏻")
test("A" + "🏾")
test("B\u{0300}\u{0301}\u{0302}" + "🏾")
Output from a macOS playground in Xcode 10.2.1 on macOS 10.14.6 Beta (18G48f):
👍🏻 1
A🏾 2
B̀́̂🏾 2

I think it might be possible to reason about this by looking to see whether the modifier is present and if so whether it has increased the character count.
So for example:
let tonedThumbsUp = "👍" + "🏻"
let tonedA = "A" + "🏻"
tonedThumbsUp.count // 1
tonedThumbsUp.unicodeScalars.count // 2
tonedA.count //2
tonedThumbsUp.unicodeScalars.count //2
let c = "\u{1F3FB}"
tonedThumbsUp.contains(c) // true
tonedA.contains(c) // true
Okay, so they both contain a modifier character, and they both contain two unicode scalars, but one is count 1 and the other is count 2. Surely that's a useful distinction.

Related

Detecting Cursor position in UITextView that contains emojis returns the wrong position in swift 4

I'm using this code for detecting the cursor's position in a UITextView :
if let selectedRange = textView.selectedTextRange {
let cursorPosition = textView.offset(from: textView.beginningOfDocument, to: selectedRange.start)
print("\(cursorPosition)")
}
I put this under textViewDidChange func for detecting cursor position each time the text change.
It is working fine but when I putting emojis the textView.text.count is different with the cursor position. from swift 4 each emoji counted as one character but it seems that for cursor position it is different.
so How can I get the exact cursor position that matches the count of characters in a text ?

Long story short: When using Swift with String and NSRange use this extension for Range conversion
extension String {
/// Fixes the problem with `NSRange` to `Range` conversion
var range: NSRange {
let fromIndex = unicodeScalars.index(unicodeScalars.startIndex, offsetBy: 0)
let toIndex = unicodeScalars.index(fromIndex, offsetBy: count)
return NSRange(fromIndex..<toIndex, in: self)
}
}
Let's take a deeper look:
let myStr = "Wéll helló ⚙️"
myStr.count // 12
myStr.unicodeScalars.count // 13
myStr.utf8.count // 19
myStr.utf16.count // 13
In Swift 4 string is a collection of characters (composite character like ö and emoji will count as one character). UTF-8 and UTF-16 views are the collections of UTF-8 and UTF-16 code units respectively.
Your problem is, that textView.text.count counts collection elements (emoji as well as composite character will count as one element) and NSRange counts indexes of UTF-16 code units. The difference is illustrated in the snipped above.
More here:
Strings And Characters

UTF8 String length and indices in Go vs Swift

I have apps in Go and Swift which process strings, such as finding substrings and their indices. At first it worked nicely even with multi-byte characters (e.g. emojis), using to Go's utf8.RuneCountInString() and Swift's native String.
But there are some UTF8 characters that break the string length and indices for substrings, e.g. a string "Lorem 😂😃✌️🤔 ipsum":
Go's utf8.RuneCountInString("Lorem 😂😃✌️🤔 ipsum") returns 17 and the start index of ipsum is 12.
Swift's "Lorem 😂😃✌️🤔 ipsum".count returns 16 and the start index of ipsum is 11.
Using Swift String's utf8, utf16 or casting to NSString gives also different lengths and indices. There are also other emojis composed from multiple other emoji's like 👨‍👩‍👧‍👦 which gives even funnier numbers.
This is with Go 1.8 and Swift 4.1.
Is there any way to get the same string lengths and substrings' indices with same values with Go and Swift?
EDIT
I created a Swift String extension based on #MartinR's great answer:
extension String {
func runesRangeToNSRange(from: Int, to: Int) -> NSRange {
let length = to - from
let start = unicodeScalars.index(unicodeScalars.startIndex, offsetBy: from)
let end = unicodeScalars.index(start, offsetBy: length)
let range = start..<end
return NSRange(range, in: self)
}
}

In Swift a Character is an “extended grapheme cluster,” and each of "😂", "😃", "✌️", "🤔", "👨‍👩‍👧‍👦" counts as a single character.
I have no experience with Go, but as I understand it from Strings, bytes, runes and characters in Go,
a “rune” is a Unicode code point, which essentially corresponds to a UnicodeScalar in Swift.
In your example, the difference comes from "✌️" which
counts as a single Swift character, but is built from two Unicode scalars:
print("✌️".count) // 1
print("✌️".unicodeScalars.count) // 2
Here is an example how you can compute the length and offsets in
terms of Unicode scalars:
let s = "Lorem 😂😃✌️🤔 ipsum"
print(s.unicodeScalars.count) // 17
if let idx = s.range(of: "ipsum") {
print(s.unicodeScalars.distance(from: s.startIndex, to: idx.lowerBound)) // 12
}
As you can see, this gives the same numbers as in your example from Go.

A rune in Go identifies a specific UTF-8 code point; that does not necessarily mean it maps 1:1 to visually distinct characters. Some characters may be made up of multiple runes/code points, therefor counting runes may not give you what you'd expect from a visual inspection of the string. I don't know what "some text".count actually counts in Swift so I can't offer any comparison there.

Why two flags only form 1 character? [duplicate]

let str1 = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪"
let str2 = "🇩🇪.🇩🇪.🇩🇪.🇩🇪.🇩🇪."
println("\(countElements(str1)), \(countElements(str2))")
Result: 1, 10
But should not str1 have 5 elements?
The bug seems only occurred when I use the flag emoji.

Update for Swift 4 (Xcode 9)
As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9
standard:
let str1 = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪"
print(str1.count) // 5
print(Array(str1)) // ["🇩🇪", "🇩🇪", "🇩🇪", "🇩🇪", "🇩🇪"]
Also String is a collection of its characters (again), so one can
obtain the character count with str1.count.
(Old answer for Swift 3 and older:)
From "3 Grapheme Cluster Boundaries"
in the "Standard Annex #29 UNICODE TEXT SEGMENTATION":
(emphasis added):
A legacy grapheme cluster is defined as a base (such as A or カ)
followed by zero or more continuing characters. One way to think of
this is as a sequence of characters that form a “stack”.
The base can be single characters, or be any sequence of Hangul Jamo
characters that form a Hangul Syllable, as defined by D133 in The
Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji
national flag symbols corresponding to ISO country codes. Sequences of
more than two RI characters should be separated by other characters,
such as U+200B ZWSP.
(Thanks to #rintaro for the link).
A Swift Character represents an extended grapheme cluster, so it is (according
to this reference) correct that any sequence of regional indicator symbols
is counted as a single character.
You can separate the "flags" by a ZERO WIDTH NON-JOINER:
let str1 = "🇩🇪\u{200C}🇩🇪"
print(str1.characters.count) // 2
or insert a ZERO WIDTH SPACE:
let str2 = "🇩🇪\u{200B}🇩🇪"
print(str2.characters.count) // 3
This solves also possible ambiguities, e.g. should "🇫🇷🇺🇸"
be "🇫🇷🇺🇸" or "🇫🇷🇺🇸" ?
See also How to know if two emojis will be displayed as one emoji? about a possible method
to count the number of "composed characters" in a Swift string,
which would return 5 for your let str1 = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪".

Here's how I solved that problem, for Swift 3:
let str = "🇩🇪🇩🇪🇩🇪🇩🇪🇩🇪" //or whatever the string of emojis is
let range = str.startIndex..<str.endIndex
var length = 0
str.enumerateSubstrings(in: range, options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in
length = length + 1
}
print("Character Count: \(length)")
This fixes all the problems with character count and emojis, and is the simplest method I have found.

How to get the number of real words in a text in Swift [duplicate]

This question already has answers here:
Number of words in a Swift String for word count calculation
(7 answers)
Closed 5 years ago.
Edit: there is already a question similar to this one but it's for numbers separated by a specific character (Get no. Of words in swift for average calculator). Instead this question is about to get the number of real words in a text, separated in various ways: a line break, some line breaks, a space, more than a space etc.
I would like to get the number of words in a string with Swift 3.
I'm using this code but I get imprecise result because the number is get counting the spaces and new lines instead of the effective number of words.
let str = "Architects and city planners,are \ndesigning buildings to create a better quality of life in our urban areas."
// 18 words, 21 spaces, 2 lines
let components = str.components(separatedBy: .whitespacesAndNewlines)
let a = components.count
print(a)
// 23 instead of 18

Consecutive spaces and newlines aren't coalesced into one generic whitespace region, so you're simply getting a bunch of empty "words" between successive whitespace characters. Get rid of this by filtering out empty strings:
let components = str.components(separatedBy: .whitespacesAndNewlines)
let words = components.filter { !$0.isEmpty }
print(words.count) // 17
The above will print 17 because you haven't included , as a separation character, so the string "planners,are" is treated as one word.
You can break that string up as well by adding punctuation characters to the set of separators like so:
let chararacterSet = CharacterSet.whitespacesAndNewlines.union(.punctuationCharacters)
let components = str.components(separatedBy: chararacterSet)
let words = components.filter { !$0.isEmpty }
print(words.count) // 18
Now you'll see a count of 18 like you expect.

Display certain number of letters

I have a word that is being displayed into a label. Could I program it, where it will only show the last 2 characters of the word, or the the first 3 only? How can I do this?

Swift's string APIs can be a little confusing. You get access to the characters of a string via its characters property, on which you can then use prefix() or suffix() to get the substring you want. That subset of characters needs to be converted back to a String:
let str = "Hello, world!"
// first three characters:
let prefixSubstring = String(str.characters.prefix(3))
// last two characters:
let suffixSubstring = String(str.characters.suffix(2))

I agree it is definitely confusing working with String indexing in Swift and they have changed a little bit from Swift 1 to 2 making googling a bit of a challenge but it can actually be quite simple once you get a hang of the methods. You basically need to make it into a two-step process:
1) Find the index you need
2) Advance from there
For example:
let sampleString = "HelloWorld"
let lastThreeindex = sampleString.endIndex.advancedBy(-3)
sampleString.substringFromIndex(lastThreeindex) //prints rld
let secondIndex = sampleString.startIndex.advancedBy(2)
sampleString.substringToIndex(secondIndex) //prints He

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to determine the display count of a Swift String? - swift

Related

Detecting Cursor position in UITextView that contains emojis returns the wrong position in swift 4

UTF8 String length and indices in Go vs Swift

Why two flags only form 1 character? [duplicate]

How to get the number of real words in a text in Swift [duplicate]

Display certain number of letters

Categories

Resources