Issues when attempting to modify an emoji sequence - swift

I have the following function that takes a string with emojis, if its a sequence emoji a+b it will leave a as is and it will change b to a different emoji
func changeEmoji(givenString:String)->(String){
let emojiDictionary :[String:String] = [
"โ›น" : "โ›น",
"โ™€๏ธ" : "๐Ÿ‘ฉ",
"๐Ÿป" : "๐Ÿ’ค",
"โ™‚๏ธ" :"๐Ÿ‘จ",
]
let stringCharacters=Array(givenString.characters)
var returnedString=String()
for character in stringCharacters{
if emojiDictionary[String(character)] == nil {
return "error"
}
else {
returnedString=returnedString+emojiDictionary[String(character)]!
}
}
return returnedString
}
i have no problem with
changeEmoji(givenString: "โ›นโ›น๐Ÿป")
it outputs: "โ›นโ›น๐Ÿ’ค"
but:
changeEmoji(givenString: "โ›นโ›น๐Ÿปโ›น๐Ÿปโ€โ™€๏ธ")
outputs "error" while it shouldn't as โ™€ Female Sign and Variation Selector-16 is the second key in my emojiDictionary..
Similar issue appears with male sign and variation selector.
Any ideas why is this happening?

The problem is that "โ›น๐Ÿปโ€โ™€๏ธ" is made up of 3 Swift Characters (aka extended grapheme clusters):
"โ›น๐Ÿปโ€" (U+26F9 PERSON WITH BALL)
"๐Ÿปโ€" (U+1F3FB Emoji Modifier Fitzpatrick Type-1-2, U+200D ZERO WIDTH JOINER)
"โ™€๏ธ" (U+2640 FEMALE SIGN, U+FE0F VARIATION SELECTOR-16)
However, your emojiDictionary only contains a "๐Ÿป" (U+1F3FB Emoji Modifier Fitzpatrick Type-1-2), which doesn't match the second Character of "โ›น๐Ÿปโ€โ™€๏ธ" as it's missing the zero width joiner.
The simplest solution therefore is to just add another key to your dictionary to include the Emoji Modifier Fitzpatrick Type-1-2 character, with a zero width joiner suffix. The clearest way of doing this would be to just suffix it with the unicode escape sequence \u{200D}.
For example:
func changeEmoji(givenString: String) -> String? {
// I have included the unicode point breakdowns for clarity
let emojiDictionary : [String : String] = [
"โ›น" : "โ›น", // 26F9 : 26F9
"โ™€๏ธ" : "๐Ÿ‘ฉ", // 2640, fe0f : 1f469
"๐Ÿป" : "๐Ÿ’ค", // 1f3fb : 1f4a4
"๐Ÿป\u{200D}" : "๐Ÿ’ค", // 1f3fb, 200d : 1f4a4
"โ™‚๏ธ" :"๐Ÿ‘จ" // 2642, fe0f : 1f468
]
// Convert characters of string to an array of string characters,
// given that you're just going to use the String(_:) initialiser later.
let stringCharacters = givenString.characters.map(String.init(_:))
var returnedString = ""
for character in stringCharacters {
guard let replacementCharacter = emojiDictionary[character] else {
// I would advise making your method return an optional
// in cases where the string can't be converted.
// Failure is shown by the return of nil, rather than some
// string sentinel.
return nil
}
returnedString += replacementCharacter
}
return returnedString
}
print(changeEmoji(givenString: "โ›นโ›น๐Ÿปโ›น๐Ÿปโ€โ™€๏ธ")) // Optional("โ›นโ›น๐Ÿ’คโ›น๐Ÿ’ค๐Ÿ‘ฉ")

Related

Swift. How to get the previous character?

For example: I have character "b" and I what to get "a", so "a" is the previous character.
let b: Character = "b"
let a: Character = b - 1 // Compilation error
It's actually pretty complicated to get the previous character from Swift's Character type because Character is actually comprised of one or more Unicode.Scalar values. Depending on your needs you could restrict your efforts to just the ASCII characters. Or you could support all characters comprised of a single Unicode scalar. Once you get into characters comprised of multiple Unicode scalars (such as the flag Emojis or various skin toned Emojis) then I'm not even sure what the "previous character" means.
Here is a pair of methods added to a Character extension that can handle ASCII and single-Unicode scalar characters.
extension Character {
var previousASCII: Character? {
if let ascii = asciiValue, ascii > 0 {
return Character(Unicode.Scalar(ascii - 1))
}
return nil
}
var previousScalar: Character? {
if unicodeScalars.count == 1 {
if let scalar = unicodeScalars.first, scalar.value > 0 {
if let prev = Unicode.Scalar(scalar.value - 1) {
return Character(prev)
}
}
}
return nil
}
}
Examples:
let b: Character = "b"
let a = b.previousASCII // Gives Optional("a")
let emoji: Character = "๐Ÿ˜†"
let previous = emoji.previousScalar // Gives Optional("๐Ÿ˜…")

How do I compare a string element to a unicode character in Swift?

I'm writing a code that will take in characters from the International Phonetic Alphabet and model them as phonemes to analyze phonologically. I need to be able to compare parts of a symbol (diacritics) to certain unicode characters. This is what I'm currently doing (what isn't working)
let diacritics : [String : String] = [
...
"\u{2B0}" : "aspirated",
...
]
let elementsInSample = Array(sample)
for element in elementsInSample {
if diacritics.keys.contains(String(element)) {
\\Do things
}
}
.contains will return false for สฐ when the key is in unicode. How do I rearrange the types so that it will be accurate?
You may want to work with Unicode Scalars, rather than Characters.
let diacritics : [Unicode.Scalar : String] = [
//...
"\u{2B0}" : "aspirated",
//...
]
let sample = "สฐ"
for element in sample.unicodeScalars {
if let value = diacritics[element] {
print(value)
}
}

How do you turn a string into a unicode family in Swift?

I'm trying to make a feature in my app that when a user types in a text field, the text converts into a unicode family.
Like below, there is an image of a user typing. And as the user types, you can see different unicode family characters that when a user types on a cell, you can copy the text and paste it somewhere else.
If I would like to turn my text into the black bubble unicode family like the screenshot above, how can I do that?
You can define a character map. Here's one to get you started.
let circledMap: [Character : Character] = ["A": "๐Ÿ…", "B": "๐Ÿ…‘", "C": "๐Ÿ…’", "D": "๐Ÿ…“"] // The rest are left as an exercise
let circledRes = String("abacab".uppercased().map { circledMap[$0] ?? $0 })
print(circledRes)
If your map contains mappings for both upper and lowercase letters then don't call uppercased.
Create whatever maps you want. Spend lots of time with the "Emoji & Symbols" viewer found on the Edit menu of every macOS program.
let invertedMap: [Character : Character] = ["a": "ษ", "b": "q", "c": "ษ”", "d": "p", "e": "ว", "f": "ษŸ", "g": "ฦƒ", "h": "ษฅ"]
In a case like the circled letters, it would be nice to define a range where you can transform "A"..."Z" to "๐Ÿ…"..."๐Ÿ…ฉ".
That actually takes more code than I expected but the following does work:
extension String {
// A few sample ranges to get started
// NOTE: Make sure each mapping pair has the same number of characters or bad things may happen
static let circledLetters: [ClosedRange<UnicodeScalar> : ClosedRange<UnicodeScalar>] = ["A"..."Z" : "๐Ÿ…"..."๐Ÿ…ฉ", "a"..."z" : "๐Ÿ…"..."๐Ÿ…ฉ"]
static let boxedLetters: [ClosedRange<UnicodeScalar> : ClosedRange<UnicodeScalar>] = ["A"..."Z" : "๐Ÿ…ฐ"..."๐Ÿ†‰", "a"..."z" : "๐Ÿ…ฐ"..."๐Ÿ†‰"]
static let italicLetters: [ClosedRange<UnicodeScalar> : ClosedRange<UnicodeScalar>] = ["A"..."Z" : "๐ด"..."๐‘", "a"..."z" : "๐‘Ž"..."๐‘ง"]
func transformed(using mapping: [ClosedRange<UnicodeScalar> : ClosedRange<UnicodeScalar>]) -> String {
let chars: [UnicodeScalar] = self.unicodeScalars.map { ch in
for transform in mapping {
// If the character is found in the range, convert it
if let offset = transform.key.firstIndex(of: ch) {
// Convert the offset from key range into an Int
let dist = transform.key.distance(from: transform.key.startIndex, to: offset)
// Build new index into value range
let newIndex = transform.value.index(transform.value.startIndex, offsetBy: dist)
// Get the mapped character
let newch = transform.value[newIndex]
return newch
}
}
// Not found in any of the mappings so return the original as-is
return ch
}
// Convert the final [UnicodeScalar] into a new String
var res = ""
res.unicodeScalars.append(contentsOf: chars)
return res
}
}
print("This works".transformed(using: String.circledLetters)) // ๐Ÿ…ฃ๐Ÿ…—๐Ÿ…˜๐Ÿ…ข ๐Ÿ…ฆ๐Ÿ…ž๐Ÿ…ก๐Ÿ…š๐Ÿ…ข
The above String extension also requires the following extension (thanks to this answer):
extension UnicodeScalar: Strideable {
public func distance(to other: UnicodeScalar) -> Int {
return Int(other.value) - Int(self.value)
}
public func advanced(by n: Int) -> UnicodeScalar {
let advancedValue = n + Int(self.value)
guard let advancedScalar = UnicodeScalar(advancedValue) else {
fatalError("\(String(advancedValue, radix: 16)) does not represent a valid unicode scalar value.")
}
return advancedScalar
}
}

Get numbers characters from a string [duplicate]

This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Cafรฉ". The last character, "รฉ", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Cafรฉ
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caรฉ"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}

NSCharacterSet.characterIsMember() with Swift's Character type

Imagine you've got an instance of Swift's Character type, and you want to determine whether it's a member of an NSCharacterSet. NSCharacterSet's characterIsMember method takes a unichar, so we need to get from Character to unichar.
The only solution I could come up with is the following, where c is my Character:
let u: unichar = ("\(c)" as NSString).characterAtIndex(0)
if characterSet.characterIsMember(u) {
dude.abide()
}
I looked at Character but nothing leapt out at me as a way to get from it to unichar. This may be because Character is more general than unichar, so a direct conversion wouldn't be safe, but I'm only guessing.
If I were iterating a whole string, I'd do something like this:
let s = myString as NSString
for i in 0..<countElements(myString) {
let u = s.characterAtIndex(i)
if characterSet.characterIsMember(u) {
dude.abide()
}
}
(Warning: The above is pseudocode and has never been run by anyone ever.) But this is not really what I'm asking.
My understanding is that unichar is a typealias for UInt16. A unichar is just a number.
I think that the problem that you are facing is that a Character in Swift can be composed of more than one unicode "characters". Thus, it cannot be converted to a single unichar value because it may be composed of two unichars. You can decompose a Character into its individual unichar values by casting it to a string and using the utf16 property, like this:
let c: Character = "a"
let s = String(c)
var codeUnits = [unichar]()
for codeUnit in s.utf16 {
codeUnits.append(codeUnit)
}
This will produce an array - codeUnits - of unichar values.
EDIT: Initial code had for codeUnit in s when it should have been for codeUnit in s.utf16
You can tidy things up and test for whether or not each individual unichar value is in a character set like this:
let char: Character = "\u{63}\u{20dd}" // This is a 'c' inside of an enclosing circle
for codeUnit in String(char).utf16 {
if NSCharacterSet(charactersInString: "c").characterIsMember(codeUnit) {
dude.abide()
} // dude will abide() for codeUnits[0] = "c", but not for codeUnits[1] = 0x20dd (the enclosing circle)
}
Or, if you are only interested in the first (and often only) unichar value:
if NSCharacterSet(charactersInString: "c").characterIsMember(String(char).utf16[0]) {
dude.abide()
}
Or, wrap it in a function:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
return set.characterIsMember(String(char).utf16[0])
}
let xSet = NSCharacterSet(charactersInString: "x")
isChar("x", inSet: xSet) // This returns true
isChar("y", inSet: xSet) // This returns false
Now make the function check for all unichar values in a composed character - that way, if you have a composed character, the function will only return true if both the base character and the combining character are present:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
var found = true
for ch in String(char).utf16 {
if !set.characterIsMember(ch) { found = false }
}
return found
}
let acuteA: Character = "\u{e1}" // An "a" with an accent
let acuteAComposed: Character = "\u{61}\u{301}" // Also an "a" with an accent
// A character set that includes both the composed and uncomposed unichar values
let charSet = NSCharacterSet(charactersInString: "\u{61}\u{301}\u{e1}")
isChar(acuteA, inSet: charSet) // returns true
isChar(acuteAComposed, inSet: charSet) // returns true (both unichar values were matched
The last version is important. If your Character is a composed character you have to check for the presence of both the base character ("a") and the combining character (the acute accent) in the character set or you will get false positives.
I would treat the Character as a String and let Cocoa do all the work:
func charset(cset:NSCharacterSet, containsCharacter c:Character) -> Bool {
let s = String(c)
let ix = s.startIndex
let ix2 = s.endIndex
let result = s.rangeOfCharacterFromSet(cset, options: nil, range: ix..<ix2)
return result != nil
}
And here's how to use it:
let cset = NSCharacterSet.lowercaseLetterCharacterSet()
let c : Character = "c"
let ok = charset(cset, containsCharacter:c) // true
Do it all in a one liner:
validCharacterSet.contains(String(char).unicodeScalars.first!)
(Swift 3)
Due to changes in Swift 3.0, matt's answer no longer works, so here is working version (as extension):
private extension NSCharacterSet {
func containsCharacter(c: Character) -> Bool {
let s = String(c)
let ix = s.startIndex
let ix2 = s.endIndex
let result = s.rangeOfCharacter(from: self as CharacterSet, options: [], range: ix..<ix2)
return result != nil
}
}
Swift 3.0 changes means you actually don't need to be bridging to NSCharacterSet anymore, you can use Swift's native CharacterSet.
You could do something similar to Jiri's answer directly:
extension CharacterSet {
func contains(_ character: Character) -> Bool {
let string = String(character)
return string.rangeOfCharacter(from: self, options: [], range: string.startIndex..<string.endIndex) != nil
}
}
or do:
func contains(_ character: Character) -> Bool {
let otherSet = CharacterSet(charactersIn: String(character))
return self.isSuperset(of: otherSet)
}
Note: the above crashes and doesn't work due to https://bugs.swift.org/browse/SR-3667. Not sure CharacterSet gets the kind of love it needs.