How can I convert unicode values in a string "U+XXXXXX" to the corresponding characters in Swift? - swift

I accept from backend the following json
{
"id": "f33919f6-3554-4246-9e78-bca3a690c119",
"title": "Category3",
"slug": "category3",
"hex_up": "#eb4034",
"hex_down": "#80302a",
"emoji": "U+1F602",
"parent_id": "aa3f651b-f068-4ae1-a9d8-a18a9945b111"
}
There is a field "emoji": "U+1F602",
I need show emoji icon like 😃 in UILabel
I tried to google and found results like
let scalarValue = UnicodeScalar(emojiString)
let myString = String(scalarValue!)
Unfortunately app crashes at the second line.
Thanks for your answers.

There's no U+... syntax in Swift. (There is a \u{...} syntax that does the same thing, but it's not necessary here.)
You'll need to parse the String yourself:
func parseUnicode(_ string: String) -> String? {
guard string.hasPrefix("U+"), // Make sure it's a U+ string
let value = Int(string.dropFirst(2), radix: 16), // Convert to Int
let scalar = UnicodeScalar(value) // Convert to UnicodeScalar
else { return nil }
return String(scalar) // Convert to String
}
if let myString = parseUnicode(emoji) { ... }
Don't use ! here. The U+... string may be invalid, and you wouldn't want to crash in that case.

You can simply apply a string transform from "Hex/Unicode" to "Any" (a set of all characters):
"U+1F602".applyingTransform(.init("Hex/Unicode-Any"), reverse: false) // "😂"
or as instance properties of StringProtocol to encode/decode from/to hexa unicode:
extension StringTransform {
static let unicodeToAny: Self = .init("Hex/Unicode-Any")
static let anyToUnicode: Self = .init("Any-Hex/Unicode")
}
extension StringProtocol {
var decodingHexaUnicode: String {
applyingTransform(.unicodeToAny, reverse: false)!
}
var encodingHexaUnicode: String {
applyingTransform(.anyToUnicode, reverse: false)!
}
}
Usage:
let hexaUnicode = "U+1F602"
let emoji = hexaUnicode.decodingHexaUnicode // "😂"
let unicodeFromEmoji = emoji.encodingHexaUnicode // "U+1F602"

The reason your app crashed is due to the fact that the scalarValue you attempted to initialize is nil, and you're force-unwrapping using (!) that nil value on line 2. Rob's answer shows how to unwrap the optional safely.
You can get the emoji by using the value following the U+. So you'll need to drop the first two characters of the string. So use this code to accomplish that:
let parsedEmoji = emojiString.substring(from:2)
Now you'll convert that emoji unicode using the code below.
let emoji = String(UnicodeScalar(Int(parsedEmojiHex,radix: 16)!)!)
print(emoji)

Related

How to get the range of the first line in a string?

I would like to change the formatting of the first line of text in an NSTextView (give it a different font size and weight to make it look like a headline). Therefore, I need the range of the first line. One way to go is this:
guard let firstLineString = textView.string.components(separatedBy: .newlines).first else {
return
}
let range = NSRange(location: 0, length: firstLineString.count)
However, I might be working with quite long texts so it appears to be inefficient to first split the entire string into line components when all I need is the first line component. Thus, it seems to make sense to use the firstIndex(where:) method:
let firstNewLineIndex = textView.string.firstIndex { character -> Bool in
return CharacterSet.newlines.contains(character)
}
// Then: Create an NSRange from 0 up to firstNewLineIndex.
This doesn't work and I get an error:
Cannot convert value of type '(Unicode.Scalar) -> Bool' to expected argument type 'Character'
because the contains method accepts not a Character but a Unicode.Scalar as a parameter (which doesn't really make sense to me because then it should be called a UnicodeScalarSet and not a CharacterSet, but nevermind...).
My question is:
How can I implement this in an efficient way, without first slicing the whole string?
(It doesn't necessarily have to use the firstIndex(where:) method, but appears to be the way to go.)
A String.Index range for the first line in string can be obtained with
let range = string.lineRange(for: ..<string.startIndex)
If you need that as an NSRange then
let nsRange = NSRange(range, in: string)
does the trick.
You can use rangeOfCharacter, which returns the Range<String.Index> of the first character from a set in your string:
extension StringProtocol where Index == String.Index {
var partialRangeOfFirstLine: PartialRangeUpTo<String.Index> {
return ..<(rangeOfCharacter(from: .newlines)?.lowerBound ?? endIndex)
}
var rangeOfFirstLine: Range<Index> {
return startIndex..<partialRangeOfFirstLine.upperBound
}
var firstLine: SubSequence {
return self[partialRangeOfFirstLine]
}
}
You can use it like so:
var str = """
some string
with new lines
"""
var attributedString = NSMutableAttributedString(string: str)
let firstLine = NSAttributedString(string: String(str.firstLine))
// change firstLine as you wish
let range = NSRange(str.rangeOfFirstLine, in: str)
attributedString.replaceCharacters(in: range, with: firstLine)

Remove the first six characters from a String (Swift)

What's the best way to go about removing the first six characters of a string? Through Stack Overflow, I've found a couple of ways that were supposed to be solutions but I noticed an error with them. For instance,
extension String {
func removing(charactersOf string: String) -> String {
let characterSet = CharacterSet(charactersIn: string)
let components = self.components(separatedBy: characterSet)
return components.joined(separator: "")
}
If I type in a website like https://www.example.com, and store it as a variable named website, then type in the following
website.removing(charactersOf: "https://")
it removes the https:// portion but it also removes all h's, all t's, :'s, etc. from the text.
How can I just delete the first characters?
In Swift 4 it is really simple, just use dropFirst(n: Int)
let myString = "Hello World"
myString.dropFirst(6)
//World
In your case: website.dropFirst(6)
Why not :
let stripped = String(website.characters.dropFirst(6))
Seems more concise and straightforward to me.
(it won't work with multi-char emojis either mind you)
[EDIT] Swift 4 made this even shorter:
let stripped = String(website.dropFirst(6))
length is the number of characters you want to remove (6 in your case)
extension String {
func toLengthOf(length:Int) -> String {
if length <= 0 {
return self
} else if let to = self.index(self.startIndex, offsetBy: length, limitedBy: self.endIndex) {
return self.substring(from: to)
} else {
return ""
}
}
}
It will remove first 6 characters from a string
var str = "Hello-World"
let range1 = str.characters.index(str.startIndex, offsetBy: 6)..<str.endIndex
str = str[range1]
print("the end time is : \(str)")

Swift: How to check if a range is valid for a given string

I have written a swift function that takes a String and a Range as its parameters. How can I check that the range is valid for the string?
Edit: Nonsensical Example
func foo(text: String, range: Range<String.Index>) ->String? {
// what can I do here to ensure valid range
guard *is valid range for text* else {
return nil
}
return text[range]
}
var str = "Hello, world"
let range = str.rangeOfString("world")
let str2 = "short"
let text = foo(str2, range: range!)
In Swift 3, this is easy: just get the string's character range and call contains to see if it contains your arbitrary range.
Edit: In Swift 4, a range no longer "contains" a range. A Swift 4.2 solution might look like this:
let string = // some string
let range = // some range of String.Index
let ok = range.clamped(to: string.startIndex..<string.endIndex) == range
If ok is true, it is safe to apply range to string.
Swift 5
extension String {
func hasRange(_ range: NSRange) -> Bool {
return Range(range, in: self) != nil
}
}
Unfortunately, I was not able to test Matt's solution as I am using swift 2.2. However, using his idea I came up with ...
func foo(text: String, range: Range<String.Index>) -> String? {
let r = text.startIndex..<text.endIndex
if r.contains(range.startIndex) && r.contains(range.endIndex) {
return text[range]
} else {
return nil
}
}
If the start and end indices are ok then so must be the entire range.

Remove all non-numeric characters from a string in swift

I have the need to parse some unknown data which should just be a numeric value, but may contain whitespace or other non-alphanumeric characters.
Is there a new way of doing this in Swift? All I can find online seems to be the old C way of doing things.
I am looking at stringByTrimmingCharactersInSet - as I am sure my inputs will only have whitespace/special characters at the start or end of the string. Are there any built in character sets I can use for this? Or do I need to create my own?
I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep
I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep.
You can either use trimmingCharacters with the inverted character set to remove characters from the start or the end of the string. In Swift 3 and later:
let result = string.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
Or, if you want to remove non-numeric characters anywhere in the string (not just the start or end), you can filter the characters, e.g. in Swift 4.2.1:
let result = string.filter("0123456789.".contains)
Or, if you want to remove characters from a CharacterSet from anywhere in the string, use:
let result = String(string.unicodeScalars.filter(CharacterSet.whitespaces.inverted.contains))
Or, if you want to only match valid strings of a certain format (e.g. ####.##), you could use regular expression. For example:
if let range = string.range(of: #"\d+(\.\d*)?"#, options: .regularExpression) {
let result = string[range] // or `String(string[range])` if you need `String`
}
The behavior of these different approaches differ slightly so it just depends on precisely what you're trying to do. Include or exclude the decimal point if you want decimal numbers, or just integers. There are lots of ways to accomplish this.
For older, Swift 2 syntax, see previous revision of this answer.
let result = string.stringByReplacingOccurrencesOfString("[^0-9]", withString: "", options: NSStringCompareOptions.RegularExpressionSearch, range:nil).stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
Swift 3
let result = string.replacingOccurrences( of:"[^0-9]", with: "", options: .regularExpression)
You can upvote this answer.
I prefer this solution, because I like extensions, and it seems a bit cleaner to me. Solution reproduced here:
extension String {
var digits: String {
return components(separatedBy: CharacterSet.decimalDigits.inverted)
.joined()
}
}
You can filter the UnicodeScalarView of the string using the pattern matching operator for ranges, pass a UnicodeScalar ClosedRange from 0 to 9 and initialise a new String with the resulting UnicodeScalarView:
extension String {
private static var digits = UnicodeScalar("0")..."9"
var digits: String {
return String(unicodeScalars.filter(String.digits.contains))
}
}
"abc12345".digits // "12345"
edit/update:
Swift 4.2
extension RangeReplaceableCollection where Self: StringProtocol {
var digits: Self {
return filter(("0"..."9").contains)
}
}
or as a mutating method
extension RangeReplaceableCollection where Self: StringProtocol {
mutating func removeAllNonNumeric() {
removeAll { !("0"..."9" ~= $0) }
}
}
Swift 5.2 • Xcode 11.4 or later
In Swift5 we can use a new Character property called isWholeNumber:
extension RangeReplaceableCollection where Self: StringProtocol {
var digits: Self { filter(\.isWholeNumber) }
}
extension RangeReplaceableCollection where Self: StringProtocol {
mutating func removeAllNonNumeric() {
removeAll { !$0.isWholeNumber }
}
}
To allow a period as well we can extend Character and create a computed property:
extension Character {
var isDecimalOrPeriod: Bool { "0"..."9" ~= self || self == "." }
}
extension RangeReplaceableCollection where Self: StringProtocol {
var digitsAndPeriods: Self { filter(\.isDecimalOrPeriod) }
}
Playground testing:
"abc12345".digits // "12345"
var str = "123abc0"
str.removeAllNonNumeric()
print(str) //"1230"
"Testing0123456789.".digitsAndPeriods // "0123456789."
Swift 4
I found a decent way to get only alpha numeric characters set from a string.
For instance:-
func getAlphaNumericValue() {
var yourString = "123456789!##$%^&*()AnyThingYouWant"
let unsafeChars = CharacterSet.alphanumerics.inverted // Remove the .inverted to get the opposite result.
let cleanChars = yourString.components(separatedBy: unsafeChars).joined(separator: "")
print(cleanChars) // 123456789AnyThingYouWant
}
A solution using the filter function and rangeOfCharacterFromSet
let string = "sld [f]34é7*˜µ"
let alphaNumericCharacterSet = NSCharacterSet.alphanumericCharacterSet()
let filteredCharacters = string.characters.filter {
return String($0).rangeOfCharacterFromSet(alphaNumericCharacterSet) != nil
}
let filteredString = String(filteredCharacters) // -> sldf34é7µ
To filter for only numeric characters use
let string = "sld [f]34é7*˜µ"
let numericSet = "0123456789"
let filteredCharacters = string.characters.filter {
return numericSet.containsString(String($0))
}
let filteredString = String(filteredCharacters) // -> 347
or
let numericSet : [Character] = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
let filteredCharacters = string.characters.filter {
return numericSet.contains($0)
}
let filteredString = String(filteredCharacters) // -> 347
Swift 4
But without extensions or componentsSeparatedByCharactersInSet which doesn't read as well.
let allowedCharSet = NSCharacterSet.letters.union(.whitespaces)
let filteredText = String(sourceText.unicodeScalars.filter(allowedCharSet.contains))
let string = "+1*(234) fds567#-8/90-"
let onlyNumbers = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined()
print(onlyNumbers) // "1234567890"
or
extension String {
func removeNonNumeric() -> String {
return self.components(separatedBy: CharacterSet.decimalDigits.inverted).joined()
}
}
let onlyNumbers = "+1*(234) fds567#-8/90-".removeNonNumeric()
print(onlyNumbers)// "1234567890"
Swift 3, filters all except numbers
let myString = "dasdf3453453fsdf23455sf.2234"
let result = String(myString.characters.filter { String($0).rangeOfCharacter(from: CharacterSet(charactersIn: "0123456789")) != nil })
print(result)
Swift 4.2
let numericString = string.filter { (char) -> Bool in
return char.isNumber
}
You can do something like this...
let string = "[,myString1. \"" // string : [,myString1. "
let characterSet = NSCharacterSet(charactersInString: "[,. \"")
let finalString = (string.componentsSeparatedByCharactersInSet(characterSet) as NSArray).componentsJoinedByString("")
print(finalString)
//finalString will be "myString1"
The issue with Rob's first solution is stringByTrimmingCharactersInSet only filters the ends of the string rather than throughout, as stated in Apple's documentation:
Returns a new string made by removing from both ends of the receiver characters contained in a given character set.
Instead use componentsSeparatedByCharactersInSet to first isolate all non-occurrences of the character set into arrays and subsequently join them with an empty string separator:
"$$1234%^56()78*9££".componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: "0123456789").invertedSet)).joinWithSeparator("")
Which returns 123456789
Swift 3
extension String {
var keepNumericsOnly: String {
return self.components(separatedBy: CharacterSet(charactersIn: "0123456789").inverted).joined(separator: "")
}
}
Swift 4.0 version
extension String {
var numbers: String {
return String(describing: filter { String($0).rangeOfCharacter(from: CharacterSet(charactersIn: "0123456789")) != nil })
}
}
Swift 4
String.swift
import Foundation
extension String {
func removeCharacters(from forbiddenChars: CharacterSet) -> String {
let passed = self.unicodeScalars.filter { !forbiddenChars.contains($0) }
return String(String.UnicodeScalarView(passed))
}
func removeCharacters(from: String) -> String {
return removeCharacters(from: CharacterSet(charactersIn: from))
}
}
ViewController.swift
let character = "1Vi234s56a78l9"
let alphaNumericSet = character.removeCharacters(from: CharacterSet.decimalDigits.inverted)
print(alphaNumericSet) // will print: 123456789
let alphaNumericCharacterSet = character.removeCharacters(from: "0123456789")
print("no digits",alphaNumericCharacterSet) // will print: Vishal
Swift 4.2
let digitChars = yourString.components(separatedBy:
CharacterSet.decimalDigits.inverted).joined(separator: "")
Swift 3 Version
extension String
{
func trimmingCharactersNot(in charSet: CharacterSet) -> String
{
var s:String = ""
for unicodeScalar in self.unicodeScalars
{
if charSet.contains(unicodeScalar)
{
s.append(String(unicodeScalar))
}
}
return s
}
}

NSCharacterSet.characterIsMember() with Swift's Character type

Imagine you've got an instance of Swift's Character type, and you want to determine whether it's a member of an NSCharacterSet. NSCharacterSet's characterIsMember method takes a unichar, so we need to get from Character to unichar.
The only solution I could come up with is the following, where c is my Character:
let u: unichar = ("\(c)" as NSString).characterAtIndex(0)
if characterSet.characterIsMember(u) {
dude.abide()
}
I looked at Character but nothing leapt out at me as a way to get from it to unichar. This may be because Character is more general than unichar, so a direct conversion wouldn't be safe, but I'm only guessing.
If I were iterating a whole string, I'd do something like this:
let s = myString as NSString
for i in 0..<countElements(myString) {
let u = s.characterAtIndex(i)
if characterSet.characterIsMember(u) {
dude.abide()
}
}
(Warning: The above is pseudocode and has never been run by anyone ever.) But this is not really what I'm asking.
My understanding is that unichar is a typealias for UInt16. A unichar is just a number.
I think that the problem that you are facing is that a Character in Swift can be composed of more than one unicode "characters". Thus, it cannot be converted to a single unichar value because it may be composed of two unichars. You can decompose a Character into its individual unichar values by casting it to a string and using the utf16 property, like this:
let c: Character = "a"
let s = String(c)
var codeUnits = [unichar]()
for codeUnit in s.utf16 {
codeUnits.append(codeUnit)
}
This will produce an array - codeUnits - of unichar values.
EDIT: Initial code had for codeUnit in s when it should have been for codeUnit in s.utf16
You can tidy things up and test for whether or not each individual unichar value is in a character set like this:
let char: Character = "\u{63}\u{20dd}" // This is a 'c' inside of an enclosing circle
for codeUnit in String(char).utf16 {
if NSCharacterSet(charactersInString: "c").characterIsMember(codeUnit) {
dude.abide()
} // dude will abide() for codeUnits[0] = "c", but not for codeUnits[1] = 0x20dd (the enclosing circle)
}
Or, if you are only interested in the first (and often only) unichar value:
if NSCharacterSet(charactersInString: "c").characterIsMember(String(char).utf16[0]) {
dude.abide()
}
Or, wrap it in a function:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
return set.characterIsMember(String(char).utf16[0])
}
let xSet = NSCharacterSet(charactersInString: "x")
isChar("x", inSet: xSet) // This returns true
isChar("y", inSet: xSet) // This returns false
Now make the function check for all unichar values in a composed character - that way, if you have a composed character, the function will only return true if both the base character and the combining character are present:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
var found = true
for ch in String(char).utf16 {
if !set.characterIsMember(ch) { found = false }
}
return found
}
let acuteA: Character = "\u{e1}" // An "a" with an accent
let acuteAComposed: Character = "\u{61}\u{301}" // Also an "a" with an accent
// A character set that includes both the composed and uncomposed unichar values
let charSet = NSCharacterSet(charactersInString: "\u{61}\u{301}\u{e1}")
isChar(acuteA, inSet: charSet) // returns true
isChar(acuteAComposed, inSet: charSet) // returns true (both unichar values were matched
The last version is important. If your Character is a composed character you have to check for the presence of both the base character ("a") and the combining character (the acute accent) in the character set or you will get false positives.
I would treat the Character as a String and let Cocoa do all the work:
func charset(cset:NSCharacterSet, containsCharacter c:Character) -> Bool {
let s = String(c)
let ix = s.startIndex
let ix2 = s.endIndex
let result = s.rangeOfCharacterFromSet(cset, options: nil, range: ix..<ix2)
return result != nil
}
And here's how to use it:
let cset = NSCharacterSet.lowercaseLetterCharacterSet()
let c : Character = "c"
let ok = charset(cset, containsCharacter:c) // true
Do it all in a one liner:
validCharacterSet.contains(String(char).unicodeScalars.first!)
(Swift 3)
Due to changes in Swift 3.0, matt's answer no longer works, so here is working version (as extension):
private extension NSCharacterSet {
func containsCharacter(c: Character) -> Bool {
let s = String(c)
let ix = s.startIndex
let ix2 = s.endIndex
let result = s.rangeOfCharacter(from: self as CharacterSet, options: [], range: ix..<ix2)
return result != nil
}
}
Swift 3.0 changes means you actually don't need to be bridging to NSCharacterSet anymore, you can use Swift's native CharacterSet.
You could do something similar to Jiri's answer directly:
extension CharacterSet {
func contains(_ character: Character) -> Bool {
let string = String(character)
return string.rangeOfCharacter(from: self, options: [], range: string.startIndex..<string.endIndex) != nil
}
}
or do:
func contains(_ character: Character) -> Bool {
let otherSet = CharacterSet(charactersIn: String(character))
return self.isSuperset(of: otherSet)
}
Note: the above crashes and doesn't work due to https://bugs.swift.org/browse/SR-3667. Not sure CharacterSet gets the kind of love it needs.