Convert unicode symbols \uXXXX in String to Character in Swift - swift

I'm receiving via a REST API a string which contains unicode encoded characters in form of \uXXXX
e.g. Ain\u2019t which should be Ain’t
Is there a nice way to convert these?

You can use \u{my_unicode}:
print("Ain\u{2019}t this a beautiful day")
/* Prints "Ain’t this a beautiful day"
From the Language Guide - Strings and Characters - Unicode:
String literals can include the following special characters:
...
An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit
hexadecimal number with a value equal to a valid Unicode code point

You can apply a string transform StringTransform:
extension String {
var decodingUnicodeCharacters: String { applyingTransform(.init("Hex-Any"), reverse: false) ?? "" }
}
let string = #"Ain\u2019t"#
print(string.decodingUnicodeCharacters) // "Ain’t\n"

Related

Swift: Simple method to replace a single character in a String?

I wanted to replace the first character of a String and got it to work like this:
s.replaceSubrange(Range(NSMakeRange(0,1),in:s)!, with:".")
I wonder if there is a simpler method to achieve the same result?
[edit]
Get nth character of a string in Swift programming language doesn't provide a mutable substring. And it requires writing a String extension, which isn't really helping when trying to shorten code.
To replace the first character, you can do use String concatenation with dropFirst():
var s = "😃hello world!"
s = "." + s.dropFirst()
print(s)
Result:
.hello world!
Note: This will not crash if the String is empty; it will just create a String with the replacement character.
Strings work very differently in Swift than many other languages. In Swift, a character is not a single byte but instead a single visual element. This is very important when working with multibyte characters like emoji (see: Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?)
If you really do want to set a single random byte of your string to an arbitrary value as you expanded on in the comments of your question, you'll need to drop out of the string abstraction and work with your data as a buffer. This is sort of gross in Swift thanks to various safety features but it's doable:
var input = "Hello, world!"
//access the byte buffer
var utf8Buffer = input.utf8CString
//replace the first byte with whatever random data we want
utf8Buffer[0] = 46 //ascii encoding of '.'
//now convert back to a Swift string
var output:String! = nil //buffer for holding our new target
utf8Buffer.withUnsafeBufferPointer { (ptr) in
//Load the byte buffer into a Swift string
output = String.init(cString: ptr.baseAddress!)
}
print(output!) //.ello, world!

Invalid dog face scalar

I thought I understand Unicode scalars in Swift pretty well, but the dog face emoji proved me wrong.
for code in "🐶".utf16 {
print(code)
}
The UTF-16 codes are 55357 and 56374. In hex, that's d83d and dc36.
Now:
let dog = "\u{d83d}\u{dc36}"
Instead of getting a string with "🐶", I'm getting an error:
Invalid unicode scalar
I tried with UTF-8 codes and it didn't work neither. Not throwing an error, but returning "ð¶" instead of the dog face.
What is wrong here?
The \u{nnnn} escape sequence expects a Unicode scalar value, not the UTF-16 representation (with high and low surrogates):
for code in "🐶".unicodeScalars {
print(String(code.value, radix: 16))
}
// 1f436
let dog = "\u{1F436}"
print(dog) // 🐶
Solutions to reconstruct a string from its UTF-16 representation can be found at Is there a way to create a String from utf16 array in swift?. For example:
let utf16: [UInt16] = [ 0xd83d, 0xdc36 ]
let dog = String(utf16CodeUnits: utf16, count: utf16.count)
print(dog) // 🐶

Swift string to ascii array Cyrillic

This string extension works with Latin characters, but doesn't work with Cyrillic. Can someone explain why, and how can I fix it?
extension String {
var asciiArray: [UInt32] {
return unicodeScalars.filter{$0.isASCII}.map{$0.value}
}
}
I think you are confusing the original ASCII standard with one of its cyrillic extensions (such as KOI-8R). The original ASCII is 7-bit, whereas an extension would use up the codes above 127 (and up to 255) for its purposes.
Swift's isASCII property on UnicodeScalar type indicates whether a scalar is from the original ASCII.

How to create a single character String

I've reproduced this problem in a Swift playground but haven't solved it yet...
I'd like to print one of a range of characters in a UILabel. If I explicitly declare the character, it works:
// This works.
let value: String = "\u{f096}"
label.text = value // Displays the referenced character.
However, I want to construct the String. The code below appears to produce the same result as the line above, except that it doesn't. It just produces the String \u{f096} and not the character it references.
// This doesn't work
let n: Int = 0x95 + 1
print(String(n, radix: 16)) // Prints "96".
let value: String = "\\u{f0\(String(n, radix: 16))}"
label.text = value // Displays the String "\u{f096}".
I'm probably missing something simple. Any ideas?
How about stop using string conversion voodoo and use standard library type UnicodeScalar?
You can also create Unicode scalar values directly from their numeric representation.
let airplane = UnicodeScalar(9992)
print(airplane)
// Prints "✈︎"
UnicodeScalar.init there is actually returning optional value, so you must unwrap it.
If you need String just convert it via Character type to String.
let airplaneString: String = String(Character(airplane)) // Assuming that airplane here is unwrapped

How can I get the Unicode codepoint represented by an integer in Swift?

So I know how to convert String to utf8 format like this
for character in strings.utf8 {
// for example A will converted to 65
var utf8Value = character
}
I already read the guide but can't find how to convert Unicode code point that represented by integer to String. For example: converting 65 to A. I already tried to use the "\u"+utf8Value but it still failed.
Is there any way to do this?
If you look at the enum definition for Character you can see the following initializer:
init(_ scalar: UnicodeScalar)
If we then look at the struct UnicodeScalar, we see this initializer:
init(_ v: UInt32)
We can put them together, and we get a whole character
Character(UnicodeScalar(65))
and if we want it in a string, it's just another initializer away...
1> String(Character(UnicodeScalar(65)))
$R1: String = "A"
Or (although I can't figure out why this one works) you can do
String(UnicodeScalar(65))