How do you decode utf8-literals like "\xc3\xa6" in Swift 5? - swift

I am fetching a list of WiFi SSID's from a Bluetooth characteristic. Each SSID is represented as a string, some have these UTF8-Literals like "\xc3\xa6".
I have tried multiple ways to decode this like
let s = "\\xc3\\xa6"
let dec = s.utf8
From this I expect
print(dec)
> æ
etc. but it doesn't work, it just results in
print(dec)
> \xc3\xa6
How do I decode UTF-8 literals in Strings in Swift 5?

You'll just have to parse the string, convert each hex string to a UInt8, and then decode that with String.init(byte:encoding:):
let s = "\\xc3\\xa6"
let bytes = s
.components(separatedBy: "\\x")
// components(separatedBy:) would produce an empty string as the first element
// because the string starts with "\x". We drop this
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
if let decoded = String(bytes: bytes, encoding: .utf8) {
print(decoded)
} else {
print("The UTF8 sequence was invalid!")
}

Related

Decoding strings including utf8-literals like '\xc3\xa6' in Swift?

Follow up question to my former thread about UTF-8 literals:
It was established that you can decode UTF-8 literals from string like this that exclusively includes UTF-8 literals:
let s = "\\xc3\\xa6"
let bytes = s
.components(separatedBy: "\\x")
// components(separatedBy:) would produce an empty string as the first element
// because the string starts with "\x". We drop this
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
if let decoded = String(bytes: bytes, encoding: .utf8) {
print(decoded)
} else {
print("The UTF8 sequence was invalid!")
}
However this only works if the string only contains UTF-8 literals. As I am fetching a Wi-Fi list of names that has these UTF-8 literals within, how do I go about decoding the entire string?
Example:
let s = "This is a WiFi Name \\xc3\\xa6 including UTF-8 literals \\xc3\\xb8"
With the expected result:
print(s)
> This is a WiFi Name æ including UTF-8 literals ø
In Python there is a simple solution to this:
contents = source_file.read()
uni = contents.decode('unicode-escape')
enc = uni.encode('latin1')
dec = enc.decode('utf-8')
Is there a similar way to decode these strings in Swift 5?
To start with add the decoding code into a String extension as a computed property (or create a function)
extension String {
var decodeUTF8: String {
let bytes = self.components(separatedBy: "\\x")
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
return String(bytes: bytes, encoding: .utf8) ?? self
}
}
Then use a regular expression and match using a while loop to replace all matching values
while let range = string.range(of: #"(\\x[a-f0-9]{2}){2}"#, options: [.regularExpression, .caseInsensitive]) {
string.replaceSubrange(range, with: String(string[range]).decodeUTF8)
}
As far as I know there's no native Swift solution to this. To make it look as compact as the Python version at the call site you can build an extension on String to hide the complexity
extension String {
func replacingUtf8Literals() -> Self {
let regex = #"(\\x[a-zAZ0-9]{2})+"#
var str = self
while let range = str.range(of: regex, options: .regularExpression) {
let literalbytes = str[range]
.components(separatedBy: "\\x")
.dropFirst()
.compactMap{UInt8($0, radix: 16)}
guard let actuals = String(bytes: literalbytes, encoding: .utf8) else {
fatalError("Regex error")
}
str.replaceSubrange(range, with: actuals)
}
return str
}
}
This lets you call
print(s.replacingUtf8Literals()).
//prints: This is a WiFi Name æ including UTF-8 literals ø
For convenience I'm trapping a failed conversion with fatalError. You may want to handle this in a better way in production code (although, unless the regex is wrong it should never occur!). There needs to be some form of break or error thrown here else you have an infinite loop.

Convert UTF-8 (Bytes) Emoji Code to Emoji icon as a text

I am getting this below string as a response from WS API when they send emoji as a string:
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
I want it to be converted to the emoji icon like this -> Hii 😁
I think so it is coming in UTF-8 Format as explained in the below Image: Image Unicode
I have tried decoding it Online using UTF-8 Decoder
And i got the emoticon Successfully decoded
Before Decoding:
After Decoding:
But the issue here is I do not know how to work with it in Swift.
I referred following link but it did not worked for me.
Swift Encode/decode emojis
Any help would be appreciated.
Thanks.
As you already given the link of converter tool which is clearly doing UTF-8 encoding and decoding. You have UTF-8 encoded string so here is an example of UTF8-Decoding.
Objective-C
const char *ch = [#"Hii \xF0\x9F\x98\x81" cStringUsingEncoding:NSUTF8StringEncoding];
NSString *decode_string = [NSString stringWithUTF8String:ch];
NSLog(#"%#",decode_string);
Output: Hii 😁
Swift
I'm able to convert \\xF0\\x9F\\x98\\x81 to 😁 in SWift.
First I converted the hexa string into Data and then back to String using UTF-8 encoding.
var str = "\\xF0\\x9F\\x98\\x81"
if let data = data(fromHexaStr: str) {
print(String(data: data, encoding: String.Encoding.utf8) ?? "")
}
Output: 😁
Below is the function I used to convert the hexa string into data. I followed this answer.
func data(fromHexaStr hexaStr: String) -> Data? {
var data = Data(capacity: hexaStr.characters.count / 2)
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
regex.enumerateMatches(in: hexaStr, range: NSMakeRange(0, hexaStr.utf16.count)) { match, flags, stop in
let byteString = (hexaStr as NSString).substring(with: match!.range)
var num = UInt8(byteString, radix: 16)!
data.append(&num, count: 1)
}
guard data.count > 0 else { return nil }
return data
}
Note: Problem with above code is it converts hexa string only not combined strings.
FINAL WORKING SOLUTION: SWIFT
I have done this by using for loop instead of [0-9a-f]{1,2} regex because this will also scan 81, 9F, Any Two digits number which is wrong obviously.
For example: I have 81 INR \\xF0\\x9F\\x98\\x81.
/// This line will convert "F0" into hexa bytes
let byte = UInt8("F0", radix: 16)
I made a String extension in which I check upto every 4 characters if it has prefix \x and count 4 and last two characters are convertible into hexa bytes by using radix as mentioned above.
extension String {
func hexaDecoededString() -> String {
var newData = Data()
var emojiStr: String = ""
for char in self.characters {
let str = String(char)
if str == "\\" || str.lowercased() == "x" {
emojiStr.append(str)
}
else if emojiStr.hasPrefix("\\x") || emojiStr.hasPrefix("\\X") {
emojiStr.append(str)
if emojiStr.count == 4 {
/// It can be a hexa value
let value = emojiStr.replacingOccurrences(of: "\\x", with: "")
if let byte = UInt8(value, radix: 16) {
newData.append(byte)
}
else {
newData.append(emojiStr.data(using: .utf8)!)
}
/// Reset emojiStr
emojiStr = ""
}
}
else {
/// Append the data as it is
newData.append(str.data(using: .utf8)!)
}
}
let decodedString = String(data: newData, encoding: String.Encoding.utf8)
return decodedString ?? ""
}
}
USAGE:
var hexaStr = "Hi \\xF0\\x9F\\x98\\x81 81"
print(hexaStr.hexaDecoededString())
Hi 😁 81
hexaStr = "Welcome to SP19!\\xF0\\x9f\\x98\\x81"
print(hexaStr.hexaDecoededString())
Welcome to SP19!😁
I fix your issue but it need more work to make it general , the problem here is that your Emijo is Represented by Hex Byte x9F , so we have to convert this Hex to utf8 then convert it to Data and at last convert data to String
Final result Hii 😁 Please read comment
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
// get all matched hex xF0 , x9f,..etc
let matches = regex.matches(in: strTemp, options: [], range: NSMakeRange(0, strTemp.count))
// Data that will hanlde convert hex to UTf8
var emijoData = Data(capacity: strTemp.count / 2)
matches.enumerated().forEach { (offset , check) in
let byteString = (strTemp as NSString).substring(with: check.range)
var num = UInt8(byteString, radix: 16)!
emijoData.append(&num, count: 1)
}
let subStringEmijo = String.init(data: emijoData, encoding: String.Encoding.utf8)!
//now we have your emijo text 😁 we can replace by its code from string using matched ranges `first` and `last`
// All range range of \\xF0\\x9F\\x98\\x81 in "Hii \\xF0\\x9F\\x98\\x81" to replce by your emijo
if let start = matches.first?.range.location, let end = matches.last?.range.location , let endLength = matches.last?.range.length {
let startLocation = start - 2
let length = end - startLocation + endLength
let sub = (strTemp as NSString).substring(with: NSRange.init(location: startLocation, length: length))
print( strTemp.replacingOccurrences(of: sub, with: subStringEmijo))
// Hii 😁
}

How to use Big5 encoding in Swift on iOS

I'm scanning a QR-Code with chinese characters encoded in Big5. (主页概况)
Is there a chance to get this String decoded correctly in Swift 3?
I found this Objective-C example on GitHub and this SO question, but there are no kCFStringEncodingBig5_HKSCS_1999 and kCFStringEncodingBig constants in Swift.
Update:
I found the corresponding swift variables, so i now tried the following:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
guard metadataObjects?.count ?? 0 > 0 else {
return
}
guard let metadata = metadataObjects.first as? AVMetadataMachineReadableCodeObject, let code = metadata.stringValue else {
return
}
let big5encoding = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.big5.rawValue)))
print("Big5 encoded String: " + (String(data: code.data(using: .nonLossyASCII)!, encoding: big5encoding) ?? "?"))
}
Output: Big5 encoded String: \326\367\322\263\270\305\277\366
How can i get to the expected output Big5 encoded String: 主页概况
Update 2:
It seems that my QR-Code contained some corrupt data, so i created a new Code, this time the content is definitely a Big5 encoded String (Android App reads it correctly). The content is 傳統
When I scan this code with my iOS app, metadata.stringValue returns the japanese String カヌイホ
What the hell is going on here???
CFStringEncodings
are defined as enumeration values in Swift 3:
public enum CFStringEncodings : CFIndex {
// ...
case big5 /* Big-5 (has variants) */
// ...
case big5_HKSCS_1999 /* Big-5 with Hong Kong special char set supplement*/
// ...
}
so you have to convert
CFStringEncodings -> CFStringEncoding -> NSStringEncoding -> String.Encoding
Example:
let cfEnc = CFStringEncodings.big5
let nsEnc = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEnc.rawValue))
let big5encoding = String.Encoding(rawValue: nsEnc) // String.Encoding
Then big5encoding can be used for conversion between String and (NS)Data.
In your case you have a string where each unicode scalar corresponds to
a byte of the Big5 encoding. Then the following should work:
// let code = "\u{00D6}\u{00F7}\u{00D2}\u{00B3}\u{00B8}\u{00C5}\u{00BF}\u{00F6}"
let bytes = code.unicodeScalars.map { UInt8(truncatingBitPattern: $0.value) }
if let result = String(bytes: bytes, encoding: big5encoding) {
print(result)
}
Alternatively, using the fact that the ISO Latin 1 encoding maps
the Unicode code points U+0000 .. U+00FF to the bytes 0x00 .. 0xFF:
if let data = code.data(using: .isoLatin1),
let result = String(data: data, encoding: big5encoding) {
print(result)
}

Convert hex-encoded String to String

I want to convert following hex-encoded String in Swift 3:
dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4
to its equivalant String:
Ü°J:\ص7cï ¼f«à)ýë®^A~/ü*¤
Following websites do the job very fine:
http://codebeautify.org/hex-string-converter
http://string-functions.com/hex-string.aspx
But I am unable to do the same in Swift 3. Following code doesn't do the job too:
func convertHexStringToNormalString(hexString:String)->String!{
if let data = hexString.data(using: .utf8){
return String.init(data:data, encoding: .utf8)
}else{ return nil}
}
Your code doesn't do what you think it does. This line:
if let data = hexString.data(using: .utf8){
means "encode these characters into UTF-8." That means that "01" doesn't encode to 0x01 (1), it encodes to 0x30 0x31 ("0" "1"). There's no "hex" in there anywhere.
This line:
return String.init(data:data, encoding: .utf8)
just takes the encoded UTF-8 data, interprets it as UTF-8, and returns it. These two methods are symmetrical, so you should expect this whole function to return whatever it was handed.
Pulling together Martin and Larme's comments into one place here. This appears to be encoded in Latin-1. (This is a really awkward way to encode this data, but if it's what you're looking for, I think that's the encoding.)
import Foundation
extension Data {
// From http://stackoverflow.com/a/40278391:
init?(fromHexEncodedString string: String) {
// Convert 0 ... 9, a ... f, A ...F to their decimal value,
// return nil for all other input characters
func decodeNibble(u: UInt16) -> UInt8? {
switch(u) {
case 0x30 ... 0x39:
return UInt8(u - 0x30)
case 0x41 ... 0x46:
return UInt8(u - 0x41 + 10)
case 0x61 ... 0x66:
return UInt8(u - 0x61 + 10)
default:
return nil
}
}
self.init(capacity: string.utf16.count/2)
var even = true
var byte: UInt8 = 0
for c in string.utf16 {
guard let val = decodeNibble(u: c) else { return nil }
if even {
byte = val << 4
} else {
byte += val
self.append(byte)
}
even = !even
}
guard even else { return nil }
}
}
let d = Data(fromHexEncodedString: "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4")!
let s = String(data: d, encoding: .isoLatin1)
You want to use the hex encoded data as an AES key, but the
data is not a valid UTF-8 sequence. You could interpret
it as a string in ISO Latin encoding, but the AES(key: String, ...)
initializer converts the string back to its UTF-8 representation,
i.e. you'll get different key data from what you started with.
Therefore you should not convert it to a string at all. Use the
extension Data {
init?(fromHexEncodedString string: String)
}
method from hex/binary string conversion in Swift
to convert the hex encoded string to Data and then pass that
as an array to the AES(key: Array<UInt8>, ...) initializer:
let hexkey = "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4"
let key = Array(Data(fromHexEncodedString: hexkey)!)
let encrypted = try AES(key: key, ....)
There is still a way to convert the key from hex to readable string by adding the below extension:
extension String {
func hexToString()->String{
var finalString = ""
let chars = Array(self)
for count in stride(from: 0, to: chars.count - 1, by: 2){
let firstDigit = Int.init("\(chars[count])", radix: 16) ?? 0
let lastDigit = Int.init("\(chars[count + 1])", radix: 16) ?? 0
let decimal = firstDigit * 16 + lastDigit
let decimalString = String(format: "%c", decimal) as String
finalString.append(Character.init(decimalString))
}
return finalString
}
func base64Decoded() -> String? {
guard let data = Data(base64Encoded: self) else { return nil }
return String(data: data, encoding: .init(rawValue: 0))
}
}
Example of use:
let hexToString = secretKey.hexToString()
let base64ReadableKey = hexToString.base64Decoded() ?? ""

Convert emoji to hex value using Swift

I'm trying to convert emojis in hex values, I found some code online to do it but it's only working using Objective C, how to do the same with Swift?
This is a "pure Swift" method, without using Foundation:
let smiley = "😊"
let uni = smiley.unicodeScalars // Unicode scalar values of the string
let unicode = uni[uni.startIndex].value // First element as an UInt32
print(String(unicode, radix: 16, uppercase: true))
// Output: 1F60A
Note that a Swift Character represents a "Unicode grapheme cluster"
(compare Strings in Swift 2 from the Swift blog) which can
consist of several "Unicode scalar values". Taking the example
from #TomSawyer's comment below:
let zero = "0️⃣"
let uni = zero.unicodeScalars // Unicode scalar values of the string
let unicodes = uni.map { $0.value }
print(unicodes.map { String($0, radix: 16, uppercase: true) } )
// Output: ["30", "FE0F", "20E3"]
If some one trying to found a way to convert Emoji To Unicode string
extension String {
func decode() -> String {
let data = self.data(using: .utf8)!
return String(data: data, encoding: .nonLossyASCII) ?? self
}
func encode() -> String {
let data = self.data(using: .nonLossyASCII, allowLossyConversion: true)!
return String(data: data, encoding: .utf8)!
}
}
Example:
"😍".encode()
RESULT: \ud83d\ude0d
"\ud83d\ude0d".decode()
RESULT: 😍
It works similarly but pay attention when you're printing it:
import Foundation
var smiley = "😊"
var data: NSData = smiley.dataUsingEncoding(NSUTF32LittleEndianStringEncoding, allowLossyConversion: false)!
var unicode:UInt32 = UInt32()
data.getBytes(&unicode)
// println(unicode) // Prints the decimal value
println(NSString(format:"%2X", unicode)) // Print the hex value of the smiley