Trying to find the shortest / most compact way to write out ASCII characters in Swift into a single string. For example, in JavaScript you can do '\x00' for the decimal equivalent of 0 in ASCII, or you can write '\0, which is 2 characters shorter. So if you have a lot of these characters, that is 2x smaller file size.
Wondering how to write the ASCII characters 0-31 and 127 in Swift so they are minimal, into a single string. In JavaScript, that sort of looks like this:
'\0...\33abcdef...\127¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½...'
In general, you would use \u{x} where x is the hex value. In your case \u{0} through \u{1f} and \u{7f}.
As in C based languages, Swift strings also supports \0 for "null", \t for "tab", \n for "newline", and \r for "carriage return". Unlike C, Swift does not support \b or \f.
If you want to create single String will all 128 ASCII characters then you can do:
let ascii = String(Array(0...127).map { Character(Unicode.Scalar($0)) })
If you have a lot of these characters, maybe put them in a Data object and then convert it to a string:
let data = Data(bytes: Array(0...31) + [127])
let text = String(data: data, encoding: .utf8)!
Based on your comment, you could do:
let tab = Data(bytes: [9])
let null = Data(bytes: [0])
let data = "abc".data(using: .utf8)! + tab + null + "morechars".data(using: .utf8)! + tab
I need to change
This (string):
"0xab,0xcd,0x00,0x01,0xff,0xff,0xab,0xcd,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00"
to (bytes)
[0xab,0xcd,0x00,0x01,0xff,0xff,0xab,0xcd,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ]
using swift
One option is to remove the 0x from each string, then split the remaining comma separated values into an array. Finally, use flatMap to convert each hex string into a number.
// Your original string
let hexString = "0xab,0xcd,0x00,0x01,0xff,0xff,0xab,0xcd,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00"
// Remove all of the "0x"
let cleanString = hexString.replacingOccurrences(of: "0x", with: "")
// Create an array of hex strings
let hexStrings = cleanString.components(separatedBy: ",")
// Convert the array of hex strings into bytes (UInt8)
let bytes = hexStrings.flatMap { UInt8($0, radix: 16) }
I used flatMap in case there are any values that aren't valid hex byte values.
I'm trying to insert a symbol with ASCII code 255 (Telnet IAC) into a String, but when converting the data back to utf8 I'm getting a different symbol:
var s = "\u{ff}"
print(s.utf8.count) // 2
try! s.write(toFile: "output.txt", atomically: true, encoding: .utf8)
The file contains C3 BF, not FF. I've also tried using
var s = "\(Character(UnicodeScalar(255)))"
but this produced the same result. How to escape it properly?
ASCII defines 128 characters from 0x00 to 0x7F. 0xFF (255) is not included.
In Unicode, U+00FF (in Swift, "\u{ff}") represents "ÿ" (LATIN SMALL LETTER Y WITH DIARESIS).
And its UTF-8 representation is 0xC3 0xBF. See UTF-8, characters with code point from U+0080 to U+07FF are represented with two-byte sequence.
Also you need to know that 0xFF is not a valid byte in UTF-8 byte sequence, which means you cannot get any 0xFF bytes in UTF-8 text file.
If you want to output "\u{ff}" as a single-byte 0xFF, use ISO-8859-1 (aka ISO-Latin-1) instead:
try! s.write(toFile: "output.txt", atomically: true, encoding: .isoLatin1)
I was wondering what is the best way for converting an UTF8 Array or String to its base 2 representation(each UTF8 value of each character to its base 2 representation) . Since you could have two values representing the code for the same character, I suppose extracting values from the array and then converting it is not a valid method. So which one is? Thank you!
Here is a possible approach:
Enumerate the unicode scalars of the string.
Convert each unicode scalar back to a string, and enumerate its
UTF-8 encoding.
Convert each UTF-8 byte to a "binary string".
The last task can be done with the following generic method which
works for all unsigned integer types:
extension UnsignedIntegerType {
func toBinaryString() -> String {
let s = String(self, radix: 2)
let numBits = 8 * sizeofValue(self)
return String(count: numBits - s.characters.count, repeatedValue: Character("0")) + s
}
}
// Example:
// UInt8(100).toBinaryString() = "01100100"
// UInt16.max.toBinaryString() = "1111111111111111"
Then the conversion to a UTF-8 binary representation can be
implemented like this:
func binaryUTF8Strings(string: String) -> [String] {
return string.unicodeScalars.map {
String($0).utf8.map { $0.toBinaryString() }.joinWithSeparator(" ")
}
}
Example usage:
for u in base2UTF8("H€llö 🇩🇪") {
print(u)
}
Output:
01001000
11100010 10000010 10101100
01101100
01101100
11000011 10110110
00100000
11110000 10011111 10000111 10101001
11110000 10011111 10000111 10101010
Note that "🇩🇪" is a single character (an "extended grapheme cluster")
but two unicode scalars.
On a String, I can use utf8 and count to get the number of bytes required to encode the String with UTF-8 encoding:
"a".utf8.count // 1
"チャオ".utf8.count // 9
"チ".utf8.count // 3
However, I don't see an equivalent method on a single Character value. To get the number of bytes required to encode a character in the string to UTF-8, I could iterate through the string by character, convert the Character to a String, and get the utf8.count of that String:
"チャオ".characters.forEach({print(String($0).utf8.count)}) // 3, 3, 3
This seems unnecessarily verbose. Is there a way to get the UTF-8 encoding of a Character in Swift?
Character has no direct (public) accessor to its UTF-8 representation.
There are some internal methods in Character.swift dealing with the UTF-8 bytes, but the public stuff is implemented in
String.UTF8View in StringUTF8.swift.
Therefore String(myChar).utf8.count is the correct way to obtain
the length of the characters UTF-8 representation.