Urlencode cyrillic characters in Swift - swift

I need to convert a cyrillic string to its urlencoded version using Windows-1251 encoding. For the following example string:
Моцарт
The correct result should be:
%CC%EE%F6%E0%F0%F2
I tried addingPercentEncoding(withAllowedCharacters:) but it doesn't work.
How to achieve the desired result in Swift?

NSString has a addingPercentEscapes(using:) method which allows to specify an arbitrary
encoding:
let text = "Моцарт"
if let encoded = (text as NSString).addingPercentEscapes(using: String.Encoding.windowsCP1251.rawValue) {
print(encoded)
// %CC%EE%F6%E0%F0%F2
}
However, this is deprecated as of iOS 9/macOS 10.11. It causes compiler warnings and may not be available in newer OS versions.
What you can do instead is to convert the string do Data with
the desired encoding,
and then convert each byte to the corresponding %NN sequence (using the approach from
How to convert Data to hex string in swift):
let text = "Моцарт"
if let data = text.data(using: .windowsCP1251) {
let encoded = data.map { String(format: "%%%02hhX", $0) }.joined()
print(encoded)
// %CC%EE%F6%E0%F0%F2
}

Related

How do you decode utf8-literals like "\xc3\xa6" in Swift 5?

I am fetching a list of WiFi SSID's from a Bluetooth characteristic. Each SSID is represented as a string, some have these UTF8-Literals like "\xc3\xa6".
I have tried multiple ways to decode this like
let s = "\\xc3\\xa6"
let dec = s.utf8
From this I expect
print(dec)
> æ
etc. but it doesn't work, it just results in
print(dec)
> \xc3\xa6
How do I decode UTF-8 literals in Strings in Swift 5?
You'll just have to parse the string, convert each hex string to a UInt8, and then decode that with String.init(byte:encoding:):
let s = "\\xc3\\xa6"
let bytes = s
.components(separatedBy: "\\x")
// components(separatedBy:) would produce an empty string as the first element
// because the string starts with "\x". We drop this
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
if let decoded = String(bytes: bytes, encoding: .utf8) {
print(decoded)
} else {
print("The UTF8 sequence was invalid!")
}

Decoding strings including utf8-literals like '\xc3\xa6' in Swift?

Follow up question to my former thread about UTF-8 literals:
It was established that you can decode UTF-8 literals from string like this that exclusively includes UTF-8 literals:
let s = "\\xc3\\xa6"
let bytes = s
.components(separatedBy: "\\x")
// components(separatedBy:) would produce an empty string as the first element
// because the string starts with "\x". We drop this
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
if let decoded = String(bytes: bytes, encoding: .utf8) {
print(decoded)
} else {
print("The UTF8 sequence was invalid!")
}
However this only works if the string only contains UTF-8 literals. As I am fetching a Wi-Fi list of names that has these UTF-8 literals within, how do I go about decoding the entire string?
Example:
let s = "This is a WiFi Name \\xc3\\xa6 including UTF-8 literals \\xc3\\xb8"
With the expected result:
print(s)
> This is a WiFi Name æ including UTF-8 literals ø
In Python there is a simple solution to this:
contents = source_file.read()
uni = contents.decode('unicode-escape')
enc = uni.encode('latin1')
dec = enc.decode('utf-8')
Is there a similar way to decode these strings in Swift 5?
To start with add the decoding code into a String extension as a computed property (or create a function)
extension String {
var decodeUTF8: String {
let bytes = self.components(separatedBy: "\\x")
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
return String(bytes: bytes, encoding: .utf8) ?? self
}
}
Then use a regular expression and match using a while loop to replace all matching values
while let range = string.range(of: #"(\\x[a-f0-9]{2}){2}"#, options: [.regularExpression, .caseInsensitive]) {
string.replaceSubrange(range, with: String(string[range]).decodeUTF8)
}
As far as I know there's no native Swift solution to this. To make it look as compact as the Python version at the call site you can build an extension on String to hide the complexity
extension String {
func replacingUtf8Literals() -> Self {
let regex = #"(\\x[a-zAZ0-9]{2})+"#
var str = self
while let range = str.range(of: regex, options: .regularExpression) {
let literalbytes = str[range]
.components(separatedBy: "\\x")
.dropFirst()
.compactMap{UInt8($0, radix: 16)}
guard let actuals = String(bytes: literalbytes, encoding: .utf8) else {
fatalError("Regex error")
}
str.replaceSubrange(range, with: actuals)
}
return str
}
}
This lets you call
print(s.replacingUtf8Literals()).
//prints: This is a WiFi Name æ including UTF-8 literals ø
For convenience I'm trapping a failed conversion with fatalError. You may want to handle this in a better way in production code (although, unless the regex is wrong it should never occur!). There needs to be some form of break or error thrown here else you have an infinite loop.

Convert UInt8 Array to String

I have decrypted using AES (CrytoSwift) and am left with an UInt8 array. What's the best approach to covert the UInt8 array into an appripriate string? Casting the array only gives back a string that looks exactly like the array. (When done in Java, a new READABLE string is obtained when casting Byte array to String).
I'm not sure if this is new to Swift 2, but at least the following works for me:
let chars: [UInt8] = [ 49, 50, 51 ]
var str = String(bytes: chars, encoding: NSUTF8StringEncoding)
In addition, if the array is formatted as a C string (trailing 0), these work:
str = String.fromCString(UnsafePointer(chars)) // UTF-8 is implicit
// or:
str = String(CString: UnsafePointer(chars), encoding: NSUTF8StringEncoding)
I don't know anything about CryptoSwift. But I can read the README:
For your convenience CryptoSwift provides two function to easily convert array of bytes to NSData and other way around:
let data = NSData.withBytes([0x01,0x02,0x03])
let bytes:[UInt8] = data.arrayOfBytes()
So my guess would be: call NSData.withBytes to get an NSData. Now you can presumably call NSString(data:encoding:) to get a string.
SWIFT 3.1
Try this:
let decData = NSData(bytes: enc, length: Int(enc.count))
let base64String = decData.base64EncodedString(options: .lineLength64Characters)
This is string output
Extensions allow you to easily modify the framework to fit your needs, essentially building your own version of Swift (my favorite part, I love to customize). Try this one out, put at the end of your view controller and call in viewDidLoad():
func stringToUInt8Extension() {
var cache : [UInt8] = []
for byte : UInt8 in 97..<97+26 {
cache.append(byte)
print(byte)
}
print("The letters of the alphabet are \(String(cache))")
}
extension String {
init(_ bytes: [UInt8]) {
self.init()
for b in bytes {
self.append(UnicodeScalar(b))
}
}
}

Replace all Special Characters in String with valid URL characters

I cannot figure out how to replace all special characters in a string and convert it to a string I can use in a URL.
What I am using it for:
I am uploading an image, converting it to base64, and then passing it to the Laravel framework, however the base64 string can contain +, /, \, etc. which changes the meaning of the URL.
I can replace the + sign with the following code:
let withoutPlus = image.stringByReplacingCharactersInRange("+", withString: "%2B")
however then I cannot use that as a NSString to try and change the other characters.
Surely there is a way to just target every single special character and convert it something usable in a URL?
You can use stringByAddingPercentEncodingWithAllowedCharacters to escape characters as needed. You pass it an NSCharacterSet containing the characters that are valid for that string (i.e. the ones you don't want replaced). There's a built-in NSCharacterSet for characters allowed in URL query strings that will get you most of the way there, but it includes + and / so you'll need to remove those from the set. You can do that by making a mutable copy of the set and then calling removeCharactersInString:
let allowedCharacters = NSCharacterSet.URLQueryAllowedCharacterSet().mutableCopy() as NSMutableCharacterSet
allowedCharacters.removeCharactersInString("+/=")
Then you can call stringByAddingPercentEncodingWithAllowedCharacters on your string, passing in allowedCharacters:
let encodedImage = image.stringByAddingPercentEncodingWithAllowedCharacters(allowedCharacters)
Note that it will return an optional String (String?) so you'll probably want to use optional binding:
if let encodedImage = image.stringByAddingPercentEncodingWithAllowedCharacters(allowedCharacters) {
/* use encodedImage here */
} else {
/* stringByAddingPercentEncodingWithAllowedCharacters failed for some reason */
}
Example:
let unencodedString = "abcdef/+\\/ghi"
let allowedCharacters = NSCharacterSet.URLQueryAllowedCharacterSet().mutableCopy() as NSMutableCharacterSet
allowedCharacters.removeCharactersInString("+/=")
if let encodedString = unencodedString.stringByAddingPercentEncodingWithAllowedCharacters(allowedCharacters) {
println(encodedString)
}
Prints:
abcdef%2F%2B%5C%2Fghi
Use
let withoutPlus = image.stringByReplacingOccurrencesOfString("+", withString: "%2B")
rather than image.stringByReplacingCharactersInRange. Note that your call as posted doesn't work, as that method is declared as
func stringByReplacingCharactersInRange(range: Range<String.Index>, withString replacement: String) -> String
and you are not supplying the correct parameters.
You might do better to use POST to send a file, rather than encode it into your URL

Append bytes to NSMutableData in Swift

In Obj-C I can successfully append bytes enclosed inside two quotation marks like so:
[commands appendBytes:"\x1b\x61\x01"
length:sizeof("\x1b\x61\x01") - 1];
In Swift I supposed I would do something like:
commands.appendBytes("\x1b\x61\x01", length: sizeof("\x1b\x61\x01") - 1)
But this throws the error "invalid escape sequence in literal", how do I escape bytes in Swift?
As already said, in Swift a string stores Unicode characters, and not – as in (Objective-)C – an arbitrary (NUL-terminated) sequence of char, which is a signed
or unsigned byte on most platforms.
Now theoretically you can retrieve a C string from a Swift string:
let commands = NSMutableData()
let cmd = "\u{1b}\u{61}\u{01}"
cmd.withCString {
commands.appendBytes($0, length: 3)
}
println(commands) // <1b6101>
But this produces not the expected result for all non-ASCII characters:
let commands = NSMutableData()
let cmd = "\u{1b}\u{c4}\u{01}"
cmd.withCString {
commands.appendBytes($0, length: 3)
}
println(commands) // <1bc384>
Here \u{c4} is "Ä" which has the UTF-8 representation C3 84.
A Swift string cannot represent an arbitrary sequence of bytes.
Therefore you better work with an UInt8 array for (binary) control sequences:
let commands = NSMutableData()
let cmd : [UInt8] = [ 0x1b, 0x61, 0xc4, 0x01 ]
commands.appendBytes(cmd, length: cmd.count)
println(commands) // <1b61c401>
For text you have to know which encoding the printer expects.
As an example, NSISOLatin1StringEncoding is the ISO-8859-1 encoding, which is intended for "Western European" languages:
let text = "123Ö\n"
if let data = text.dataUsingEncoding(NSISOLatin1StringEncoding) {
commands.appendData(data)
println(commands) // 313233d6 0a>
} else {
println("conversion failed")
}
Unicode characters in Swift are entered differently - you need to add curly braces around the hex number:
"\u{1b}\u{61}\u{01}"
To avoid duplicating the literal, define a constant for it:
let toAppend = "\u{1b}\u{61}\u{01}"
commands.appendBytes(toAppend, length: toAppend.length - 1)