I'm building an app in Swift an I'm using Backendless as my backend. Turns out their database is UTF8 and thus I can't save emojis without converting the String first.
I can't seem to find the right way to make this conversion to UTF8. I tried this:
let encoding = processedText.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
But after this operation the emojis look like this:
%F0%9F%99%84%F0%9F%98%80%F0%9F%98%92%F0%9F%98%89%F0%9F%98%B6%F0%9F%98%B6%F0%9F%98%80%F0%9F%99%81
And I tried this:
class func stringToUTF8String (string: String) -> String? {
let encodedData = string.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
do{
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
return attributedString.string
}catch _ {
}
return nil
}
And the emojis look like this:
🤔🤔👅ðŸ™ðŸ˜‚ðŸ˜ðŸ˜ŽðŸ˜‰ðŸ˜…😉
Does anyone have any suggestions? Thanks
First, to create a String from NSData with utf8 encoding you use
String(data: theData, encoding: NSUTF8StringEncoding)
Second, swift String's already are unicode compliant. You do not have to convert them as they already do that. You can access different encodings with their respective properties, e.g. String.utf8, String.utf16, and so on.
Third, to have NSAttributedString properly utf8 encode your string from data you have to add NSCharacterEncodingDocumentAttribute key to the attributedOptions dictionary with the value NSUTF8StringEncoding.
Final notes, I don't know if that's a partial method, but attributed string shouldn't be used just to encode a string.
Here is NSAttributedString encoding the data in some format returning gibberish.
Here is NSAttributedString encoding data as utf8 and returning correct text.
Here is encoding a string as utf8 string.
I know it's an image, but I wanted the results to show. If these don't work, the database may be stripping bits. Which sucks, also have no idea what to do then, and you probably shouldn't use that database if you want unicode support.
Related
I have been playing with a json file in a playground and I've seen examples of reading the file like this:
do {
let jsonData = try String(contentsOf: url).data(using: .utf8)
} catch {
...
}
And like this:
do {
let jsonData = try Data(contentsOf: url)
} catch {
...
}
Is there a difference in the data? The only difference I see is the String data method is being formatted as UTF8 when read, where I am assuming the Data method is reading with a default format (UTF8 also??)? I can't see a difference in the data, however, but just want to make sure.
The difference is that String(contentsOf: url) tries to read text from that URL, whereas Data(contentsOf: url) reads the raw bytes.
Therefore, if the file at the URL is not a plain text file, String(contentsOf: url) could throw an error, whereas Data(contentsOf: url) would read it successfully.
Regarding the encoding, String(contentsOf: url) is undocumented, but from its implementation, we can see that it calls NSString.init(contentsOf:usedEncoding:):
public init(
contentsOf url: __shared URL
) throws {
let ns = try NSString(contentsOf: url, usedEncoding: nil)
self = String._unconditionallyBridgeFromObjectiveC(ns)
}
NSString.init(contentsOf:usedEncoding:) is documented:
Returns an NSString object initialized by reading data from a given URL and returns by reference the encoding used to interpret the data.
So apparently the encoding is guessed (?) and returned by reference, which is then ignored by String.init(contentsOf:), as it passed nil for the usedEncoding parameter.
This means that for some non-UTF-8 files, there is a chance of String(contentsOf:) guessing the correct encoding, and then data(using: .utf8) encodes the string to UTF-8 bytes, making the rest of your code work. If you had used Data(contentsOf:), you would be reading in the wrong encoding, and though it wouldn't throw an error, the JSON-parsing code later down the line probably would.
That said, JSON is supposed to be exchanged in UTF-8 (See RFC), so an error when you read a non-UTF-8 file is probably desired.
So basically, if we are choosing between these two options, just use Data(contentsOf:). It's simpler and less typing. You don't need to worry about thing like wrong encodings, or that the file is not plain text. If anything like that happens, it is not JSON, and the JSONDecoder later down the line would throw.
In order to convert a String instance to a Data instance in Swift you can use data(using:allowLossyConversion:), which returns an optional Data instance.
Can the return value of this function ever be nil if the encoding is UTF-8 (String.Encoding.utf8)?
If the return value cannot be nil it would be safe to always force-unwrap such a conversion.
UTF-8 can represent all valid Unicode code points, therefore a conversion
of a Swift string to UTF-8 data cannot fail.
The forced unwrap in
let string = "some string .."
let data = string.data(using: .utf8)!
is safe.
The same would be true for .utf16 or .utf32, but not for
encodings which represent only a restricted character set,
such as .ascii or .isoLatin1.
You can alternatively use the .utf8 view of a string to create UTF-8 data,
avoiding the forced unwrap:
let string = "some string .."
let data = Data(string.utf8)
Can someone explain the behavior of the function below? Some have suggested to not use NSData. Do you have better alternatives to mention? If the returned value is Base64Encoded can I decode on one of the online encoders/decoders? Thanks.
func stringToData(message: String) -> NSData? {
let strData = NSData(base64Encoded: message, options: NSData.Base64DecodingOptions.ignoreUnknownCharacters)
return strData
}
NSData(base64Encoded:options:) is documented to attempt to initialize a data object with the given Base64 encoded string—and return nil if it fails. In other words; it decodes a Base64 encoded string as an NSData object.
In Swift, you would likely use the base64EncodedString() function and the Data(base64Encoded:) initializer on the Data type to encode and decode data as Base64 strings, for example like this:
let originalData = Data(bytes: [1,2,3,4,5,6,7,8,9,10,11,12])
let encodedAsBase64String = originalData.base64EncodedString()
// "AQIDBAUGBwgJCgsM"
let decodedData = Data(base64Encoded: encodedAsBase64String) // is optional because the decoding can fail
// 12 bytes: <01020304 05060708 090A0B0C>
Converting Data to String returns a nil value.
Code:
// thus unwraps the image
if let image = image{
print("Saving image data")
// don't unwrap here
if let data = UIImagePNGRepresentation(image){
let str = String(data: data, encoding: .utf8)
print(str)
}
}
I don't know the reason.
Also, how do I convert the String back to Data?
This doesn't work because when you interpret the bytes of the Image as a String, the string is invalid. Not every jumble of data is a valid utf8 string. i.e. not every collection of n bits (8, sometimes 16) are a valid utf8 code point. The Swift String api loops through the data object you pass it to validate that it is a valid string. In your case, theres no reason to think that this Data is a valid string, so it doesn't work.
A good read on utf8:
https://www.objc.io/issues/9-strings/unicode/
I write a little snippet of usual code but found that my code don't return hex data from server with this line of code:
let currentData = try! Data(contentsOf: fullURL!)
print("currentData=", currentData)
And the output:
currentData= 24419 bytes
I tried to use Leo's comment link:
stackoverflow.com/q/39075043/2303865
I got something hex data without spaces, and validator (http://jsonprettyprint.com) can't recognise it and returns null.
Let's try to sort out the different issues here and summarize the
above comments.
The description method
of Data prints only a short summary "NNN bytes", and not a hex dump
as NSData did:
let o = ["foo": "bar"]
let jsonData = try! JSONSerialization.data(withJSONObject: o)
print(jsonData) // 13 bytes
You can get a hex dump by bridging to NSData (source):
print(jsonData as NSData) // <7b22666f 6f223a22 62617222 7d>
or by writing an extension method for Data (How to convert Data to hex string in swift).
But that is actually not the real problem. The JSON validator needs
the JSON as a string, not as a hex dump (source):
print(String(data: jsonData, encoding: .utf8)!) // {"foo":"bar"}
And to de-serialize the JSON data into an object you would need
none of the above and just call
let obj = try JSONSerialization.jsonObject(with: jsonData)