How to determine string encoding in cocoa?
Recently I'm working on a radio player.Sometimes id3 tag text was garbled.
Here is my code:
CFDictionaryRef audioInfoDictionary;
UInt32 size = sizeof(audioInfoDictionary);
result = AudioFileGetProperty(fileID, kAudioFilePropertyInfoDictionary, &size, &audioInfoDictionary);
ID3 info are in audioInfoDictionary. Sometimes the id3 doesn't use utf8 encoding, and title, artist name were garbled.
Is there any way to determine what encoding a string use?
Special thx!
While it's an NSString object, there's no specific encoding since it's guaranteed to represent whatever is put into it using the encoding determined when it was created. See the Working With Encodings section of the docs.
From where are you getting the ID3 tags? The time you "receive" this information is the best time to determine its encoding. See Creating and Initializing Strings and the next few sections (for file and url creation) for a list of initializers. Some of them let you set the encoding and others pass back (by reference) the "best guess" encoding the system determined when creating the string. Look for methods with "usedEncoding:" for the system's reported guess.
All of this really depends on exactly what is handing you that string. Are you reading it from a file (an MP3) or a web service (Internet Radio)? If the latter, the server's response should include the encoding and if that's wrong, there's not much to do but guess.
Related
I have to read text files in Swift/Cocoa, which are encoded as OEM 850. Does anybody know how to do this?
You can first read the file in as raw data and then convert that data to a string value according to your encoding. A small wrinkle in your case:
There are two types which represent the known string encodings, NSStringEncoding (String.Encoding in Swift) and CFStringEncoding. Apple only directly defines a subset of the known encodings as NSStringEncoding/String.Encoding values. The remaining known encodings have CFStringEncoding values and the function CFStringConvertEncodingToNSStringEncoding() is provided to map these to NSStringEncoding. Unfortunately for you OEM 850 is only directly provided by CFStringEncoding...
That sounds worse than it is. In Objective-C you can get the encoding you require using:
NSStringEncoding dosLatin1 = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingDOSLatin1);
Note: “DOS Latin 1” is one of the names for the same coding “OEM 850” refers to (see Wikipedia for a list) and is the one Apple chose hence the kCFStringEncodingDOSLatin1.
In Swift this is messier:
let dosLatin1 = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.dosLatin1.rawValue)))
Once you have the encoding the rest is straightforward, without error checking an outline is:
let sourceURL = ...
let rawData = try? Data(contentsOf: sourceURL)
let convertedString = String(data: rawData, encoding: dosLatin1)
In real code you must check that the file read and conversion are succesful. Reading raw data from a URL in Swift will throw if the read fails, converting the data to a string produces an optional (String?) as the conversion may fail.
HTH
I want to give a label a text that have multiple fonts in it. This can be accomplished by creating a NSMutableAttributedString. However, I am not sure how I format the following case:
String(format: NSLocalizedString("%# has replied in '%#'", comment: ""), username, conversationTitle)
I want to give the username and conversation title a separate font. I want to do this in the less buggiest way. What I mean by this:
I do not want to find out the username later on in the string by using a substring. This is causing issues when the conversationTitle is the same as the username, or the conversationTitle is in the username etc. etc..
I do not want to build up the string, as seen here: https://stackoverflow.com/a/37992022/7715250. This is just bad when creating NSLocalizedString's, I think the translators are going to have a bad time when string are created like that.
Questions like: Making text bold using attributed string in swift, Are there approaches for using attributed strings in combination with localization? and others are mostly string literals without NSLocalizedString or NSLocalizedString with parameters.
First, you should have in your .strings a much more generic and readble key, something like:
"_REPLIED_IN_" = "%# has replied in %#";
Do not confuse keys and values as you seem to do in your example.
Also, it's easier later to see when there is an hardcoded string not localized in your code.
Now, there is an issue, because in English, it might be in that order, but not necessarily in other languages.
So instead:
"_REPLIED_IN_" = "%1$# has replied in %$2#";
Now, I'll use the bold sample, because it's easier, but what you could do is use some custom tags to tell you that it needs to be bold, like HTML, MarkDown, etc.
In HTML:
"_REPLIED_IN_" = "<b>%1$#</b> has replied in <b>%$2#</b>";
You need to parse it into a NSAttributedString:
let translation = String(format: NSLocalizedString(format: "_REPLIED_IN_", comment: nil), userName, conversationTitle)
let attributedText = NSAttributedString.someMethodThatParseYourTags(translation)
It's up to you to choose the easiest tag format), according to your needs: easy to understand by translators, and easy to parse (CocoaTouch already has a HTML parser, etc.).
In a friend's music directory, I came across this path and filename:
Ministry/Κî•Î¦Î‘Î›Î—Îžî˜ (Psalm 69)/Ministry - Κî•Î¦Î‘Î›Î—Îžî˜ (Psalm 69) - 06 - Scarecrow.mp3
You can google Ministry Κî•Î¦Î‘Î›Î—Îžî˜ and get results. If I feed it into a url encoder, I get %C2%9Ai%C2%95i%C2%A6i%C2%91i%C2%9Bi%C2%97i%C2%9Ei%C2%98.
It's clearly mangled in some way by traversing multiple incorrect encode/decode cycles. What is it supposed to be? How did you get that answer?
I've tried various paper and pencil scribblings with UTF-8, but can't figure out anything that makes sense.
It is supposed to be ΚΕΦΑΛΗΞΘ, which is the title of the Ministry album commonly known as Psalm 69. ΚΕΦΑΛΗΞΘ is what it looks like when the UTF-8 encoded ΚΕΦΑΛΗΞΘ is interpreted as Windows-1252.
This is close, but not identical to your Κî•Î¦Î‘Î›Î—Îžî˜ which has îs in place of two of the Îs. My guess for the discrepancies is, given their change and position, somewhere along the way a TitleCase conversion happened as well.
Got there by way of an educated guess, testing, and #Remy's helpful comment.
Currently, I'm trying to parse an NSData in my iOS app. Problem is, I can't seem to find a proper hebrew encoding for parsing. I must decode the data using the Windows-1255 encoding (hebrew encoding type for windows) or ISO 8859-8 encoding, or I'll get plain gibberish. The closest I've got to solving the issue was using
CFStringConvertEncodingToNSStringEncoding(CFStringEncodings.ISOLatinHebrew)
yet it throws 'CFStringEncodings' is not convertible to 'CFStringEncoding' (notice Encodings vs Encoding).
What can I do in order to encode the data correctly?
Thanks!
The problem is that CFStringEncodings is an enumeration based on CFIndex
(which in turn is a type alias for Int), whereas CFStringEncoding is a type
alias for UInt32. Therefore you have to convert the .ISOLatinHebrew
value explicitly to a CFStringEncoding:
let cfEnc = CFStringEncodings.ISOLatinHebrew
let enc = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEnc.rawValue))
Turns out I needed to get my hands a bit dirty.
I saw that CFStringEncodings has a relation to the file CFStringEncodingsExt.h, so I searched the file for some help. Suddenly I came across a huge CF_ENUM that included exactly what I needed- all of the CFStringEncodings by their UInt32 value!
So it has turned out that kCFStringEncodingISOLatinHebrew = 0x0208, /* ISO 8859-8 */
I encourage everyone who is facing this encoding issue to go to that file and search for his needed encoding.
I wish to generate a barcode mixing code128B and code128C with iTextSharp DLL. Do you know how to do that ? I currently know only with a single codeset.
By example, I wish to generate a barcode with the value 8L1 91450 883421 0550 001065
where "8L1 91450" is in code128B and "883421 0550 001065" is in code128C.
Thanks
Barcode128 will actually automatically switch from B to C if and when it can but it sounds like you don't want this. For the control that you're looking for you'll need to set your barcode's CodeType property to Barcode.CODE128_RAW and manually set the raw values.
There's a couple of posts out there that give the basic idea but unfortunately they tend to assume to much knowledge of iText or too much knowledge of barcodes.
I'm not a barcode expert either but the basic idea is to create a string that starts with Barcode128.START_B, then the first part of your text, then Barcode128.START_C and then the second. When in raw mode, text isn't ASCII, however. You can use this site to get the character codes for various ASCII values. But basically instead of sending the letter L you'd send (char)44.
Hopefully this gets you started at least.