NSString Encoding problem - iphone

My code
char* tmp = "abc \x80 dfg";
NSString* name = [[NSString alloc] initWithUTF8String:tmp];
It returns name as nil. I understand -initWithUTF8String: method doesn't like my extended-ascii \x80 (euro sign). I tried to play with -initWithCString: with all encoding possible. Nothing works.
Interestingly Apple sample code below works properly
[NSString stringWithUTF8String:"Long \xe2\x80\x94 dash"];
I can't figure out how to use their approach. Any help would be much appreciated.

U+0080 is an invalid Unicode codepoint (the Euro sign is U+20AC). It's valid (and the Euro sign) in Windows CP-1252, however:
NSString* name = [[NSString alloc] initWithCString:tmp encoding:NSWindowsCP1252StringEncoding];
(The reason Apple's code works is because of the way UTF-8 characters are represented in bytes.)

The UTF-8 code for € is three bytes long, and it goes: \xe2\x82\xac.
For translating between Unicode code points and UTF-8, you can use the following site: http://www.utf8-chartable.de/unicode-utf8-table.pl . I took the code point for the Euro sign from Wikipedia.

The C99 \u character escape for € is \u20ac
So, €1.99 will be:
NSString *euroString = [NSString stringWithUTF8String:"\u20ac1.99"];
Also check this out for more info: using UTF-32 in NSString

Related

NSString and no UTF-8 symbols

For some of you (I'm sure) this question is quite simple to answer, but I have some difficulties in understanding how to solve the problem.
I have a .txt file containing a table like this:
" 236
? 26
x00EE 16
As you probably understood the left column lists symbols and the right one lists some code of the, I defined in my app.
And... you probably understood that, within symbols, there are some "strange". The 0x00EE should be the "å" (a with a ring above).
Unfortunately I can't control the left column, i.e. it comes from another software. Making some experiments I found that:
NSLog( #"\x00ee" );
for example produces a waring telling the hte code does not belong to the UTF-8 range.
So I was wandering how to convert the NSString #"\x00ee" (that I read from file, so is a string composed of 6 chars) to the unique unicode letter "å" (a with a ring above).
Can anyone help me?
Thanks...
You need to find out what character set encoding was used. 0xEE is unicode for î. In Unicode, å is E5. This is encoded in UTF-8 as the sequence 0xC3 0xA5. The following does the trick for me:
NSLog(#"\xc3\xa5");
If your input string contains only ASCII characters then you can use the fact that
NSNonLossyASCIIStringEncoding decodes \uNNNN to the corresponding Unicode character:
NSString *s = #"\\x00ee"; // from your text file
NSString *s1 = [s stringByReplacingOccurrencesOfString:#"\\x" withString:#"\\u"];
NSData *d = [s1 dataUsingEncoding:NSASCIIStringEncoding];
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSNonLossyASCIIStringEncoding];
NSLog (#"%#", s2);
Output: î, which is U+00EE (LATIN SMALL LETTER I WITH CIRCUMFLEX).
(Remark: å is U+00E5, not U+00EE).

NSString UTF8String mangling unicode characters

When I run [NSString UTF8String] on certain unicode characters the resulting const char* representation is mangled both in NSLog and on the device/simulator display. The NSString itself displays fine but I need to convert the NSString to a cStr to use it in CGContextShowTextAtPoint.
It's very easy to reproduce (see code below) but I've searched for similar questions without any luck. Must be something basic I'm missing.
const char *cStr = [#"章" UTF8String];
NSLog(#"%s", cStr);
Thanks!
CGContextShowTextAtPoint is only for ASCII chars.
Check this SO question for answers.
When using the string format specifier (aka %s) you cannot be guaranteed that the characters of a c string will print correctly if they are not ASCII. Using a complex character as you've defined can be expressed in UTF-8 using escape characters to indicate the character set from which the character can be found. However the %s uses the system encoding to interpret the characters in the character string you provide to the formatting ( in this case, in NSLog ). See Apple's documentation:
https://developer.apple.com/library/mac/documentation/cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. %s interprets its input in the system encoding rather than, for example, UTF-8.
Going onto you CGContextShowTextAtPoint not working, that API supports only the macRoman character set, which is not the entire Unicode character set.
Youll need to look into another API for showing Unicode characters. Probably Core Text is where you'll want to start.
I've never noticed this issue before, but some quick experimentation shows that using printf instead of NSLog will cause the correct Unicode character to show up.
Try:
printf("%s", cStr);
This gives me the desired output ("章") both in the Xcode console and in Terminal. As nob1984 stated in his answer, the interpretation of the character data is up to the callee.

Decoding HTML entities on iPhone

I have a list of several locations, some of them containing the letters æ, Æ, ø, Ø, å and Å.
From the webservice I'm using, the letters comes out as "&oslash ;" "&Aring ;" etc.
When I download the feed from the webservice, I use UTF-8 encoding.
How can I decode the occurences of these characters?
Thanks!
There is no standard way, to make it simple write your own custom method (or NSString extension) and do this :
string = [string stringByReplacingOccurrencesOfString:#"&" withString:#"&"];
If your webservice is using utf8 and if you decode the data with [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding], all should be ok.
A NSString category called "GTMNSString+HTML" written by Google works perfectly for me. Check it out here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.h & here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.m

decoding quoted-printables

I am looking for a way to decode quoted-printables.
The quoted-printables are for arabic characters and look like this:
=D8=B3=D8=B9=D8=A7=D8=AF
I need to convert it to a string, and store it or display..
I've seen post on stackoverflow for the other way around (encoding), but couldn't find decoding.
Uhm, it's a little hacky but you could replace the = characters with a % character and use NSString's stringByReplacingPercentEscapesUsingEncoding: method. Otherwise, you could essentially split the string on the = characters, convert each element to a byte value (easily done using NSScanner), put the byte values into a C array, and use NSString's initWithBytes:length:encoding: method.
Note that your example isn't technically in quoted-printable format, which specifies that a quoted-printable is a three character sequence consisting of an = character followed by two hex digits.
In my case I was coming from EML... bensnider's answer worked great... quoted-printable (at least in EML) uses an = sign followed by \r\n to signify a line wrapping, so this was the code needed to cleanly translate:
(Made as a category cause I loves dem)
#interface NSString (QuotedPrintable)
- (NSString *)quotedPrintableDecode;
#end
#implementation NSString (QuotedPrintable)
- (NSString *)quotedPrintableDecode
{
NSString *decodedString = [self stringByReplacingOccurrencesOfString:#"=\r\n" withString:#""]; // Ditch the line wrap indicators
decodedString = [decodedString stringByReplacingOccurrencesOfString:#"=" withString:#"%"]; // Change the ='s to %'s
decodedString = [decodedString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; // Replace the escaped strings.
return decodedString;
}
#end
Which worked great for decoding my EML / UTF-8 objects!
Bensnider's answer is correct, the easy way of it.
u'll need to replace the "=" to "%"
NSString *s = #"%D8%B3%D8%B9%D8%A7%D8%AF";
NSString *s2 = [s stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
s2 stored "سعاد" which makes sense so this should work straight forward with out a hack
In some cases the line ends are not "=\r\n" but are only "=\n", in which case you need another step:
decodedString = [self stringByReplacingOccurrencesOfString:#"=\n" withString:#""];
Otherwise, the final step fails due to the unbalanced "%" at the end of a line.
I know nothing of the iPhone, but most email processing libraries will contain functions to do this, as email is where this format is used. I suggest searching for MIME decoding type functions, similar to those at enter link description here.
The earlier posters approach also seems fine to me - I feel he is being a little too self-deprecating in describing it as hacky :)
Please see a working solution that takes a quoted-printable-containing strings and resolves those graphemes. The only thing you should pay attention to is the encoding (that answer is based upon UTF8, by it can be easily switched to any other): https://stackoverflow.com/a/32903103/2799410

How do you add a macron to a character in an NSString? via Unicode?

Objective-C iOS Programming:
I need to display a number like 8.33333 just as 8.3, with the three having a macron (repeating number symbol, a bar line) above it. I have done some searching and have not found a solution to this. I have found the encoding for C/C++/Java source code being "\u0304" and for Unicode being "U+0304". Is there a way that I can create an NSString from a Unicode character? And how would a create a Unicode character with a macron?
Thanks.
For combining characters such as U+0304, the string should contain the original letter followed by the combining character. For instance,
NSString *str = #"ca\u0304t";
is a representation of cāt.