NSString and no UTF-8 symbols - iphone

For some of you (I'm sure) this question is quite simple to answer, but I have some difficulties in understanding how to solve the problem.
I have a .txt file containing a table like this:
" 236
? 26
x00EE 16
As you probably understood the left column lists symbols and the right one lists some code of the, I defined in my app.
And... you probably understood that, within symbols, there are some "strange". The 0x00EE should be the "å" (a with a ring above).
Unfortunately I can't control the left column, i.e. it comes from another software. Making some experiments I found that:
NSLog( #"\x00ee" );
for example produces a waring telling the hte code does not belong to the UTF-8 range.
So I was wandering how to convert the NSString #"\x00ee" (that I read from file, so is a string composed of 6 chars) to the unique unicode letter "å" (a with a ring above).
Can anyone help me?
Thanks...

You need to find out what character set encoding was used. 0xEE is unicode for î. In Unicode, å is E5. This is encoded in UTF-8 as the sequence 0xC3 0xA5. The following does the trick for me:
NSLog(#"\xc3\xa5");

If your input string contains only ASCII characters then you can use the fact that
NSNonLossyASCIIStringEncoding decodes \uNNNN to the corresponding Unicode character:
NSString *s = #"\\x00ee"; // from your text file
NSString *s1 = [s stringByReplacingOccurrencesOfString:#"\\x" withString:#"\\u"];
NSData *d = [s1 dataUsingEncoding:NSASCIIStringEncoding];
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSNonLossyASCIIStringEncoding];
NSLog (#"%#", s2);
Output: î, which is U+00EE (LATIN SMALL LETTER I WITH CIRCUMFLEX).
(Remark: å is U+00E5, not U+00EE).

Related

NSString UTF8String mangling unicode characters

When I run [NSString UTF8String] on certain unicode characters the resulting const char* representation is mangled both in NSLog and on the device/simulator display. The NSString itself displays fine but I need to convert the NSString to a cStr to use it in CGContextShowTextAtPoint.
It's very easy to reproduce (see code below) but I've searched for similar questions without any luck. Must be something basic I'm missing.
const char *cStr = [#"章" UTF8String];
NSLog(#"%s", cStr);
Thanks!
CGContextShowTextAtPoint is only for ASCII chars.
Check this SO question for answers.
When using the string format specifier (aka %s) you cannot be guaranteed that the characters of a c string will print correctly if they are not ASCII. Using a complex character as you've defined can be expressed in UTF-8 using escape characters to indicate the character set from which the character can be found. However the %s uses the system encoding to interpret the characters in the character string you provide to the formatting ( in this case, in NSLog ). See Apple's documentation:
https://developer.apple.com/library/mac/documentation/cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. %s interprets its input in the system encoding rather than, for example, UTF-8.
Going onto you CGContextShowTextAtPoint not working, that API supports only the macRoman character set, which is not the entire Unicode character set.
Youll need to look into another API for showing Unicode characters. Probably Core Text is where you'll want to start.
I've never noticed this issue before, but some quick experimentation shows that using printf instead of NSLog will cause the correct Unicode character to show up.
Try:
printf("%s", cStr);
This gives me the desired output ("章") both in the Xcode console and in Terminal. As nob1984 stated in his answer, the interpretation of the character data is up to the callee.

NSString Encoding problem

My code
char* tmp = "abc \x80 dfg";
NSString* name = [[NSString alloc] initWithUTF8String:tmp];
It returns name as nil. I understand -initWithUTF8String: method doesn't like my extended-ascii \x80 (euro sign). I tried to play with -initWithCString: with all encoding possible. Nothing works.
Interestingly Apple sample code below works properly
[NSString stringWithUTF8String:"Long \xe2\x80\x94 dash"];
I can't figure out how to use their approach. Any help would be much appreciated.
U+0080 is an invalid Unicode codepoint (the Euro sign is U+20AC). It's valid (and the Euro sign) in Windows CP-1252, however:
NSString* name = [[NSString alloc] initWithCString:tmp encoding:NSWindowsCP1252StringEncoding];
(The reason Apple's code works is because of the way UTF-8 characters are represented in bytes.)
The UTF-8 code for € is three bytes long, and it goes: \xe2\x82\xac.
For translating between Unicode code points and UTF-8, you can use the following site: http://www.utf8-chartable.de/unicode-utf8-table.pl . I took the code point for the Euro sign from Wikipedia.
The C99 \u character escape for € is \u20ac
So, €1.99 will be:
NSString *euroString = [NSString stringWithUTF8String:"\u20ac1.99"];
Also check this out for more info: using UTF-32 in NSString

How do you add a macron to a character in an NSString? via Unicode?

Objective-C iOS Programming:
I need to display a number like 8.33333 just as 8.3, with the three having a macron (repeating number symbol, a bar line) above it. I have done some searching and have not found a solution to this. I have found the encoding for C/C++/Java source code being "\u0304" and for Unicode being "U+0304". Is there a way that I can create an NSString from a Unicode character? And how would a create a Unicode character with a macron?
Thanks.
For combining characters such as U+0304, the string should contain the original letter followed by the combining character. For instance,
NSString *str = #"ca\u0304t";
is a representation of cāt.

NSString stringWithCharacters Unicode Problem

This has got to be simple -- surely this method is supposed to work -- but I'm having some kind to two-byte-to-one-byte problem, I think.
The purpose of the code is to generate a string of 0 characters of a certain length (10 minus the number of digits that will be tacked onto the end). It looks like this:
const unichar zero = 0x0030;
NSString *zeroBuffer = [NSString stringWithCharacters:&zero length:(10 - [[NSString stringWithFormat:#"%i", photoID] length])];
Alternate second line (casting the thing at address &zero):
NSString *zeroBuffer = [NSString stringWithCharacters:(unichar *)&zero length:(10 - [[NSString stringWithFormat:#"%i", photoID] length])];
0x0030 is the address of the numeral 0 in the Basic Latin portion of the unicode table.
If photoID is 123 I'd want zeroBuffer to be #"0000000". What it actually ends up as is a zero and then some crazy unicode characters along the lines of (not sure how this will show) this:
0䪨 燱ܾ뿿﹔
I'm assuming that I've got data crossing character boundaries or something. I've temporarily rewritten it as a dumb substring thing, but this seems like it would be more efficient.
What am I doing wrong?
stringWithCharacters:length: expects the first argument to be the address of a buffer containing each of the characters to be inserted in the string in sequence. It's reading your character zero for the first character, then advancing to the following memory address and reading whatever data is there for the next character, and so on. This is not the right method for doing what you're trying to do.
Alas, there isn't a built-in repeat-this-string method. See the answers here for suggestions.
Alternatively, you can avoid the issue completely and just do this:
[NSString stringWithFormat:#"%010i", photoID];
That causes the number formatter to output a decimal number padded with ten zeroes.

Special Characters from SQLite DB

I read from a sqlite db to my iphone app. Within the texts sometimes there are special characters like 'xf2' or 'xe0' as I can see in the debugger in the char* data type. When I try to transform the chars to an NSString Object by using initWithUTF8String, I get a nil back.
How can I transform such special characters?
It looks like encoding issue. You can get 'xf2' or 'xe0' when you have such symbols as © or ®. Those symbols need 2 bytes, and sqlite can interpret each byte of symbol as separate symbol.
So, try to use not initWithUTF8String, but initWithCString:
NSString *stringFromDB = [[NSString alloc] initWithCString:charsArrayFromDB
encoding:NSASCIIStringEncoding];