Decoding HTML entities on iPhone - iphone

I have a list of several locations, some of them containing the letters æ, Æ, ø, Ø, å and Å.
From the webservice I'm using, the letters comes out as "&oslash ;" "&Aring ;" etc.
When I download the feed from the webservice, I use UTF-8 encoding.
How can I decode the occurences of these characters?
Thanks!

There is no standard way, to make it simple write your own custom method (or NSString extension) and do this :
string = [string stringByReplacingOccurrencesOfString:#"&" withString:#"&"];

If your webservice is using utf8 and if you decode the data with [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding], all should be ok.

A NSString category called "GTMNSString+HTML" written by Google works perfectly for me. Check it out here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.h & here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.m

Related

NSString and no UTF-8 symbols

For some of you (I'm sure) this question is quite simple to answer, but I have some difficulties in understanding how to solve the problem.
I have a .txt file containing a table like this:
" 236
? 26
x00EE 16
As you probably understood the left column lists symbols and the right one lists some code of the, I defined in my app.
And... you probably understood that, within symbols, there are some "strange". The 0x00EE should be the "å" (a with a ring above).
Unfortunately I can't control the left column, i.e. it comes from another software. Making some experiments I found that:
NSLog( #"\x00ee" );
for example produces a waring telling the hte code does not belong to the UTF-8 range.
So I was wandering how to convert the NSString #"\x00ee" (that I read from file, so is a string composed of 6 chars) to the unique unicode letter "å" (a with a ring above).
Can anyone help me?
Thanks...
You need to find out what character set encoding was used. 0xEE is unicode for î. In Unicode, å is E5. This is encoded in UTF-8 as the sequence 0xC3 0xA5. The following does the trick for me:
NSLog(#"\xc3\xa5");
If your input string contains only ASCII characters then you can use the fact that
NSNonLossyASCIIStringEncoding decodes \uNNNN to the corresponding Unicode character:
NSString *s = #"\\x00ee"; // from your text file
NSString *s1 = [s stringByReplacingOccurrencesOfString:#"\\x" withString:#"\\u"];
NSData *d = [s1 dataUsingEncoding:NSASCIIStringEncoding];
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSNonLossyASCIIStringEncoding];
NSLog (#"%#", s2);
Output: î, which is U+00EE (LATIN SMALL LETTER I WITH CIRCUMFLEX).
(Remark: å is U+00E5, not U+00EE).

String encodiing Conversation in JSON response

I have a problem of string encoding. Actually I have a application, which is in 5 languages swedish, norwegian, english, finnish and danish. In one of section of my app, I get the review of user so it's possible to come in different language format like the word in swedish nämndes.
Now the problem is i get the response of review in JSOn format and the swedish character ä came as &a and it print as &a. i want to print as ä format. same in all language character problem.
Please help me...
I do something like this when Im requesting xml data from myserver.
NSString *responseString = [request responseString]; //Pass request text from server over to NSString
NSData *capturedResponseData = [responseString dataUsingEncoding:NSUTF8StringEncoding];
Hope it helps.
Solution finally i change the web service character encoding. The web service Change the special word into UTF8 style encoding like /U00.. When we display in text or label it automatically converted and display specific word.

NSString Encoding problem

My code
char* tmp = "abc \x80 dfg";
NSString* name = [[NSString alloc] initWithUTF8String:tmp];
It returns name as nil. I understand -initWithUTF8String: method doesn't like my extended-ascii \x80 (euro sign). I tried to play with -initWithCString: with all encoding possible. Nothing works.
Interestingly Apple sample code below works properly
[NSString stringWithUTF8String:"Long \xe2\x80\x94 dash"];
I can't figure out how to use their approach. Any help would be much appreciated.
U+0080 is an invalid Unicode codepoint (the Euro sign is U+20AC). It's valid (and the Euro sign) in Windows CP-1252, however:
NSString* name = [[NSString alloc] initWithCString:tmp encoding:NSWindowsCP1252StringEncoding];
(The reason Apple's code works is because of the way UTF-8 characters are represented in bytes.)
The UTF-8 code for € is three bytes long, and it goes: \xe2\x82\xac.
For translating between Unicode code points and UTF-8, you can use the following site: http://www.utf8-chartable.de/unicode-utf8-table.pl . I took the code point for the Euro sign from Wikipedia.
The C99 \u character escape for € is \u20ac
So, €1.99 will be:
NSString *euroString = [NSString stringWithUTF8String:"\u20ac1.99"];
Also check this out for more info: using UTF-32 in NSString

how to convert german charater in to utf string in pdf parsing in iphone?

I have implementing pdf parsing in which i have parsed pdf and fetch the all text but it disply junks characters so i want to convert in to utf string.How it possible please help me for this question.
First, you need to find out which encoding is currently used for the text. I guess it's ISO-8859-1, aka Latin-1 or it's variant ISO-8859-15, aka Latin-15.
As soon as know that it's a piece of cake. You haven't said in which container you got the text, e.g. whether it's stored in a C string or NSData.
Let's assume you got a C string. In that case you would do:
myString = [[NSString alloc] initWithBytes:myCString
length:strlen(myCString)
encoding:NSISOLatin1StringEncoding];
If you got a NSData, you would use the initWithData:encoding: initializer instead. That's all you need to do, as according to Apple's documentation, "A string object presents itself as an array of Unicode characters". If you need a UTF8-encoded C string, you can then query it via:
myUTF8CString = [myString UTF8String];
There's also dataUsingEncoding: to get a NSData object instead of a C string.
Have a look at the NSString class reference and the NSStringEncoding constants.

Special Characters from SQLite DB

I read from a sqlite db to my iphone app. Within the texts sometimes there are special characters like 'xf2' or 'xe0' as I can see in the debugger in the char* data type. When I try to transform the chars to an NSString Object by using initWithUTF8String, I get a nil back.
How can I transform such special characters?
It looks like encoding issue. You can get 'xf2' or 'xe0' when you have such symbols as © or ®. Those symbols need 2 bytes, and sqlite can interpret each byte of symbol as separate symbol.
So, try to use not initWithUTF8String, but initWithCString:
NSString *stringFromDB = [[NSString alloc] initWithCString:charsArrayFromDB
encoding:NSASCIIStringEncoding];