Special Characters from SQLite DB - iphone

I read from a sqlite db to my iphone app. Within the texts sometimes there are special characters like 'xf2' or 'xe0' as I can see in the debugger in the char* data type. When I try to transform the chars to an NSString Object by using initWithUTF8String, I get a nil back.
How can I transform such special characters?

It looks like encoding issue. You can get 'xf2' or 'xe0' when you have such symbols as © or ®. Those symbols need 2 bytes, and sqlite can interpret each byte of symbol as separate symbol.
So, try to use not initWithUTF8String, but initWithCString:
NSString *stringFromDB = [[NSString alloc] initWithCString:charsArrayFromDB
encoding:NSASCIIStringEncoding];

Related

NSString and no UTF-8 symbols

For some of you (I'm sure) this question is quite simple to answer, but I have some difficulties in understanding how to solve the problem.
I have a .txt file containing a table like this:
" 236
? 26
x00EE 16
As you probably understood the left column lists symbols and the right one lists some code of the, I defined in my app.
And... you probably understood that, within symbols, there are some "strange". The 0x00EE should be the "å" (a with a ring above).
Unfortunately I can't control the left column, i.e. it comes from another software. Making some experiments I found that:
NSLog( #"\x00ee" );
for example produces a waring telling the hte code does not belong to the UTF-8 range.
So I was wandering how to convert the NSString #"\x00ee" (that I read from file, so is a string composed of 6 chars) to the unique unicode letter "å" (a with a ring above).
Can anyone help me?
Thanks...
You need to find out what character set encoding was used. 0xEE is unicode for î. In Unicode, å is E5. This is encoded in UTF-8 as the sequence 0xC3 0xA5. The following does the trick for me:
NSLog(#"\xc3\xa5");
If your input string contains only ASCII characters then you can use the fact that
NSNonLossyASCIIStringEncoding decodes \uNNNN to the corresponding Unicode character:
NSString *s = #"\\x00ee"; // from your text file
NSString *s1 = [s stringByReplacingOccurrencesOfString:#"\\x" withString:#"\\u"];
NSData *d = [s1 dataUsingEncoding:NSASCIIStringEncoding];
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSNonLossyASCIIStringEncoding];
NSLog (#"%#", s2);
Output: î, which is U+00EE (LATIN SMALL LETTER I WITH CIRCUMFLEX).
(Remark: å is U+00E5, not U+00EE).

Decoding HTML entities on iPhone

I have a list of several locations, some of them containing the letters æ, Æ, ø, Ø, å and Å.
From the webservice I'm using, the letters comes out as "&oslash ;" "&Aring ;" etc.
When I download the feed from the webservice, I use UTF-8 encoding.
How can I decode the occurences of these characters?
Thanks!
There is no standard way, to make it simple write your own custom method (or NSString extension) and do this :
string = [string stringByReplacingOccurrencesOfString:#"&" withString:#"&"];
If your webservice is using utf8 and if you decode the data with [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding], all should be ok.
A NSString category called "GTMNSString+HTML" written by Google works perfectly for me. Check it out here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.h & here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.m

How to avoid UTF8 characters inside my NSDictionary?

i'm saving a NSString inside an NSArray and that NSArray inside an NSDictionary. While doing this, a process inside my NSDictionary notifies me if my string is like Hi I'm XYZ. Then in the place of single quote the appropriate UTF character is getting stored.
So how to avoid this or how can I get my actual text along with special characters from NSArray or from my NSDictionary?
Any help is thankful.
NSString internally uses Unicode characters. So it easily can handle all sorts of characters from different languages.
You cannot choose the internal encodig of NSString. It's always Unicode. If you have an encoding problem, then you have either created the NSString instance incorrectly or you have output the instance the wrong way.
And there's no such thing as an UTF character.
Please better describe your problem and show the relevant source code.

how to convert german charater in to utf string in pdf parsing in iphone?

I have implementing pdf parsing in which i have parsed pdf and fetch the all text but it disply junks characters so i want to convert in to utf string.How it possible please help me for this question.
First, you need to find out which encoding is currently used for the text. I guess it's ISO-8859-1, aka Latin-1 or it's variant ISO-8859-15, aka Latin-15.
As soon as know that it's a piece of cake. You haven't said in which container you got the text, e.g. whether it's stored in a C string or NSData.
Let's assume you got a C string. In that case you would do:
myString = [[NSString alloc] initWithBytes:myCString
length:strlen(myCString)
encoding:NSISOLatin1StringEncoding];
If you got a NSData, you would use the initWithData:encoding: initializer instead. That's all you need to do, as according to Apple's documentation, "A string object presents itself as an array of Unicode characters". If you need a UTF8-encoded C string, you can then query it via:
myUTF8CString = [myString UTF8String];
There's also dataUsingEncoding: to get a NSData object instead of a C string.
Have a look at the NSString class reference and the NSStringEncoding constants.

NSStream, UTF8String & NSString... Messy Conversion

I am constructing a data packet to be sent over NSStream to a server. I am trying to seperate two pieces of data with the a '§' (ascii code 167). This is the way the server is built, so I need to try to stay within those bounds...
unichar asciiChar = 167; //yields #"§"
[self setSepString:[NSString stringWithCharacters:&asciiChar length:1]];
sendData=[NSString stringWithFormat:#"USER User%#Pass", sepString];
NSLog(sendData);
const uint8_t *rawString=(const uint8_t *)[sendData UTF8String];
[oStream write:rawString maxLength:[sendData length]];
So the final outcome should look like this.. and it does when sendData is first constructed:
USER User§Pass
however, when it is received on the server side, it looks like this:
//not a direct copy and paste. The 'mystery character' may not be exact
USER UserˤPas
...the seperator string has become two in length, and the last letter is getting cropped from the command. I believe this to be cause by the UTF8 conversion.
Can anyone shed some light on this for me?
Any help would be greatly appreciated!
The correct encoding in UTF-8 for this character is the two-byte sequence 0xC2 0xA7, which is what you're getting. (Fileformat.info is invaluable for this stuff.) This is out of the LATIN-1 set, so you almost certainly want to be using NSISOLatin1StringEncoding rather than NSUTF8StringEncoding in order to get a single-byte 167 encoding. Look at NSString -dataUsingEncoding:.
What you have and what you want to transmit is not really a UTF-8 string, and it's technically not us-ascii, because that's only 7 bits. You want to transmit an arbitrary array of bytes, according to the protocol that you're working with. The two fields of the byte array, username and password, might themselves be UTF-8 strings, but with the 167 separator it cannot be a UTF-8 string.
Here are some options I see:
Construct the uint8_t* byte array using at least two different NSString objects plus the 167 code. This will be necessary if the username or password can possibly contain non-ascii characters.
Use the NSString method getBytes:maxLength:usedLength:encoding:options:range:remainingRange and set encoding to NSASCIIStringEncoding. If you do this you must validate elsewhere that your username and password is us-ascii only.
Use the NSString method getCString. However, that's been deprecated because you cannot specify the encoding you want.