how to convert german charater in to utf string in pdf parsing in iphone? - iphone

I have implementing pdf parsing in which i have parsed pdf and fetch the all text but it disply junks characters so i want to convert in to utf string.How it possible please help me for this question.

First, you need to find out which encoding is currently used for the text. I guess it's ISO-8859-1, aka Latin-1 or it's variant ISO-8859-15, aka Latin-15.
As soon as know that it's a piece of cake. You haven't said in which container you got the text, e.g. whether it's stored in a C string or NSData.
Let's assume you got a C string. In that case you would do:
myString = [[NSString alloc] initWithBytes:myCString
length:strlen(myCString)
encoding:NSISOLatin1StringEncoding];
If you got a NSData, you would use the initWithData:encoding: initializer instead. That's all you need to do, as according to Apple's documentation, "A string object presents itself as an array of Unicode characters". If you need a UTF8-encoded C string, you can then query it via:
myUTF8CString = [myString UTF8String];
There's also dataUsingEncoding: to get a NSData object instead of a C string.
Have a look at the NSString class reference and the NSStringEncoding constants.

Related

NSString to NSData encoding considerations

I understand why when going from NSData to NSString you need to specify encoding.
However I'm finding it frustrating how the reverse (NSString to NSData) needs to have an encoding specified.
In this related question the answers suggested using
NSUTF8StringEncoding or defaultCStringEncoding, with the latter not being fully explained.
So I just wanted to ask IF the following is correct when converting NSString to NSData:
In cases where you want to be 100% sure the binary representation of the NSString object is UTF8 then use NSUTF8StringEncoding (or whatever encoding is needed)
In cases where the encoding of the NSString object is known/expected to already be of a certain type and no conversion is required then it's safe (perhaps internally faster) to use defaultCStringEncoding (from what I have read objective-c uses UTF-16 internally, not sure if LE or BE but I'd assume LE because the platform is LE)
TIA
The encoding needs to be specified for converting NSString to NSData for the same reason it needs to be specified going from NSData to NSString.
An NSData object is a wrapper for a string of absolutely raw bytes. If the NSString doesn't specify some encoding, it doesn't know what to write, because at the level of ones and zeroes, a UTF-16 encoding looks different from a UTF-8 encoding of the same letter, and of course, if you write UTF-16 as big-endian and read it as little-endian you will get gibberish.
In other words, don't think of it as converting or escaping a string; it's generating a byte buffer, and the encoding tells it which ones and zeroes to write when the next character is "a" and which ones to write when it means "妈".
As for your question...here's my two cents.
1) If you are converting an NSString to an NSData so that your same program can convert it back later, and no other software will need to deal with that NSData until after you've read it back into an NSString, then none of this matters. All that matters is that your string-to-data encoding and your data-to-string encoding match.
2) If you are dealing only with ASCII characters, you can probably get away with a lot, just because many kinds of encoding use the same representation for characters under 128. But this breaks easily, even with little things like smart quotes.
3) Despite the name, defaultCStringEncoding is not something you should use as a default. It's designed for special circumstances where you need to deal with system strings and don't otherwise know how the system deals with its internal strings. It refers to the way strings are handled in the default C implementation, NOT in the NSString internals, so there's not necessarily a performance benefit.
4) If you write a string with an unknown string encoding, and you try to read it back with a different string encoding, your code will fail; in many cases, you will just end up with an empty string.
Bottom line is: who will be trying to interpret your NSData objects? If it's your own app, pick an encoding that makes sense for you (I use UTF8 for everything) and use it for both conversions. Otherwise, figure out what your ecosystem needs to read or write and make that your standard.

Decoding HTML entities on iPhone

I have a list of several locations, some of them containing the letters æ, Æ, ø, Ø, å and Å.
From the webservice I'm using, the letters comes out as "&oslash ;" "&Aring ;" etc.
When I download the feed from the webservice, I use UTF-8 encoding.
How can I decode the occurences of these characters?
Thanks!
There is no standard way, to make it simple write your own custom method (or NSString extension) and do this :
string = [string stringByReplacingOccurrencesOfString:#"&" withString:#"&"];
If your webservice is using utf8 and if you decode the data with [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding], all should be ok.
A NSString category called "GTMNSString+HTML" written by Google works perfectly for me. Check it out here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.h & here: https://gist.github.com/takuma104/ntlniph/blob/master/gtm/Foundation/GTMNSString+HTML.m

How to avoid UTF8 characters inside my NSDictionary?

i'm saving a NSString inside an NSArray and that NSArray inside an NSDictionary. While doing this, a process inside my NSDictionary notifies me if my string is like Hi I'm XYZ. Then in the place of single quote the appropriate UTF character is getting stored.
So how to avoid this or how can I get my actual text along with special characters from NSArray or from my NSDictionary?
Any help is thankful.
NSString internally uses Unicode characters. So it easily can handle all sorts of characters from different languages.
You cannot choose the internal encodig of NSString. It's always Unicode. If you have an encoding problem, then you have either created the NSString instance incorrectly or you have output the instance the wrong way.
And there's no such thing as an UTF character.
Please better describe your problem and show the relevant source code.

Convert string which contains symbol of arabic language to const iphone sdk

I am working on one app in which I need to convert my string which contains the arabic language latter in to const. I have following code but it returns me the nil value.
I tried different encoding style like NSISOLatin1StringEncoding, NSASCIIStringEncodin etc.
my code is as follows.
My string cmpnyname contains the arabic charechter.
const char *textcmnylogo = [cmpnyname cStringUsingEncoding:NSSymbolStringEncoding];
textcmnylogo comes nil.
Please let me know the right encoding style.
Thanks in advance
Assuming you have the string in an NSString with the text you need to pick an encoding that can handle the Arabic character(s), NSUTF8StringEncoding will handle it as well as other UTF encodings.

Special Characters from SQLite DB

I read from a sqlite db to my iphone app. Within the texts sometimes there are special characters like 'xf2' or 'xe0' as I can see in the debugger in the char* data type. When I try to transform the chars to an NSString Object by using initWithUTF8String, I get a nil back.
How can I transform such special characters?
It looks like encoding issue. You can get 'xf2' or 'xe0' when you have such symbols as © or ®. Those symbols need 2 bytes, and sqlite can interpret each byte of symbol as separate symbol.
So, try to use not initWithUTF8String, but initWithCString:
NSString *stringFromDB = [[NSString alloc] initWithCString:charsArrayFromDB
encoding:NSASCIIStringEncoding];