NSStream, UTF8String & NSString... Messy Conversion - iphone

I am constructing a data packet to be sent over NSStream to a server. I am trying to seperate two pieces of data with the a '§' (ascii code 167). This is the way the server is built, so I need to try to stay within those bounds...
unichar asciiChar = 167; //yields #"§"
[self setSepString:[NSString stringWithCharacters:&asciiChar length:1]];
sendData=[NSString stringWithFormat:#"USER User%#Pass", sepString];
NSLog(sendData);
const uint8_t *rawString=(const uint8_t *)[sendData UTF8String];
[oStream write:rawString maxLength:[sendData length]];
So the final outcome should look like this.. and it does when sendData is first constructed:
USER User§Pass
however, when it is received on the server side, it looks like this:
//not a direct copy and paste. The 'mystery character' may not be exact
USER UserˤPas
...the seperator string has become two in length, and the last letter is getting cropped from the command. I believe this to be cause by the UTF8 conversion.
Can anyone shed some light on this for me?
Any help would be greatly appreciated!

The correct encoding in UTF-8 for this character is the two-byte sequence 0xC2 0xA7, which is what you're getting. (Fileformat.info is invaluable for this stuff.) This is out of the LATIN-1 set, so you almost certainly want to be using NSISOLatin1StringEncoding rather than NSUTF8StringEncoding in order to get a single-byte 167 encoding. Look at NSString -dataUsingEncoding:.

What you have and what you want to transmit is not really a UTF-8 string, and it's technically not us-ascii, because that's only 7 bits. You want to transmit an arbitrary array of bytes, according to the protocol that you're working with. The two fields of the byte array, username and password, might themselves be UTF-8 strings, but with the 167 separator it cannot be a UTF-8 string.
Here are some options I see:
Construct the uint8_t* byte array using at least two different NSString objects plus the 167 code. This will be necessary if the username or password can possibly contain non-ascii characters.
Use the NSString method getBytes:maxLength:usedLength:encoding:options:range:remainingRange and set encoding to NSASCIIStringEncoding. If you do this you must validate elsewhere that your username and password is us-ascii only.
Use the NSString method getCString. However, that's been deprecated because you cannot specify the encoding you want.

Related

NSString UTF8String mangling unicode characters

When I run [NSString UTF8String] on certain unicode characters the resulting const char* representation is mangled both in NSLog and on the device/simulator display. The NSString itself displays fine but I need to convert the NSString to a cStr to use it in CGContextShowTextAtPoint.
It's very easy to reproduce (see code below) but I've searched for similar questions without any luck. Must be something basic I'm missing.
const char *cStr = [#"章" UTF8String];
NSLog(#"%s", cStr);
Thanks!
CGContextShowTextAtPoint is only for ASCII chars.
Check this SO question for answers.
When using the string format specifier (aka %s) you cannot be guaranteed that the characters of a c string will print correctly if they are not ASCII. Using a complex character as you've defined can be expressed in UTF-8 using escape characters to indicate the character set from which the character can be found. However the %s uses the system encoding to interpret the characters in the character string you provide to the formatting ( in this case, in NSLog ). See Apple's documentation:
https://developer.apple.com/library/mac/documentation/cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. %s interprets its input in the system encoding rather than, for example, UTF-8.
Going onto you CGContextShowTextAtPoint not working, that API supports only the macRoman character set, which is not the entire Unicode character set.
Youll need to look into another API for showing Unicode characters. Probably Core Text is where you'll want to start.
I've never noticed this issue before, but some quick experimentation shows that using printf instead of NSLog will cause the correct Unicode character to show up.
Try:
printf("%s", cStr);
This gives me the desired output ("章") both in the Xcode console and in Terminal. As nob1984 stated in his answer, the interpretation of the character data is up to the callee.

how to get the UTF8 binary value of a NSString

So I am wanting to send a request to a server I am working with, it requieres the binary value of a (utf-8 formatted nsstring) if there is such a thing, I have read that NSStrings are unicode formatted...
basically the idea is to send the value of the nsstring to the server without the added 3 byte header that UTF8 applies to the front of a string. This is because the server knows I will be sending it UTF8 formatted string in binary format so to save unnessacery formatting values that could bloat my requests I would like to try and do it this way.
dose any one have any ideas on how I might achieve this? I'm currently reading up about NSStrings on the apple docs, but there is so much to read and process I'm hoping someone can provide me some insight.
I don't have any code to show for this atm because I'm only in the planning and understanding phase of this, and to move forward I need to understand how this might be done so I can start coding it :)
any help would be greatly appreciated :)
NSString has a UTF8String method. It returns chars and chars are bytes. Does that work?
-(const char *)UTF8String
Return Value:
A null-terminated UTF8 representation of the receiver.
more info on it here.
https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html

NSString to NSData encoding considerations

I understand why when going from NSData to NSString you need to specify encoding.
However I'm finding it frustrating how the reverse (NSString to NSData) needs to have an encoding specified.
In this related question the answers suggested using
NSUTF8StringEncoding or defaultCStringEncoding, with the latter not being fully explained.
So I just wanted to ask IF the following is correct when converting NSString to NSData:
In cases where you want to be 100% sure the binary representation of the NSString object is UTF8 then use NSUTF8StringEncoding (or whatever encoding is needed)
In cases where the encoding of the NSString object is known/expected to already be of a certain type and no conversion is required then it's safe (perhaps internally faster) to use defaultCStringEncoding (from what I have read objective-c uses UTF-16 internally, not sure if LE or BE but I'd assume LE because the platform is LE)
TIA
The encoding needs to be specified for converting NSString to NSData for the same reason it needs to be specified going from NSData to NSString.
An NSData object is a wrapper for a string of absolutely raw bytes. If the NSString doesn't specify some encoding, it doesn't know what to write, because at the level of ones and zeroes, a UTF-16 encoding looks different from a UTF-8 encoding of the same letter, and of course, if you write UTF-16 as big-endian and read it as little-endian you will get gibberish.
In other words, don't think of it as converting or escaping a string; it's generating a byte buffer, and the encoding tells it which ones and zeroes to write when the next character is "a" and which ones to write when it means "妈".
As for your question...here's my two cents.
1) If you are converting an NSString to an NSData so that your same program can convert it back later, and no other software will need to deal with that NSData until after you've read it back into an NSString, then none of this matters. All that matters is that your string-to-data encoding and your data-to-string encoding match.
2) If you are dealing only with ASCII characters, you can probably get away with a lot, just because many kinds of encoding use the same representation for characters under 128. But this breaks easily, even with little things like smart quotes.
3) Despite the name, defaultCStringEncoding is not something you should use as a default. It's designed for special circumstances where you need to deal with system strings and don't otherwise know how the system deals with its internal strings. It refers to the way strings are handled in the default C implementation, NOT in the NSString internals, so there's not necessarily a performance benefit.
4) If you write a string with an unknown string encoding, and you try to read it back with a different string encoding, your code will fail; in many cases, you will just end up with an empty string.
Bottom line is: who will be trying to interpret your NSData objects? If it's your own app, pick an encoding that makes sense for you (I use UTF8 for everything) and use it for both conversions. Otherwise, figure out what your ecosystem needs to read or write and make that your standard.

Change data encoding

I get some data from the server in Unicode. However I need this data in UTF8. How can I convert data to UTF8 encoding?
The ideal solution is that that the server sends you UTF-8 in the first place.
UTF-8 is an encoding of Unicode, so depending on what you mean by “Unicode” in your question, it may already be doing that.
Cocoa misuses “Unicode” in the symbol NSUnicodeStringEncoding to refer to UTF-16. It's possible, but unlikely, that that's what the server is sending you.
The server should tell you in the Content-Type header what encoding it used for the content. You should look at that in your program rather than assuming the server will use any specific encoding.
If the encoding is not specified in the header, try treating it as UTF-8, and if that doesn't work, I suggest complaining to whoever runs the server.
To convert from any encoding supported by Cocoa to UTF-8, pass the input data and the encoding it's in to the -[NSString initWithData:encoding:] method, which will decode the data and produce a string; then, send the string a dataUsingEncoding: message with NSUTF8StringEncoding as the desired encoding.
Well UTF-8 is an encoding for Unicode, but to get a string:
NSString *string = [[NSString alloc] initWithData:yourData encoding:NSUTF8StringEncoding]

What encoding to use with raw bytes in a NSString

I need to store some raw bytes from NSData object into an NSString (basically a null encoding) but I am not sure how to do this. Obviously assigning an improper 8-bit encoding would be bad. NSASCIIStringEncoding is not OK because the docs say "Strict 7-bit ASCII encoding within 8-bit chars; ASCII values 0…127 only." but I need full range of 0x0 - 0xFF.
Base64 encoding is NOT an acceptable solution.
Basically, you don't.
An NSString is for strings of validly encoded string data; typically UTF8 or UTF16. NSData is for arbitrary binary data.
If you want to store raw bytes into an NSString, you need to encode them and base64 is one of the most common means of doing so.
Use NSNEXTSTEPStringEncoding. According to the documentation:
8-bit ASCII encoding with NEXTSTEP extensions.
It appears in the current documentation (as of writing this post) and is available in both Apple and GNU's implementation of the (OPENSTEP) standard.
Caveat: It doesn't state what exactly those "extensions" are, so tread lightly.