Simple task: I need to convert two characters to two numbers, add them together and change that back to an character.
What I have got: (works perfect in Java - where encoding is handled for you, I guess):
int myChar1 = (int)([myText1 characterAtIndex:i]);
int myChar2 = (int)([myText2 characterAtIndex:keyCurrent]);
int newChar = (myChar1 + myChar2);
//NSLog(#"Int's %d, %d, %d", textChar, keyChar, newChar);
char newC = ((char) newChar);
NSString *tmp1 = [NSString stringWithFormat:#"%c", newC];
NSString *tmp2 = [NSString stringWithFormat:#"%#", newString];
newString = [NSString stringWithFormat:#"%#%#", tmp2, tmp1]; //Adding these char's in a string
The algorithm is perfect, but now I can't figure out how to implement encoding properties. I would like to do everything in UTF-8 but have no idea how to get a char's UTF-8 value, for instance. And if I've got it, how to change that value back to an char.
The NSLog in the code outputs the correct values. But when I try to do the opposite with the algorithm (I.e. - the values) then it goes wrong. It gets the wrong character value for weird/odd characters.
NSString works with unichar characters that are 2 bytes long (16 bits). Char is one byte long so you can only store code point from U+0000 to U+00FF (i.e. Basic Latin and Latin-1 Supplement).
You should do you math on unichar values then use +[NSString stringWithCharacters:length:] to create the string representation.
But there is still an issue with that solution. You code may generate code points between U+D800 and U+DFFF that aren't valid Unicode characters. The standard reserves them to encode code points from U+10000 to U+10FFFF in UTF-16 by pairs of 16-bit code units. In such a case, your string would be ill-formed and could neither be displayed nor converted in UTF8.
Also, the temporary variable tmp2 is useless and you should not create a new newString as you concatenate the string but rather use a NSMutableString.
I am assuming that your strings are NSStrings consisting of numerals which represent a number. If that is the case, you could try the following:
Include the following headers:
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
Then use the following code:
// convert NSString to UTF8 string
const char * utf8String1 = [myText1 UTF8String];
const char * utf8String2 = [myText2 UTF8String];
// convert UTF8 string into long integers
long num1 = strtol(utf8String1, NULL 0);
long num2 = strtol(utf8String2, NULL 0);
// perform calculations
long calc = num1 - num2;
// convert calculated value back into NSString
NSString * calcText = [[NSString alloc] initWithFormat:#"%li" calc];
// convert calculated value back into UTF8 string
char calcUTF8[64];
snprintf(calcUTF8, 64, "%li", calc);
// log results
NSLog(#"calcText: %#", calcText);
NSLog(#"calcUTF8: %s", calcUTF8);
Not sure if this is what you meant, but from what I understood, you wanted to create a NSString with the UTF-8 string encoding from a char?
If that's what you want, maybe you can use the initWithCString:encoding: method in NSString.
Related
How to append unicode ranging U+0000 to U+0099 To NSString in iOS. I have used the following link for reference http://en.wikipedia.org/wiki/List_of_Unicode_characters
Try to use this one....
NSString uses UTF-16 to store codepoints internally, so those in the range you're looking for (U+1F300 to U+1F6FF) will be stored as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar) doesn't know about codepoints and will give you the two bytes that it sees at the index you give it (the 55357 you're seeing is the lead surrogate of the codepoint in UTF-16).
To examine the raw codepoints, you'll want to convert the string/characters into UTF-32 (which encodes them directly). To do this, you have a few options:
1) Get all UTF-16 bytes that make up the codepoint, and use either this algorithm or CFStringGetLongCharacterForSurrogatePair to convert the surrogate pairs to UTF-32.
2) Use either dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert the NSString to UTF-32, and interpret the raw bytes as a uint32_t.
3) Use a library like ICU.
I'm not sure this is 100% correct solution, but it works:
NSString *uniString = [NSString stringWithFormat:#"%C", (unichar)0x0021];
Where 0x0021 is your unicode char code.
You can test it with this loop:
for (unichar ch = 0x0000; ch <= 0x0099; ch++) {
NSString *uniString = [NSString stringWithFormat:#"%C", ch];
NSLog(#"%#", uniString);
}
I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.
To convert the first character of an NSString to a unicode value, this routine seems to work:
const unsigned char *cs = (const unsigned char *)
[s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
code <<= 8;
code += cs[i];
}
return code;
However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.
However, converting to / from cstrings does not seem to be reversible for me.
For example, I've tried this code, and the "tmp" string is not equal to the original string "s".
char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];
Does anyone know what I am doing wrong? Should I be using "wchar_t" for the cstring instead of char *?
Any help is greatly appreciated!
Thanks,
Ron
You have a couple of reasonable options.
1. Conversion
The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like
UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;
Now you can create an NSString using both characters at the same time:
NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];
To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).
2. Byte buffers
Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:
UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];
To get it back out again, you can use
UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
// outputChar now has the first UTF32 character
}
There are two probelms here:
1:
The first one is that both [NSString cStringUsingEncoding:] and [NSString getCString:maxLength:encoding:] return the C-string in native-endianness (little) without adding a BOM to it when using NSUTF32StringEncoding and NSUTF16StringEncoding.
The Unicode standard states that: (see, "How I should deal with BOMs")
"If there is no BOM, the text should be interpreted as big-endian."
This is also stated in NSString's documentation: (see, "Interpreting UTF-16-Encoded Data")
"... if the byte order is not otherwise specified, NSString assumes that the UTF-16 characters are big-endian, unless there is a BOM (byte-order mark), in which case the BOM dictates the byte order."
Although they're referring to UTF-16, the same applies to UTF-32.
2:
The second one is that [NSString stringWithCString:encoding:] internally uses CFStringCreateWithCString to create the C-string. The problem with this is that CFStringCreateWithCString only accepts strings using 8-bit encodings. From the documentation: (see, "Parameters" section)
The string must use an 8-bit encoding.
To solve this issue:
Explicitly state the encoding endianness you want to use both ways (NSString -> C-string and C-string -> NSString)
Use [NSString initWithBytes:length:encoding:] when trying to create an NSString from a C-string encoded in UTF-32 or UTF-16.
Hope this helps!
I am used to doing this in C or C++, ie:
myChar++;
should increment a letter.
I am trying to do the same in Objective-C, except that I have a NSString to start off with (the NSString is always just one letter). I have tried converting the NSString to a char *, but this method is deprecated and other ways of achieving this don't seem to work.
How should I convert an NSString to a char * - or, is there a way to increment a character in objective-c without needing a char * somehow?
Thanks :)
// Get the first character as a UTF-16 (2-byte) character:
unichar c = [string characterAtIndex:0];
// Increment as usual:
c++;
// And to turn it into a 1-character string again:
[NSString stringWithCharacters:&c length:1];
Of course, this assumes incrementing a Unicode character makes sense, which does for ASCII-range characters but probably not for others.
How about NSString's
- (unichar)characterAtIndex:(NSUInteger)index;
Would that work?
I have on one string like #"K_h_10_K_d_10_K_c_13_T_c_13_T_s_13"
I separate them by #"_"
using appCardString=[substringAppCard componentsSeparatedByString:#"_"];
then I have to convert them in to char and want to put in char[] ....
how can I do that ..
please help me ....
It's crashing here
appusedFaces[i]=[[NSString stringWithFormat:#"%#",[appCardString objectAtIndex:i]] charValue];
This will work:
appusedFaces[i]=[[appCardString objectAtIndex:i] characterAtIndex:0];
Though you should add a check that the string has at least one character. You should also be aware that char can only hold character codes up to 255 (unichar can handle any Unicode character).
It also looks like you have some numeric codes in your test string. Checking if the string has more than one character and then calling [[appCardString objectAtIndex:i] intValue] for those characters will handle these.
NSString *theString = #"a %C3%B8 b";
NSLog(#"%#", theString);
NSString *utf8string = [theString stringByReplacingPercentEscapesUsingEncoding: NSUTF8StringEncoding]
NSLog(#"%#", utf8string);
const char *theChar = [utf8string UTF8String];
NSLog(#"%s", theChar);
This logs the following:
'a %C3%B8 b'
'a ø b'
'a √∏ b'
The problem is that I want theChar to be 'a ø b'. Any help on how to achieve that would be greatly appreciated.
I don't think you can. char is a eight bit type so all values are between 0-255. In UTF8 the ø is not encoded in that range.
You might want to look at the unicode type which is a 16 bit type. This can hold the ø as one item and use getCharacters:range: to get the characters out of the NSString
From String Format Specifiers in String Programming Guide:
%s : Null-terminated array of 8-bit
unsigned characters. %s interprets its
input in the system encoding rather
than, for example, UTF-8.
So NSLog(#"%s", theChar) creates and displays NSString object with wrong encoding and theChar itself contains correct string data.
NSLog([NSString stringWithUTF8String:theChar]);
Gives the correct output. (a ø b)
I'd like to add that your theChar does contain the UTF8 byte sequence of your desired string. It's the problem of NSLog("%s") that it can't show the string correctly into the log file and/or the console.
So, if you want to pass the UTF8 byte sequence in char* to some other library, what you did is perfectly correct.