How to encode the Numeric code in iPhone - iphone

I am having some numerical code and i want to encode the "Numerical Code". So how can i encode the string?. I have tried with NSASCIIStringEncoding and NSUTF8StringEncoding, but it doesn't encoded the string. So please help me out.
Eg :
İ -> İ
ı -> ı
Thanks!

What you have are Unicode code points, not strings. You don't need to specify a string encoding, because what you are dealing with aren't strings at all; they're just single characters. And an NSString does not have an "encoding" in this sense.
To get those characters into a string, you need to use:
[NSString stringWithCharacters: length];
For example: you don't want to be creating a string with the contents "304"; that's just a string of numbers. Instead, create a unichar with the value of 304:
unichar iWithDot = 304;
"Unichar" is just an unsigned short, so no pointer and no quotes; you are just assigning the code point to a numerical value. Bundle all of the characters you need into a C string and pass the pointer to stringWithCharacters.

Related

How can I convert a single Character type to uppercase?

All I want to do is convert a single Character to uppercase without the overhead of converting to a String and then calling .uppercased(). Is there any built-in way to do this, or a way for me to call the toupper() function from C without any bridging? I really don't think I should have to go out of my way for something so simple.
To call the C toupper() you need to get the Unicode code point of the Character. But Character has no method for getting its code point (a Character may consist of multiple code points), so you have to convert the Character into a String to obtain any of its code points.
So you really have to convert to String to get anywhere. Unless you store the character as a UnicodeScalar instead of a Character. In this case you can do this:
assert(unicodeScalar.isASCII) // toupper argument must be "representable as an unsigned char"
let uppercase = UnicodeScalar(toupper(CInt(unicodeScalar.value)))
But this isn't really more readable than simply using String:
let uppercase = Character(String(character).uppercased())
just add this to your program
extension Character {
//converts a character to uppercase
func convertToUpperCase() -> Character {
if(self.isUppercase){
return self
}
return Character(self.uppercased())
}
}

Decode a string with both Unicode and Utf-8 codes in Python 2.x

Say we have a string:
s = '\xe5\xaf\x92\xe5\x81\x87\\u2014\\u2014\xe5\x8e\xa6\xe9\x97\xa8'
Somehow two symbols, '—', whose Unicode is \u2014 was not correctly encoded as '\xe2\x80\x94' in UTF-8. Is there an easy way to decode this string? It should be decoded as 寒假——厦门
Manually using the replace function is OK:
t = u'\u2014'
s.replace('\u2014', t.encode('utf-8')
print s
However, it is not automatic. If we extract the Unicode,
index = s.find('\u')
t = s[index : index+6]
then t = '\\u2014'. How to convert it to UTF-8 code?
You're missing extra slashes in your replace()
It should be:
s.replace("\\u2014", u'\u2014'.encode("utf-8") )
Check my warning in the comments of the question. You should not end up in this situation.

Confusion with case used by CFURLCreateStringByAddingPercentEscapes encoding

I want URL encoding to be done. My input string is "ChBdgzQ3qUpNRBEHB+bOXQNjRTQ="
I get an output as "ChBdgzQ3qUpNRBEHB%2BbOXQNjRTQ%3D" which is totally correct except the case which gets encoded.
Ideally, it should have been "ChBdgzQ3qUpNRBEHB%2bbOXQNjRTQ%3d" instead of the output I get.
i.e I should have got %2b and %3d instead of %2B and %3D.
Could this be done?
The code I used is as below :
NSString* inputStr = #"ChBdgzQ3qUpNRBEHB+bOXQNjRTQ=";
NSString* outputStr = (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)inputStr,
NULL,
(CFStringRef)#"!*'\"();:#&=+$,/?%#[]% ",
CFStringConvertNSStringEncodingToEncoding(encoding));
Another perhaps more elegant but slower way would be to loop over your string, converting each character in the string one by one (so you would get the length of your string, then get a substring from it from location 0 to length-1, with one character each time, then translate just that substring. If the returned string has a length > 1, then CFURLCreateStringByAddingPercentEscapes encoded the character, and you can safely turn the case into lower case.
In all cases you append the returned (and possibly modified) string to a mutable string, and when done you have exactly what you want for any possible string. Even though this would appear to be a real processor hog, the reality is you would probably never notice the extra consumed cycles.
Likewise, a second approach would be to just convert your whole string first, then copy it byte by byte to a mutable string, and if you find a "%", then turn the next two characters into lower case. Just a slightly different way to slice the problem.
You can use a regular expression to perform the post operation:
NSMutableString *finalStr = outputStr.mutableCopy;
NSRegularExpression *re = [[NSRegularExpression alloc] initWithPattern:#"(?<=%)[0-9A-F]{2}" options:0 error:nil];
for (NSTextCheckingResult *match in [re matchesInString:escaped options:0 range:NSMakeRange(0, escaped.length)]) {
[finalStr replaceCharactersInRange:match.range withString:[[escaped substringWithRange:match.range] lowercaseString]];
}
The code uses this regular expression:
(<?=%)[0-9A-F]{2}
It matches two hexadecimal characters, only if preceded by a percent sign. Each match is then iterated and replaced within a mutable string. We don't have to worry about offset changes because the replacement string is always the same length.

Converting an NSString to and from UTF32

I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.
To convert the first character of an NSString to a unicode value, this routine seems to work:
const unsigned char *cs = (const unsigned char *)
[s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
code <<= 8;
code += cs[i];
}
return code;
However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.
However, converting to / from cstrings does not seem to be reversible for me.
For example, I've tried this code, and the "tmp" string is not equal to the original string "s".
char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];
Does anyone know what I am doing wrong? Should I be using "wchar_t" for the cstring instead of char *?
Any help is greatly appreciated!
Thanks,
Ron
You have a couple of reasonable options.
1. Conversion
The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like
UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;
Now you can create an NSString using both characters at the same time:
NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];
To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).
2. Byte buffers
Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:
UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];
To get it back out again, you can use
UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
// outputChar now has the first UTF32 character
}
There are two probelms here:
1:
The first one is that both [NSString cStringUsingEncoding:] and [NSString getCString:maxLength:encoding:] return the C-string in native-endianness (little) without adding a BOM to it when using NSUTF32StringEncoding and NSUTF16StringEncoding.
The Unicode standard states that: (see, "How I should deal with BOMs")
"If there is no BOM, the text should be interpreted as big-endian."
This is also stated in NSString's documentation: (see, "Interpreting UTF-16-Encoded Data")
"... if the byte order is not otherwise specified, NSString assumes that the UTF-16 characters are big-endian, unless there is a BOM (byte-order mark), in which case the BOM dictates the byte order."
Although they're referring to UTF-16, the same applies to UTF-32.
2:
The second one is that [NSString stringWithCString:encoding:] internally uses CFStringCreateWithCString to create the C-string. The problem with this is that CFStringCreateWithCString only accepts strings using 8-bit encodings. From the documentation: (see, "Parameters" section)
The string must use an 8-bit encoding.
To solve this issue:
Explicitly state the encoding endianness you want to use both ways (NSString -> C-string and C-string -> NSString)
Use [NSString initWithBytes:length:encoding:] when trying to create an NSString from a C-string encoded in UTF-32 or UTF-16.
Hope this helps!

String encoding of scandinavian letter from url to utf8 to const char on iphone

NSString *theString = #"a %C3%B8 b";
NSLog(#"%#", theString);
NSString *utf8string = [theString stringByReplacingPercentEscapesUsingEncoding: NSUTF8StringEncoding]
NSLog(#"%#", utf8string);
const char *theChar = [utf8string UTF8String];
NSLog(#"%s", theChar);
This logs the following:
'a %C3%B8 b'
'a ø b'
'a √∏ b'
The problem is that I want theChar to be 'a ø b'. Any help on how to achieve that would be greatly appreciated.
I don't think you can. char is a eight bit type so all values are between 0-255. In UTF8 the ø is not encoded in that range.
You might want to look at the unicode type which is a 16 bit type. This can hold the ø as one item and use getCharacters:range: to get the characters out of the NSString
From String Format Specifiers in String Programming Guide:
%s : Null-terminated array of 8-bit
unsigned characters. %s interprets its
input in the system encoding rather
than, for example, UTF-8.
So NSLog(#"%s", theChar) creates and displays NSString object with wrong encoding and theChar itself contains correct string data.
NSLog([NSString stringWithUTF8String:theChar]);
Gives the correct output. (a ø b)
I'd like to add that your theChar does contain the UTF8 byte sequence of your desired string. It's the problem of NSLog("%s") that it can't show the string correctly into the log file and/or the console.
So, if you want to pass the UTF8 byte sequence in char* to some other library, what you did is perfectly correct.