NSString stringWithCharacters Unicode Problem

NSString stringWithCharacters Unicode Problem - iphone

This has got to be simple -- surely this method is supposed to work -- but I'm having some kind to two-byte-to-one-byte problem, I think.
The purpose of the code is to generate a string of 0 characters of a certain length (10 minus the number of digits that will be tacked onto the end). It looks like this:
const unichar zero = 0x0030;
NSString *zeroBuffer = [NSString stringWithCharacters:&zero length:(10 - [[NSString stringWithFormat:#"%i", photoID] length])];
Alternate second line (casting the thing at address &zero):
NSString *zeroBuffer = [NSString stringWithCharacters:(unichar *)&zero length:(10 - [[NSString stringWithFormat:#"%i", photoID] length])];
0x0030 is the address of the numeral 0 in the Basic Latin portion of the unicode table.
If photoID is 123 I'd want zeroBuffer to be #"0000000". What it actually ends up as is a zero and then some crazy unicode characters along the lines of (not sure how this will show) this:
0䪨 燱ܾ뿿﹔
I'm assuming that I've got data crossing character boundaries or something. I've temporarily rewritten it as a dumb substring thing, but this seems like it would be more efficient.
What am I doing wrong?

stringWithCharacters:length: expects the first argument to be the address of a buffer containing each of the characters to be inserted in the string in sequence. It's reading your character zero for the first character, then advancing to the following memory address and reading whatever data is there for the next character, and so on. This is not the right method for doing what you're trying to do.
Alas, there isn't a built-in repeat-this-string method. See the answers here for suggestions.
Alternatively, you can avoid the issue completely and just do this:
[NSString stringWithFormat:#"%010i", photoID];
That causes the number formatter to output a decimal number padded with ten zeroes.

Related

NSString and no UTF-8 symbols

For some of you (I'm sure) this question is quite simple to answer, but I have some difficulties in understanding how to solve the problem.
I have a .txt file containing a table like this:
" 236
? 26
x00EE 16
As you probably understood the left column lists symbols and the right one lists some code of the, I defined in my app.
And... you probably understood that, within symbols, there are some "strange". The 0x00EE should be the "å" (a with a ring above).
Unfortunately I can't control the left column, i.e. it comes from another software. Making some experiments I found that:
NSLog( #"\x00ee" );
for example produces a waring telling the hte code does not belong to the UTF-8 range.
So I was wandering how to convert the NSString #"\x00ee" (that I read from file, so is a string composed of 6 chars) to the unique unicode letter "å" (a with a ring above).
Can anyone help me?
Thanks...

You need to find out what character set encoding was used. 0xEE is unicode for î. In Unicode, å is E5. This is encoded in UTF-8 as the sequence 0xC3 0xA5. The following does the trick for me:
NSLog(#"\xc3\xa5");

If your input string contains only ASCII characters then you can use the fact that
NSNonLossyASCIIStringEncoding decodes \uNNNN to the corresponding Unicode character:
NSString *s = #"\\x00ee"; // from your text file
NSString *s1 = [s stringByReplacingOccurrencesOfString:#"\\x" withString:#"\\u"];
NSData *d = [s1 dataUsingEncoding:NSASCIIStringEncoding];
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSNonLossyASCIIStringEncoding];
NSLog (#"%#", s2);
Output: î, which is U+00EE (LATIN SMALL LETTER I WITH CIRCUMFLEX).
(Remark: å is U+00E5, not U+00EE).

How to remove the last unicode symbol from NSString

I have implemented a custom keyboard associated with a text field, so when the user presses the delete button, I remove the last character from the string, and manually update the current text field text.
NSRange range = NSMakeRange(currentTextFieldString.length-1, 1);
[currentTextFieldString replaceCharactersInRange:range withString:#""];
So far so good.
Now, the problem is, that the user has the option to enter some special unicode symbols, these are not 1 byte, they can be 2 bytes too, now on pressing the delete button, I have to remove the entire symbol, but if I follow the above approach, the user has to press the delete button twice.
Here, if I do:
NSRange range = NSMakeRange(currentTextFieldString.length-2, 2);
[currentTextFieldString replaceCharactersInRange:range withString:#""];
it works fine, but then, the normal characters, which are just 1 byte, get deleted twice at a time.
How to handle such scenarios?
Thanks in advance.
EDIT:
It is strange, that if I switch to the iPhone keyboard, it handles both cases appropriately. There must be some way to do it, there is something that I am missing, but am not able to figure out what.

Here's the problem. NSStrings are encoded using UTF-16. Many common Unicode glyphs take up only one unichar (a 16 bit unsigned value). However, some glyphs take up two unichars. Even worse, some glyphs can be composed or decomposed, e.g.é might be one Unicode code point or it might be two - an acute accent followed by an e. This makes it quite difficult to do what you want viz delete one "character" because it is really hard to tell how many unichars it takes up.
Fortunately, NSString has a method that helps with this: -rangeOfComposedCharacterSequenceAtIndex:. What you need to do is get the index of the last unichar, run this method on it, and the returned NSRange will tell you where to delete from. It goes something like this (not tested):
NSUInteger lastCharIndex = [myString length] - 1; // I assume string is not empty
NSRange rangeOfLastChar = [myString rangeOfComposedCharacterSequenceAtIndex: lastCharIndex];
myNewString = [myString substringToIndex: rangeOfLastChar.location];

If you can't get this to work by default, then use an if/else block and test if the last character is part of a special character. If it is, use the substring to length-2, otherwise use the substring to length-1.

I don't know exactly what the problem is there with the special characters byte length.
What i suggest is:
Store string length to a param, before adding any new characters
If user selects backspace (remove last characters) then remove the string from last length to new length. Means for example last saved string length is 5 and new string length is 7 then remove get a new string with the index from 0 to 4, so it will crop the remaining characters.
This is the other way around to do as i don't know the exact what problem internally.
But i guess logically this solution should work.
Enjoy Coding :)

NSString UTF8String mangling unicode characters

When I run [NSString UTF8String] on certain unicode characters the resulting const char* representation is mangled both in NSLog and on the device/simulator display. The NSString itself displays fine but I need to convert the NSString to a cStr to use it in CGContextShowTextAtPoint.
It's very easy to reproduce (see code below) but I've searched for similar questions without any luck. Must be something basic I'm missing.
const char *cStr = [#"章" UTF8String];
NSLog(#"%s", cStr);
Thanks!

CGContextShowTextAtPoint is only for ASCII chars.
Check this SO question for answers.

When using the string format specifier (aka %s) you cannot be guaranteed that the characters of a c string will print correctly if they are not ASCII. Using a complex character as you've defined can be expressed in UTF-8 using escape characters to indicate the character set from which the character can be found. However the %s uses the system encoding to interpret the characters in the character string you provide to the formatting ( in this case, in NSLog ). See Apple's documentation:
https://developer.apple.com/library/mac/documentation/cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. %s interprets its input in the system encoding rather than, for example, UTF-8.
Going onto you CGContextShowTextAtPoint not working, that API supports only the macRoman character set, which is not the entire Unicode character set.
Youll need to look into another API for showing Unicode characters. Probably Core Text is where you'll want to start.

I've never noticed this issue before, but some quick experimentation shows that using printf instead of NSLog will cause the correct Unicode character to show up.
Try:
printf("%s", cStr);
This gives me the desired output ("章") both in the Xcode console and in Terminal. As nob1984 stated in his answer, the interpretation of the character data is up to the callee.

decoding quoted-printables

I am looking for a way to decode quoted-printables.
The quoted-printables are for arabic characters and look like this:
=D8=B3=D8=B9=D8=A7=D8=AF
I need to convert it to a string, and store it or display..
I've seen post on stackoverflow for the other way around (encoding), but couldn't find decoding.

Uhm, it's a little hacky but you could replace the = characters with a % character and use NSString's stringByReplacingPercentEscapesUsingEncoding: method. Otherwise, you could essentially split the string on the = characters, convert each element to a byte value (easily done using NSScanner), put the byte values into a C array, and use NSString's initWithBytes:length:encoding: method.
Note that your example isn't technically in quoted-printable format, which specifies that a quoted-printable is a three character sequence consisting of an = character followed by two hex digits.

In my case I was coming from EML... bensnider's answer worked great... quoted-printable (at least in EML) uses an = sign followed by \r\n to signify a line wrapping, so this was the code needed to cleanly translate:
(Made as a category cause I loves dem)
#interface NSString (QuotedPrintable)
- (NSString *)quotedPrintableDecode;
#end
#implementation NSString (QuotedPrintable)
- (NSString *)quotedPrintableDecode
{
NSString *decodedString = [self stringByReplacingOccurrencesOfString:#"=\r\n" withString:#""]; // Ditch the line wrap indicators
decodedString = [decodedString stringByReplacingOccurrencesOfString:#"=" withString:#"%"]; // Change the ='s to %'s
decodedString = [decodedString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; // Replace the escaped strings.
return decodedString;
}
#end
Which worked great for decoding my EML / UTF-8 objects!

Bensnider's answer is correct, the easy way of it.
u'll need to replace the "=" to "%"
NSString *s = #"%D8%B3%D8%B9%D8%A7%D8%AF";
NSString *s2 = [s stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
s2 stored "سعاد" which makes sense so this should work straight forward with out a hack

In some cases the line ends are not "=\r\n" but are only "=\n", in which case you need another step:
decodedString = [self stringByReplacingOccurrencesOfString:#"=\n" withString:#""];
Otherwise, the final step fails due to the unbalanced "%" at the end of a line.

I know nothing of the iPhone, but most email processing libraries will contain functions to do this, as email is where this format is used. I suggest searching for MIME decoding type functions, similar to those at enter link description here.
The earlier posters approach also seems fine to me - I feel he is being a little too self-deprecating in describing it as hacky :)

Please see a working solution that takes a quoted-printable-containing strings and resolves those graphemes. The only thing you should pay attention to is the encoding (that answer is based upon UTF8, by it can be easily switched to any other): https://stackoverflow.com/a/32903103/2799410

Newline chars somehow get added to my strings. And cant remove them

On some of my strings there seems to be somekind of newline char. I think this is the case because when i do a simple NSLog
NSLog(#"Test: %#",aNSMutableString);
I would get output like below
Test:
I am a String
I've tried using
[mutableString stringByTrimmingCharactersInSet:[NSCharacterSet newlineCharacterSet]];
But it does not remove whatever it is thats forcing the newline to happen.
In a string that i parse out from a file which has 4 characters 'm3u8' has 5 chars when I check the length of the new string.
Anybody got an idea of what might be going on?
Thanks
-Code
P.S.
I know I could just zap the first char out of all my strings but it feels like a hack and i still wont know whats going on.

[mutableString stringByTrimmingCharactersInSet:[NSCharacterSet newlineCharacterSet]];
The above will not directly modify your mutableString. It returns a new autoreleased NSString with the characters trimmed. See NSString doc.
e.x.
NSString *trimmedString = [mutableString stringByTrimmingCharactersInSet:[NSCharacterSet newlineCharacterSet]];
NSLog(#"Test: %#", trimmedString);
should give you expected results.

I think #Sam 's answer will fix your problem, but I think the origin of your problem is the file source. Do you know how it is encoded? Is it part of a download? My guess is that you have a Windows' file with "\n\r" terminating lines and you are using Unix string tools that are breaking on "\n", thus leaving a leading "\r".
Verify the source of the file and read the document lines with the appropriate encoding.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

NSString stringWithCharacters Unicode Problem - iphone

Related

NSString and no UTF-8 symbols

How to remove the last unicode symbol from NSString

NSString UTF8String mangling unicode characters

decoding quoted-printables

Newline chars somehow get added to my strings. And cant remove them

Categories

Resources