How to initialize UniChar with too many bytes in Swift? - swift

I am trying to initialize a UniChar variable, like:
var unicode: UniChar = 0x1F63E
"Integer literal '128701' overflows when stored into 'UniChar' (aka 'UInt16')"
But if I initialize with a shorter unicode value, like:
var unicode: UniChar = 0x2705
Everything is alright.
How do I do the first one?

The function expects (the pointer to) an array of UniChar aka UInt16,
containing the UTF-16 representation of the string.
As #rmaddy said, UniChar can hold only values up to 0xFFFF.
Larger Unicode scalars need to be represented as โ€œsurrogate pairsโ€.
The .utf16 view of a string provides the UTF-16 representation:
let c = "\u{1F63E}" // Or: let c = "๐Ÿ˜พ"
let utf16Chars = Array(c.utf16)
event.keyboardSetUnicodeString(stringLength: utf16Chars.count, unicodeString: utf16Chars)

Related

Iterate through alphabet in Swift explanation

I accidentally wrote this simple code to print alphabet in terminal:
var alpha:Int = 97
while (alpha <= 122) {
write(1, &alpha, 1)
alpha += 1
}
write(1, "\n", 1)
//I'm using write() function from C, to avoid newline on each symbol
And I've got this output:
abcdefghijklmnopqrstuvwxyz
Program ended with exit code: 0
So, here is the question: Why does it work?
In my logic, it should display a row of numbers, because an integer variable is being used. In C, it would be a char variable, so we would mean that we point to a sign at some index in ASCII. Then:
char alpha = 97;
Would be a code point to an 'a' sign, by incrementing alpha variable in a loop we would display each element of ascii through 122nd.
In Swift though, I couldn't assign an integer to Character or String type variable. I used Integer and then declared several variables to assign UnicodeScalar, but accidentally I found out that when I'm calling write, I point to my integer, not the new variable of UnicodeScalar type, although it works! Code is very short and readable, but I don't completely understand how does work and why at all.
Has anyone had such situation?
Why does it work?
This works โ€œby chanceโ€ because the integer is stored in little-endian byte order.
The integer 97 is stored in memory as 8 bytes
0x61 0x00 0x00 0x00 0x00 0x00 0x00 0x00
and in write(1, &alpha, 1), the address of that memory location is
passed to the write system call. Since the last parameter (nbyte)
is 1, the first byte at that memory address is written to the
standard output: That is 0x61 or 97, the ASCII code of the letter
a.
In Swift though, I couldn't assign an integer to Character or String type variable.
The Swift equivalent of char is CChar, a type alias for Int8:
var alpha: CChar = 97
Here is a solution which does not rely on the memory layout and
works for non-ASCII character as well:
let first: UnicodeScalar = "ฮฑ"
let last: UnicodeScalar = "ฯ‰"
for v in first.value...last.value {
if let c = UnicodeScalar(v) {
print(c, terminator: "")
}
}
print()
// ฮฑฮฒฮณฮดฮตฮถฮทฮธฮนฮบฮปฮผฮฝฮพฮฟฯ€ฯฯ‚ฯƒฯ„ฯ…ฯ†ฯ‡ฯˆฯ‰

How is the ๐Ÿ‡ฉ๐Ÿ‡ช character represented in Swift strings?

Like some other emoji characters, the 0x0001F1E9 0x0001F1EA combination (German flag) is represented as a single character on screen although it is really two different Unicode character points combined. Is it represented as one or two different characters in Swift?
let flag = "\u{1f1e9}\u{1f1ea}"
then flag is ๐Ÿ‡ฉ๐Ÿ‡ช .
For more regional indicator symbols, see:
http://en.wikipedia.org/wiki/Regional_Indicator_Symbol
Support for "extended grapheme clusters" has been added to Swift in the meantime.
Iterating over the characters of a string produces a single character for
the "flags":
let string = "Hi๐Ÿ‡ฉ๐Ÿ‡ช!"
for char in string.characters {
print(char)
}
Output:
H
i
๐Ÿ‡ฉ๐Ÿ‡ช
!
Swift 3 implements Unicode in its String struct. In Unicode, all flags are pairs of Regional Indicator Symbols. So, ๐Ÿ‡ฉ๐Ÿ‡ช is actually ๐Ÿ‡ฉ followed by ๐Ÿ‡ช (try copying the two and pasting them next to eachother!).
When two or more Regional Indicator Symbols are placed next to eachother, they form an "Extended Grapheme Cluster", which means they're treated as one character. This is why "๐Ÿ‡ช๐Ÿ‡บ = ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ฉ๐Ÿ‡ช...".characters gives you ["๐Ÿ‡ช๐Ÿ‡บ", " ", "=", " ", "๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ฉ๐Ÿ‡ช", ".", ".", "."].
If you want to see every single Unicode code point (AKA "scalar"), you can use .unicodeScalars, so that "Hi๐Ÿ‡ฉ๐Ÿ‡ช!".unicodeScalars gives you ["H", "i", "๐Ÿ‡ฉ", "๐Ÿ‡ช", "!"]
tl;dr
๐Ÿ‡ฉ๐Ÿ‡ช is one character (in both Swift and Unicode), which is made up of two code points (AKA scalars). Don't forget these are different! ๐Ÿ™‚
See Also
Why are emoji characters like ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ treated so strangely in Swift strings?
The Swift Programming Language (Swift 3.1) - Strings and Characters - Unicode
With Swift 5, you can iterate over the unicodeScalars property of a flag emoji character in order to print the Unicode scalar values that compose it:
let emoji: Character = "๐Ÿ‡ฎ๐Ÿ‡น"
for scalar in emoji.unicodeScalars {
print(scalar)
}
/*
prints:
๐Ÿ‡ฎ
๐Ÿ‡น
*/
If you combine those scalars (that are Regional Indicator Symbols), you get a flag emoji:
let italianFlag = "๐Ÿ‡ฎ" + "๐Ÿ‡น"
print(italianFlag) // prints: ๐Ÿ‡ฎ๐Ÿ‡น
print(italianFlag.count) // prints: 1
Each Unicode.Scalar instance also has a property value that you can use in order to display a numeric representation of it:
let emoji: Character = "๐Ÿ‡ฎ๐Ÿ‡น"
for scalar in emoji.unicodeScalars {
print(scalar.value)
}
/*
prints:
127470
127481
*/
You can create Unicode scalars from those numeric representations then associate them into a string:
let scalar1 = Unicode.Scalar(127470)
let scalar2 = Unicode.Scalar(127481)
let italianFlag = String(scalar1!) + String(scalar2!)
print(italianFlag) // prints: ๐Ÿ‡ฎ๐Ÿ‡น
print(italianFlag.count) // prints: 1
If needed, you can use Unicode.Scalar's escaped(asASCII:) method in order to display a string representation of the Unicode scalars (using ASCII characters):
let emoji: Character = "๐Ÿ‡ฎ๐Ÿ‡น"
for scalar in emoji.unicodeScalars {
print(scalar.escaped(asASCII: true))
}
/*
prints:
\u{0001F1EE}
\u{0001F1F9}
*/
let italianFlag = "\u{0001F1EE}\u{0001F1F9}"
print(italianFlag) // prints: ๐Ÿ‡ฎ๐Ÿ‡น
print(italianFlag.count) // prints: 1
String's init(_:radix:uppercase:) may also be relevant to convert the scalar value to an hexadecimal value:
let emoji: Character = "๐Ÿ‡ฎ๐Ÿ‡น"
for scalar in emoji.unicodeScalars {
print(String(scalar.value, radix: 16, uppercase: true))
}
/*
prints:
1F1EE
1F1F9
*/
let italianFlag = "\u{1F1EE}\u{1F1F9}"
print(italianFlag) // prints: ๐Ÿ‡ฎ๐Ÿ‡น
print(italianFlag.count) // prints: 1
Swift doesn't tell you what the internal representation of a String is. You interact with a String as a list of full-size (32-bit) Unicode code points:
for character in "Dog!๐Ÿถ" {
println(character)
}
// prints D, o, g, !, ๐Ÿถ
If you want to work with a string as a sequence of UTF-8 or UTF-16 code points, use its utf8 or utf16 properties. See Strings and Characters in the docs.

Objective-C character encoding - Change char to int, and back

Simple task: I need to convert two characters to two numbers, add them together and change that back to an character.
What I have got: (works perfect in Java - where encoding is handled for you, I guess):
int myChar1 = (int)([myText1 characterAtIndex:i]);
int myChar2 = (int)([myText2 characterAtIndex:keyCurrent]);
int newChar = (myChar1 + myChar2);
//NSLog(#"Int's %d, %d, %d", textChar, keyChar, newChar);
char newC = ((char) newChar);
NSString *tmp1 = [NSString stringWithFormat:#"%c", newC];
NSString *tmp2 = [NSString stringWithFormat:#"%#", newString];
newString = [NSString stringWithFormat:#"%#%#", tmp2, tmp1]; //Adding these char's in a string
The algorithm is perfect, but now I can't figure out how to implement encoding properties. I would like to do everything in UTF-8 but have no idea how to get a char's UTF-8 value, for instance. And if I've got it, how to change that value back to an char.
The NSLog in the code outputs the correct values. But when I try to do the opposite with the algorithm (I.e. - the values) then it goes wrong. It gets the wrong character value for weird/odd characters.
NSString works with unichar characters that are 2 bytes long (16 bits). Char is one byte long so you can only store code point from U+0000 to U+00FF (i.e. Basic Latin and Latin-1 Supplement).
You should do you math on unichar values then use +[NSString stringWithCharacters:length:] to create the string representation.
But there is still an issue with that solution. You code may generate code points between U+D800 and U+DFFF that aren't valid Unicode characters. The standard reserves them to encode code points from U+10000 to U+10FFFF in UTF-16 by pairs of 16-bit code units. In such a case, your string would be ill-formed and could neither be displayed nor converted in UTF8.
Also, the temporary variable tmp2 is useless and you should not create a new newString as you concatenate the string but rather use a NSMutableString.
I am assuming that your strings are NSStrings consisting of numerals which represent a number. If that is the case, you could try the following:
Include the following headers:
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
Then use the following code:
// convert NSString to UTF8 string
const char * utf8String1 = [myText1 UTF8String];
const char * utf8String2 = [myText2 UTF8String];
// convert UTF8 string into long integers
long num1 = strtol(utf8String1, NULL 0);
long num2 = strtol(utf8String2, NULL 0);
// perform calculations
long calc = num1 - num2;
// convert calculated value back into NSString
NSString * calcText = [[NSString alloc] initWithFormat:#"%li" calc];
// convert calculated value back into UTF8 string
char calcUTF8[64];
snprintf(calcUTF8, 64, "%li", calc);
// log results
NSLog(#"calcText: %#", calcText);
NSLog(#"calcUTF8: %s", calcUTF8);
Not sure if this is what you meant, but from what I understood, you wanted to create a NSString with the UTF-8 string encoding from a char?
If that's what you want, maybe you can use the initWithCString:encoding: method in NSString.

How to encode the Numeric code in iPhone

I am having some numerical code and i want to encode the "Numerical Code". So how can i encode the string?. I have tried with NSASCIIStringEncoding and NSUTF8StringEncoding, but it doesn't encoded the string. So please help me out.
Eg :
ฤฐ -> ฤฐ
ฤฑ -> ฤฑ
Thanks!
What you have are Unicode code points, not strings. You don't need to specify a string encoding, because what you are dealing with aren't strings at all; they're just single characters. And an NSString does not have an "encoding" in this sense.
To get those characters into a string, you need to use:
[NSString stringWithCharacters: length];
For example: you don't want to be creating a string with the contents "304"; that's just a string of numbers. Instead, create a unichar with the value of 304:
unichar iWithDot = 304;
"Unichar" is just an unsigned short, so no pointer and no quotes; you are just assigning the code point to a numerical value. Bundle all of the characters you need into a C string and pass the pointer to stringWithCharacters.

Converting an NSString to and from UTF32

I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.
To convert the first character of an NSString to a unicode value, this routine seems to work:
const unsigned char *cs = (const unsigned char *)
[s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
code <<= 8;
code += cs[i];
}
return code;
However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.
However, converting to / from cstrings does not seem to be reversible for me.
For example, I've tried this code, and the "tmp" string is not equal to the original string "s".
char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];
Does anyone know what I am doing wrong? Should I be using "wchar_t" for the cstring instead of char *?
Any help is greatly appreciated!
Thanks,
Ron
You have a couple of reasonable options.
1. Conversion
The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like
UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;
Now you can create an NSString using both characters at the same time:
NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];
To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).
2. Byte buffers
Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:
UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];
To get it back out again, you can use
UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
// outputChar now has the first UTF32 character
}
There are two probelms here:
1:
The first one is that both [NSString cStringUsingEncoding:] and [NSString getCString:maxLength:encoding:] return the C-string in native-endianness (little) without adding a BOM to it when using NSUTF32StringEncoding and NSUTF16StringEncoding.
The Unicode standard states that: (see, "How I should deal with BOMs")
"If there is no BOM, the text should be interpreted as big-endian."
This is also stated in NSString's documentation: (see, "Interpreting UTF-16-Encoded Data")
"... if the byte order is not otherwise specified, NSString assumes that the UTF-16 characters are big-endian, unless there is a BOM (byte-order mark), in which case the BOM dictates the byte order."
Although they're referring to UTF-16, the same applies to UTF-32.
2:
The second one is that [NSString stringWithCString:encoding:] internally uses CFStringCreateWithCString to create the C-string. The problem with this is that CFStringCreateWithCString only accepts strings using 8-bit encodings. From the documentation: (see, "Parameters" section)
The string must use an 8-bit encoding.
To solve this issue:
Explicitly state the encoding endianness you want to use both ways (NSString -> C-string and C-string -> NSString)
Use [NSString initWithBytes:length:encoding:] when trying to create an NSString from a C-string encoded in UTF-32 or UTF-16.
Hope this helps!