iOS japanese handwriting input code help please - iphone

I have a series of questions about writing code for iOS and including handwritten recognition of japanese. I am a beginner, so be gentle and assume I am stupid ...
I'd like to present a japanese word in hiragana (japanese phonetic alphabet), then have the user handwrite the appropriate kanji (chinese character). Then, this is internally compared to the correct character. Then, user gets feedback (if they were correct or not).
My questions here revolve around the handwritten input.
I know normally if one uses the chinese keyboard this type of input is possible.
How can I institute something similar, without using the keyboard itself? Are there already library functions for this (I feel there must be since that input is available on the chinese keyboard)?
Also, Kanji aren't exactly the same as chinese characters. There are unique characters that japanese people invented themselves. How would I be able to include these in my handwriting recognition?

We worked on a similar exercise back at University.
As the order of the strokes is well defined with kanji and there are only 8 (?) different strokes. Basically each Kanji is a well-ordered sequence of strokes. Like te (hand) is the sequence "The short falling backward stroke" and then twice the "left to right stroke" and finally "The long downward stroke with the little tip at the bottom". There are databases that give you this information.
Now the problem is almost reduced to identify the correct stroke. You will still run into some ambiguities where you have to take into consideration in which spatial relation some strokes are to some others.
EDIT: For stroke recognition we snapped the free hand writing to 45 degrees (Where is the little circle symbol on the keyboard?) angles, thus converting it into a sequence of vectors along one of these directions. Let's assume that direction zero is from bottom to top, direction 1 bottom right to top left, 2 from right to left and so on CCW.
Then the first stroke of te (手) would be [23]+ (as some write it falling and some horizontal)
The second and third stroke would be 6+
and the last would be 4+[123] (as with the little tip, every writer uses a different direction)
This coarse snapping was actually enough for us to recognize kanjis. Maybe there are more sofisticated ways, but this simple solution managed to recognize about 90% of kanjis. It couldn't grasp only the handwriting of one professor, but the problem was that also no human except himself could read his handwriting.
EDIT2: It is important that your user "prints" the Kanji and doesn't write in calligraphy, since in calligraphy many strokes are merged into one. Like when writing a kanji with the radical of "rice field" in calligraphy, this radical morphs into something completely different. Or radicals with a lot of horizontal dashes (like the radical of "speech" iu) just become one long wriggly line.

Related

Are all "non-grapheme" code points invisible?

In a unicode string, each grapheme consists of one or more code points. However, there are some code points, such as the Zero-width joiner (ZWJ), which are never a part of a grapheme. The ZWJ is, in itself, invisible. Are all of those "non-grapheme" code points always invisible?
The Unicode representation of the Ogham script is notable for containing a non-invisible whitespace character. (U+1680: OGHAM SPACE MARK)
Tom Scott made an excellent YouTube video on the subject: link
There are many joining characters which are intended to modify a base character. Whether they provide a grapheme on their own is partially an implementation detail, I expect.
Example: o followed by U+0308 COMBINING DIAERESIS produces â (the glyph in isolation is rendered by your browser as ̈)
List of all code points in this category: https://codepoints.net/search?lb=CM
Recent Unicode versions also have invisible characters which modify how a previous emoji is being rendered, famously to add e.g. a skin color trait to emojis with human figures or faces. These by definition are not graphemes in their own right, though again, rendering engines are probably free to figure out a way to represent them if they are encountered in isolation.
Example: πŸ‘‹ U+1F44B WAVING HAND SIGN followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 (which in isolation renders as 🏻) produces πŸ‘‹πŸ»
Full catalog: https://www.unicode.org/emoji/charts/full-emoji-modifiers.html

Are they unicode code points that enable geometric transformation such as rotation and mirroring?

Playing with Unicode to create symbols with the already large set of combiners and other modifiers allows to already go far.
Although, there are times where some arrows are only given in a single direction, or a diacritic is available only placed above, but not for example bellow on the left side.
So are they modifiers/combiners that allow to instruct such a composition?
For example, the combining rectangle allows to make something like aΜ». At least on the current terminal, it's rendered with a rectangle on the above right position compared to the a glyph to which it's combined, having it's longest side oriented horizontally. Now, what if :
the goal is to place the rectangle at the top left, top middle, etc.?
the goal is to rotate the rectangle before it's combined with the main glyph?
the goal is to mirror the rectangle before it's combined with the main glyph?
Obviously the last point don't make much difference for a rectangle, but for asymmetric glyphs it would.
No, there is no such mechanism in Unicode. Different positional variants of the same diacritic are encoded as separate characters. For example, U+0307 COMBINING DOT ABOVE, U+0358 COMBINING DOT ABOVE RIGHT, and U+1DF8 COMBINING DOT ABOVE LEFT are all different codepoints. There is currently no way to represent, say, a generic combining dot below right in Unicode.
Similarly, arbitrary Unicode characters cannot be mirrored or rotated. Where such transformations make a meaningful distinction (for example the pair β€œE” and β€œΖŽβ€), they have once again been encoded atomicly.
There are some very specific circumstances where such modifiers can be applied. In Sutton SignWriting, rotation is a productive feature. Rotating glyphs is necessary to display text correctly, so a number of rotation modifiers have been defined. For example, U+1D800 SIGNWRITING HAND-FIST INDEX points upwards in its base orientation (𝠀), but by appending U+1DAA1 SIGNWRITING ROTATION MODIFIER-2 you can make it point north-west instead (𝠀πͺ‘).
For emoji only, Unicode also specifies a mechanism for defining whether a given glyph is supposed to face left or right. For example, β€œπŸš—β€β¬…οΈβ€ would be an automobile going to the left and β€œπŸš—β€βž‘οΈβ€ would be an automobile going to the right. No commercially available fonts presently support this mechanism, however.

Why does the red heart emoji require two code points, but the other colored hearts require one?

It appears that the red heart emoji (❀️) "\u2764\uFE0F" requires two Unicode codepoints, specifically Heavy Black Heart followed by a Variation Selector. However, blue πŸ’™, green πŸ’š, yellow πŸ’›, and purple πŸ’œ each have their own single codepoint.
Why is red so different?
For historical reasons. Originally, there was only U+2764 HEAVY BLACK HEART which the first applications that supported Emojis decided to render as a red heart. These early applications always rendered U+2764 as Emoji. Later it was realized that this was a bad idea and the variation selectors for Emojis were standardized. When additional heart emojis were added, there was no need for another red heart, so it was omitted. Instead there's a separate black heart emoji U+1F5A4 πŸ–€.
In theory, an application could require that the Emoji variation selector is appended to other heart code points as well. But it doesn't make much sense to render characters like PURPLE HEART as a non-Emoji. It does make a difference for HEAVY BLACK HEART, though, which is often intended to be rendered as the original, plain heavy black heart character.
HEAVY BLACK HEART was added to Unicode decades before emoji. When emoji were incorporated in Unicode 6 some already existing characters were simply reused as emoji to avoid unnecessary duplicates. Later, variation sequences were defined for characters that also map to a non-emoji character set to allow for better control over how they display. For example, U+2744 ❄ SNOWFLAKE is originally from Zapf Dingbats (I believe) but was later also made an emoji. So if you want to force the original text-style display you can use VARIATION SELECTOR-15 (resulting in β„οΈŽ), and if you want to force the newer emoji-style display you can use VARIATION SELECTOR-16 (resulting in ❄️).
Note, however, that not many platforms actually support those variation sequences correctly at the moment. Also not all of them automatically apply the variation selectors when using the emoji keyboard. In theory ❀ and ❄ (and many other emoji) should display as text style by default without VS16, but many applications ignore that as well.
I have a list of all code points that can display differently via a variation sequence, on my website, if you're interested. The next Unicode update in June is going to add some more.

How to center align, ignoring certain characters?

Look at this UILabel. It's center-aligned:
Now look at this UILabel. Although it is technically center-aligned, it really doesn't look that way:
The reason why it looks like this is because the center-alignment considers the degree symbol a third character, thus bumping the other two off to the left a bit. My question is: is there any way to ignore certain characters whilst center-aligning a label?
Interesting question. The only solution that comes to mind for me is to pad the text string with spaces on the front to cancel out the ignored characters on the back.
That is, to center #"60d" as if it were #"60", set the text to #" 60d". This works well with a fixed width font, but otherwise is only a rough approximation.
If you like this idea and want to get fancy with it, then you can use NSStrings method
– stringByPaddingToLength:withString:startingAtIndex:
perhaps in conjunction with – rangeOfCharacterFromSet: or some such method to determine how many spaces to pad with.
You could of course measure the text string(s) and compute your own positioning, rather than using text alignment in a larger field.
Assuming you don't want to do that, another idea that comes to mind is to display the string β€œΒ°60°” with the first character styled with a color of opacity 0 and no shadow.
I don't do iOS development so I don't know how practical these are.

font with graphic "blackspace" character

I'm looking for a font which contains a graphic character which is (essentially), the space character, inverted. I'm looking for a graphic character equivalent to the largest-possible solid-black box. The closest I have been able to find is Wingings 2 character 162, but that doesn't fill the entire available character space. When I insert two consecutive Wingdings 2 162 characters, there is still appreciable whitespace between them when displayed or printed. Does anyone know of a black-box font/character which would fill all available character space?
All characters are going to have whitespace between them, or they would be unreadable. This is called "kerning". You can adjust the kerning and line-height in whatever program you are using to send the malicious fax, if you want to be sure to use the maximum amount of toner per page.
Have you considered creating your own font using a software package like this or like this? You could edit the space character to be a solid black square. But as Chris McCall mentioned, you may still have space between characters of any size due to kerning applied by the layout engine that draws the fonts.
You other option is to owner draw your own text and programmatically replacing spaces with black boxes. You would have complete control over kerning and everything else.
I don't know if this is exactly what you were looking for, but...
I was looking for the same thing, since I wanted to create a "textbox" when I wanted to write text using the spritefont, but I never knew how long the total string was going to be, so I wanted something that I could "write" in the same location right before the string with a contrasting color which could be expected to be as long as the string it needed to encompass. That being the case, try:
Webdings - character 103.
I tried lining them up and there wasn't even any space in between. Perfect.