Unicode character best suited for being upside-down small L, with widest font compatibility? - unicode

I want a character that's nearly a vertical line and extends significantly below the baseline, and that's supported by (at least one of) the fonts installed by default on popular operating systems. I could use 'l' or '1' or '|', but it looks bad when all the other upside-down characters extend downwards, and this is the only one pointing up. I'd use #font-face (it's for a webpage), but it also shows up in tooltips and page titles so that's not good enough.
The ideal would of course be Latin Small Letter Turned L, but the font support is awful. Latin Small Letter Dotless J would also be acceptable, and the font support is not quite as bad, but I still don't see any of the default Windows fonts in the list.
What's my least-bad option?
(A fair suggestion would be, "Why do you need this? Maybe there are other solutions" -- and I'll figure out a plan B if I have to, but first I'd like to know if I have to. I also recognize that this is a niche question. If this is not the right place to ask it, where is?)

The best I was able to find is TURNED GREEK SMALL LETTER IOTA (U+2129) “℩”, which is present in Arial Unicode MS and Lucida Sans Unicode. The main problem with it is its small size—its height is just the x-height, roughly speaking.
Or maybe THAI CHARACTER LAKKHANGYAO (U+0E45) “ๅ”? Arial Unicode MS and MS Sans Serif.
(I suppose this is for an improved upside-down converter. Such converters can be fun, but they often suffer from the effects of mixing fonts.)

Related

Why does the red heart emoji require two code points, but the other colored hearts require one?

It appears that the red heart emoji (❤️) "\u2764\uFE0F" requires two Unicode codepoints, specifically Heavy Black Heart followed by a Variation Selector. However, blue 💙, green 💚, yellow 💛, and purple 💜 each have their own single codepoint.
Why is red so different?
For historical reasons. Originally, there was only U+2764 HEAVY BLACK HEART which the first applications that supported Emojis decided to render as a red heart. These early applications always rendered U+2764 as Emoji. Later it was realized that this was a bad idea and the variation selectors for Emojis were standardized. When additional heart emojis were added, there was no need for another red heart, so it was omitted. Instead there's a separate black heart emoji U+1F5A4 🖤.
In theory, an application could require that the Emoji variation selector is appended to other heart code points as well. But it doesn't make much sense to render characters like PURPLE HEART as a non-Emoji. It does make a difference for HEAVY BLACK HEART, though, which is often intended to be rendered as the original, plain heavy black heart character.
HEAVY BLACK HEART was added to Unicode decades before emoji. When emoji were incorporated in Unicode 6 some already existing characters were simply reused as emoji to avoid unnecessary duplicates. Later, variation sequences were defined for characters that also map to a non-emoji character set to allow for better control over how they display. For example, U+2744 ❄ SNOWFLAKE is originally from Zapf Dingbats (I believe) but was later also made an emoji. So if you want to force the original text-style display you can use VARIATION SELECTOR-15 (resulting in ❄︎), and if you want to force the newer emoji-style display you can use VARIATION SELECTOR-16 (resulting in ❄️).
Note, however, that not many platforms actually support those variation sequences correctly at the moment. Also not all of them automatically apply the variation selectors when using the emoji keyboard. In theory ❤ and ❄ (and many other emoji) should display as text style by default without VS16, but many applications ignore that as well.
I have a list of all code points that can display differently via a variation sequence, on my website, if you're interested. The next Unicode update in June is going to add some more.

Codepoint of the 'missing glyph'-box

When a textbox, browser or other program can't display a character, or the character is not valid unicode, a white-box character is drawn instead to represent the missing glyph.
I assume that this box-glyph is a Unicode character itself, thus I am looking for its codepoint so I can use it. Does anyone know which codepoint is used, or perhaps if my assumption is wrong and it is not necessarily a member of the font?
At first I thought it might be the White Square (U+25A1), but, after I compared this glyph with an example, I found white square was smaller. There is a larger variant of it (medium and large), but these do not appear in the font under consideration, so these can not be the ones I am seeing.
I managed to find my answer, here on stackoverflow: https://stackoverflow.com/a/22636426/2718186
Particularly, the part that talks about .notdef glyph. It seems that fonts reserve a special glyph, that is not mapped to by any Unicode point, to indicate that a character has no glyph in the current font.

Display fullwidth chars w/ exactly twice the length as halfwidth chars?

I am not certain whether this is the right place to ask this, but I do not know of any other sites that would fit better. And the question has something to do with programming, so:
I Am Writing a formatted txt-guide. Please take a look at this excerpt: http://mad-gaksha.homelinux.net/public/width.txt. I need to have full-width characters displayed so that they occupy exactly twice the space as half-width characters. While monospaced fonts seems to work fine with only half-width chars, most fullwidth "fixed-width" fonts I've tried didn't produce the desired result.
In firefox, this works when I set the monospace font (Edit>Preferences>Content>Advanced)to "monospace". But only for a font-size of 14. Same thing with gedit, the fixed-width font MS-Gothic, works only for font sizes 13/14.
As I find this behaviour quite strange and wouldn't want my readers to be troubled by technical details, does anyone have suggestions or give any resources or could explain what's going on here? Why does it seem so hard just to display each glyph with a fixed size?
Thanks in advance for taking your time.
It looks like it's to do with rounding fractional pixels.
A font renderer may adjust horizontal positioning when the width of a glyph isn't a whole number of screen pixels. I believe the Cairo rendering used by gedit and Firefox on Linux doesn't do sub-pixel positioning for fonts so this may be necessary here.
In a true monospace font this doesn't matter because every glyph has the same width so receives the same treatment, but where there is a mixture of full- and half-width characters, the rounding won't be uniform unless the glyphs happen to be a whole number of pixels wide (which happens in your case at font size 14).
Note that on Windows for most small sizes, fonts like MS Gothic will be rendered using custom built-in bitmaps in the font file, instead of rendering the outlines and their metrics. This makes all glyphs necessarily a fixed number of pixels wide. However this does result in the typical old-school ‘jaggy’ rendering style.
If you are producing formatted-text files there is really nothing you can do about this. You can only hope that your target audience has Japanese monospaced fonts that are suitable and can switch to them at a particular font size.
I would agree with Clement's comment that using HTML to get the rendering you want would be more robust, modern and convenient. Using HTML for layout relieves you of having to worry about lining up characters, and allows you to get fonts that are less ugly than all that half-width-monospaced Latin.

iOS japanese handwriting input code help please

I have a series of questions about writing code for iOS and including handwritten recognition of japanese. I am a beginner, so be gentle and assume I am stupid ...
I'd like to present a japanese word in hiragana (japanese phonetic alphabet), then have the user handwrite the appropriate kanji (chinese character). Then, this is internally compared to the correct character. Then, user gets feedback (if they were correct or not).
My questions here revolve around the handwritten input.
I know normally if one uses the chinese keyboard this type of input is possible.
How can I institute something similar, without using the keyboard itself? Are there already library functions for this (I feel there must be since that input is available on the chinese keyboard)?
Also, Kanji aren't exactly the same as chinese characters. There are unique characters that japanese people invented themselves. How would I be able to include these in my handwriting recognition?
We worked on a similar exercise back at University.
As the order of the strokes is well defined with kanji and there are only 8 (?) different strokes. Basically each Kanji is a well-ordered sequence of strokes. Like te (hand) is the sequence "The short falling backward stroke" and then twice the "left to right stroke" and finally "The long downward stroke with the little tip at the bottom". There are databases that give you this information.
Now the problem is almost reduced to identify the correct stroke. You will still run into some ambiguities where you have to take into consideration in which spatial relation some strokes are to some others.
EDIT: For stroke recognition we snapped the free hand writing to 45 degrees (Where is the little circle symbol on the keyboard?) angles, thus converting it into a sequence of vectors along one of these directions. Let's assume that direction zero is from bottom to top, direction 1 bottom right to top left, 2 from right to left and so on CCW.
Then the first stroke of te (手) would be [23]+ (as some write it falling and some horizontal)
The second and third stroke would be 6+
and the last would be 4+[123] (as with the little tip, every writer uses a different direction)
This coarse snapping was actually enough for us to recognize kanjis. Maybe there are more sofisticated ways, but this simple solution managed to recognize about 90% of kanjis. It couldn't grasp only the handwriting of one professor, but the problem was that also no human except himself could read his handwriting.
EDIT2: It is important that your user "prints" the Kanji and doesn't write in calligraphy, since in calligraphy many strokes are merged into one. Like when writing a kanji with the radical of "rice field" in calligraphy, this radical morphs into something completely different. Or radicals with a lot of horizontal dashes (like the radical of "speech" iu) just become one long wriggly line.

Unicode character that lines up with ⎮ but is as long as ⎢

Sorry if this isn't the right overflow for this question. I need a unicode character that is as long as ⎢ (23A2, LEFT SQUARE BRACKET EXTENSION) but lines up horizontally with ⎮ (23AE, INTEGRAL EXTENSION). Is there such a character?
Take a look at shapecatcher. If you draw a straight line, it shows plenty of different codepoints resembling |.
As already pointed out, exact placement and size may depend on the font, but if you know that the font is going to be a specific one (because you supply it), you could still find the character you're looking for.
It turns out this does depend on the font. If I use DejaVu Sans Mono, INTEGRAL EXTENSION is as long as I want it to be. This font appears to be almost exactly the same as the font I was using, Menlo, except for some small differences with some characters (including this one).