Justification in Unicode

Justification in Unicode - unicode

I'm almost positive that Unicode has a recommendation for whether and when to add spacing around interpunctuation characters (like the raised dot, ·), but I can't seem to find it again. Can someone with better Unicode-fu help me?

Related

What is a realistic maximum number of unicode combining characters?

I'm looking for a maximum number of unicode combining characters that appear after a non-combining one in a realistic natural text.
I know that in unicode text there can be an arbitrary number of combinings placed anywhere in the text. However, I am writing a specialized application that has to operate under constrained resources and because of that and other technical reasons displaying an arbitrary number of combining chars after a non-combining one is not an option. However I would still like to display natural languages properly if possible and support for a small number of combinings should not be a problem.
My intuition that natural languages don't need more than some two or three combinings after a proper char, but I'm not sure and can't find any source on that number.

Ok, for a lack of a better answer, here's what I did (for future reference if needed):
I ended up using a SmallVec -like thing with a threshold of 8 bytes before allocation and some 50 bytes upper limit (text stored in UTF-8). That should make everyone happy I think and performance doesn't suffer.
Take those numbers with a pinch of salt, they are arbitrary and I might tune them anyway.

Are there Unicode symbols for even and odd numbers?

Using an option, I can specify whether even or odd days are valid. For the user interface I search now for suitable symbols to display these options. In mathematics there is no symbol for even and odd numbers, as far as I know.
Does anyone know whether there is perhaps something corresponding in Unicode?

The answer is no.
Given that odd and even numbers are a mathematical concept and mathematics has no symbol for odd and even numbers, maybe except for 2N and 2N+1, you'll find it hard to find a non-existent symbols in Unicode.
You'd have to think of your own characters, or find some in Unicode and just redefine their meaning.

Subscripted 'y' in unicode

I have to display $CₓH\subscript{y}$.
Is there any chance to display a subscripted 'y' in Unicode?
\u2093 represents the subscripted 'x'

Usually you do this with formatting. Unicode's selection of superscript and subscript characters doesn't stem from the need or desire to cover whole alphabets but rather to enable specific use cases, e.g. writing IPA. Furthermore, if you're using a good OpenType font it can also support proper subscripts for arbitrary characters at the font level (where a glyph isn't simply scaled down by the layout engine, but rather a specifically-designed subscript glyph from the font is used).
In fact, since you're already using TeX or something vaguely similar to it, just let one of the many implementations render it. There are lots of things you simply cannot do in plain text without formatting, and this is one of them.

The subscript and superscript characters in Unicode do not cover the whole alphabet.
See the Wiki article on this topic or this answer on SO.

In Sublime Text this subscripted y works: ᵧ. Copied from here: https://lingojam.com/SubscriptGenerator
EDIT This is actually the greek letter gamma

ways to hide secret in png image (steganography)

I'm trying to find a secret message, a string, in a 256x256 png image. It's supposed to have "used an old school trick to hide the data", and apparently that method is mentioned in the steganography Wikipedia article.
I tried what appeared to me as most oldschool an straightforward first: LSB steganography. But no luck. I know the first and last characters of the string ("F" and "}"), and I thought they may have mixed the common lsb method up a bit, so I inspected the very first pixels and the very last pixels of the picture myself. However, no apparent combination (like only red values of each pixel) would allow for the correct character. Hence I'm pretty positive it's not using lsb.
In a second, rather desperate try I saw that Wikipedia talks about stripping the most significant six bits, leaving only the least significant two, and then normalizing the picture. I wrote a little script to do this, but no luck here either.
I also looked at the metadata with identify -verbose image.png. Nothing. The file ends as it should after the IEND chunk, so nothing hidden beyond that either.
I'm running out of ideas, so here my question:
Any hints what might classify as old school trick, that I haven't already tried? I'm sure I missed something obvious. This exercise came with a few others, and they all looked harder at first glance than they really were.
Thanks a lot. :)

It turned out that there was a chunk in the middle of the picture with a long text, which contained the wanted string, hidden in the least bits of the blue values only, in least bit first order. Somehow I missed that combination in my preliminary tests. So there you go. :)
To anybody having a similar problem: I find it's best to write a script to test all more commonsense variations (like only single colors, vertical, least-bit or greatest-bit first, etc.) in one large run. It's too easy to miss a simple one otherwise and get hopelessly stuck in crazy complicated theories.

Is there a Unicode character for plus over minus? (+/-)

Occasionally I've seen the symbol "plus or minus" written in fractional form, like this:
Is there a Unicode character for this?
Note: I already know about the standard "plus-minus sign" symbol, but it won't work in this context. I'm specifically looking for a version with the fraction bar.

You can approximate it to some extent with a superscript plus (U+207A), a division slash (U+2215) and a subscript minus (U+208B):
⁺∕₋
However, it requires font support to get right. Especially the super- and subscript +/− are not available in most fonts, so it might just render horribly.
For reference, that's how it looks for me (better than five years ago, but still somewhat broken):
However, using Cambria Math in Word 2010 it looks like this:
Which probably is exactly how it should look like (follows the same typesetting rules as fractions).

This is the only one I have seen in unicode (plus over minus):
±
HTML/XML Character reference:
±
HTML Named Entity:
±
This symbol is used to indicate the precision of an approximation.

You mean like ± (U+00B1 / "\x00b1")?
Edit: speaking specifically to a design which uses a solidus, the best I could find was ⁺⁄₋ which is U+207a (superscript plus sign) U+2044 (fraction slash) U+208b (subscript minus). The fraction slash has negative kerning in some fonts, which causes the appearance of composition. See this JSFiddle for an example of how this works with a larger font size.
<div style="font-size:20em;">⁺⁄₋</div>

+⁄−
<sup>+</sup>⁄<sub>−</sub>

In UTF-8: 0xC2 0xB1
For other encodings see:
http://www.fileformat.info/info/unicode/char/b1/index.htm

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Justification in Unicode - unicode

I'm almost positive that Unicode has a recommendation for whether and when to add spacing around interpunctuation characters (like the raised dot, ·), but I can't seem to find it again. Can someone with better Unicode-fu help me?

Related

What is a realistic maximum number of unicode combining characters?

Are there Unicode symbols for even and odd numbers?

Subscripted 'y' in unicode

ways to hide secret in png image (steganography)

Is there a Unicode character for plus over minus? (+/-)

Categories

Resources