I learned today that while common fractions have dedicated Unicode values, in order to form less common fractions like ³/₁₆ you have to use superscript/subscript characters followed by a slash. This is confirmed here and here.
This works for ¹¹/₁₆ and ¹³/₁₆, but it gets messed up with ¹⁵/₁₆. Do you see how the 5 rises higher than the one? I imagine this is because in order to show the number 5 clearly as a superscript, it requires more height than 1 and 3.
Well, that creates a problem. How do you display the fraction 15/16 nicely as Unicode characters? Unfortunately I can't use the sup and sub tags. I'm not displaying it in an HTML page. Rather, we're passing a string to a Java application that will then render these values. I know it renders Unicode values fine, but it wouldn't recognize HTML tags. Is there a Unicode solution?
The “proper” way of composing arbitrary vulgar fractions in Unicode is to not use the subscript and superscript digits at all, but to utilise the special properties of the character U+2044 FRACTION SLASH. You would simply type the regular ASCII digits and separate them with the slash like so: 15⁄16. The rendering engine will then automatically select the correct forms of the numbers, producing a clean, uniform look.
I put the word ‘proper’ in quotation marks because this method is not guaranteed to be supported on all systems, and some that do support it do so incorrectly or incompletely. If you absolutely need to make sure that 100% of recipients regardless of system will definitely see something that looks more or less right, I would therefore still (begrudgingly) recommend using the preformatted subscripts and superscripts as a substitute. As the other answer explained, the problem you are having is a font issue and cannot be solved if you do not have control over font settings.
This is indeed a font issue, however the problem arises from the fact that, in Unicode, ¹, ², and ³ belong to the Latin-1 Supplement block, while the other superscript digits belong to the Superscripts and Subscripts block, and some font substitution occurs.
Please see Why the display of Unicode characters for superscripted digits are not at the same height? for extra details; it is tagged as iOS, but I have the same problem on macOS too.
I found this site, Unicode Fraction Creator: https://lights0123.com/fractions/
Here's an example: ³⁄₂
Which is:
U+00B3 superscript three
U+2044 fraction slash
U+2082 subscript two
For a general answer on displaying fractions nicely, copy, paste, and change.
ASCII Characters
Name
hexadecimal value
⁄
Fraction Slash
8260
0
digit 0
48
1
digit 1
49
2
digit 2
50
3
digit 3
51
4
digit 4
52
5
digit 5
53
6
digit 6
54
7
digit 7
55
8
digit 8
56
9
digit 9
57
example: 1/0 =
1⁄0
Related
While dealing with unicode encoded characters in Java, I used Normalizer to normalize it and convert it to a String. Below is the code I used:
input = "¼";
input = Normalizer.normalize(input,Normalizer.Form.NFKD);
output: 1⁄4.
The forward slash that the method used was "⁄" whose unicode encoding is \u2044 as opposed to the regular forward slash that I am able to type using my keyboard which is "/" encoded as \u002f.
What is the difference between these and when should one be used over another?
Thanks in advance.
Rishit
Unicode these days contains heaps of variations of the common non-letter characters, and slashes are no exception. (That's not even all of them - search for "solidus" to get some more.) You've got fraction slashes (your one), full-width slashes, division slashes (yup, that's separate from the fraction one), thick slashes, extra-thick slashes - the list goes on.
The good news is you get to decide what slash is appropriate for your context.
If you're wanting to normalise just because you don't want fractions to appear squashed into a single character, or you want all fractions to display identically (unicode obviously can't have a character for every possible fraction) then using this fraction slash is probably what you want to go with.
On the other hand, if you want to normalise because you want to reduce the set of available characters to those that can be easily typed on a standard keyboard, it's likely the standard forward slash you should go with.
As Michael Berry mentioned, \u2044 is the fraction slash character.
It isn’t just a slash that looks a little different; it has specific rendering behavior. From the Unicode specification, section 6.2, “Other Punctuation”:
Fraction Slash. U+2044 FRACTION SLASH is used between digits to form numeric fractions, such as 2/3 and 3/9. The standard form of a fraction built using the fraction slash is defined as follows: any sequence of one or more decimal digits (General Category = Nd), followed by the fraction slash, followed by any sequence of one or more decimal digits. Such a fraction should be displayed as a unit, such as ³⁄₄ or . The precise choice of display can depend on additional formatting information.
If the displaying software is incapable of mapping the fraction to a unit, then it can also be displayed as a simple linear sequence as a fallback (for example, 3/4). If the fraction is to be separated from a previous number, then a space can be used, choosing the appropriate width (normal, thin, zero width, and so on). For example, 1 + THIN SPACE + 3 + FRACTION SLASH + 4 is displayed as 1 ³⁄₄.
Personally, I prefer the use of the fraction slash, as it makes fractions look better, like they’re professionally typeset. But there are some contexts where an ASCII slash is better, such as monospaced text, or wanting all-ASCII output, or as Michael mentioned, limiting text to characters which can be typed on a keyboard.
American Standard Code Of Information Interchange
Unicode
or more character set Contains Digits? Example:1 2 3
Assuming you're asking which character sets contain digits, the answers are yes, yes and yes.
ASCII has the digits from 0 thru 9 at code points 0x30 thru 0x39.
So do ISO 10646 and Unicode, which share the same character encodings in the low range at least. They are also at code points 0x30/U+30 thru 0x39/U+39 (since the lower 128 characters are equivalent to ASCII).
As for the final paragraph asking if there are other character set that contain digits, I can think of at least one off the top of my head, EBCDIC. These are at code points 0xf0 thru 0xf9.
No doubt there are other encoding schemes that include digits as well, given their usefulness.
I'm reading the popular Unicode article from Joel Spolsky and there's one illustration that I don't understand.
What does "Hex Min, Hex Max" mean? What do those values represent? Min and max of what?
Binary can only have 1 or 0. Why do I see tons of letter "v" here?
http://www.joelonsoftware.com/articles/Unicode.html
The Hex Min/Max define the range of unicode characters (typically represented by their unicode number in HEX).
The v is referring to the bits of the original number
So the first line is saying:
The unicode characters in the range 0 (hex 00) to 127 (hex 7F) (a 7
bit number) are represented by a 1 byte bit string starting with '0'
followed by all 7 bits of the unicode number.
The second line is saying:
The unicode numbers in the range 128 (hex 0800) to 2047 (07FF) (an 11
bit number) are represented by a 2 byte bit string where the first
byte starts with '110' followed by the first 5 of the 11 bits, and the
second byte starts with '10' followed by the remaining 6 of the 11 bits
etc
Hope that makes sense
Note that the table in Joel's article covers code points that do not, and never will, exist in Unicode. In fact, UTF-8 never needs more than 4 bytes, though the scheme underlying UTF-8 could be extended much further, as shown.
A more nuanced version of the table is available in How does a file with Chinese characters know how many bytes to use per character? It points out some of the gaps. For example, the bytes 0xC0, 0xC1, and 0xF5..0xFF can never appear in valid UTF-8. You can also see information about invalid UTF-8 at Really good bad UTF-8 example test data.
In the table you showed, the Hex Min and Hex Max values are the minimum and maximum U+wxyz values that can be represented using the number of bytes in the 'byte sequence in binary' column. Note that the maximum code point in Unicode is U+10FFFF (and that is defined/reserved as a non-character). This is the maximum value that can be represented using the surrogate encoding scheme in UTF-16 using just 4 bytes (two UTF-16 code points).
Occasionally I've seen the symbol "plus or minus" written in fractional form, like this:
Is there a Unicode character for this?
Note: I already know about the standard "plus-minus sign" symbol, but it won't work in this context. I'm specifically looking for a version with the fraction bar.
You can approximate it to some extent with a superscript plus (U+207A), a division slash (U+2215) and a subscript minus (U+208B):
⁺∕₋
However, it requires font support to get right. Especially the super- and subscript +/− are not available in most fonts, so it might just render horribly.
For reference, that's how it looks for me (better than five years ago, but still somewhat broken):
However, using Cambria Math in Word 2010 it looks like this:
Which probably is exactly how it should look like (follows the same typesetting rules as fractions).
This is the only one I have seen in unicode (plus over minus):
±
HTML/XML Character reference:
±
HTML Named Entity:
±
This symbol is used to indicate the precision of an approximation.
You mean like ± (U+00B1 / "\x00b1")?
Edit: speaking specifically to a design which uses a solidus, the best I could find was ⁺⁄₋ which is U+207a (superscript plus sign) U+2044 (fraction slash) U+208b (subscript minus). The fraction slash has negative kerning in some fonts, which causes the appearance of composition. See this JSFiddle for an example of how this works with a larger font size.
<div style="font-size:20em;">⁺⁄₋</div>
+⁄−
<sup>+</sup>⁄<sub>−</sub>
In UTF-8: 0xC2 0xB1
For other encodings see:
http://www.fileformat.info/info/unicode/char/b1/index.htm
I know unicode contains all characters from most world aphabets..but what about digits? Are they part of unicode or not? I was not able to find straight answer.
Thanks
As already stated, Indo-Arabic numerals (0,1,..,9) are included in Unicode, inherited from ASCII. If you're talking about representation of numbers in other languages, the answer is still yes, they are also part of Unicode.
//numbers (0-9) in Malayalam (language spoken in Kerala, India)
൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
//numbers (0-9) in Hindi (India's national language)
० १ २ ३ ४ ५ ६ ७ ८ ९
You can use \p{N} or \p{Number} in a regular expression to match any kind of numeric character in any script.
This document (Page-3) describes the Unicode code points for Malayalam digits.
In short: yes, of course. There are three categories in UNICODE containing various representations of digits and numbers:
Number, Decimal Digit (characters) – e.g. Arabic, Thai, Devanagari digits;
Number, Letter (characters) – e.g. Roman numerals;
Number, Other (characters) – e.g. fractions.
The Unicode points below 128 are exactly the same as ASCII so, yes, they're at U+0030 through U+0039 inclusive.
Yes they are - codepoints 0030 to 0039, as you can see e.g. on decodeunicode.org
btw, codepoints 0000-007E are the same as ASCII (0-127, 128+ isn't ASCII anymore), so anything that you can find in ASCII you can find in Unicode.
Yes I think so:
Information Taken From Here
U+0030 0 30 DIGIT ZERO
U+0031 1 31 DIGIT ONE
U+0032 2 32 DIGIT TWO
U+0033 3 33 DIGIT THREE
U+0034 4 34 DIGIT FOUR
U+0035 5 35 DIGIT FIVE
U+0036 6 36 DIGIT SIX
U+0037 7 37 DIGIT SEVEN
U+0038 8 38 DIGIT EIGHT
U+0039 9 39 DIGIT NINE
You can answer that question yourself: if they weren’t part of Unicode, this would rather drastically reduce the usefulness of Unicode, don’t you think?
Basically, any text that needs to use numbers couldn’t be represented using Unicode code points. (This is assuming that you don’t switch to and fro between different character encodings in one text: I don’t know a single software / programming language that supports this, and for good reason.)
If such questions crop up, you badly need to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky. Seriously. Go read it.