Recognizing Unicode numbers from different languages - unicode

In Unicode every language will have their own number. For example ASCII has "3", Japanese has "3", and so on. How can I identify a three no matter what unicode byte it is represented by?

Read about normative properties Decimal digit value, Digit value and Numeric value in UnicodeData File Format:
Decimal digit value normative This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of
the Unicode Standard, the value of that digit is represented with an
integer value in this field.
Digit value normative This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the
value is here. This covers digits which do not form decimal radix
forms, such as the compatibility superscript digits.
Numeric value normative This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the
Unicode Standard, the value of that character is represented with an
integer or rational number in this field. This includes fractions as,
e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are
numerical values for compatibility characters such as circled numbers.
For instance, Python's unicodedata module provides access to the Unicode Character Database which defines character properties for all Unicode characters, see implementation: unicodedata — Unicode Database:
import unicodedata
numchars = '\u0033','\u00B3','\u0663','\u06F3','\u07C3','\u0969','\uFF13','\u2155'
for numchar in numchars:
print( numchar
, unicodedata.decimal( numchar, -1)
, unicodedata .digit( numchar, -1)
, unicodedata.numeric( numchar, -1)
, unicodedata .name( numchar, '? ? ?') )
Output:
==> D:\test\Python\Py3\41045800.py
3 3 3 3.0 DIGIT THREE
³ -1 3 3.0 SUPERSCRIPT THREE
٣ 3 3 3.0 ARABIC-INDIC DIGIT THREE
۳ 3 3 3.0 EXTENDED ARABIC-INDIC DIGIT THREE
߃ 3 3 3.0 NKO DIGIT THREE
३ 3 3 3.0 DEVANAGARI DIGIT THREE
3 3 3 3.0 FULLWIDTH DIGIT THREE
⅕ -1 -1 0.2 VULGAR FRACTION ONE FIFTH
==>
P.S. Given Python example as the question is not tagged to any particular language.

Related

printf: how to set the default number of digits used by the exponent?

printf: is it possible to configure the default number of digits used by the exponent.
For portability reason I would like to set the number of digits used for exponents for exponents below 100.
On my machine the default is 2 digits
printf "%.3e\n", 342.7234;
# 3.427e+02
but in How can I convert between scientific and decimal notation in Perl? the exponent has 3 digits.

In unicode standard, why does U+12ca = 0x12ca? Where does the 0 come from and how does 0x12ca = 4810 decimal

I'm learning about Unicode basics and I came across this passage:
"The Unicode standard describes how characters are represented by code points. A code point is an integer value, usually denoted in base 16. In the standard, a code point is written using the notation U+12ca to mean the character with value 0x12ca (4810 decimal)."
I have three questions from here.
what does the ca stand for? in some places i've seen it written as just U+12. what's the difference?
where did the 0 in 0x12ca come from? what does it mean?
how does the value 0x12ca become 4810 decimal?
its my first post here and would appreciate any help! have a nice day y'all!!
what does the ca stand for?
It stands for the hexadecimal digits c and a.
In some places I've seen it written as just U+12. What's the difference?
Either that is a mistake, or U+12 is another (IMO sloppy / ambiguous) way of writing U+0012 ... which is a different Unicode codepoint to U+12ca.
Where did the 0 in 0x12ca come from? what does it mean?
That is a different notation. That is hexadecimal (integer) literal notation as used in various programming languages; e.g. C, C++, Java and so on. It represents a number ... not necessarily a Unicode codepoint.
The 0x is just part of the notation. (It "comes from" the respective language specifications ...)
How does the value 0x12ca become 4810 decimal?
The 0x means that the remaining are hexadecimal digits (aka base 16), where:
a or A represents 10,
b or B represents 11,
c or C represents 12,
d or D represents 13,
e or E represents 14,
f or F represents 15,
So 0x12ca is 1 x 163 + 2 x 162 + 12 x 161 + 10 x 160 ... is 4810.
(Do the arithmetic yourself to check. Converting between base 10 and base 16 is simple high-school mathematics.)

How can I check if number is out specified hexa range?

How can I write regular expression that recognize any expression from this form: "\xdd" while dd represents hexadecimal number out of the range 00-7F ?
Regular expressions do not express numerical ranges, but sequences of characters in a character set. You have to express those ranges one character at a time.
So the hex digits are [0-9A-F] which describes the set of characters for one digit using the two ranges [0-9] and [A-F] (you'd also have to decide if lower case letters are permitted). For two digits you'd have to notice that the first digit is of a shorter range using only [0-7]. The combined result would be:
[0-7][0-9A-Fa-f]
Putting the other symbols in place we could get:
\\x[0-7][0-9A-Fa-f]
(Assuming \ is a meta-character that needs escaping).

length and precision issue in Postgres

I'm using postgres sql I need 12 digits and after decimal I need only 6 digits what length & Precision should I give in columns.what datatype shold I give to cloumn.
I tried numeric as a datatype and length I give to column is 12 and precision is 6.
If you need 12 digits before the decimal and 6 digits after, you need numeric(18,6)
Quote from the manual
The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. The precision of a numeric is the total count of significant digits in the whole number, that is, the number of digits to both sides of the decimal point
(Emphasis mine)
So the first number (precision) in the type definition is the total number of digits. The second one is the number of decimal digits.
If you specify numeric(12,6) you have a total of 12 digits and 6 decimal digits, which leaves you only 6 digits for the digits to the left of the decimal. Therefor you need numeric(18,6)

how to remove last zero from number in matlab

If I set a variable in Matlab, say var1 = 2.111, after running the script, Matlab returns var1 = 2.1110. I want Matlab to return the original number, with no trailing zero. Anyone know how to do this. Thanks in advance.
By default Matlab displays results in Short fixed decimal format, with 4 digits after the decimal point.
You can change that to various other format such as:
long
Long fixed decimal format, with 15 digits after the decimal point for double values, and 7 digits after the decimal point for single values.
3.141592653589793
shortE
Short scientific notation, with 4 digits after the decimal point.
Integer-valued floating-point numbers with a maximum of 9 digits do not display in scientific notation.
3.1416e+00
longE
Long scientific notation, with 15 digits after the decimal point for double values, and 7 digits after the decimal point for single values.
Integer-valued floating-point numbers with a maximum of 9 digits do not display in scientific notation.
3.141592653589793e+00
shortG
The more compact of short fixed decimal or scientific notation, with 5 digits.
3.1416
longG
The more compact of long fixed decimal or scientific notation, with 15 digits for double values, and 7 digits for single values.
3.14159265358979
shortEng
Short engineering notation, with 4 digits after the decimal point, and an exponent that is a multiple of 3.
3.1416e+000
longEng
Long engineering notation, with 15 significant digits, and an exponent that is a multiple of 3.
3.14159265358979e+000
However I don't think other options are available. If you absolutely want to remove those zeros you would have to cast you result in a string and remove the trailing 0 characters and then display your result as a string and not a number.