Convert between diacritic variants of a character - swift

I'm passing a string as a parameter to command line tool written in swift.
I have a problem with some characters containing diacritics.
If I pass à á ả ã ạ й ё as a line argument, inside the app I got à á ả ã ạ й ё. It looks the same, but it's not:
func printUnicodeScalars(_ string: String) {
print(string, "->", string.unicodeScalars.map { $0 })
}
printUnicodeScalars("à á ả ã ạ й ё")
// à á ả ã ạ й ё -> ["\u{00E0}", " ", "\u{00E1}", " ", "\u{1EA3}", " ", "\u{00E3}", " ", "\u{1EA1}", " ", "\u{0439}", " ", "\u{0451}"]
printUnicodeScalars("à á ả ã ạ й ё")
// à á ả ã ạ й ё -> ["a", "\u{0300}", " ", "a", "\u{0301}", " ", "a", "\u{0309}", " ", "a", "\u{0303}", " ", "a", "\u{0323}", " ", "\u{0438}", "\u{0306}", " ", "\u{0435}", "\u{0308}"]
I know that a diacritics character can in ASCII can be represented in different ways: like a single character, or like a combination of two: a letter and a diacritics.
For some reason command line tool converts first variant into the second one. Probably that's because it's limited to UTF-8.
How can I convert it back? Like to join many unicode-scalars character into a single one.

I think you need to use precomposedStringWithCanonicalMapping. This converts the string to Normalization Form C, which is:
Canonical Decomposition, followed by Canonical Composition
Example:
let string = "à á ả ã ạ й ё"
print(string.unicodeScalars.count) // 20
print(string.precomposedStringWithCanonicalMapping.unicodeScalars.count) // 13

Related

Updating my current script which modifies multiple lines into a single one

My current script copies text like this with a shortcut:
:WiltedFlower: aetheryxflower ─ 4
:Alcohol: alcohol ─ 3,709
:Ant: ant ─ 11,924
:Apple: apple ─ 15
:ArmpitHair: armpithair ─ 2
and pastes it modified into a single line
Pls trade 4 aetheryxflower 3 alcohol 11 ant 15 apple 2 armpithair <#id>
As you can see there are already two little problems, the first one is that it copies only the number/s before a comma if one existed instead of ignoring it. The second is that I always need to also copy before hitting the hotkey and start re/start the script, I've thought of modifying the script so that it uses the already selected text instead of the copied one so that I can bind it with a single hotkey.
That is my current script, it would be cool if anyone can also tell me what they used and why exactly, so that I also get better with ahk
!q::
list =
While pos := RegExMatch(Clipboard, "(\w*) ─ (\d*)", m, pos ? pos + StrLen(m) : 1)
list .= m2 " " m1 " "
Clipboard := "", Clipboard := "Pls trade " list " <#951737931159187457>"
ClipWait, 0
If ErrorLevel
MsgBox, 48, Error, An error occurred while waiting for the clipboard.
return
If the pattern of your copied text dont change, you can use something like this:
#Persistent
OnClipboardChange:
list =
a := StrSplit(StrReplace(Clipboard, "`r"), "`n")
Loop,% a.Count() {
b := StrSplit( a[A_Index], ": " )
c := StrSplit( b[2], " - " )
list .= Trim( c[2] ) " " Trim( c[1] ) " "
}
Clipboard := "Pls trade " list " <#951737931159187457>"]
ToolTip % Clipboard ; just for debug
return
With your example text, the output will be:
Pls trade aetheryxflower ─ 4 alcohol ─ 3,709 ant ─ 11,924 apple ─ 15 armpithair ─ 2 <#951737931159187457>
And this will run EVERY TIME your clipboard changes, to avoid this, you can add at the top of the script #IfWinActive, WinTitle or #IfWinExist, WinTitle depending of your need.
The answer given would solve the problem, assuming that it never changes pattern as Diesson mentions.
I did the explanation of the code you provided with comments in the code below:
!q::
list = ; initalize a blank variable
; regexMatch(Haystack, regexNeedle, OutputVar, startPos)
; just for frame of reference in explanation of regexMatch
While ; loop while 'pos' <> 0
pos := RegExMatch(Clipboard ; Haystack is the string to be searched,
in this case the Clipboard
, "(\w*) ─ (\d*)" ; regex needle in this case "capture word characters
(a-z OR A-Z OR 0-9 OR _) any number of times, space dash space
then capture any number of digits (0-9)"
, m ; output var array base name, ie first capture will be in m1
second m2 and so on.
, pos ? pos + StrLen(m) : 1) ; starting position for search
"? :"used in this way is called a ternary operator, what is saying
is "if pos<>0 then length of string+pos is start position, otherwise
start at 1". Based on the docs, this shouldn't actually work well
since 'm' in this case should be left blank
list .= m2 " " m1 " " ; append the results to the 'list' variable
followed with a space
Clipboard := "" ; clear the clipboard.
Clipboard := "Pls trade " list " <#951737931159187457>"
ClipWait, 0 ; wait zero seconds for the clipboard to change
If ErrorLevel ; if waiting zero seconds for the clipboard to change
doesn't work, give error msg to user.
MsgBox, 48, Error, An error occurred while waiting for the clipboard.
return
Frankly this code is what I would call quick and dirty, and seems unlikely to work well all the time.

BreakPermittedHere char in filename

I recieved (mail attachment) a .pdf file with a \0082 (Break Permitted Here) character instead presumably of a é one.
Sciences Num<here>riques et Technologie.pdf
What could have happened ?
It's a simple mojibake case: sender and receiver applies different code pages.
Example for given characters .\Py\mojibakeWindows.py é \x82
Mojibake prove using 97 codecs
string é ['U+00e9'] Latin Small Letter E With Acute
versus ['U+0082'] ??? Cc
[b'\x82']
é ['cp437', 'cp720', 'cp775', 'cp850', 'cp852', 'cp857', 'cp858', 'cp860', 'cp861', 'cp863', 'cp865']
['cp1006', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'iso8859_11']
Here the mojibakeWindows.py script is as follows:
import sys
import codecs
import unicodedata
if len(sys.argv) == 3:
str1st = sys.argv[1].encode('raw_unicode_escape').decode('unicode_escape').encode('utf-16_BE','surrogatepass').decode('utf-16_BE');
str2nd = sys.argv[2].encode('raw_unicode_escape').decode('unicode_escape').encode('utf-16_BE','surrogatepass').decode('utf-16_BE');
else:
print( 'need two `string` parameters e.g. as follows:');
print( sys.argv[0], '"╧╤╪"', '"ÏÑØ"' )
sys.exit();
codec_list = ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] + ['cp273', 'cp1125', 'iso8859_11', 'koi8_t', 'kz1048']; # 'cp65001',
print( 'Mojibake prove using', len( codec_list ), 'codecs' );
str1stname = unicodedata.name(str1st, '??? {}'.format(unicodedata.category(str1st))) if len(str1st)==1 else ''
str2ndname = unicodedata.name(str2nd, '??? {}'.format(unicodedata.category(str2nd))) if len(str2nd)==1 else ''
print( 'string', str1st, ['U+{0:04x}'.format( ord(ch)) for ch in str1st], str1stname.title() );
print( 'versus', str2nd, ['U+{0:04x}'.format( ord(ch)) for ch in str2nd], str2ndname.title() );
str1list = [];
str2list = [];
strXlist = [];
for cod in codec_list:
for doc in codec_list:
if cod != doc:
# str1ste = codecs.encode( str1st,encoding=cod,errors='replace');
try:
str1ste = codecs.encode( str1st,encoding=cod,errors='strict');
except:
str1ste = b'?' * len(str1st);
# str2nde = codecs.encode( str2nd,encoding=doc,errors='replace');
try:
str2nde = codecs.encode( str2nd,encoding=doc,errors='strict');
except:
str2nde = b'?' * len(str2nd);
if ( str1ste == str2nde and b'?' not in str1ste ):
if cod not in str1list: str1list.append( cod );
if doc not in str2list: str2list.append( doc );
if str1ste not in strXlist: strXlist.append( str1ste );
print( strXlist );
print( str1st, str1list );
print( str2nd, str2list );
Another example applies my Alt KeyCode Finder script (see column Alt0 for ACP code and column Dec for OEMCP code): powershell -COMMAND .\PShell\MyCharMap.ps1 é,0x82
Ch Unicode Dec CP IME Alt Alt0 IME 0405/cs-CZ; CP65001; ACP 65001
é U+00E9 233 …233… Latin Small Letter E With Acute
130 CP437 en-US 0233 (ACP 1252) US & Western Eu
130 CP850 en-GB 0233 (ACP 1252) US & Western Eu
130 CP852 cs-CZ 0233 (ACP 1250) Central Europe
130 CP775 et-EE 0233 (ACP 1257) Baltic
130 CP857 tr-TR 0233 (ACP 1254) Turkish
130 CP720 ar-EG 0233 (ACP 1256) Arabic
vi-VN 0233 (ACP 1258) Vietnamese
U+0082 130 …130… Break Permitted Here
130 CP869 el-gr (ACP 1253) Greek-2
th-TH 0130 (ACP 874) Thai

Odd school assignment, about displaying emojis in powershell

I've had the pleasure to get the assignment of posting emojis in Powershell, the only problem is they have to be on the same line, and there are three. This is, my first assignment, and we have no prior teaching in this subject so after googling and searching YouTube, my best shot was this below, however, it came with some error saying something about either too high value, or too low value.
Full error text: Exception calling "ToInt32" with "2" argument (s): "The value was either too large or too small to a UInt32. "
At C: \ Users \ EG \ Downloads \ Herningsholm \ Powershell H1 \ Hardware Information.ps1: 3 char: 5
$ UnicodeInt = [System.Convert] :: toInt32 ($ StrippedUnicode, 16)
CategoryInfo: NotSpecified: (:) [], MethodInvocationException
FullyQualifiedErrorId: OverflowException
$FullUnicode = ('U+1F60E') + ('U+1F436') + ('U+1F642')
$StrippedUnicode = $FullUnicode -replace 'U\+',''
$UnicodeInt = [System.Convert]::toInt32($StrippedUnicode,16)
[System.Char]::ConvertFromUtf32($UnicodeInt)
Try this out:
Full emoji list > here
# saves unicode for each emoji https://unicode.org/emoji/charts/full-emoji-list.html
$FullUnicode0 = 'U+1F606'
$FullUnicode1 = 'U+1F605'
$FullUnicode2 = 'U+1F605'
# removes the U+ bit
$StrippedUnicode0 = $FullUnicode0 -replace 'U\+',''
$StrippedUnicode1 = $FullUnicode1 -replace 'U\+',''
$StrippedUnicode2 = $FullUnicode2 -replace 'U\+',''
# Converts the value of the specified object to a 32-bit signed integer
$UnicodeInt0 = [System.Convert]::toInt32($StrippedUnicode0,16)
$UnicodeInt1 = [System.Convert]::toInt32($StrippedUnicode1,16)
$UnicodeInt2 = [System.Convert]::toInt32($StrippedUnicode2,16)
# Converts the specified Unicode code point into a UTF-16 encoded string so that you have an emoji
$Emoji0 = [System.Char]::ConvertFromUtf32($UnicodeInt0)
$Emoji1 = [System.Char]::ConvertFromUtf32($UnicodeInt1)
$Emoji2 = [System.Char]::ConvertFromUtf32($UnicodeInt2)
write-host "$($Emoji0), $($Emoji1), $($Emoji2)"

How to break a long code line into multiple lines in NetLogo?

In Python, we break a long line of code into multiple code lines with backslash like this.
a = 5
b = 11
print(str(a) + " plus " + \
str(b) + " is " + \
str(a + b))
# prints "5 plus 11 is 16"
How do we do that in NetLogo?
NetLogo doesn't care about multiple lines except for comments (the comment marker ; only lasts to the end of the line). All of these are the same:
to testme
; single line
let a 25 print a
; command per line
let b 20
print b
; unreadable
let
c
15
print
c
end

How to identify all non-basic UTF-8 characters in a set of strings in perl

I'm using perl's XML::Writer to generate an import file for a program called OpenNMS. According to the documentation I need to pre-declare all special characters as XML ENTITY declarations. Obviously I need to go through all strings I'm exporting and catalogue the special characters used. What's the easiest way to work out which characters in a perl string are "special" with respect to UTF-8 encoding? Is there any way to work out what the entity names for those characters should be?
In order to find "special" characters, you can use ord to find out the codepoint. Here's an example:
# Create a Unicode test file with some Latin chars, some Cyrillic,
# and some outside the BMP.
# The BMP is the basic multilingual plane, see perluniintro.
# (Not sure what you mean by saying "non-basic".)
perl -CO -lwe "print join '', map chr, 97 .. 100, 0x410 .. 0x415, 0x10000 .. 0x10003" > u.txt
# Read it and find codepoints outside the BMP.
perl -CI -nlwe "print for map ord, grep ord > 0xffff, split //" < u.txt
You can get a good introduction from reading perluniintro.
I'm not sure what the docs you're referring to mean in the section "Exported XML".
Looks like some limitation of a system which is de facto ASCII and doesn't do Unicode.
Or a misunderstanding of XML. Or both.
Anyway, if you're looking for names you could use or reference the canonical ones.
See XML Entity Definitions for Characters or one of the older documents for HTML or MathML referenced therein.
You might look into the uniquote program. It has a --xml option. For example:
$ cat sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
3 NFC multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
5 invisible characters: (4⁄3⁢π⁢r³) and (4⁄3⁢π⁢r³).
6 astral characters: (𝐂 = sqrt[𝐀² + 𝐁²]) and (𝐂 = sqrt[𝐀² + 𝐁²]).
7 astral + combining chars: (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]) and (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]).
8 wide characters: (wide) and (wide).
9 regular characters: (normal) and (normal).
$ uniquote -x sample
1 NFD single combining characters: (cre\x{300}me bru\x{302}le\x{301}e et fiance\x{301}) and (cre\x{300}me bru\x{302}le\x{301}e et fiance\x{301}).
2 NFC single combining characters: (cr\x{E8}me br\x{FB}l\x{E9}e et fianc\x{E9}) and (cr\x{E8}me br\x{FB}l\x{E9}e et fianc\x{E9}).
3 NFD multiple combining characters: (ha\x{302}\x{303}c\x{327}\x{30C}k) and (ha\x{303}\x{302}c\x{327}\x{30C}k).
3 NFC multiple combining characters: (h\x{1EAB}\x{E7}\x{30C}k) and (h\x{E3}\x{302}\x{E7}\x{30C}k).
5 invisible characters: (4\x{2044}3\x{2062}\x{3C0}\x{2062}r\x{B3}) and (4\x{2044}3\x{2062}\x{3C0}\x{2062}r\x{B3}).
6 astral characters: (\x{1D402} = sqrt[\x{1D400}\x{B2} + \x{1D401}\x{B2}]) and (\x{1D402} = sqrt[\x{1D400}\x{B2} + \x{1D401}\x{B2}]).
7 astral + combining chars: (\x{1D402}\x{305} = sqrt[\x{1D400}\x{305}\x{B2} + \x{1D401}\x{305}\x{B2}]) and (\x{1D402}\x{305} = sqrt[\x{1D400}\x{305}\x{B2} + \x{1D401}\x{305}\x{B2}]).
8 wide characters: (\x{FF57}\x{FF49}\x{FF44}\x{FF45}) and (\x{FF57}\x{FF49}\x{FF44}\x{FF45}).
9 regular characters: (normal) and (normal).
$ uniquote -b sample
1 NFD single combining characters: (cre\xCC\x80me bru\xCC\x82le\xCC\x81e et fiance\xCC\x81) and (cre\xCC\x80me bru\xCC\x82le\xCC\x81e et fiance\xCC\x81).
2 NFC single combining characters: (cr\xC3\xA8me br\xC3\xBBl\xC3\xA9e et fianc\xC3\xA9) and (cr\xC3\xA8me br\xC3\xBBl\xC3\xA9e et fianc\xC3\xA9).
3 NFD multiple combining characters: (ha\xCC\x82\xCC\x83c\xCC\xA7\xCC\x8Ck) and (ha\xCC\x83\xCC\x82c\xCC\xA7\xCC\x8Ck).
3 NFC multiple combining characters: (h\xE1\xBA\xAB\xC3\xA7\xCC\x8Ck) and (h\xC3\xA3\xCC\x82\xC3\xA7\xCC\x8Ck).
5 invisible characters: (4\xE2\x81\x843\xE2\x81\xA2\xCF\x80\xE2\x81\xA2r\xC2\xB3) and (4\xE2\x81\x843\xE2\x81\xA2\xCF\x80\xE2\x81\xA2r\xC2\xB3).
6 astral characters: (\xF0\x9D\x90\x82 = sqrt[\xF0\x9D\x90\x80\xC2\xB2 + \xF0\x9D\x90\x81\xC2\xB2]) and (\xF0\x9D\x90\x82 = sqrt[\xF0\x9D\x90\x80\xC2\xB2 + \xF0\x9D\x90\x81\xC2\xB2]).
7 astral + combining chars: (\xF0\x9D\x90\x82\xCC\x85 = sqrt[\xF0\x9D\x90\x80\xCC\x85\xC2\xB2 + \xF0\x9D\x90\x81\xCC\x85\xC2\xB2]) and (\xF0\x9D\x90\x82\xCC\x85 = sqrt[\xF0\x9D\x90\x80\xCC\x85\xC2\xB2 + \xF0\x9D\x90\x81\xCC\x85\xC2\xB2]).
8 wide characters: (\xEF\xBD\x97\xEF\xBD\x89\xEF\xBD\x84\xEF\xBD\x85) and (\xEF\xBD\x97\xEF\xBD\x89\xEF\xBD\x84\xEF\xBD\x85).
9 regular characters: (normal) and (normal).
$ uniquote -v sample
1 NFD single combining characters: (cre\N{COMBINING GRAVE ACCENT}me bru\N{COMBINING CIRCUMFLEX ACCENT}le\N{COMBINING ACUTE ACCENT}e et fiance\N{COMBINING ACUTE ACCENT}) and (cre\N{COMBINING GRAVE ACCENT}me bru\N{COMBINING CIRCUMFLEX ACCENT}le\N{COMBINING ACUTE ACCENT}e et fiance\N{COMBINING ACUTE ACCENT}).
2 NFC single combining characters: (cr\N{LATIN SMALL LETTER E WITH GRAVE}me br\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}l\N{LATIN SMALL LETTER E WITH ACUTE}e et fianc\N{LATIN SMALL LETTER E WITH ACUTE}) and (cr\N{LATIN SMALL LETTER E WITH GRAVE}me br\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}l\N{LATIN SMALL LETTER E WITH ACUTE}e et fianc\N{LATIN SMALL LETTER E WITH ACUTE}).
3 NFD multiple combining characters: (ha\N{COMBINING CIRCUMFLEX ACCENT}\N{COMBINING TILDE}c\N{COMBINING CEDILLA}\N{COMBINING CARON}k) and (ha\N{COMBINING TILDE}\N{COMBINING CIRCUMFLEX ACCENT}c\N{COMBINING CEDILLA}\N{COMBINING CARON}k).
3 NFC multiple combining characters: (h\N{LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE}\N{LATIN SMALL LETTER C WITH CEDILLA}\N{COMBINING CARON}k) and (h\N{LATIN SMALL LETTER A WITH TILDE}\N{COMBINING CIRCUMFLEX ACCENT}\N{LATIN SMALL LETTER C WITH CEDILLA}\N{COMBINING CARON}k).
5 invisible characters: (4\N{FRACTION SLASH}3\N{INVISIBLE TIMES}\N{GREEK SMALL LETTER PI}\N{INVISIBLE TIMES}r\N{SUPERSCRIPT THREE}) and (4\N{FRACTION SLASH}3\N{INVISIBLE TIMES}\N{GREEK SMALL LETTER PI}\N{INVISIBLE TIMES}r\N{SUPERSCRIPT THREE}).
6 astral characters: (\N{MATHEMATICAL BOLD CAPITAL C} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{SUPERSCRIPT TWO}]) and (\N{MATHEMATICAL BOLD CAPITAL C} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{SUPERSCRIPT TWO}]).
7 astral + combining chars: (\N{MATHEMATICAL BOLD CAPITAL C}\N{COMBINING OVERLINE} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO}]) and (\N{MATHEMATICAL BOLD CAPITAL C}\N{COMBINING OVERLINE} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO}]).
8 wide characters: (\N{FULLWIDTH LATIN SMALL LETTER W}\N{FULLWIDTH LATIN SMALL LETTER I}\N{FULLWIDTH LATIN SMALL LETTER D}\N{FULLWIDTH LATIN SMALL LETTER E}) and (\N{FULLWIDTH LATIN SMALL LETTER W}\N{FULLWIDTH LATIN SMALL LETTER I}\N{FULLWIDTH LATIN SMALL LETTER D}\N{FULLWIDTH LATIN SMALL LETTER E}).
9 regular characters: (normal) and (normal).
$ uniquote --xml sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hâçk) and (hãçk).
3 NFC multiple combining characters: (hẫk) and (hãk).
5 invisible characters: (4⁄3⁢r³) and (4⁄3⁢r³).
6 astral characters: (𝐂 = sqrt[𝐀 + 𝐁]) and (𝐂 = sqrt[𝐀 + 𝐁]).
7 astral + combining chars: (𝐂 = sqrt[𝐀 + 𝐁]) and (𝐂 = sqrt[𝐀 + 𝐁]).
8 wide characters: (w) and (w).
9 regular characters: (normal) and (normal).
$ uniquote --verbose --html sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
3 NFC multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
5 invisible characters: (4⁄3⁢π⁢r³) and (4⁄3⁢π⁢r³).
6 astral characters: (𝐂 = sqrt[𝐀² + 𝐁²]) and (𝐂 = sqrt[𝐀² + 𝐁²]).
7 astral + combining chars: (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]) and (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]).
8 wide characters: (wide) and (wide).
9 regular characters: (normal) and (normal).