How can I get Mocha's Unicode output to display properly in a Windows console? - powershell

When I run Mocha, it tries to show a check mark or an X for a passing or a failing test run, respectively. I've seen great-looking screenshots of Mocha's output. But those screenshots were all taken on Macs or Linux. In a console window on Windows, these characters both show up as a nondescript empty-box character, the classic "huh?" glyph:
If I highlight the text in the console window and copy it to the clipboard, I do see actual Unicode characters; I can paste the fancy characters into a textbox in a Web browser and they render just fine (✔, ✖). So the Unicode output is getting to the console window just fine; the problem is that the console window isn't displaying those characters properly.
How can I fix this so that all of Mocha's output (including the ✔ and ✖) displays properly in a Windows console?

By pasting the characters into LinqPad, I was able to figure out that they were 'HEAVY CHECK MARK' (U+2714) and 'HEAVY MULTIPLICATION X' (U+2716). It looks like neither character is supported in any of the console fonts (Consolas, Lucida Console, or Raster Fonts) that are available in Windows 7. In fact, out of all the fonts that ship with Windows 7, only a handful support these characters (Meiryo, Meiryo UI, MS Gothic, MS Mincho, MS PGothic, MS PMincho, MS UI Gothic, and Segoe UI Symbol). The ones starting with "MS" are all fixed-width (monospace) fonts, but they all look awful at the font sizes typical of a console. And the others are out, since the console requires fixed-width fonts.
So you'll need to download a font. I like DejaVu Sans Mono -- it's free, it looks good at console sizes, it's easy to tell the 0 from the O and the 1 from the I from the l, and it's got all kinds of fancy Unicode symbols, including the check and X that Mocha uses.
Unfortunately, it's a bit of a pain to install a new console font, but it's doable. (Steps adapted from this post by Scott Hanselman, but extended to include the non-obvious subtleties of 000.)
Steps:
Download the DejaVu fonts. Unzip the files. Go into the "ttf" directory you just unzipped, select all the files, right-click and "Install".
Run Regedit, and go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont.
Add a new string value. Give it a name that's a string of zeroes one longer than the longest string of zeroes that's already there. For example, on my Windows 7 install, there's already a value named 0 and one named 00, so I had to name the new one 000.
Double-click on your new value, and set its value to DejaVu Sans Mono.
Reboot. (Yes, this step is necessary, at least on OSes up to and including Windows 7.)
Now you can open a console window, open the window menu, go to Defaults > Font tab, and "DejaVu Sans Mono" should be available in the Font list box. Select it and OK.
Now Mocha's output will display in all its glory.

Update: this issue has now been fixed. Starting from Mocha 1.7.0, fallbacks are used for symbols that don't exist in default console fonts (√ instead of ✔, × instead of ✖, etc.). It's not as pretty as it could be, but it surely beats empty-box placeholder symbols.
For details, see the related pull request: https://github.com/visionmedia/mocha/pull/641

Related

How to fix PowerShell 7 fonts not showing correctly | oh-my-posh

I've already installed Windows Terminal, set it up with "oh my posh" and everything working as intended.
Though whenever I launch PowerShell 7 (without the terminal), the font is messy as you can see at the image below
I have already tried to change the font, to the same one I used in terminal's .json but there are still some parts that are not rendering correctly and I cannot use it that way with VSCode
The problem is because the Windows Console doesn't fully support UTF-8:
Windows Console was created way back in the early days of Windows,
back before Unicode itself existed! Back then, a decision was made to
represent each text character as a fixed-length 16-bit value (UCS-2).
Thus, the Console’s text buffer contains 2-byte wchar_t values per
grid cell, x columns by y rows in size.
...
One problem, for example, is that because UCS-2 is a fixed-width
16-bit encoding, it is unable to represent all Unicode codepoints.
This means you have "partial" support for Unicode characters in the Windows Console (i.e. as long as the character can be represented in UCS-2), but won't support all potential (32-bit) Unicode regions.
When you see boxes, that means that the character that is being used is using a region outside of the UCS-2 range. You also tell this because you get 2 boxes (i.e. 2 x 16 bit values). That is why you can't have happy faces 😀 in your Windows Console (which makes me sad ☹️).
In order for it to work in all locations, you will have to modify your oh-my-posh themes to use a different character that can be represented with a UCS-2 character.
For Version 2 of Oh My Posh, to make the font changes you have to edit the $ThemeSettings variable. Follow the instructions on the GitHub on configuring Theme Settings. e.g.:
$ThemeSettings.GitSymbols.BranchSymbol = [char]::ConvertFromUtf32(0x2514)
For Version 3+ of Oh My Posh, you have to edit the JSON configuration file to make the changes, e.g.:
...
{
"type": "git",
"style": "powerline",
"powerline_symbol": "\u2514",
....

I need to remove a specific unicode in my existing subtitle text file

I basically work on subtitles and I have this arabic file and when I open it up on notepad and right click and select SHOW UNICODE CONTROL CHARACTERS I give me some weird characters on the left of every line. I tried so many ways to remove it but failed I also tried NOTEPAD++ but failed.
Notepad ++
SUBTITLE EDIT
EXCEL
WORD
288
00:24:41,960 --> 00:24:43,840
‫أتعلم، قللنا من شأنك فعلاً‬
289
00:24:44,000 --> 00:24:47,120
‫كان علينا تجنيدك لتكون جاسوساً‬
‫مكان (كاي سي)‬
290
00:24:47,280 --> 00:24:51,520
‫لا تعلمون كم أنا سعيد‬
‫لسماع ذلك‬
291
00:24:54,800 --> 00:24:58,160
‫لا تقلق، سيستيقظ نشيطاً غداً‬
292
00:24:58,320 --> 00:25:00,800
‫ولن يتذكر ما حصل‬
‫في الساعات الـ٦‬
the unicodes are not showing in this the unicode is U+202B which shows a ¶ sign, after googling it I think it's called PILCROW.
The issue with this is that it doesn't display subtitles correctly on ps4 app.
I need this PILCROW sign to go away. with this website I can see the issue in this file https://www.soscisurvey.de/tools/view-chars.php
The PILCROW ¶ is used by various software and publishers to show the end of a line in a document. The actual Unicode character does not exist in your file so you can't get rid of it.
The Unicode characters in these lines are 'RIGHT-TO-LEFT EMBEDDING'
(code \u202b) and 'POP DIRECTIONAL FORMATTING' (code \u202c) -
these are used in the text to indicate that the included text should be rendered
right-to-left instead of the ocidental left-to-right direction.
Now, these characters are included as hints to the application displaying the text, rather than to actually perform the text reversing - so they likely can be removed without compromising the text displaying itself.
Now this a programing Q&A site, but you did not indicate any programming language you are familiar with - enough for at least running a program. So it is very hard to know how give an answer that is suitable to you.
Python can be used to create a small program to filter such characters from a file, but I am not willing to write a full fledged GUI program, or an web app that you could run there just as an answer here.
A program that can work from the command line just to filter out a few characters is another thing - as it is just a few lines of code.
You have to store the follwing listing as a file named, say "fixsubtitles.py" there, and, with a terminal ("cmd" if you are on Windows) type python3 fixsubtitles.py \path\to\subtitlefile.txt and press enter.
That, of course, after installing Python3 runtime from http://python.org
(if you are on Mac or Linux that is already pre-installed)
import sys
from pathlib import Path
encoding = "utf-8"
remove_set = str.maketrans("\u202b\u202c")
if len(sys.argv < 2):
print("Usage: python3 fixsubtitles.py [filename]", file=sys.stderr)
exit(1)
path = Path(sys.argv[1])
data = path.read_text(encoding=encoding)
path.write_text(data.translate("", "", remove_set), encoding=encoding)
print("Done")
You may need to adjust the encoding - as Windows not always use utf-8 (the files can be in, for example "cp1256" - if you get an unicode error when running the program try using this in place of "utf-8") , and maybe add more characters to the set of characters to be removed - the tool you linked in the question should show you other such characters if any. Other than that, the program above should work

Where are the unicode characters on the disk and what's the mapping process?

There are several unicode relevant questions has been confusing me for some time.
For these reasons as follow I think the unicode characters are existed on disk.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
There's a concept of UCD (unicode character database), and We can download it's latest version. UCD latest
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
So if the unicode characters does existed on the disk , then :
Where is it ?
How can I upgrade it ?
What's the process of mapping the unicode code point to a glyph ?
If I use a specific font, then what's the process of mapping the unicode code point to a glyph ?
If not, then what's the process of mapping the unicode code point to a glyph ?
It will very appreciated if someone could shed light on these problems.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
That's echo -e in bash.
› echo "\u6211"
\u6211
› echo -e "\u6211"
我
Where is it ?
In the font file.
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
How can I upgrade it ?
Installing/upgrading a suitable font with the emojis should be enough. I don't have macOS, so I cannot verify this.
I use "Noto Color Emoji" version 2.011/20180424, it works fine.
What's the process of mapping the unicode code point to a glyph ?
The application (e.g. text editor) provides the font rendering subsystem (Quartz? on macOS) with Unicode text and a font name. The font renderer analyses the codepoints of the text and decides whether this is simple text (e.g. Latin, Chinese, stand-alone emojis) or complex text (e.g. Latin with many marks, Thai, Arabic, emojis with zero-width joiners). The renderer finds the corresponding outlines in the font file. If the file does not have the required glyph, the renderer may use a similar font, or use a configured fallback font for a poor substitute (white box, black question mark etc.). Then the outlines undergo shaping to compose a complex glyph and line-breaking. Finally, the font renderer hands off the result to the display system.
Apart from the shaping, very little of this has to do with Unicode or encoding. Font rendering already used to work that way before Unicode existed, of course font files and rendering was much simpler 30 years ago. Encoding only matters when someone wants to load or save text from an application.
Summary: investigate
Truetype/Opentype font editing software so you can see what's contained in the files
font renderers, on Linux look at the libraries pango and freetype.
Generally speaking, operating system components that use text use the Unicode character set. In particular, font files use the Unicode character set. But, not all font files support all the Unicode codepoints.
When a codepoint is not supported by one font, the system might fallback to another that does. This is particularly true of web browsers. But ultimately if the codepoint is not supported, an unfilled rectangle is rendered. (There is no character for that because it's not a character. In fact, if you were able to copy and paste it as text, it should be the original character that couldn't be rendered.)
In web development, the web page can either supply or give the location of fonts that should work for the codepoints it uses.
Other programs typically use the operating system's rendering facilities and therefore the fonts available through it. How to install a font in an operating system is not a programming question (unless you are including a font in an installer for your program). For more information on that, you could see if the question fits with the Ask Different (Apple) Stack Exchange site.

Miscellaneous characters in xmgrace

xmgrace is wonderful, but it has some problems when dealing with miscellaneous characters.
How can I make the script small l ($\ell$ in latex) in xmgrace?
I believe the only way to do this is to specify a script-like system font. None of the standard ones are suitable so you will have to make sure that a suitable font is installed on your system.
You can change to any font by enclosing the name in
\f{}
e.g.
\f{Symbol}
or
\f{Century-Schoolbook-L-Bold_italic}
You can see a list of the available fonts (and their labels) by going to the Font tool in the Window menu of the xmgrace GUI.
After typing the special character you can return to your original font in a similar way, or by using \0 to get back to the default font 0.

Opening a file containing unicode characters using notepad++ appears corrupted

I'm using the latest version of Notepad ++ 6.3.1 and using a windows os. While trying to open a file containing unicode character appears corrupted despite changing the encoding to UTP8. It is displayed like "[][][][]". I'm I missing something in the settings? Kindly help.
Thanks
This is a font issue. You need a font containing the Japanese characters as installed in your computer, and you also need to have Notepad++ set to use such a font, for the kind of text being viewed. But it seems that Notepad++ is capable of using fallback fonts when needed (e.g., when the font selected does not contain all characters appearing in the text), so the problem is probably that no font in your system contains the characters. See e.g. the list East Asian Unicode fonts for Windows computers.
Not a font issue, unfortunately. Try with these characters, with UTF-16 encoding:
🔊, 🎥, 📕 (> U+FFFF)
Conclusion: Notepad++ doesn't have full Unicode support (unlike Windows Notepad or AkelPad)
Notepad++ is also inconsistent. With a document in UTF-8, using Lucida Console font, create a line
⇐⇑⇒⇓⇔⇕⇖⇗⇘⇙
and enter a newline in the middle - second line becomes 5 blocks, and then delete newline - all 10 characters display properly.
With font MS Gothic, this test always displays proper characters
Notepad++ v7.5.1 (64-bit)
Build time : Aug 29 2017 - 02:38:44
Path : C:\Program Files\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS : Windows 10 (64-bit)
Plugins : mimeTools.dll NppConverter.dll