Racket Terminal: Print unicode characters - racket

Is there a way to setup the racket terminal so that it can display unicode characters? It would be helpful, since Dr. Racket itself supports unicode file formats.

Try
chcp 65001
before starting Racket. That should change the code page to UTF-8.

Related

Encoding from ANSI when having non-latin letters

I have a very old program (not a server or something on the internet) that I think it use the ANSI (Windows-1252) encoding.
The problem is that some inputs to this program are written in Arabic.
However, when I am trying to read the result, the Arabic words are written with very wired character. For example the input: "نور" is converted to "äæÑ".
The program output should contain a combination of English words and Arabic words.
E.x. It outputs "Name äæÑ" while the correct output should be something like "Name نور".
In general, the English words are correct and readable with both UTF-8 and ANSI. But the Arabic words are read for example as "���" with UTF-8 and as "äæÑ" with ANSI.
I understand that this is because ANSI doesn't have support to non-Latin letters.
but what should I do now? How can I convert them to Arabic again?
Note: I know the exact input and the exact output that this program should produce.
Note2: I don't have the source code of this program. I just want to convert the output file of this program to have the correct words or encoding.
I solved this problem now by typing in the terminal:
iconv -f WINDOWS-1256 -t utf8 < my_File.ged > result.ged
I tried to write code in java that do a similar thing but it wasn't really working with giving my the result I wanted.
I have also tried the previous terminal command but using WINDOWS-1252 instead of WINDOWS-1256 but it wasn't working. So, I guess it is good to try different encoding until it is working

Save file with ANSI encoding in VS Code

I have a text file that needs to be in ANSI mode. It specifies only that: ANSI. Notepad++ has an option to convert to ANSI and that does the trick.
In VS Code I don't find this encoding option. So I read up on it and it looks like ANSI doesn't really exist and should actually be called Windows-1252.
However, there's a difference between Notepad++'s ANSI encoding and VS Code's Windows-1252. It picks different codepoints for characters such as an accented uppercase e (É), as is evident from the document.
When I let VS Code guess the encoding of the document converted to ANSI by Notepad++, however, it still guesses Windows-1252.
So the questions are:
Is there something like pure ANSI?
How can VS Code convert to it?
Check out the upcoming VSCode 1.48 (July 2020) browser support.
It also solves issue 33720, where you had to force encoding for the full project before with, for instance:
"files.encoding": "utf8",
"files.autoGuessEncoding": false
Now you can set the encoding file by file:
Text file encoding support
All of the text file encodings of the desktop version of VSCode are now supported for web as well.
As you can see, the encoding list includes ISO 8859-1, which is the closest norm of what "ANSI encoding" could represent.
The Windows codepage 1252 was created based on ISO 8859-1 but is not completely equal.
It differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range
From Wikipedia:
Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1.
Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard.
Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."

what's the difference among various types of 'utf-8' in emacs

In Emacs, after typing
M-x revert-buffer-with-coding-system
I could see many types of 'utf-8', for example, utf-8, utf-8-auto-unix, utf-8-emacs-unix and etc.
I want to know what's the difference among them.
I have googled them but couldn't find a proper answer.
P.S.
I ask this question because I encountered an encoding problem a few months ago. I wrote a php program in Emacs and in my ~/.emacs, I set
(prefer-coding-system 'utf-8)
but when browsing the php page in a browser, I found the browser couldn't display the content correctly due to the encoding problem even though I had wrote
<meta name="Content-Type" content="text/html; charset=UTF-8" />
in the page.
But after I used notepad++ to store the file in utf-8, the browser could display the content correctly.
So I want to learn more about encoding in Emacs.
The last part of the encoding name (eg. mac in utf-8-mac) is usually to describe the special character that will be used at the end of lines:
-mac: CR, the standard line delimiter with MacOS (until OS X)
-unix: LF the standard delimiter for unice systems (so the BSD-based Mac OS X)
-dos: CR+LF the delimiter for DOS / Windows
some additional encodings parameters include:
-emacs: support for encoding all Emacs characters (including non Unicode)
-with-signature: force the usage of the BOM (see below)
-auto: autodetect the BOM
You can combine the different possibilities, that makes the list shown in Emacs.
To get some information on type of line ending, BOMs and charsets provided by encodings, you can use describe-coding-system, or: C-hC
Concerning the BOM:
the utf standard defines a special signature to be placed at the beginning of the (text) files to distinct for the utf-16 encoding the order of the bytes (as utf-16 stores the characters with 2 bytes - or 16 bits) or endianess: some systems place the most significant byte first (big-endian -> utf-16be) some others place the least significant byte first (little-endian -> utf-16le). That signature is called BOM: the Byte Order Mark
in utf-8, each character is represented by a single byte (excepted for extended characters greater than 127, they use a special sequence of bytes) thus specifying a byte order is a nonsense but this signature is anyway usefull to detect an utf-8 file instead of a plain text ascii. An utf-8 file differs from an ascii file only on extended chars, and that can be impossible to detect without parsing the whole file until finding one when the pseudo-BOM make it visible instantly. (BTW Emacs is very efficient to make such auto-detection)
FYI, BOMs are the following bytes as very first bytes of a file:
utf-16le : FF FE
utf-16be : FE FF
utf-8 : EF BB BF
you can ask Emacs to open a file without any conversion with find-file-literally : if the first line begins with  you see the undecoded utf-8 BOM
for some additional help while playing with encodings, you can refer to this complementary answer "How to see encodings in emacs"
As #wvxvw said, your issue is a probable lack of BOM at the beginning of the file that made it wrongly interpreted and rendered.
BTW, M-x hexl-mode is also a very handy tool to check the raw content of the file. Thanks for pointing it to me (I often use an external hex editor for that, while it could be done directly in Emacs)
Can't say much about the issue, except that after setting
(prefer-coding-system 'utf-8)
(setq coding-system-for-read 'utf-8)
(setq coding-system-for-write 'utf-8)
I haven't had any unicode problems for more than 2 years.

What encoding in emacs corresponds to LaTeX's ansinew?

As the title says:
Which encoding do I have to choose in emacs long list of encodings (accessible via e.g., change for saving file: C-x RET f), that it matches ansinew in LaTeX as given by:
\usepackage[ansinew]{inputenc}
In inputenc manual you can read
ansinew Windows 3.1 ANSI encoding, extension of Latin-1 (synonym for
cp1252).
In Emacs cp1252 is an alias of windows-1252, so you can use both of them.

How can I convert japanese characters to unicode in Perl?

Can you point me tool to convert japanese characters to unicode?
CPAN gives me "Unicode::Japanese". Hope this is helpful to start with. Also you can look at article on Character Encodings in Perl and perl doc for unicode for more information.
See http://p3rl.org/UNI.
use Encode qw(decode encode);
my $bytes_in_sjis_encoding = "\x88\xea\x93\xf1\x8e\x4f";
my $unicode_string = decode('Shift_JIS', $bytes_in_sjis_encoding); # returns 一二三
my $bytes_in_utf8_encoding = encode('UTF-8', $unicode_string); # returns "\xe4\xb8\x80\xe4\xba\x8c\xe4\xb8\x89"
For batch conversion from the command line, use piconv:
piconv -f Shift_JIS -t UTF-8 < infile > outfile
First, you need to find out the encoding of the source text if you don't know it already.
The most common encodings for Japanese are:
euc-jp: (often used on Unixes and some web pages etc with greater Kanji coverage than shift-jis)
shift-jis (Microsoft also added some extensions to shift-jis which is called cp932, which is often used on non-Unicode Windows programs)
iso-2022-jp is a distant third
A common encoding conversion library for many languages is iconv (see http://en.wikipedia.org/wiki/Iconv and http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which supports many other encodings as well as Japanese.
This question seems a bit vague to me, I'm not sure what you're asking. Usually you would use something like this:
open my $file, "<:encoding(cp-932)", "JapaneseFile.txt"
to open a file with Japanese characters. Then Perl will automatically convert it into its internal Unicode format.