I used Netbeans IDE to compile and run the below program.
public class Unicode {
public static void main(String[] args) {
char a=3476;
System.out.println(a);
}
}
But the output was a box. When I ran the program on the console, it printed a question mark. How can I fix the issue?
You can't display a Unicode character on the Windows console directly from Java because it always writes to the console using the application code page (ANSI). However you could use JNA APIs to write unicode characters to the console directly. You would still need to install a monospace font that includes a glyph for the character that you're trying to display.
Related
I would like some way of doing essentially the following:
if supports_unicode {
print!("some unicode");
} else {
print!("ascii");
}
Is there any way in rust to check if the output supports unicode?
Update
I found a way to check if the device supports unicode, but it doesn't check if the current output is set to the correct encoding, nor does it check if the font supports the full range of unicode characters. If you're curious, it uses the crate locale-codes 0.3.0, and the code is
locale_codes::codeset::all_names().contains(&String::from("UTF-8"))
But, as I said, this doesn't solve my problem
Also, if you want, here is a more specific example of the problem I've been having. In the VSCode intergrated terminal (Windows 10 x64, VSCode 1.47), if I run a rust program that prints the character 𝑥 (U+1D465), I get a variety of results, such as:
It actually printing the correct character
It prints �
It prints nothing at all
It prints 𝐵 (U+1D435)
I hope this example helps.
I'm just trying to pick up D having come from C++. I'm sure it's something very basic, but I can't find any documentation to help me. I'm trying to print the character à, which is U+00E0. I am trying to assign this character to a variable and then use write() to output it to the console.
I'm told by this website that U+00E0 is encoded as 0xC3 0xA0 in UTF-8, 0x00E0 in UTF-16 and 0x000000E0 in UTF-32.
Note that for everything I've tried, I've tried replacing string with char[] and wstring with wchar[]. I've also tried with and without the w or d suffixes after wide strings.
These methods return the compiler error, "Invalid trailing code unit":
string str = "à";
wstring str = "à"w;
dstring str = "à"d;
These methods print a totally different character (Ò U+00D2):
string str = "\xE0";
string str = hexString!"E0";
And all these methods print what looks like ˧á (note á ≠ à!), which is UTF-16 0x2E7 0x00E1:
string str = "\xC3\xA0";
wstring str = "\u00E0"w;
dstring str = "\U000000E0"d;
Any ideas?
I confirmed it works on my Windows box, so gonna type this up as an answer now.
In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.
Or, you can replace the characters in your source code with \uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "\u00E0" is good, and will work for any type of string (not just wstring like in your example).
Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:
import core.sys.windows.windows;
import std.stdio;
void main() {
SetConsoleOutputCP(65001);
writeln("Hi \u00E0");
}
printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.
BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://learn.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.
The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!
Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.
D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.
E.g.
import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}
Will output as UTF-8 the character you seek.
Which you can then hard code like so:
import std.stdio;
void main() {
string str = "à";
writeln(str);
}
I'm trying to display Unicode Bengali, a native language of India through a MFC application as below:
CFont *m_pFontSmallBN = new CFont();
m_pFontSmallBN->CreateFont(34,0,0,0,600,0,0,0,ANSI_CHARSET,OUT_DEFAULT_PRECIS,
CLIP_DEFAULT_PRECIS,DEFAULT_QUALITY,DEFAULT_PITCH|FF_DONTCARE,
_T("Ekushey Lalsalu")); //"Ekushey Lalsalu" is the Bengali Font name here.
CStatic m_msg_bn;
m_msg_bn.SetFont(m_pFontSmallBN,TRUE);
m_msg_bn.SetWindowText(_T("TEXT IN NATIVE LANGUAGE")); //TEXT is typed with the Font
While I'm running the app in Windows vista it can display the text perfectly; but in Windows XP it cannot display unicode characters properly. Compound alphabets (framed with multiple unicode characters) of the bengali language are being displayed as separate characters. I ensured that both Windows Vista and XP have the Font installed and character set of my MFC project setting is Unicode.
Could anybody please help me to find out the issue in Windows XP environment ?
Choosing a font in Windows is tricky. You'd expect the font name to take precedence over all other font characteristics, but that's not always the case. To be sure you're getting the proper font you should make sure all the parameters to CreateFont match the font you want. This article, though old, details the font mapping process: Windows Font Mapping.
Here's a small program that puts up a font selection dialog and dumps the parameters that you can pass to CreateFont to guarantee that you're getting the font you want.
#include <Windows.h>
#include <stdio.h>
int wmain(int argc, wchar_t* argv[])
{
LOGFONT lf = {};
CHOOSEFONT cf = {sizeof(CHOOSEFONT)};
cf.lpLogFont = &lf;
cf.Flags = CF_BOTH | CF_FORCEFONTEXIST;
if (ChooseFont(&cf))
{
wprintf(L"%d,%d,%d,%d,%d,", lf.lfHeight, lf.lfWidth, lf.lfEscapement, lf.lfOrientation, lf.lfWeight);
wprintf(L"%d,%d,%d,%d,%d,", lf.lfItalic, lf.lfUnderline, lf.lfStrikeOut, lf.lfCharSet, lf.lfOutPrecision);
wprintf(L"%d,%d,%d,", lf.lfClipPrecision, lf.lfQuality, lf.lfPitchAndFamily);
wprintf(L"_T(\"%s\")\n", lf.lfFaceName);
}
return 0;
}
#Mark I could not add my comment using "add a comment" link; therefore, I'm adding it here. Even in XP environment the program displays same values for the Font properties. Another thing is that using notepad of the same system, I see the same improper display. It can display bengali font but display is improper for compound alphabet (consonant conjunct or consonant attached with a diacritic form of a vowel) of bengali language. This is probably due to XP doesn't have in-built support for complex text for native scripts like bengali by default. Windows from Vista and onward have this complex text support enabled by default; therefore just installing a native unicode font enables us to view native script properly.
I would like to concatenate the Rupee Symbol Unicode '\u20B9' to a String in java, but I get the following Error, I am using jre7 it has been told in java docs that java7 supports unicode6.0 where this rupee Symbol is added in that version., I have attached my code ant its output below.
public class no {
public static void main(String[] args) {
String rupee = "\u20B9";
JOptionPane.showMessageDialog(null,"Total Amount"+rupee);
}
}
This is not a problem of a string concatenation. It's a problem of the display font. It just doesn't support the character. If I try it on my machine where the standard display fonts have full unicode support, this is the result:
You should try to use a font that has the support, rather than the standard font.
You need a font capable of displaying a glyph for that codepoint. Since the Rupee symbol is relatively new that might be hard. There is no problem with your code here, since you see a square which just means that the font doesn't have a glyph for that character and no suitable other font could be found (assuming that Java does font substitution, I'm not terribly sure of that).
How do I let my Eclipse use \uXXXX symbols?
Should I change the font?
Eclipse will never use \u escapes for display in the console window. That's just not in its repertoire.
However, that's probably not what you want.
If you have coded some Java with a \u escape in the source, your first problem is to configure the run / debug configuration to use an appropriate encoding for the console window. UTF-8 is usually the right answer. Then, you need to select an appropriate font in the eclipse preferences for the particular character you've chosen. However, whatever you do, "\uxxxx" will never be what comes out. What you will get is the character specified by your unicode escape.
If you're just trying to see unicode output in the console, make sure the font you're using supports unicode and that the output encoding is set to UTF-8.
When running this in my pretty vanilla install of Eclipse:
System.out.println("\u0CA0_\u0CA0");
I get this as expected in the Eclipse console output:
ಠ_ಠ