Displaying Unicode Characters with Arduino - unicode

I am currently using the Keyboard.h library on Arduino
I would like to display the following characters upon pressing a button on my breadboard : ♥ ♦ ♣ ♠
I don't know much about ASCII, Unicode and Hexadecimal so I'm having a hard time figuring this out
Does someone know how to do it ?
Thanks.

See my answer at
https://arduino.stackexchange.com/a/91365/70109
for how to convert from unicode to Octal for output
The GCC compiler used by Arduino also does not accept all unicode sequences such as \u0020
Using octal avoids this problem.
Serial.print("\342\204\211");
will output ℉ provided the receiver has font for that unicode.

Try Keyboard.print("\uUNICODE_VALUE");
Unicode values can be found at: http://www.unicode.org/charts/
If that don't work on linux you can hit Ctrl+Shift+u, type the unicode value, and press enter like this:
void typeUnicode(int val, int time){
Keyboard.press(KEY_LEFT_CTRL);
Keyboard.press(KEY_LEFT_SHIFT);
Keyboard.press('u');
delay(time);
Keyboard.releaseAll();
delay(time);
Keyboard.println(String(val, HEX));
delay(time);
}
On Windows you havel "ALT codes", and i'm not sure how they work since i'm a unix geek.

Related

Is it possible to use unicodes with hex values over 0xFF with Keil uVision 5?

I am trying to print ∆ to console. I tried printf("\u0394"); but got the following error:
../Src/main.c(322): warning: #3488-D: Unicode character with hex
value 394 not representable in the system default code page.
Am I missing an #include or #pragma require to use Unicode with uVision v5?
What is the system default code page?
Your code page could be anything since you haven't described the operating environment.
One thing code pages do is map the bytes 0-255 to specific Unicode code points. Since there are at most 1,114,112 Unicode code points, you'll only be able to print the 256 characters mapped to whatever your code page is. The Unicode characters don't have to be U+0000 to U+00FF for bytes 0-255 (unless the code page is ISO-8859-1 aka latin1, where that actually is the mapping). See, for example, code page 1252.
Keil's compiler is for embedded systems, and as such the notion of a "console" is a bit limited. You need to figure out how your console really works. There are some display modules that simply have a hardcoded ASCII character set in ROM; they're not going to display ∆ no matter what you do.

Unicode vector over a character string

I'm using Python 3.5, PyQT5 and I need to print a character with a vector above it.
I know I have to use a Unicode codepoint, and I tried the following instruction :
myLabel = QLabel(b"\U+20D6".encode('utf-16','ignore')
Nothing worked. It does not work with any type of encoding (utf-8, utf-16, ecc.).
My goal is to put an arrow above a character, according to the tutorial found on the web I have to use unicode b"\U+20D6" codepoint.
Do you know right way to do this?
Thanks in advance.

What's the ASCII character code for '—'?

I am working on decoding text. I am trying to find the character code for the — character, not to be mistaken for -, in ASCII. I have tried unsuccessfully. Does anybody know how to convert it?
Quotation from wiki (Em dash)
When an actual em dash is unavailable—as in the ASCII character set—a double ("--") or triple hyphen-minus ("---") is used. In Unicode, the em dash is U+2014 (decimal 8212).
Em dash character is not a part of ASCII character set.
— is known as an Em Dash. It's character code is \u2014. It is not an ASCII character, so you cannot decode it with the ASCII character set because it is not in the ASCII character table. You would probably want to use UTF8 instead.
Windows
For Windows on a keyboard with a Numeric keypad:
Use Alt+0150 (en dash), Alt+0151 (em dash), or Alt+8722 (minus sign) using the numeric keypad.
This character does not exist in ASCII, but only in Unicode, usually encoded by UTF-8.
In UTF-8, characters are encoded by 2- or 3-byte sequences (or occasionally longer), where none of the two or three bytes is a valid ASCII code, where all of them are outside the ASCII range of 0 through 127.
One suspects that the foregoing only partly answers your question, but if so then this is probably because your question is, inadvertently, only partly asked. For further details, you can extend your question with more specifics.
The character — is not part of the ASCII set.
But if you are looking to convert it to some other format (like U+hex), you can use this online tool. Put your character into the first green box and click "Convert" (above the box)
further below you'll find a number of different codes, including U+hex:
U+2014
Feel free to edit this answer if the link breaks or leave a comment so I can find a replacement.
Alt + 0151 seems to do the trick—perhaps it doesn't work on all keyboards.
alt-196 - while holding down the 'Alt' key, type 196 on the numeric keypad, then release the 'Alt' key

Scintilla Supports Unicode? What about SCI_GETCHARAT?

Does Scintilla really support Unicode? If so, why does SCI_GETCHARAT return a char value (casted to LRESULT)?
From the SCI_SETCODEPAGE docs...
Code page SC_CP_UTF8 (65001) sets Scintilla into Unicode mode with the document treated as a sequence of characters expressed in UTF-8. The text is converted to the platform's normal Unicode encoding before being drawn by the OS and thus can display Hebrew, Arabic, Cyrillic, and Han characters.
You will have to examine the byte you retrieve with SCI_GETCHARAT(pos) and, depending on the top bits of that, maybe read SCI_GETCHARAT(pos+1) and beyond in order to get the Unicode code point. (See here.)
Edit:
For some C++ code that does this, see below (search for SciMoz::GetWCharAt):
http://vacuproj.googlecode.com/svn/trunk/npscimoz/npscimoz/oldsrc/trunk.nsSciMoz.cxx
I was long time ago but if I remember well Scintilla is not a native Unicode application. Still it has some Unicode support.
First, the function name should SCI_GETBYTEAT, because it returns a byte from UTF-8 internal buffer.
Also, the application has Unicode support for keybaord, so it has some Unicode support :)

Displaying Unicode characters above U+FFFF on Windows

the application I'm developing with EVC++ 4 runs on Windows CE 5 and should support unicode (AFAIK wchar_t uses UTF-16 on windows, so I'm using that), so I want to be able to test it with "more exotic" characters. Especially with characters that use 4 Byte in UTF-16 and not just 2. Therefore I'm trying to display such characters in a texteditor (atm on my desktop PC with Windows XP, not on the embedded device).
But I haven't managed it to do so yet. As an example I've chosen this character.
Like mentioned here "MPH 2B Damase" should support this character. So I downloaded the font and put it into Windows\Fonts. I created a textfile using a hexeditor (just to be sure) with following content:
FFFE D802 DC00
When I open it with notepad (which should be unicode-capable, right?) and use the downloaded font it doesn't display 1 char, as intended, but this 2:
˘Ü
What am I doing wrong? :)
Thanks!
hrniels
Edit:
Flipping the BOM, as suggested, doesn't work. Notepad (and all other editors I tried, too) displays two squares in this case. Interesting is that if I copy the two squares here (with firefox) I see the right character:
I've also tried it with Komodo Edit with the same result.
Using UTF-8 doesn't help notepad either.
What happens if you put the byte order mark the other way around?
FEFF D802 DC00
(At the moment the byte sequence is being interpreted as the two characters U+02D8 U+00DC, so hopefully flipping the BOM will cause the bytes to be read in the intended order)
Probably you forgot to read the _wfopen() documentation. There they specify the encoding parameter. BTW, I assumed you are already using Unicode (wchars).
I would recommend you to use UTF-8 in files with or without BOM but forcing your fopen to use UTF-8 flag. It looks _wfopen("newfile.txt", "r, ccs=UTF-8"); will work with UTF-8 with or without BOM and also with UTF-16. Do not make the mistake of using the ccs=Unicode, it is a common thing to have UTF-8 files without BOM.
You should really read a little bit about Unicode before trying to work. This about this as a very good investment - it will save you time if you understand how Unicode works.
Here is a start http://blog.i18n.ro/newbie-guide-to-unicode/ and do not forget to read the links from the end of the article.
If you really need a simple text editor that allows you to play with Unicode encodings, use Notepad++ and forget about Notepad.
Your text editor might not like UTF-16. It probably assumes ANSI or UTF-8.
Try typing in the UTF-8 equivalent instead:
0xF0 0x90 0xA0 0x80
This won't help your testing, but will make sure your font isn't at fault. A text editor that does support UTF-16 is Komodo Edit.