Is there a way I can add unicode text to a MBCS MFC menu - unicode

I have a MFC application compiled with the MBCS character set. I have a submenu off of my main menu that I would like to add unicode characters to. Can that be done?

You can force the use of Unicode strings even in MBCS apps by explicitely calling the Unicode form of an API and passing it a Unicode string.
In your case, ModifyMenuW() is the API that sets the menu item text (assuming the menu item already exists):
ModifyMenuW(GetMenu()->m_hMenu,ID_APP_ABOUT, MF_BYCOMMAND , 0, L"\u573F");
This code displays a Chinese ideogram (I have no idea of its meaning) instead of the original text
The L in front of the string says it's a Unicode string. \u573F is the way you encode a Unicode char in your C++ ASCII source file. The W at the end of the API name: It stands for Wide and denotes the Unicode form of the API.
Note that if your goal is to translate the full UI of your app, this is a complete other story: The method I showed here is only suitable for one-shot calls. You can't create a full UI that way.
You can translate your MBCS app to Japanese, Russian, whatever,... without switching to Unicode (Although it would be a very good idea to do that switch. But that can be costly for legacy apps).
You have 2 friends to help you out there: appTranslator lets you very easily translate your app (and manage your translations (Disclaimer: This is my own ad ;-) and Microsoft AppLocale helps you test MBCS apps in different codepages without actually changing the codepage of your computer (which requires a reboot).

Related

What is this character: 🔖 ? Where can I see the similar characters?

🔖
I am not sure whether everyone can see the above character, but I can see it. I got it when I input "booknote" in Chinese on my iPhone. To my surprise, this character seems "platform-insensative", it can be seen on my phones, chrome on laptop, and even in MacOS terminal.
Is it an ASCII character? I've never seen colorful characters like this before. Since when these have been around? And where I can get a list of similar characters?
Here: http://www.unicode.org/charts/nameslist/index.html
You put the character on an HTML page. All characters on an HTML page are from the Unicode character set. Characters that are not in the Unicode character set either soon will be or are too specialized to be of general use.
The Unicode Consortium occasionally publishes a new version of the character set. Since you ask about the kind of character, the common partitions of the character set are blocks, categories, and—stretching a bit—which version the character was added in. Some characters are in a script (for a language writing system), some are not. You see the block and category of 🔖 at http://www.fileformat.info/info/unicode/char/1f516/index.htm.
The Unicode character set is published in text files called the Unicode Character Database (UCD), as well as many supplementary documents and webpages. The data includes important information about usage and relationships. For example, for applicable characters, which character is considered the uppercase form of another in a particular language.
To see any character, you have to use a font that presents it. This can be a problem for some characters. There is probably no one font that presents every Unicode character as it was meant to be.
You mentioned ASCII. Although it used every day in HTTP headers and other specialized and historical applications, ASCII is such a limited character set that it hasn't generally been used in decades.

Unicode for Contextual forms of ټ,ګ,ځ,څ,ڼ,ښ,ډ,ۍ,ړ,ې in Pashto language

I am developing a program that give the correct format of text for example if I write سلام so it give FEB3, FEE0, FE8E and FEE2 witch are Unicode of سـ, ـلـ,ﺎ,ـم, then if I write ټول there is Unicode for character ټ which is 067C, but there is not Unicode for character ټـ which is Initial Contextual form.
So I found Unicode for isolated of ټ,ګ,ځ,څ,ڼ,ښ,ډ,ۍ,ړ,ې in the Wikipedia, but I can't find Unicode of Contextual forms.
For example Unicode of ټـ ,ـټـ,ـټ.
I am waiting for response if any one knows the solution of this problem.
thanks...
A Unicode character is intended to be abstract in the sense that it doesn't have a particular presentation form. The preferred way to display cursive scripts like Arabic is to store the standard, non-contextual forms, and convert them to their cursive forms at display time - that is, as one of the final stages of a text display system in an operating system or word processor.
The cursive forms are usually provided as glyphs in the font, and are chosen using information in tables in the font file embodying the contextual rules.
Unicode stores quite a large number of Arabic contextual forms, but only for compatibility with older encodings, and with traditional metal type, for which only a finite number of physical glyphs can be supplied. Unfortunately for your purposes, these contextual forms don't cover all the extended characters used in languages other than Arabic, such as the example you give, which is U+067C ARABIC LETTER TEH WITH RING, used in Pashto.
It's very unlikely that further contextual Arabic forms will be added, in my opinion. Therefore your proposed program cannot be made to work, at least according to its current design.
Earlier Unicode versions included separate codes for the different forms of Arabic letters for all letters except some. Arabic letters are used to write Pashto, Farsi, Urdu, and few other languages. The letters that were used in Arabic, Farsi, and may be a couple more languages were assigned different codes for each form of the their letters. However, the letters used only by less taught languages like Pashto, which you are asking about, were assigned codes for only the isolated forms. In the later versions of the Unicode, it was decided to only assign a single code to each letter, leaving Pashto only letters to have codes for only the isolated forms.
Actually there was no need to have a separate code for each form which was a bad decision made by the earlier Unicode versions. A rendering engine (editors, and other programs that deal with plain text) should account for the different forms of each letter and display the correct form according to its position.

Displaying Chinese characters on a form from an INI File

My plugin reads the control caption text from an INI file (ANSI as UTF-8 encoding) in order to display multiple languages. Key point being it is a plugin, I have no control nor ability to change this INI file format or file type.
They are currently being read into my plugin with TINIFile.ReadString and stored as a string. I can modify this (data type, read method, etc) as needed.
The main application reads from its own application language files that are UCS-2 Little Endian encoded as a TXT file. These display fine when the language is changed, even when the Windows OS is kept in English (in other words no OS locale changes need to be made for the application to switch display languages).
My plugin's form cannot display Asian characters (Chinese, Japanese, Korean, etc). English language is fine.
I have tried various fonts, using various combinations of AnsiString, String, etc. What am I missing to be able to display Asian characters on the form? I have not found a similar question to what I'm trying to do specifically with how my language text is being read into the plugin.
If the .INI file reader does not interpret the contents of the values, and allows all values through transparently, then you need to map the strings into one with the correct locale.
There is a similar question at Delphi 2010: how do I convert a UTF8-encoded PAnsiChar to a UnicodeString? that explains how to do the conversion. You may need to extract the contents into a RawByteString to avoid the implicit conversions.

How can I make support Unicode characters in whole my VB 6.0 application

I am facing a problem in my VB 6.0 application that Unicode characters are not supporting. I need to set Chinese characters in field of a recordset in my application-(size of each field is setting from program itself). If we are setting Chinese char into the field of recordset then getting Multiple-step operation error(because of the holding field size is not enough). This error will not fire, if we are setting language to Chinese from Regional settings from control panel in server (Control Panel > Region and Language setting > Administrative Tab > Change system Locale.. > to Chinese )
if we are setting this then time settings of our application will be change. I need some help with out changing from control panel how can we solve this problem.
please help.
Thanks in advance.
In Windows, you can set your regional settings to Chinese, while keeping the time and date format. http://www.techpavan.com/2009/04/07/change-time-format-windows/
For using Unicode in Visual Basic 6 applications, here is an article with thorough explanations and examples: http://www.example-code.com/vb/vbUnicode1.asp
Quoting this link:
Internally, VB6 stores strings as Unicode. Your VB6 program is capable of manipulating strings in any language containing any character -- whether it's Chinese, Japanese, Icelandic, Arabic, etc. It's fully Unicode capable. A single string may contain characters in multiple languages. You can save these strings to databases, files, etc., and there shouldn't be a problem. Problems arise only when trying to display (i.e. render the glyphs) for foreign characters in the standard VB6 controls.
When displaying a string, the standard VB6 textbox and label controls do an implicit (and internal) conversion from Unicode to ANSI. This is the confounding behavior that causes all the trouble. Internal to VB6, the runtime is converting Unicode to the current Windows ANSI code page identifier for the operating system. There is no way to change this conversion short of changing the ANSI code page for the system.
The standard VB6 textbox and label controls display the ANSI bytes according to a character encoding that you can specify. After the Unicode-to-ANSI conversion, VB6 then attempts to display the character data according to the control's Font.Charset property, which if left unchanged is equal to the ANSI charset. Changing the control's Font.Charset changes the way VB6 interprets the "ANSI" bytes. In other words, you're telling VB6 to treat the bytes as some other character encoding instead of "ANSI". Note: VB6 is capable of displaying characters in all the major languages. It simply needs to be told to do so, and the correct bytes need to be in place internally for it to happen.
Try setting the font on those controls to Lucida Sans Unicode to add Unicode Support in.

Using unicode / utf-8 in programmers editors

There are a lot of programmers editors that claim to support unicode / utf-8. I've tried a number of them (UltraEdit, jedit, emedit) but none of them tell you how to actually enter unicode characters into a file. Some of them tell you how to change the default file encoding to utf-8 or how to select a font that has good support for utf-8, but not how to enter utf-8 into a file using their editor.
The Go language (and some others) support utf-8 and I like the idea of using the actual utf-8 symbols for variables instead of variables with names like omega. I haven't found a programmers editor yet that actually allows you to do this, though.
The only editor / word processor that I've found that lets you how to enter unicode is Microsoft Word. Type the unicode and Alt+X and Word converts it. To get the Greek letter omega type "03c9" followed by Alt+X. UltraEdit will let you copy utf-8 from a web page into it, but their docs don't say how to actually enter utf-8 in a file, and their tech. support people don't know either.
This should be simple, but seems to be completely undocumented. Is there some key combination convention the lets you enter unicode into these editors that supposedly support unicode the way that Ctrl-F is widely used for search?
Thanks.
The standard programmer’s editor vim(1) supports limited Unicode input even if your operating system should be too broken to do so (are there any such, still?).
Just enter ^VuXXXX, where XXXX represents exactly four hex digits.
That will allow you to enter the ~6% of Unicode allocated to the Basic Multilingual Plane. The rest are forbidden to you.
This may be fixed in a newer release.
Otherwise, just use your mouse.
A few techniques I use if an editor is lacking:
Use the Windows charmap.exe utility to select characters and paste into a document.
Install an input method editor (IME) to write in a particular language.
Windows ALT keycodes.
Better to set your keyboard to generate Unicode characters across all Windows applications than to rely on a single application's custom input feature IMO.
Use the EnableHexNumpad feature and you can type any character in the Basic Multilingual Plane using Alt+numbad-plus,hexcode. (May not be of much use on a laptop without a numpad though.)
Or if there are particular characters you want to type a lot, find a keyboard layout that allows you to type them directly. For example eurokb might cover it, or you can make your own with MSKLC.
Old question, but you can type a lot of unicode in GNU Emacs or Vim
GNU Emacs: M-x set-input-method RET tex (or C-x RET C-\ tex) will let you type \omega to generate ω
Vim: Vim digraphs can generate unicode; C-k w * in insert mode gives you ω.
deceze hit the nail on the head. (S)he just didn't elaborate. bobince gave a bit more.
And I'm hazarding a guess that you're a developer or tester working on L14N or I18N. I'm also guessing you need to do more than just a few characters here or there, or you'd be satisfied with pasting from another app. So, I'll share some advice. (note: here, "you" refers to the next person to look here. I'm sure the original poster doesn't care anymore by now. :-))
If you're on Windows 10, install an appropriate keyboard driver that lets you input the characters you want into any application. I'm sure Linux has support for the same sort of thing.
E.g. I'm teaching myself Hindi (हिंदी), so I installed Windows' Hindi (Devanangari) support. I typed "Hindi", in Hindi using that support, then I switched back to US English to do the rest of this post. If all you need are accented characters from Western European languages, you can install the INTL English support and type directly in español or français or whatever.
Don't look at entering Unicode characters as entering some sort of special data amidst your English text. It's just someone else's language. Use their keyboard. Type their language.
I'm writing a flashcard app to help my learning. I'm using the Hindi keyboard support to type characters into Word, WordPad, Excel, and the Visual Studio editor. And that Hindi keyboard support works exactly the same way in all of those apps, as I'd expect it to work in just about any text editor that supports Unicode. And as you saw above, it also works in a simple text edit control in Chrome. No copy and paste. No remembering special codes. It's as ubiquitous as ctrl-F.
It looks like the unicode support in programmers editors (except for some Microsoft products) is mostly read-only. They can open a file with unicode and display the characters, but typing unicode into a file is a different story. If you want to enter unicode in a programmers editor you can copy it from somewhere else (a web page or Microsoft Word or Notepad) and paste it into the editor, but the editors make typing unicode difficult or impossible.
UltraEdit tech support referred me to this web page which explains a lot. Unfortunately none of the solutions worked with UltraEdit.
Microsoft Word and Notepad support unicode entry. Type the unicode value followed by Alt+X and it converts the hexadecimal and displays it. You can then copy and paste it into UltraEdit or one of the other programmers editors. As others have mentioned unicode support depends on support within the operating system as well as the editor.
What got me interested in using unicode in source code files is Mark Summerfield's book Programming in Go. He includes an example .go file that uses unicode. It would be great to use unicode Greek characters for variable names instead of variables named "omega" or "theta".
Using unicode in source code is a bad idea, however. Support for unicode in programmers editors is lousy, and developers would have to save or convert their source code files to utf-8 instead of ASCII. Developer's tools are just not ready to write code in unicode no matter how neat the idea sounds.