How to display unicode char in kdb console - kdb

I am following the example here: https://code.kx.com/q/kb/unicode/, and it all works as expected, but it is still very inconveinent to work with unicode in a table:
q)select from t
sym name text ..
-----------------------------------------------------------------------------..
apples 蘋果 "\303\277\310\325\322\273\314O\271\373, \341t\311\372\337h\353x\..
bananas 香蕉 "\317\343\275\266\264\254\312\307\322\273\265\300\277\311\277\33..
oranges 橙 "\217\304\267\360\301_\300\357\337_\326\335\201\355\265\304\365r..
Is there a way to display the unicode char properly in kdb console? I know we could use symbol, but I believe symbol types requires different storage, and may not appropriate for free text, so would like to see a way for char types to work.
Or are there any other tool that could enable us to work with unicode smoothly? - I've tried qStudio, unicode seems like not supported at all, even for symbol types
Thanks.

Studio for kdb+ displays tables with unicode characters properly, so does any JetBrains' IDE with KDB+ Studio plugin installed (the plugin is based on Studio's code). Both are cross-platform.
If you are on Windows, try QInsightPad 2.1, it should handle Unicode properly too.

Related

Converting emoji from hex code to unicode

I want to use emojis in my iOS and Android app. I checked the list of emojis here and it lists out the hex code for the emojis. When I try to use the hex code such as U+1F600 directly, I don't see the emoji within the app. I found one other way of representing emoji which looks like \uD83D\uDE00. When using this notation, the emoji is seen within the app without any extra code. I think this is a Unicode string for the emoji. I think this is more of a general question that specific to emojis. How can I convert an emoji hex code to the Unicode string as shown above. I didn't find any list where the Unicode for the emojis is listed.
It seems that your question is really one of "how do I display a character, knowing its code point?"
This question turns out to be rather language-dependent! Modern languages have little trouble with this. In Swift, we do this:
$ swift
Welcome to Apple Swift version 3.0.2 (swiftlang-800.0.63 clang-800.0.42.1). Type :help for assistance.
1> "\u{1f600}"
$R0: String = "😀"
In JavaScript, it is the same:
$ node
> "\u{1f600}"
'😀'
In Java, you have to do a little more work. If you want to use the code point directly you can say:
new StringBuilder().appendCodePoint(0x1f600).toString();
The sequence "\uD83D\uDE00" also works in all three languages. This is because those "characters" are actually what Unicode calls surrogates and when they are combined together a certain way they stand for a single character. The details of how this all works can be found on the web in many places (look for UTF-16 encoding). The algorithm is there. In a nutshell you take the code point, subtract 10000 hex, and spread out the 20 bits of that difference like this: 110110xxxxxxxxxx110111xxxxxxxxxx.
But rather than worrying about this translation, you should use the code point directly if your language supports it well. You might also be able to copy-paste the emoji character into a good text editor (make sure the encoding is set to UTF-8). If you need to use the surrogates, your best best is to look up a Unicode chart that shows you something called the "UTF-16 encoding."
In Delphi XE #$1F600 is equivalent to #55357#56832 or D83D DE04 smile.
Within a program, I use it in the following way:
const smilepage : array [1..3] of WideString =(#$1F600,#$1F60A,#$2764);
JavaScript - two way
let hex = "😀".codePointAt(0).toString(16)
let emo = String.fromCodePoint("0x"+hex);
console.log(hex, emo);

wxTextCtrl OSX mutated vowel

i am using wxMac 2.8 in non-unicode build. I try to read a file with mutated vowels "ü" to a wxtextctrl. When i do, the data gets interpreted as current encoding, but it is a multibyte string. I narrowed the problem down to this:
text_ctrl->Clear();
text_ctrl->SetValue("üüüäääööößßß");
This is the result:
üüüäääööößßß
Note that the character count has doubled - printing the string in gdb displays "\303\274" and similar per original char. Typing "ü" or similar into the textctrl is no problem. I tried various wxMBConv methods but the result is always the same. Is there a way to solve this?
Best regards,
If you use anything but 7 bit ASCII, you must use Unicode build of wxWidgets. Just do yourself a favour and switch to it. If you have too much existing code that was written for "ANSI" build of wxWidgets 2.8 and earlier and doesn't compile with Unicode build, use wxWidgets 2.9 instead where it will compile -- and work as intended.
It sounds like your text editor (for program source code) is in a different encoding from the running program.
Suppose for example that your text entry control and the rest of your program are (correctly) using UTF-8. Now if your text editor is using some other encoding, then a string that looks fine on screen will actually contain garbage bytes.
Assuming you are in a position to help create a pure-UTF8 world, then you should:
1) Encode UTF-8 directly into the string literals using escapes, e.g. "\303" or "\xc3". That's annoying to do, but it means you just don't have to worry about you text editor (or the editor settings of other developers).
2) Then check that the program is using UTF-8 everywhere.

Using unicode / utf-8 in programmers editors

There are a lot of programmers editors that claim to support unicode / utf-8. I've tried a number of them (UltraEdit, jedit, emedit) but none of them tell you how to actually enter unicode characters into a file. Some of them tell you how to change the default file encoding to utf-8 or how to select a font that has good support for utf-8, but not how to enter utf-8 into a file using their editor.
The Go language (and some others) support utf-8 and I like the idea of using the actual utf-8 symbols for variables instead of variables with names like omega. I haven't found a programmers editor yet that actually allows you to do this, though.
The only editor / word processor that I've found that lets you how to enter unicode is Microsoft Word. Type the unicode and Alt+X and Word converts it. To get the Greek letter omega type "03c9" followed by Alt+X. UltraEdit will let you copy utf-8 from a web page into it, but their docs don't say how to actually enter utf-8 in a file, and their tech. support people don't know either.
This should be simple, but seems to be completely undocumented. Is there some key combination convention the lets you enter unicode into these editors that supposedly support unicode the way that Ctrl-F is widely used for search?
Thanks.
The standard programmer’s editor vim(1) supports limited Unicode input even if your operating system should be too broken to do so (are there any such, still?).
Just enter ^VuXXXX, where XXXX represents exactly four hex digits.
That will allow you to enter the ~6% of Unicode allocated to the Basic Multilingual Plane. The rest are forbidden to you.
This may be fixed in a newer release.
Otherwise, just use your mouse.
A few techniques I use if an editor is lacking:
Use the Windows charmap.exe utility to select characters and paste into a document.
Install an input method editor (IME) to write in a particular language.
Windows ALT keycodes.
Better to set your keyboard to generate Unicode characters across all Windows applications than to rely on a single application's custom input feature IMO.
Use the EnableHexNumpad feature and you can type any character in the Basic Multilingual Plane using Alt+numbad-plus,hexcode. (May not be of much use on a laptop without a numpad though.)
Or if there are particular characters you want to type a lot, find a keyboard layout that allows you to type them directly. For example eurokb might cover it, or you can make your own with MSKLC.
Old question, but you can type a lot of unicode in GNU Emacs or Vim
GNU Emacs: M-x set-input-method RET tex (or C-x RET C-\ tex) will let you type \omega to generate ω
Vim: Vim digraphs can generate unicode; C-k w * in insert mode gives you ω.
deceze hit the nail on the head. (S)he just didn't elaborate. bobince gave a bit more.
And I'm hazarding a guess that you're a developer or tester working on L14N or I18N. I'm also guessing you need to do more than just a few characters here or there, or you'd be satisfied with pasting from another app. So, I'll share some advice. (note: here, "you" refers to the next person to look here. I'm sure the original poster doesn't care anymore by now. :-))
If you're on Windows 10, install an appropriate keyboard driver that lets you input the characters you want into any application. I'm sure Linux has support for the same sort of thing.
E.g. I'm teaching myself Hindi (हिंदी), so I installed Windows' Hindi (Devanangari) support. I typed "Hindi", in Hindi using that support, then I switched back to US English to do the rest of this post. If all you need are accented characters from Western European languages, you can install the INTL English support and type directly in español or français or whatever.
Don't look at entering Unicode characters as entering some sort of special data amidst your English text. It's just someone else's language. Use their keyboard. Type their language.
I'm writing a flashcard app to help my learning. I'm using the Hindi keyboard support to type characters into Word, WordPad, Excel, and the Visual Studio editor. And that Hindi keyboard support works exactly the same way in all of those apps, as I'd expect it to work in just about any text editor that supports Unicode. And as you saw above, it also works in a simple text edit control in Chrome. No copy and paste. No remembering special codes. It's as ubiquitous as ctrl-F.
It looks like the unicode support in programmers editors (except for some Microsoft products) is mostly read-only. They can open a file with unicode and display the characters, but typing unicode into a file is a different story. If you want to enter unicode in a programmers editor you can copy it from somewhere else (a web page or Microsoft Word or Notepad) and paste it into the editor, but the editors make typing unicode difficult or impossible.
UltraEdit tech support referred me to this web page which explains a lot. Unfortunately none of the solutions worked with UltraEdit.
Microsoft Word and Notepad support unicode entry. Type the unicode value followed by Alt+X and it converts the hexadecimal and displays it. You can then copy and paste it into UltraEdit or one of the other programmers editors. As others have mentioned unicode support depends on support within the operating system as well as the editor.
What got me interested in using unicode in source code files is Mark Summerfield's book Programming in Go. He includes an example .go file that uses unicode. It would be great to use unicode Greek characters for variable names instead of variables named "omega" or "theta".
Using unicode in source code is a bad idea, however. Support for unicode in programmers editors is lousy, and developers would have to save or convert their source code files to utf-8 instead of ASCII. Developer's tools are just not ready to write code in unicode no matter how neat the idea sounds.

Plone 4.0.5 and Unicode confusion

At first, Im using FreeBSD 8.1, Plone 4.0.5 and testing both Data.fs and RelStorage 1.5.0b2 (Postgresql 9.0.3). Im from Denmark and we use danish letters ("æøå").
Im confused about encoding, but my initial guess is that the best way to go is with Unicode (utf-8). What is the correct way to configure FreeBSD, Plone (and products) and PostgreSQL to comply with Danish letters. Ive already been told that the encoding does not matter for PostgreSQL.
Ive been seeing comments about site.py and sitecustomize.py around when googling for errors - please comment.
Thanks.
Nikolaj G.
Plone and all its add-ons support Unicode by default, you don't need to configure the encoding at any level.
Even when using RelStorage, we only store binary data inside the SQL database and no strings, so there's no de/encoding taking place at this level.
Changing the Python default encoding in site.py or sitecustomize.py is actually harmful and you should not do this. It will only mask actual programming errors inside the code base and can lead to inconsistent data.
Inside the codebase we do use a mixture of both Unicode and utf-8 encoded strings. So generally your code will have to be written in a way to handle both of these. This is unfortunate but a side-effect of us slowly migrating to proper Unicode at all levels.

Toad unicode input problem

In toad, I can see unicode characters that are coming from oracle db. But when I click one of the fields in the data grid into the edit mode, the unicode characters are converted to meaningless symbols, but this is not the big issue.
While editing this field, the unicode characters are displayed correctly as I type. But as soon as I press enter and exit edit mode, they are converted to the nearest (most similar) non-unicode character. So I cannot type unicode characters on data grids. Copy & pasting one of the unicode characters also does not work.
How can I solve this?
Edit: I am using toad 9.0.0.160.
We never found a solution for the same problems with toad. In the end most people used Enterprise Manager to get around the issues. Sorry I couldn't be more help.
Quest officially states, that they currently do not fully support Unicode, but they promise a full Unicode version of Toad in 2009: http://www.quest.com/public-sector/UTF8-for-Toad-for-Oracle.aspx
An excerpt from know issues with Toad 9.6:
Toad's data layer does not support UTF8 / Unicode data. Most non-ASCII characters will display as question marks in the data grid and should not produce any conversion errors except in Toad Reports. Toad Reports will produce errors and will not run on UTF8 / Unicode databases. It is therefore not advisable to edit non-ASCII Unicode data in Toad's data grids. Also, some users are still receiving "ORA-01026: multiple buffers of size > 4000 in the bind list" messages, which also seem to be related to Unicode data.