Emacs transliteration? - emacs

Is there some way to get transliteration to work in Emacs, sort of like it does in Gmail now? I am particularly interested in getting it to work in Cyrillic.
For reference, Gmail does something like the following:
I can type svoboda and it will output свобода. This allows somebody like me who speaks Russian but cannot type to easily input Cyrillic characters. In Gmail this works with other languages, but I am only really worried about Cyrillic.

I think this will do it:
M-x set-input-method cyrillic-translit

Related

Fast unicode input in Emacs with US layout

I would like to have a quick way to input unicode characters with multicharacter sequences. For example to input ä I would type \a. Searching for this, I found agda-input.
While I could adapt the agda-input for my use, I don't really need the whole emacs mode for my purpose. So I was wondering if such thing already exists.
It is probably also not that difficult to code such input mode. I would appriciate if someone suggested on how to do that.
As #legoscia mentioned, you can use the TeX input method for such things, which is probably more general than agda-input (which seems to be specific for a programming language) and is also built in.
(setq default-input-method "TeX")
Then switch to the input method with C-\ or M-x toggle-input-method. You can then type "ä" with \"a. The minibuffer has hints when you type \.
There are other input methods (M-x list-input-methods), but TeX is a good one if you're not concerned with a specific language, or if you know LaTeX.

Using unicode / utf-8 in programmers editors

There are a lot of programmers editors that claim to support unicode / utf-8. I've tried a number of them (UltraEdit, jedit, emedit) but none of them tell you how to actually enter unicode characters into a file. Some of them tell you how to change the default file encoding to utf-8 or how to select a font that has good support for utf-8, but not how to enter utf-8 into a file using their editor.
The Go language (and some others) support utf-8 and I like the idea of using the actual utf-8 symbols for variables instead of variables with names like omega. I haven't found a programmers editor yet that actually allows you to do this, though.
The only editor / word processor that I've found that lets you how to enter unicode is Microsoft Word. Type the unicode and Alt+X and Word converts it. To get the Greek letter omega type "03c9" followed by Alt+X. UltraEdit will let you copy utf-8 from a web page into it, but their docs don't say how to actually enter utf-8 in a file, and their tech. support people don't know either.
This should be simple, but seems to be completely undocumented. Is there some key combination convention the lets you enter unicode into these editors that supposedly support unicode the way that Ctrl-F is widely used for search?
Thanks.
The standard programmer’s editor vim(1) supports limited Unicode input even if your operating system should be too broken to do so (are there any such, still?).
Just enter ^VuXXXX, where XXXX represents exactly four hex digits.
That will allow you to enter the ~6% of Unicode allocated to the Basic Multilingual Plane. The rest are forbidden to you.
This may be fixed in a newer release.
Otherwise, just use your mouse.
A few techniques I use if an editor is lacking:
Use the Windows charmap.exe utility to select characters and paste into a document.
Install an input method editor (IME) to write in a particular language.
Windows ALT keycodes.
Better to set your keyboard to generate Unicode characters across all Windows applications than to rely on a single application's custom input feature IMO.
Use the EnableHexNumpad feature and you can type any character in the Basic Multilingual Plane using Alt+numbad-plus,hexcode. (May not be of much use on a laptop without a numpad though.)
Or if there are particular characters you want to type a lot, find a keyboard layout that allows you to type them directly. For example eurokb might cover it, or you can make your own with MSKLC.
Old question, but you can type a lot of unicode in GNU Emacs or Vim
GNU Emacs: M-x set-input-method RET tex (or C-x RET C-\ tex) will let you type \omega to generate ω
Vim: Vim digraphs can generate unicode; C-k w * in insert mode gives you ω.
deceze hit the nail on the head. (S)he just didn't elaborate. bobince gave a bit more.
And I'm hazarding a guess that you're a developer or tester working on L14N or I18N. I'm also guessing you need to do more than just a few characters here or there, or you'd be satisfied with pasting from another app. So, I'll share some advice. (note: here, "you" refers to the next person to look here. I'm sure the original poster doesn't care anymore by now. :-))
If you're on Windows 10, install an appropriate keyboard driver that lets you input the characters you want into any application. I'm sure Linux has support for the same sort of thing.
E.g. I'm teaching myself Hindi (हिंदी), so I installed Windows' Hindi (Devanangari) support. I typed "Hindi", in Hindi using that support, then I switched back to US English to do the rest of this post. If all you need are accented characters from Western European languages, you can install the INTL English support and type directly in español or français or whatever.
Don't look at entering Unicode characters as entering some sort of special data amidst your English text. It's just someone else's language. Use their keyboard. Type their language.
I'm writing a flashcard app to help my learning. I'm using the Hindi keyboard support to type characters into Word, WordPad, Excel, and the Visual Studio editor. And that Hindi keyboard support works exactly the same way in all of those apps, as I'd expect it to work in just about any text editor that supports Unicode. And as you saw above, it also works in a simple text edit control in Chrome. No copy and paste. No remembering special codes. It's as ubiquitous as ctrl-F.
It looks like the unicode support in programmers editors (except for some Microsoft products) is mostly read-only. They can open a file with unicode and display the characters, but typing unicode into a file is a different story. If you want to enter unicode in a programmers editor you can copy it from somewhere else (a web page or Microsoft Word or Notepad) and paste it into the editor, but the editors make typing unicode difficult or impossible.
UltraEdit tech support referred me to this web page which explains a lot. Unfortunately none of the solutions worked with UltraEdit.
Microsoft Word and Notepad support unicode entry. Type the unicode value followed by Alt+X and it converts the hexadecimal and displays it. You can then copy and paste it into UltraEdit or one of the other programmers editors. As others have mentioned unicode support depends on support within the operating system as well as the editor.
What got me interested in using unicode in source code files is Mark Summerfield's book Programming in Go. He includes an example .go file that uses unicode. It would be great to use unicode Greek characters for variable names instead of variables named "omega" or "theta".
Using unicode in source code is a bad idea, however. Support for unicode in programmers editors is lousy, and developers would have to save or convert their source code files to utf-8 instead of ASCII. Developer's tools are just not ready to write code in unicode no matter how neat the idea sounds.

How to do unicode(hexadecimal) in Facebook to output a character?

How do I write unicode, in hexadecimal, to Facebook "What's on your mind" box?
I have tried writing:
\u00B9
"\u00B9"
&#xb9
"&#xb9"
none worked so far
(let me add that I am doing this from a MAC)
thanks
I feel that facebook converts it all to human-readable string. I think that isn't possible
Try Alt+0185 on the keyboard if you are using Windows, it should have the desired output for you.
The examples in the lead are HTML entities. They would need to be ended with a semicolon, but no matter, they do not work in a Facebook comment. The answer supplied was not, the original poster said that he was using a Mac, not Windows, so the ALT- method does not apply. I know that I did once enter Greek characters, as I was talking about stars in constellations with Greek designations like α Vir, but I don't recall how to do that now and the UNICODE universe may have been usurped by Facebook's stupid emoticon universe.
On Mac OS X, if you do not wish to use the Character Viewer, you can set your preferences to allow input of Unicode characters by code like in Windows.
How to use the Character Viewer
Entering Unicode characters by code

Does development with scalaz require an Unicode/APL-like keyboard?

Can scalaz be used without a keyboard containing the appropriate Unicode characters or does every Unicode identifier also have an "ASCII" equivalent (and if yes, is there any guarantee that it stays that way)? Are there special keyboard layouts for usage with scalaz?
What's the best practice? Inputting the Unicode identifiers directly or using the ASCII substitutes and using a script to replace them with the Unicode ones before commit?
No, you don't need anything besides ASCII to use Scalaz.
However, most editors and IDEs have some way of automatically or semi-automatically (like, -space) converting a sequence of characters into something else. That takes care of it if you want to keep your source code in Unicode.
Now, the problem with keeping stuff in Unicode is that you might trouble with some fonts when displaying stuff in web pages, etc. Hell, you might even be forced to convert the code to ASCII for some reason. Yes, it is unlikely, but it is an issue you should be aware of.
This post from Superuser has some information about this.
This wikipedia article on Unicode input might be helpful.
No. Yes. Yes. No. Benign guarantees are for sissies. Write code. I use an appropriate development environment that allows me to type whatever I like.

How can I get the charset of a string/buffer?

I need an elisp function that guesses the charset of some html, and since Emacs already does that when opening a file, I wonder if I can reuse it somehow, perhaps by writing the string in a temporary buffer, setting the correct charset, and getting it. Are there such functions?
Thanks!
See detect-coding-string.
I don't think that Emacs has something built-in to guess a character encoding, but it can read character encoding hints in files like -- coding: utf8 -- and etc. You can take a look at this external library though. I guess that you're using some web browser for Emacs like W3M and probably it has something to deal with character encodings based on the http metainformation it receives. This article might also be of some help.