I have a list of special unicode characters that I use frequently in one of my files.
To avoid typing (and learning) unicode numbers all the time I would like to just have a line with those characters at the top of my file (it's only 25 symbols) and save/yank them when I need them.
I cannot find the proper shortcut to save the character under the point though...
It's no different to copying anything else. Move point to the character you wish to copy, set the mark with C-SPC, move forward one character so that the region covers the character of interest, and save to the kill ring with M-w.
Or you could do something like this:
(defun my-copy-character-as-kill (pos)
"Copy the character at point (or POS) to the kill ring."
(interactive "d")
(if (eobp)
(error "End of buffer.")
(copy-region-as-kill pos (1+ pos))
(when (called-interactively-p 'interactive)
(let ((print-escape-newlines t))
(message "%S" (char-to-string (char-after pos)))))))
(global-set-key (kbd "C-c c") 'my-copy-character-as-kill)
Here is another way to go, especially if don't use a lot of such characters and don't want to fiddle with an input method.
Download library ucs-cmds.el and put it in your load-path (byte-compile it). Then put this in your init file (~/.emacs):
(require 'ucs-cmds)
(define-key global-map [remap insert-char] 'ucsc-insert)
Then use M-- C-x 8 RET and use completion to enter the Unicode name or code point of the character you want. That does two things: C-x 8 RET inserts the character you chose before the cursor. The M-- makes it also create a command with the same name as the character. You can then bind that command to a handy key sequence. For example:
M-- C-x 8 RET greek small letter lambda RET
That defines command greek-small-letter-lambda, which you can bind to some key sequence.
If you want to define such commands for several Unicode characters at once, you can instead just use macro ucsc-make-commands to do so. See the Commentary in file ucs-cmds.el. You provide a regexp to the macro. It is matched against all Unicode character names. An insertion command is created for each of the characters whose name matches.
Sample command creations:
(ucsc-make-commands "^math") ; Math symbols
(ucsc-make-commands "latin") ; Latin alphabet characters
(ucsc-make-commands "arabic")
(ucsc-make-commands "^cjk") ; Chinese, Japanese, Korean characters
(ucsc-make-commands "^box drawings ")
(ucsc-make-commands "^greek [a-z]+ letter") ; Greek characters
(ucsc-make-commands "\\(^hangul\\|^circled hangul\\|^parenthesized hangul\\)")
Related
Using emacs-24.
Some unicode names are quite long. Some characters have more than one name depending on the context. I would like to add some abbreviations/synonyms. How?
This approach is not so bad, but I have problems with shorter names that alias with longer ones, and it is non-standard, i.e. not consistent with the way other names are entered:
(global-set-key (kbd "C-x g all") "∀")
The approach of putting characters on keys has problems in Emacs, partly because the keymap is already overloaded:
(define-key key-translation-map (kbd "C-~") (kbd "¬"))
As a secondary question, I am curious as to why this confuses emacs (give it a try):
(global-set-key (kbd "C-x g neg") "¬")
What I would like is to hook the abbreviations into the current emacs method for entering unicode characters by name. (I've been using C-x 8 RET name RET - though wish there was a method to do this in fewer key strokes.)
You can easily define a command that inserts a given character (or that chooses from some small set of characters rather than from the entire universe of Unicode characters).
Library ucs-cmds.el can help with this. When you use C-x 8 RET with a negative prefix arg (e.g. C--), it not only inserts the char you choose but it creates a command to insert the char - the command name is the same as the char name. And you can quickly create such commands for whole ranges or other sets of characters (e.g. by matching a regexp). You can of course rename commands to whatever you like, including shorter versions.
But you already know how to bind a key to a keyboard macro that inserts a given character, as you have shown. If it helps to provide a named command for that then ucs-cmds.el can help.
You can also just do that yourself individually, using, for example:
(defun neg (&optional n)
"Insert \"¬\". With prefix arg N, insert N times."
(interactive "p")
(dotimes (ii n) (insert "¬")))
(global-set-key (kbd "C-x g neg") 'neg)
But you apparently are not very interested in dedicated commands that insert particular characters, and you want to be able to use C-x 8 RET but to type an abbreviation for a character name when it prompts you, instead of trying to match the real character name.
For that, Icicles can help. When you use C-x 8 RET you can match the character name or its code point (or the character itself - useful when the char is easy to type and you want to know its name or code point). You can match any combination of these at the same time.
Matching can be substring, regexp, pcompletion or any of several kinds of fuzzy matching, and you can change the matching behavior on the fly. So you can get the effect of the abbreviations you are asking for, provided you abbreviate in a way that corresponds to matching.
As for your question about (global-set-key (kbd "C-x g neg") "¬"): I think it is a bug. Consider reporting it: M-x report-emacs-bug. This is the error that it raises:
After 0 kbd macro iterations: user-error: No M-x tags-search or M-x tags-query-replace in progress
There are several modes around which provide simplified input for symbols needed by math and logic. For example agda2-mode. http://wiki.portal.chalmers.se/agda
OP:
What I would like is to hook the abbreviations into the current emacs method for entering unicode characters by name. (I've been using C-x 8 RET name RET - though wish there was a method to do this in fewer key strokes.)
What the OP is asking for is:
a) To use the emacs function 'insert-char' with its built-in shortcut 'C-x 8 RET', and
b) To use an alias for completion in the interactive minibuffer for 'insert-char' input.
The issue is that the minibuffer for 'insert-char' has its own TAB completion. If you want to insert the greek small letter epsilon (ε) using TAB completion, you have to input a minimum number of keystrokes like this: "greek" TAB "sm" TAB "l" TAB "ep". Even if you have an alias for epsilon in your 'init.el' configuration file like this: '("eps" "GREEK SMALL LETTER EPSILON")', the minibuffer will not automatically recognize it.
You can still use the alias for epsilon you have in your 'init.el' file using a second function 'expand-abbrev'. Using the method described in the OP, you can get an 'ε' by using "C-x 8 RET" (or "M-x insert-char"), entering your alias "eps", then call 'expand-abbrev' ("M-x expand abbrev") and return. This will expand your alias for the 'insert-char' function. (There is also a 'C-x' shortcut for 'M-x expand-abbrev'.)
Like the OP, I prefer this method over (or in addition to) automatic alias replacement. If you have something in your config file like this:
;; a quick way to insert unicode characters by code point or name
(global-set-key [f8] 'insert-char)
;; call 'expand-abbrev', especially in the 'insert-char' input minibuffer
(global-set-key [f9] 'expand-abbrev)
;; abbreviate unicode names
(define-abbrev-table 'global-abbrev-table '(
("ueps" "GREEK SMALL LETTER EPSILON")
("Ueps" "ε")
("ugsl" "GREEK SMALL LETTER ")
("uforall" "FOR ALL")
("Uforall" "∀")
))
;; see .emacs.d/abbrev_defs
;; M-x edit-abbrevs
;; turn on abbrev mode by default
(setq-default abbrev-mode t)
, you have two ways to type an epsilon. You can let emacs replace the alias "Ueps" automatically, or you can use seven keystrokes "[f8]ueps[f9]RET". (Actually there are four ways here.)
As the OP suggests, it is somewhat impractical to have aliases (or key-bindings) for every single special character. That is why it makes sense to use 'insert-char' with 'expand-abbrev'. If you want to insert a less commonly used greek letter like omicron 'ο', for instance, you do not need a special alias; you can expand an alias like 'ugsl' to "GREEK SMALL LETTER ", and enter "omicron" (or "omi" + TAB).
Emacs auto-capitalize-mode misinterprets the words i.e. and e.g. to signify the end of a sentence, and, accordingly, erroneously capitalizes any word that follows them.
Does anyone have a function that can be called by entering, say, eg or ie, that will insert the characters e.g. and i.e. and then automatically lowercase whatever word gets typed next?
Bonus: Do the same thing... for ellipses.
Add this to your .emacs:
(setq auto-capitalize-predicate
(lambda () (not (looking-back
"\\([Ee]\\.g\\|[Ii]\\.e\\)\\.[^.]*" (- (point) 20)))))
Remember that the I in i.e. will be capitalized to I.e if your
auto-capitalize-words variable is set to contain “I”.
(setq auto-capitalize-words '()) This sets it to nothing.
Here’s a version that also deals with ellipses:
(setq auto-capitalize-predicate
(lambda () (not (looking-back
"\\([Ee]\\.g\\|[Ii]\\.e\\|\\.\\.\\)\\.[^.]*" (- (point) 20)))))
But you might want to look into some abbrev magic that turns three periods into a unicode ellipsis instead. It's up to you.
From auto-capitalize.el:
;; To prevent a word in the `auto-capitalize-words' list from being
;; capitalized or upcased in a particular context (e.g.
;; "GNU.emacs.sources"), insert the following whitespace or
;; punctuation character with `M-x quoted-insert' (e.g. `gnu C-q .').
I use it and it is a comfortable approach.
I'm working in Ubuntu, but since the standard way of inserting unicode characters (Ctrl+Shift+U and, after that, the unicode code) doesn't work inside emacs, I've, in my .emacs, some keystrokes for different unicode symbols which I use frequently, for example:
(global-set-key (kbd "C-c b") "☛")
and every symbol works fine, except the symbol §, which is replaced by a simple dash ("-") when I use the corresponding keystroke:
(global-set-key (kbd "C-c y") "§")
The question is, what does it make this symbol different for other symbols and how can I solve my problem?
global-set-key usually expects a function, so this should work:
(global-set-key (kbd "C-c y") (lambda () (interactive) (insert "§")))
But you're better off using the excellent insert-char function:
(global-set-key (kbd "<f2> u") 'insert-char)
It understands hex Unicode as well as text description (with completion and all).
Just press TAB to see the completions.
You can insert unicode chars in emacs by doing C-x8RET<unicode-hex>RET. So to insert § do C-x8RET00A7RET.
You can bind insert-char (the command run by C-x8RET) to a simpler key if you wish, or define a custom function and bind it to a key as follows
(global-set-key (kbd "C-c y") (lambda () (interactive) (insert "§")))
A general solution to the need to insert a number of Unicode chars quickly is to define a command for each, dedicated to inserting it, and bind that command to a simple key sequence.
That is, in effect, what #abo-abo and #Iqbal have offered you, in the form of:
(lambda () (interactive) (insert "§")))
If you have multiple such chars that you want to create such commands for, then
library ucs-cmds.el can help -- in two ways:
It provides a macro, ucsc-make-commands, that lets you, in one fell swoop, define commands for whole ranges of Unicode chars. For example:
(ucsc-make-commands "^cjk") defines an insert-char command for each Chinese, Japanese, and Korean Unicode character.
(ucsc-make-commands "^greek [a-z]+ letter") does the same for each Greek letter.
(ucsc-make-commands "arabic") does the same for each Arabic character.
It provides a replacement command (ucsc-insert) for C-x 8 RET, which acts exactly the same as the vanilla command (insert-char), except that if you give it a negative prefix argument then it not only inserts the char you choose but it also creates an insertion command for it (i.e., the kind of command that the macro creates in bulk).
The names of the commands created by macro ucsc-make-commands and command ucsc-insert are exactly the same as the Unicode names of the chars themselves, except that they are lowercase and have hyphens (-) in place of space chars.
looking for an equivalent cut and paste strategy that would replicate vim's 'cut til'. I'm sure this is googleable if I actually knew what it was called in vim, but heres what i'm looking for:
if i have a block of text like so:
foo bar (baz)
and I was at the beginning of the line and i wanted to cut until the first paren, in visual mode, I'd do:
ct (
I think there is probably a way to look back and i think you can pass more specific regular expressions. But anyway, looking for some emacs equivalents to doing this kind of text replacement. Thanks.
Here are three ways:
Just type M-dM-d to delete two words. This will leave the final space, so you'll have to delete it yourself and then add it back if you paste the two words back elsewhere.
M-z is zap-to-char, which deletes text from the cursor up to and including a character you specify. In this case you'd have to do something like M-2M-zSPC to zap up to and including the second space character.
Type C-SPC to set the mark, then go into incremental search with C-s, type a space to jump to the first space, then C-s to search forward for the next space, RET to terminate the search, and finally C-w to kill the text you selected.
Personally I'd generally go with #1.
as ataylor said zap-to-char is the way to go, The following modification to the zap-to-char is what exactly you want
(defun zap-up-to-char (arg char)
"Like standard zap-to-char, but stops just before the given character."
(interactive "p\ncZap up to char: ")
(kill-region (point)
(progn
(search-forward (char-to-string char) nil nil arg)
(forward-char (if (>= arg 0) -1 1))
(point))))
(define-key global-map [(meta ?z)] 'zap-up-to-char) ; Rebind M-z to our version
BTW don't forget that it has the ability to go backward with a negative prefix
That sounds like zap-to-char in emacs, bound to M-z by default. Note that zap-to-char will cut all the characters up to and including the one you've selected.
I have an UTF-8 file containing some Unicode characters like LEFT-TO-RIGHT OVERRIDE (U+202D) which I want to remove from the file. In Emacs, they are hidden (which should be the correct behavior?) by default. How do I make such "exotic" unicode characters visible (while not changing display of "regular" unicode characters like german umlauts)? And how do I replace them afterwards (with replace-string for example. C-X 8 Ret does not work for isearch/replace-string).
In Vim, its quite easy: These characters are displayed with their hex representation per default (is this a bug or missing feature?) and you can easily remove them with :%s/\%u202d//g for example. This should be possible with Emacs?
You can do M-x find-file-literally then you will see these characters.
Then you can remove them using usual string-replace
How about this:
Put the U+202d character you want to match at the top of the kill ring by typing M-:(kill-new "\u202d"). Then you can yank that string into the various searching commands, with either C-y (eg. query-replace) or M-y (eg. isearch-forward).
(Edited to add:)
You could also just call commands non-interactively, which doesn't present the same keyboard-input difficulties as the interactive calls. For example, type M-: and then:
(replace-string "\u202d" "")
This is somewhat similar to your Vim version. One difference is that it only performs replacements from the cursor position to the bottom of the file (or narrowed region), so you'd need to go to the top of the file (or narrowed region) prior to running the command to replace all matches.
I also have this issue, and this is particularly annoying for commits as it may be too late to fix the log message when one notices the mistake. So I've modified the function I use when I type C-x C-c to check whether there is a non-printable character, i.e. matching "[^\n[:print:]]", and if there is one, put the cursor over it, output a message, and do not kill the buffer. Then it is possible to manually remove the character, replace it by a printable one, or whatever, depending on the context.
The code to use for the detection (and positioning the cursor after the non-printable character) is:
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t))
Notes:
There is no need to save the current cursor position since here, either the buffer will be killed or the cursor will be put over the non-printable character on purpose.
You may want to slightly modify the regexp. For instance, the tab character is a non-printable character and I regard it as such, but you may also want to accept it.
About the [:print:] character class in the regexp, you are dependent on the C library. Some printable characters may be regarded as non-printable, like some recent emojis (but not everyone cares).
The re-search-forward return value will be regarded as true if and only if there is a non-printable character. This is exactly what we want.
Here's a snippet of what I use for Subversion commits (this is between more complex code in my .emacs).
(defvar my-svn-commit-frx "/svn-commit\\.\\([0-9]+\\.\\)?tmp\\'")
and
((and (buffer-file-name)
(string-match my-svn-commit-frx (buffer-file-name))
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t)))
(backward-char)
(message "The buffer contains a non-printable character."))
in a cond, i.e. I apply this rule only on filenames used for Subversion commits. The (backward-char) can be used or not, depending on whether you want the cursor to be over or just after the non-printable character.