I am writing a function which returns linguistic information about the character at point. This is easy for pre-composed characters. However, I wish to account for diacritics. I believe these are referred to as "marks" or "combining characters" in Unicode (cf. plane U+0300 - U+036F).
For example, to place the centralization diacritic (U+0306) on the character e:
e C-x 8 <RET> 0306 <RET>
Run C-u C-x = on the resulting character and you will see something like "Composed with the following character(s) ̆ "
Functions such as following-char unfortunately only return the base character, i.e. "e", and ignore any combining diacritics. Is there any way to get these?
EDIT: slitvinov pointed out that the resulting glyph consists of two characters. If you place point before the glyph created by the above code, and execute (point) before and after running forward-char, you will see point increase by 2. I figured I could hack a solution through this behaviour, but it appears that inside a progn statement (or function definition), forward-char only moves point forward by one... try it in a defun or with (progn (forward-char) (point)). Why might this be?
I think diacritic e is treated as two characters. I put this combination in the file
e(diacritic e)e.
ĕee
(char-after 1)
(char-after 2)
(char-after 3)
(char-after 4)
It gives me.
101 101 774 101
And 774 is a decimal form of 0306.
Related
I am following the emacs lisp tutorial and have just successfully added two numbers:
(+ 111 234)
I enter this in a random buffer (Markdown mode, now, if that matters, but the same happens in *scratch*), and evaluate it with C-x C-e.
However, the bottom line on Emacs does not simply return 345, but it outputs this line:
345 (#o531 #x159 ?r)
When I submit (+ 2 3), the output is 5 (#o5 #x5 ?\C-e).
What is this extra output? It's not mentioned in the tutorial.
This is the same but in octal (#o...) and hexadecimal (#x...), as well as character syntax.
In Emacs Lisp, non-negative integers and characters are the same type:
(integerp ?d)
==> t
(characterp 123)
==> t
Thus you see ?\C-e for 5 because Ctrl-e has the ASCII code 5.
Your ?r is probably a non-ASCII r which has character code 345 in
your locale.
This is documented in Evaluating Emacs Lisp Expressions.
I have problem making (move-to-column pos t) work correctly when the cursor is at newline character and I have turned off indent-tabs-mode, that is: (setq-default indent-tabs-mode nil).
In that case, if for instance the point is at column 0, and there is a newline character at that point ( it is helpful to use (whitespace-mode) to see the newline characters ) and I issue the command (move-to-column 10 t) the point does not move to column 10. Instead the point moves to column 9.
Update
To give an illustration of the problem, consider first the following Emacs buffer
(the colors are due to (whitespace-mode)). The cursor is position at column 0 at the second line of the buffer. There is a newline character just in front of the cursor. I now issue the command (move-to-column 10 t) and I get the follwing screen
Note that the cursor is positioned at column 9 (not at column 10, as it should). If the cursor is not positioned at a newline character and there are no newline characters at the next 10 buffer positions move-to-column works as expected. For instance, consider the following case
Notice that there now is no newline characters at the point (which is at the beginning of the third line in the buffer) and there is no newline characters in the following 10 buffer positions.. If I now issue (move-to-column 10 t) I get
and we see that the point has moved to column 10 as it should..
At least a partial answer for now:
I am using emacs 24.3.1 under Ubuntu 13.04.
There, the effect is only reproducible with whitespace-mode. Thereby, indent-tabs-mode and also the buffer encoding as dos or unix do not really matter.
whitespace-mode fiddles with the buffer-display-table.
With a modified buffer-display-table one easily gets unexpected results for move-to-column.
The effect of move-to-column with whitespace-mode can already be reproduced without whitespace-mode if one executes the following code (use, e.g., M-:):
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\n [?$ ?\n])
You can revert this effect by:
(aset buffer-display-table ?\n nil)
You get a similarly unexpected effect of move-to-column if you change the number of displayed characters for any other text character. E.g:
With
(aset buffer-display-table ?\§ [?\§ ?\$])
and the buffer content
§
you get the display
§$
If you call (move-to-column 1 t) point moves to the end of this displayed string even if this makes two displayed characters.
You can revert this setting by:
(aset buffer-display-table ?\§ nil)
A further rather interesting setting is:
(aset buffer-display-table ?\n [?1 ?2 ?3 ?\n])
With this setting the newline character is three displayed characters long (exclusively the line break).
One linebreak is shown as:
123
If the current point is at the beginning of that line the command (move-to-column 3 t) does not move point but returns 3.
Note, that this behaviour is consistent with the case of the normal setting
(aset buffer-display-table ?\n nil)
If there are two consecutive linebreaks and point is positioned in between then (move-to-column 0 t) does place point before the linebreak even if there is no character on column 0.
Maybe, this is connected to the interpretation of point positions as being between characters. For an empty buffer one has (point) == (point-min) == (point-max). This interpretation also gives (point-max) == (1+ (buffer-size)) its meaning.
I cite here the description of following-char in the info-node (elisp) Near Point:
"Remember that point is always between characters, and the cursor
normally appears over the character following point. Therefore,
the character returned by `following-char' is the character the
cursor is over."
Point positions between characters and move-to-column
(Note, the following is just my interpretation. It would be nice if someone who really knows the intentions in the features of move-to-column could acknowledge, deny, or correct this stuff.)
The following discussion illustrates the consequences of point positions between characters for move-to-column.
We denote the point positions by pos0, pos1,... and the characters with char1, char2,....
We use the denotation char1a, char1b, ... if the entry of a character char1 in buffer-display-table is a vector [char1a char1b ...] of length > 1. In the following we name such an character as compound character.
Normal case (no compound characters):
pos0 char1 pos1 char2 pos2 nl
(move-to-column 2 t) means to position the point before the nlchar.
Case with a compound character char1 = [char1a char1b] in buffer-display-table:
pos0 char1a pos1 char1b pos2 nl
move-to-column respects the display size of the compound character but it cannot put point in the middle of it.
Point can only be placed at the boundaries of the compound character.
In this case (move-to-column 1 t) moves point to position pos2.
Now, let the new-line character be a composed character nl = [nla nlb].
pos0 char1 pos1 char2 pos2 nla pos3 nlb
Here, (move-to-column 3 t) arrives in the middle of the composed newline character.
Point is still on this line. So it does not make sense to put point behind nlb. Emacs cannot place point at pos3 since this is in the middle of a composed character. Thus, the only sensible way to position point is pos2.
I have a list of special unicode characters that I use frequently in one of my files.
To avoid typing (and learning) unicode numbers all the time I would like to just have a line with those characters at the top of my file (it's only 25 symbols) and save/yank them when I need them.
I cannot find the proper shortcut to save the character under the point though...
It's no different to copying anything else. Move point to the character you wish to copy, set the mark with C-SPC, move forward one character so that the region covers the character of interest, and save to the kill ring with M-w.
Or you could do something like this:
(defun my-copy-character-as-kill (pos)
"Copy the character at point (or POS) to the kill ring."
(interactive "d")
(if (eobp)
(error "End of buffer.")
(copy-region-as-kill pos (1+ pos))
(when (called-interactively-p 'interactive)
(let ((print-escape-newlines t))
(message "%S" (char-to-string (char-after pos)))))))
(global-set-key (kbd "C-c c") 'my-copy-character-as-kill)
Here is another way to go, especially if don't use a lot of such characters and don't want to fiddle with an input method.
Download library ucs-cmds.el and put it in your load-path (byte-compile it). Then put this in your init file (~/.emacs):
(require 'ucs-cmds)
(define-key global-map [remap insert-char] 'ucsc-insert)
Then use M-- C-x 8 RET and use completion to enter the Unicode name or code point of the character you want. That does two things: C-x 8 RET inserts the character you chose before the cursor. The M-- makes it also create a command with the same name as the character. You can then bind that command to a handy key sequence. For example:
M-- C-x 8 RET greek small letter lambda RET
That defines command greek-small-letter-lambda, which you can bind to some key sequence.
If you want to define such commands for several Unicode characters at once, you can instead just use macro ucsc-make-commands to do so. See the Commentary in file ucs-cmds.el. You provide a regexp to the macro. It is matched against all Unicode character names. An insertion command is created for each of the characters whose name matches.
Sample command creations:
(ucsc-make-commands "^math") ; Math symbols
(ucsc-make-commands "latin") ; Latin alphabet characters
(ucsc-make-commands "arabic")
(ucsc-make-commands "^cjk") ; Chinese, Japanese, Korean characters
(ucsc-make-commands "^box drawings ")
(ucsc-make-commands "^greek [a-z]+ letter") ; Greek characters
(ucsc-make-commands "\\(^hangul\\|^circled hangul\\|^parenthesized hangul\\)")
I have an UTF-8 file containing some Unicode characters like LEFT-TO-RIGHT OVERRIDE (U+202D) which I want to remove from the file. In Emacs, they are hidden (which should be the correct behavior?) by default. How do I make such "exotic" unicode characters visible (while not changing display of "regular" unicode characters like german umlauts)? And how do I replace them afterwards (with replace-string for example. C-X 8 Ret does not work for isearch/replace-string).
In Vim, its quite easy: These characters are displayed with their hex representation per default (is this a bug or missing feature?) and you can easily remove them with :%s/\%u202d//g for example. This should be possible with Emacs?
You can do M-x find-file-literally then you will see these characters.
Then you can remove them using usual string-replace
How about this:
Put the U+202d character you want to match at the top of the kill ring by typing M-:(kill-new "\u202d"). Then you can yank that string into the various searching commands, with either C-y (eg. query-replace) or M-y (eg. isearch-forward).
(Edited to add:)
You could also just call commands non-interactively, which doesn't present the same keyboard-input difficulties as the interactive calls. For example, type M-: and then:
(replace-string "\u202d" "")
This is somewhat similar to your Vim version. One difference is that it only performs replacements from the cursor position to the bottom of the file (or narrowed region), so you'd need to go to the top of the file (or narrowed region) prior to running the command to replace all matches.
I also have this issue, and this is particularly annoying for commits as it may be too late to fix the log message when one notices the mistake. So I've modified the function I use when I type C-x C-c to check whether there is a non-printable character, i.e. matching "[^\n[:print:]]", and if there is one, put the cursor over it, output a message, and do not kill the buffer. Then it is possible to manually remove the character, replace it by a printable one, or whatever, depending on the context.
The code to use for the detection (and positioning the cursor after the non-printable character) is:
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t))
Notes:
There is no need to save the current cursor position since here, either the buffer will be killed or the cursor will be put over the non-printable character on purpose.
You may want to slightly modify the regexp. For instance, the tab character is a non-printable character and I regard it as such, but you may also want to accept it.
About the [:print:] character class in the regexp, you are dependent on the C library. Some printable characters may be regarded as non-printable, like some recent emojis (but not everyone cares).
The re-search-forward return value will be regarded as true if and only if there is a non-printable character. This is exactly what we want.
Here's a snippet of what I use for Subversion commits (this is between more complex code in my .emacs).
(defvar my-svn-commit-frx "/svn-commit\\.\\([0-9]+\\.\\)?tmp\\'")
and
((and (buffer-file-name)
(string-match my-svn-commit-frx (buffer-file-name))
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t)))
(backward-char)
(message "The buffer contains a non-printable character."))
in a cond, i.e. I apply this rule only on filenames used for Subversion commits. The (backward-char) can be used or not, depending on whether you want the cursor to be over or just after the non-printable character.
Imagine I've got the following in a text file opened under Emacs:
some 34
word 30
another 38
thing 59
to 39
say 10
here 47
and I want to turn into this, adding 1 to every number made of 2 digits:
some 35
word 31
another 39
thing 60
to 40
say 11
here 48
(this is a short example, my actual need is on a much bigger list, not my call)
How can I do this from Emacs?
I don't mind calling some external Perl/sed/whatever magic as long as the call is made directly from Emacs and operates only on the marked region I want.
How would you automate this from Emacs?
I think the answer I'm thinking of consist in calling shell-command-on-region and replace the region by the output... But I'm not sure as to how to concretely do this.
This can be solved by using the command query-replace-regexp (bound to C-M-%):
C-M-%
\b[0-9][0-9]\b
return
\,(1+ \#&)
The expression that follows \, would be evaluated as a Lisp expression, the result of which used as the replacement string. In the Lisp expression, \#& would be replaced by the matched string, interpreted as a number.
By default, this works on the whole document, starting from the cursor. To have this work on the region, there are several posibilities:
If transient-mark-mode is turned on, you just need to select the region normally (using point and mark);
If for some reason you don't like transient-mark-mode, you may use narrow-to-region to restrict the changes to a specific region: select a region using point and mark, C-x n n to narrow, perform query-replace-regexp as described above, and finally C-x n w to widen. (Thanks to Justin Smith for this hint.)
Use the mouse to select the region.
See section Regexp Replacement of the Emacs Manual for more details.
Emacs' column editing mode is what you need.
Activate it typing M-x cua-mode.
Go to the beginning of the rectangle (leave cursor on character 3) and press C-RET.
Go to the end of the rectangle (leave cursor on character 7). You will be operating on the highlighted region.
Now press M-i which increments all values in the region.
You're done.! remove dead ImageShack links
It doesn't protect against 99->100.
(defun add-1-to-2-digits (b e)
"add 1 to every 2 digit number in the region"
(interactive "r")
(goto-char b)
(while (re-search-forward "\\b[0-9][0-9]\\b" e t)
(replace-match (number-to-string (+ 1 (string-to-int (match-string 0)))))))
Oh, and it operates on the region. If you want the entire file, then you replace b and e with (point-min) and nil.
Moderately tested; use M-: and issue the following command:
(while (re-search-forward "\\<[0-9][0-9]\\>" nil t) (let ((x (match-string 0))) (delete-backward-char 2) (insert (format "%d" (1+ (string-to-int x))))))
I managed to get it working in a different way using the following (my awk-fu ain't strong so it probably can be done in a simpler way):
C-u M-x shell-command-on-region RET awk '$2>=0&&$2<=99 {$2++} {print}' RET
but I lost my indentation in the process : )
Seeing all these answers, I can't help but have a lot of respect for Emacs...