Emacs - noninteractive call to shell-command-on-region always deletes region? - emacs

The Emacs Help page for the function shell-command-on-region says (elided for space):
(shell-command-on-region START END COMMAND &optional OUTPUT-BUFFER
REPLACE ERROR-BUFFER DISPLAY-ERROR-BUFFER)
...
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
...
If the optional fourth argument OUTPUT-BUFFER is non-nil,
that says to put the output in some other buffer.
If OUTPUT-BUFFER is a buffer or buffer name, put the output there.
If OUTPUT-BUFFER is not a buffer and not nil,
insert output in the current buffer.
In either case, the output is inserted after point (leaving mark after it).
If REPLACE, the optional fifth argument, is non-nil, that means insert
the output in place of text from START to END, putting point and mark
around it.
This isn't the clearest, but the last few sentences just quoted seem to say that if I want the output of the shell command to be inserted in the current buffer at point, leaving the other contents of the buffer intact, I should pass a non-nil argument for OUTPUT-BUFFER and nil for REPLACE.
However, if I execute this code in the *scratch* buffer (not the real code I'm working on, but the minimal case that demonstrates the issue):
(shell-command-on-region
(point-min) (point-max) "wc" t nil)
the entire contents of the buffer are deleted and replaced with the output of wc!
Is shell-command-on-region broken when used non-interactively, or am I misreading the documentation? If the latter, how could I change the code above to insert the output of wc at point rather than replacing the contents of the buffer? Ideally I'd like a general solution that works not only to run a command on the entire buffer as in the minimal example (i.e., (point-min) through (point-max)), but also for the case of running a command with the region as input and then inserting the results at point without deleting the region.

You are wrong
It is not a good idea to use an interactive command like shell-command-on-region in emacs lisp code. Use call-process-region instead.
Emacs is wrong
There is a bug in shell-command-on-region: it does not pass the replace argument down to call-process-region; here is the fix:
=== modified file 'lisp/simple.el'
--- lisp/simple.el 2013-05-16 03:41:52 +0000
+++ lisp/simple.el 2013-05-23 18:44:16 +0000
## -2923,7 +2923,7 ## interactively, this is t."
(goto-char start)
(and replace (push-mark (point) 'nomsg))
(setq exit-status
- (call-process-region start end shell-file-name t
+ (call-process-region start end shell-file-name replace
(if error-file
(list t error-file)
t)
I will commit it shortly.

If you follow the link to the functions' source code, you'll quickly see that it does:
(if (or replace
(and output-buffer
(not (or (bufferp output-buffer) (stringp output-buffer)))))
I don't know why it does that, tho. In any case, this is mostly meant as a command rather than a function; from Elisp I recomend you use call-process-region instead.

In my case (emacs 24.3, don't know what version you're using), the documentation is slightly different in the optional argument:
Optional fourth arg OUTPUT-BUFFER specifies where to put the
command's output. If the value is a buffer or buffer name, put
the output there. Any other value, including nil, means to
insert the output in the current buffer. In either case, the
output is inserted after point (leaving mark after it).
The code that checks whether to delete the output (current) buffer contents is the following:
(if (or replace
(and output-buffer
(not (or (bufferp output-buffer) (stringp output-buffer)))))
So clearly putting t as in your case, it is not a string or a buffer, and it is not nil, so it will replace current buffer contents with the output. However, if I try:
(shell-command-on-region
(point-min) (point-max) "wc" nil nil)
then the buffer is not deleted, and the output put into the "Shell Command Output" buffer. At first sight, I'd say the function is not correctly implemented. Even the two versions of the documentation seem not to correspond with the code.

Related

Get the content as string of a region in a buffer by elisp program

Here is the problem that I'm solving:
Convert a region of text into a string data structure for subsequent processing by elisp program. The challenge is that
I want to execute an elisp program interactively without affecting the selection of a region
store the string value into a variable so that I can further manipulate it.
By my understanding, a region is defined by a mark and the subsequent cursor position. And I usually execute elisp program in *scratch* buffer. Furthermore, the region is also in the *scratch* buffer.
But to write the function call and execute it in the buffer, I need to move the cursor away from the end of the text selection (region) in order to write the program of
(setq grabbed (buffer-substring-no-properties (region-beginning) (region-end)))
but then the region of selection would change due to the cursor movement.
So I wonder how I could execute the elisp program while keeping the selection intact and still can access the return value.
If you want to run the function from some Elisp code but as if the user had invoked it via a keybinding, you can use call-interactively:
(setq variable-to-keep-the-value (call-interactively 'lines-to-list))
But in most cases, what you want instead is to take yourself the responsibility to choose to which part of text the function should apply:
(setq variable-to-keep-the-value
(lines-to-list (region-beginning) (region-end)))
Notice that the region's boundaries are nothing magical, regardless if they've been set by the mouse or what.
Finally, I found a desirable solution! It's using ielm buffer, the real repl of elisp.
In the ielm buffer, I can set working buffer (by C-c C-b) to be a buffer where I have the text to be manipulated, for example, *scratch*.
I can then select a region of the working buffer to be processed, and in the ielm buffer then I can type and execute elisp code to extract the text in the selected region in the working buffer, for example,
ELISP> (setq grabbed (buffer-substring-no-properties (region-beginning) (region-end)))
"One\nTwo\nThre"
ELISP> grabbed
"One\nTwo\nThre"
ELISP> (split-string grabbed)
("One" "Two" "Thre")
I can then work with the value held by the set variable, grabbed.
Here is a very helpful description of ielm:
https://www.masteringemacs.org/article/evaluating-elisp-emacs

Run AStyle on current buffer, save and restore cursor position

I'm looking for a solution to apply code formatting and static code analysis on contents of current buffer in c++ mode. I'm planning to use AStyle and CppCheck. Both tools need to be executed on current code. For example if I'm editing foo.cpp the function should run
astyle --arg1 --argn foo.cpp
And
cppcheck --arg1 --arg2 foo.cpp
What I already tried is a simple function from here which is not working:
(defun astyle-this-buffer (pmin pmax)
(interactive "r")
(shell-command-on-region pmin pmax
"astyle" ;; add options here...
(current-buffer) t
(get-buffer-create "*Astyle Errors*") t))
Update:
I found that the above code is compatible with Emacs23 while I'm using 24. So I used this instead:
(defun reformat-code ()
(interactive)
(shell-command-on-region (point-min) (point-max)
"astyle --options=~/.astylerc" t t))
(global-set-key (kbd "C-x C-a") 'reformat-code)
Now it works and formats the code, though I can't find out how to save cursor's position and tell emacs to move that line.
It seems to me that reformatting tools like astyle will modify whitespace, but presumably nothing else. (It's possible this is mildly wrong, like if they reformat a C macro then they must modify backslashes as well -- but that can also be taken into account.)
So, the way I would approach this would be to count how many non-whitespace characters appear before (point), invoke astyle, revert the buffer (or whatever), and finally, starting from the start of the buffer, move forward that many non-whitespace characters.
This won't always be "the same", for example if point was in some whitespace that was modified -- but I think it ought to be reasonably close.
If you really want to just record the current line number and go back to that, you can use line-number-at-pos to get the current line number, and then (goto-char (point-min)) and use forward-line to get back to the line.

How to prevent Emacs from setting an undo boundary?

I've written an Emacs Lisp function which calls a shell command to
process a given string and return the resulting string. Here is a
simplified example which just calls tr to convert text to uppercase:
(defun test-shell-command (str)
"Apply tr to STR to convert lowercase letters to uppercase."
(let ((buffer (generate-new-buffer "*temp*")))
(with-current-buffer buffer
(insert str)
(call-process-region (point-min) (point-max) "tr" t t nil "'a-z'" "'A-Z'")
(buffer-string))))
This function creates a temporary buffer, inserts the text, calls
tr, replaces the text with the result, and returns the result.
The above function works as expected, however, when I write a wrapper
around this function to apply the command to the region, two steps are
being added to the undo history. Here's another example:
(defun test-shell-command-region (begin end)
"Apply tr to region from BEGIN to END."
(interactive "*r")
(insert (test-shell-command (delete-and-extract-region begin end))))
When I call M-x test-shell-command-on-region, the region is replaced
with the uppercase text, but when I press C-_ (undo), the first
step in the undo history is the state with the text deleted. Going
two steps back, the original text is restored.
My question is, how does one prevent the intermediate step from being
added to the undo history? I've read the Emacs documentation on undo,
but it doesn't seem to address this as far as I can tell.
Here's a function which accomplishes the same thing by calling the
built-in Emacs function upcase, just as before: on the result of
delete-and-extract-region with the result being handed off to
insert:
(defun test-upcase-region (begin end)
"Apply upcase to region from BEGIN to END."
(interactive "*r")
(insert (upcase (delete-and-extract-region begin end))))
When calling M-x test-upcase-region, there is only one step in the
undo history, as expected. So, it seems to be the case that calling
test-shell-command creates an undo boundary. Can that be avoided
somehow?
The key is the buffer name. See Maintaining Undo:
Recording of undo information in a newly created buffer is normally enabled to start with; but if the buffer name starts with a space, the undo recording is initially disabled. You can explicitly enable or disable undo recording with the following two functions, or by setting buffer-undo-list yourself.
with-temp-buffer creates a buffer named ␣*temp* (note the leading whitespace), whereas your function uses *temp*.
To remove the undo boundary in your code, either use a buffer name with a leading space, or explicitly disable undo recoding in the temporary buffer with buffer-disable-undo.
But generally, use with-temp-buffer, really. It's the standard way for such things in Emacs, making your intention clear to anyone who reads your code. Also, with-temp-buffer tries hard to clean up the temporary buffer properly.
As for why undo in the temporary buffer creates an undo boundary in the current one: If the previous change was undoable and made in some other buffer (the temporary one in this case), an implicit boundary is created. From undo-boundary:
All buffer modifications add a boundary whenever the previous undoable change was made in some other buffer. This is to ensure that each command makes a boundary in each buffer where it makes changes.
Hence, inhibiting undo in the temporary buffer removes the undo boundary in the current buffer, too: The previous change is simply not undoable anymore, and thus no implicit boundary is created.
There are many situations where using a temporary buffer is not practical. It's hard to debug what's going on for example.
In these cases you can let-bind undo-inhibit-record-point to prevent Emacs from deciding where to put boundaries:
(let ((undo-inhibit-record-point t))
;; Then record boundaries manually
(undo-boundary)
(do-lots-of-stuff)
(undo-boundary))
The solution in this case was to create the temporary output buffer
using with-temp-buffer, rather than explicitly creating one with
generate-new-buffer. The following alternative version of the
first function does not create an undo boundary:
(defun test-shell-command (str)
"Apply tr to STR to convert lowercase letters to uppercase."
(with-temp-buffer
(insert str)
(call-process-region (point-min) (point-max) "tr" t t nil "'a-z'" "'A-Z'")
(buffer-string)))
I was not able to determine whether generate-new-buffer is indeed
creating the undo boundary, but this fixed the problem.
generate-new-buffer calls get-buffer-create, which is defined in
the C source code, but I could not quickly determine what was
happening in terms of the undo history.
I suspect that the issue may be related to the following passage in
the Emacs Lisp Manual entry for undo-boundary:
All buffer modifications add a boundary whenever the previous
undoable change was made in some other buffer. This is to ensure
that each command makes a boundary in each buffer where it makes
changes.
Even though the with-temp-buffer macro calls generate-new-buffer
much as in the original function, the documentation for
with-temp-buffer states that no undo information is saved (even
though there is nothing in the Emacs Lisp source that suggests this
would be the case):
By default, undo (see Undo) is not recorded in the buffer created by
this macro (but body can enable that, if needed).

How to copy to clipboard in Emacs Lisp

I want to copy a string to the clipboard (not a region of any particular buffer, just a plain string). It would be nice if it were also added to the kill-ring. Here's an example:
(copy-to-clipboard "Hello World")
Does this function exist? If so, what is it called and how did you find it? Is there also a paste-from-clipboard function?
I can't seem to find this stuff in the Lisp Reference Manual, so please tell me how you found it.
You're looking for kill-new.
kill-new is a compiled Lisp function in `simple.el'.
(kill-new string &optional replace yank-handler)
Make string the latest kill in the kill ring.
Set `kill-ring-yank-pointer' to point to it.
If `interprogram-cut-function' is non-nil, apply it to string.
Optional second argument replace non-nil means that string will replace
the front of the kill ring, rather than being added to the list.
Optional third arguments yank-handler controls how the string is later
inserted into a buffer; see `insert-for-yank' for details.
When a yank handler is specified, string must be non-empty (the yank
handler, if non-nil, is stored as a `yank-handler' text property on string).
When the yank handler has a non-nil PARAM element, the original string
argument is not used by `insert-for-yank'. However, since Lisp code
may access and use elements from the kill ring directly, the string
argument should still be a "useful" string for such uses.
I do this:
(with-temp-buffer
(insert "Hello World")
(clipboard-kill-region (point-min) (point-max)))
That gets it on the clipboard. If you want it on the kill-ring add a kill-region form also.
The command to put your selection on the window system clipboard is x-select-text. You can give it a block of text to remember. So a (buffer-substring (point) (mark)) or something should give you what you need to pass to it. In Joe's answer, you can see the interprogram-cut-function. Look that up for how to find this.
In my .emacs file, i use this
(global-set-key "\C-V" 'yank)
(global-set-key "\C-cc" 'kill-ring-save)
I could not use Ctrl-C (or System-copy), but this may be enough in case old habits kick in.

How can I set the encoding of shell-command-on-region output?

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):
(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start
end
"perltidy"
t
t
(get-buffer-create "*Perltidy Output*")))
(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
(setq start (region-beginning)
end (region-end))
(setq start (point-min)
end (point-max)))
(perltidy-command start end)
(goto-char point)))
(global-set-key "\C-ct" 'perltidy-dwim)
I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.
Is there any way I can fix that?
EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!
Below are from shell-command-on-region document
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.
During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.
If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.
Value: (("\\.dz\\'" no-conversion . no-conversion)
...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
...
("" undecided))
Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,
for example:
(setq default-process-coding-system '(utf-8 . utf-8))
(If input is encoded as utf-8, then output encoded as utf-8)
Or
(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))
I also wrote a post about this if you want details.
Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded in the same coding system that will be used to save the file,
`buffer-file-coding-system'. If the output is going to replace the region,
then it is decoded from that same coding system.
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
Noninteractive callers can specify coding systems by binding
`coding-system-for-read' and `coding-system-for-write'.
In other words, you'd do something like
(let ((coding-system-for-read 'utf-8-unix))
(shell-command-on-region ...) )
This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.
Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.