How can I set the encoding of shell-command-on-region output? - emacs

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):
(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start
end
"perltidy"
t
t
(get-buffer-create "*Perltidy Output*")))
(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
(setq start (region-beginning)
end (region-end))
(setq start (point-min)
end (point-max)))
(perltidy-command start end)
(goto-char point)))
(global-set-key "\C-ct" 'perltidy-dwim)
I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.
Is there any way I can fix that?
EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!

Below are from shell-command-on-region document
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.
During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.
If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.
Value: (("\\.dz\\'" no-conversion . no-conversion)
...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
...
("" undecided))
Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,
for example:
(setq default-process-coding-system '(utf-8 . utf-8))
(If input is encoded as utf-8, then output encoded as utf-8)
Or
(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))
I also wrote a post about this if you want details.

Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded in the same coding system that will be used to save the file,
`buffer-file-coding-system'. If the output is going to replace the region,
then it is decoded from that same coding system.
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
Noninteractive callers can specify coding systems by binding
`coding-system-for-read' and `coding-system-for-write'.
In other words, you'd do something like
(let ((coding-system-for-read 'utf-8-unix))
(shell-command-on-region ...) )
This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.
Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.

Related

Run AStyle on current buffer, save and restore cursor position

I'm looking for a solution to apply code formatting and static code analysis on contents of current buffer in c++ mode. I'm planning to use AStyle and CppCheck. Both tools need to be executed on current code. For example if I'm editing foo.cpp the function should run
astyle --arg1 --argn foo.cpp
And
cppcheck --arg1 --arg2 foo.cpp
What I already tried is a simple function from here which is not working:
(defun astyle-this-buffer (pmin pmax)
(interactive "r")
(shell-command-on-region pmin pmax
"astyle" ;; add options here...
(current-buffer) t
(get-buffer-create "*Astyle Errors*") t))
Update:
I found that the above code is compatible with Emacs23 while I'm using 24. So I used this instead:
(defun reformat-code ()
(interactive)
(shell-command-on-region (point-min) (point-max)
"astyle --options=~/.astylerc" t t))
(global-set-key (kbd "C-x C-a") 'reformat-code)
Now it works and formats the code, though I can't find out how to save cursor's position and tell emacs to move that line.
It seems to me that reformatting tools like astyle will modify whitespace, but presumably nothing else. (It's possible this is mildly wrong, like if they reformat a C macro then they must modify backslashes as well -- but that can also be taken into account.)
So, the way I would approach this would be to count how many non-whitespace characters appear before (point), invoke astyle, revert the buffer (or whatever), and finally, starting from the start of the buffer, move forward that many non-whitespace characters.
This won't always be "the same", for example if point was in some whitespace that was modified -- but I think it ought to be reasonably close.
If you really want to just record the current line number and go back to that, you can use line-number-at-pos to get the current line number, and then (goto-char (point-min)) and use forward-line to get back to the line.

Emacs - noninteractive call to shell-command-on-region always deletes region?

The Emacs Help page for the function shell-command-on-region says (elided for space):
(shell-command-on-region START END COMMAND &optional OUTPUT-BUFFER
REPLACE ERROR-BUFFER DISPLAY-ERROR-BUFFER)
...
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
...
If the optional fourth argument OUTPUT-BUFFER is non-nil,
that says to put the output in some other buffer.
If OUTPUT-BUFFER is a buffer or buffer name, put the output there.
If OUTPUT-BUFFER is not a buffer and not nil,
insert output in the current buffer.
In either case, the output is inserted after point (leaving mark after it).
If REPLACE, the optional fifth argument, is non-nil, that means insert
the output in place of text from START to END, putting point and mark
around it.
This isn't the clearest, but the last few sentences just quoted seem to say that if I want the output of the shell command to be inserted in the current buffer at point, leaving the other contents of the buffer intact, I should pass a non-nil argument for OUTPUT-BUFFER and nil for REPLACE.
However, if I execute this code in the *scratch* buffer (not the real code I'm working on, but the minimal case that demonstrates the issue):
(shell-command-on-region
(point-min) (point-max) "wc" t nil)
the entire contents of the buffer are deleted and replaced with the output of wc!
Is shell-command-on-region broken when used non-interactively, or am I misreading the documentation? If the latter, how could I change the code above to insert the output of wc at point rather than replacing the contents of the buffer? Ideally I'd like a general solution that works not only to run a command on the entire buffer as in the minimal example (i.e., (point-min) through (point-max)), but also for the case of running a command with the region as input and then inserting the results at point without deleting the region.
You are wrong
It is not a good idea to use an interactive command like shell-command-on-region in emacs lisp code. Use call-process-region instead.
Emacs is wrong
There is a bug in shell-command-on-region: it does not pass the replace argument down to call-process-region; here is the fix:
=== modified file 'lisp/simple.el'
--- lisp/simple.el 2013-05-16 03:41:52 +0000
+++ lisp/simple.el 2013-05-23 18:44:16 +0000
## -2923,7 +2923,7 ## interactively, this is t."
(goto-char start)
(and replace (push-mark (point) 'nomsg))
(setq exit-status
- (call-process-region start end shell-file-name t
+ (call-process-region start end shell-file-name replace
(if error-file
(list t error-file)
t)
I will commit it shortly.
If you follow the link to the functions' source code, you'll quickly see that it does:
(if (or replace
(and output-buffer
(not (or (bufferp output-buffer) (stringp output-buffer)))))
I don't know why it does that, tho. In any case, this is mostly meant as a command rather than a function; from Elisp I recomend you use call-process-region instead.
In my case (emacs 24.3, don't know what version you're using), the documentation is slightly different in the optional argument:
Optional fourth arg OUTPUT-BUFFER specifies where to put the
command's output. If the value is a buffer or buffer name, put
the output there. Any other value, including nil, means to
insert the output in the current buffer. In either case, the
output is inserted after point (leaving mark after it).
The code that checks whether to delete the output (current) buffer contents is the following:
(if (or replace
(and output-buffer
(not (or (bufferp output-buffer) (stringp output-buffer)))))
So clearly putting t as in your case, it is not a string or a buffer, and it is not nil, so it will replace current buffer contents with the output. However, if I try:
(shell-command-on-region
(point-min) (point-max) "wc" nil nil)
then the buffer is not deleted, and the output put into the "Shell Command Output" buffer. At first sight, I'd say the function is not correctly implemented. Even the two versions of the documentation seem not to correspond with the code.

emacs shell command output not showing ANSI colors but the code

When I do M-! in my emacs 21.4 the ANSI codes gets literal. Eg: ls --color
^[[0m^[[0m05420273.pdf^[[0m
^[[0m100829_Baño1.pdf^[[0m
Is there a way of having it with color and UTF8?
The same question has been answered in SO before but with not totally satisfactory results (the solution given was to open a shell-mode). I know how to have colors in a shell. I only want to know how I can have color with M! (shell-command) or if it is not possible at all.
A shell mode is too intrusive when you want only to show something quick and don't want to move to this buffer and you would like to have it disappear automatically without C-x-k. Obviously there are situations where a shell buffer is more convenient but thanks to the other question I found how to put color to the shell-mode.
[note] emacs in use
GNU Emacs 21.4.1 (x86_64-redhat-linux-gnu, X toolkit, Xaw3d scroll bars) of 2008-06-15 on builder6.centos.org
ansi-color.el contains the functions to process ANSI color codes. Unfortunately, there's not really a good way to hook it into shell-command. This is something of a hack, but it works:
(require 'ansi-color)
(defadvice display-message-or-buffer (before ansi-color activate)
"Process ANSI color codes in shell output."
(let ((buf (ad-get-arg 0)))
(and (bufferp buf)
(string= (buffer-name buf) "*Shell Command Output*")
(with-current-buffer buf
(ansi-color-apply-on-region (point-min) (point-max))))))
About UTF-8:
To specify a coding system for
converting non-ASCII characters in the
shell command output, use C-x RET c
before this command.
Noninteractive callers can specify
coding systems by binding
coding-system-for-read' and
coding-system-for-write'.
This is from the documentation of shell-command.

Force Emacs to use a particular encoding if and only if that causes no trouble

In my .emacs file, I use the line
'(setq coding-system-for-write 'iso-8859-1-unix)
to have Emacs save files in the iso-8859-1-unix encoding. When I
enter characters that cannot be encoded that way ("Łódź" for
example), I get prompted to select a different encoding, but upon
entering `iso-8859-1-unix' into the minibuffer, the file is saved and
the offending characters are lost.
If I just hit enter at the prompt, the file is saved in Unicode, and
when I close and reopen Emacs it is interpreted as a Unicode file
again. If I then remove the offending characters, save the file and
close and reopen Emacs another time, it is still interpreted as a
Unicode file -- from which I deduce that it has still been saved in
Unicode, even though saving in iso-8859-1-unix would have been
possible.
So is there a way to force Emacs to write a file in iso-8859-1
whenever possible, and never save it in that encoding if doing so
would gobble characters?
Many thanks in advance,
Thure Dührsen
As per the doc string for coding-system-for-write, you should not be setting it globally.
Perhaps what you are looking for is (prefer-coding-system 'iso-8859-1-unix)?
I'd try to write a save time hook function which would check the content of the buffer and set the encoding correspondingly. By using find-coding-system-region there shouldn't be much work.
Try
(setq-default buffer-file-coding-system 'iso-8859-1)
Edit:
Incorporating AProgrammer's suggestion, we get
(defun enforce-coding-system-priority ()
(let ((pref (car (coding-system-priority-list)))
(list (find-coding-systems-region (point-min) (point-max))))
(when (or (memq 'undecided list) (memq pref list))
(setq buffer-file-coding-system pref))))
(add-hook 'before-save-hook 'enforce-coding-system-priority)
(prefer-coding-system 'iso-8859-1)
The following should make Emacs ask when the buffer encoding is not that with which emacs is set up to save the file. Emacs will then prompt you to chose one among "safe" encodings.
(setq select-safe-coding-system-accept-default-p
'(lambda (coding)
(string=
(coding-system-base coding)
(coding-system-base buffer-file-coding-system))))

Hiding ^M in Emacs

Sometimes I need to read log files that have ^M (control-M) in the line endings. I can do a global replace to get rid of them, but then something more is logged to the log file and, of course, they all come back.
Setting Unix-style or dos-style end-of-line encoding doesn't seem to make much difference (but Unix-style is my default). I'm using the undecided-(unix|dos) coding system.
I'm on Windows, reading log files created by log4net (although log4net obviously isn't the only source of this annoyance).
(defun remove-dos-eol ()
"Do not show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
Solution by Johan Bockgård. I found it here.
Modern versions of emacs know how to handle both UNIX and DOS line endings, so when ^M shows up in the file, it means that there's a mixture of both in the file. When there is such a mixture, emacs defaults to UNIX mode, so the ^Ms are visible. The real fix is to fix the program creating the file so that it uses consistent line-endings.
What about?
C-x RET c dos RET C-x C-f FILENAME RET
I made a file that has two lines, with the second having a carriage return. Emacs would open the file in Unix coding, and switching coding system does nothing. However, the universal-coding-system-argument above works.
I believe you can change the line coding system the file is using to the Unix format with
C-x RET f UNIX RET
If you do that, the mode line should change to add the word "(Unix)", and all those ^M's should go away.
If you'd like to view the log files and simply hide the ^M's rather than actually replace them you can use Drew Adam's highlight extension to do so.
You can either write elisp code or make a keyboard macro to do the following
select the whole buffer
hlt-highlight-regexp-region
C-q C-M
hlt-hide-default-face
This will first highlight the ^M's and then hide them. If you want them back use `hlt-show-default-face'
Edric's answer should get more attention. Johan Bockgård's solution does address the poster's complaint, insofar as it makes the ^M's invisible, but that just masks the underlying problem, and encourages further mixing of Unix and DOS line-endings.
The proper solution would be to do a global M-x replace-regexp to turn all line endings to DOS ones (or Unix, as the case may be). Then close and reopen the file (not sure if M-x revert-buffer would be enough) and the ^M's will either all be invisible, or all be gone.
You can change the display-table entry of the Control-M (^M) character, to make it displayable as whitespace or even disappear totally (vacuous). See the code in library pp-c-l.el (Pretty Control-L) for inspiration. It displays ^L chars in an arbitrary way.
Edited: Oops, I just noticed that #binOr already mentioned this method.
Put this in your .emacs:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward "\r" nil t) (replace-match "")))
Now you can simply call dos2unix and remove all the ^M characters.
If you encounter ^Ms in received mail in Gnus, you can use W c (wash CRs), or
(setq gnus-treat-strip-cr t)
what about using dos2unix, unix2dos (now tofrodos)?
sudeepdino008's answer did not work for me (I could not comment on his answer, so I had to add my own answer.).
I was able to fix it using this code:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward (string ?\C-m) nil t) (replace-match "")))
Like binOr said add this to your %APPDATA%.emacs.d\init.el on windows or where ever is your config.
;; Windows EOL
(defun hide-dos-eol ()
"Hide ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
(defun show-dos-eol ()
"Show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M ?\^M))
(add-hook 'text-mode-hook 'hide-dos-eol)