determining the line terminator in Emacs - emacs

I'm writing a config file and I need to define if the process expects a windows format file or a unix format file. I've got a copy of the expected file - is there a way I can check if it uses \n or \r\n without exiting emacs?

If it says (DOS) on the modeline when you open the file on Unix, the line endings are Windows-style. If it says (Unix) when you open the file on Windows, the line endings are Unix-style.
From the Emacs 22.2 manual (Node: Mode Line):
If the buffer's file uses
carriage-return linefeed, the colon
changes to either a backslash ('\') or
'(DOS)', depending on the operating
system. If the file uses just
carriage-return, the colon indicator
changes to either a forward slash
('/') or '(Mac)'. On some systems,
Emacs displays '(Unix)' instead of the
colon for files that use newline as
the line separator.
Here's a function that – I think – shows how to check from elisp what Emacs has determined to be the type of line endings. If it looks inordinately complicated, perhaps it is.
(defun describe-eol ()
(interactive)
(let ((eol-type (coding-system-eol-type buffer-file-coding-system)))
(when (vectorp eol-type)
(setq eol-type (coding-system-eol-type (aref eol-type 0))))
(message "Line endings are of type: %s"
(case eol-type
(0 "Unix") (1 "DOS") (2 "Mac") (t "Unknown")))))

If you go in hexl-mode (M-x hexl-mode), you shoul see the line termination bytes.

Open the file in emacs using find-file-literally. If lines have ^M symbols at the end, it expects a windows format text file.

The following Elisp function will return nil if no "\r\n" terminators appear in a file (otherwise it returns the point of the first occurrence). You can put it in your .emacs and call it with M-x check-eol.
(defun check-eol (FILE)
(interactive "fFile: ")
(set-buffer (generate-new-buffer "*check-eol*"))
(insert-file-contents-literally FILE)
(let ((point (search-forward "\r\n")))
(kill-buffer nil)
point))

Related

Run AStyle on current buffer, save and restore cursor position

I'm looking for a solution to apply code formatting and static code analysis on contents of current buffer in c++ mode. I'm planning to use AStyle and CppCheck. Both tools need to be executed on current code. For example if I'm editing foo.cpp the function should run
astyle --arg1 --argn foo.cpp
And
cppcheck --arg1 --arg2 foo.cpp
What I already tried is a simple function from here which is not working:
(defun astyle-this-buffer (pmin pmax)
(interactive "r")
(shell-command-on-region pmin pmax
"astyle" ;; add options here...
(current-buffer) t
(get-buffer-create "*Astyle Errors*") t))
Update:
I found that the above code is compatible with Emacs23 while I'm using 24. So I used this instead:
(defun reformat-code ()
(interactive)
(shell-command-on-region (point-min) (point-max)
"astyle --options=~/.astylerc" t t))
(global-set-key (kbd "C-x C-a") 'reformat-code)
Now it works and formats the code, though I can't find out how to save cursor's position and tell emacs to move that line.
It seems to me that reformatting tools like astyle will modify whitespace, but presumably nothing else. (It's possible this is mildly wrong, like if they reformat a C macro then they must modify backslashes as well -- but that can also be taken into account.)
So, the way I would approach this would be to count how many non-whitespace characters appear before (point), invoke astyle, revert the buffer (or whatever), and finally, starting from the start of the buffer, move forward that many non-whitespace characters.
This won't always be "the same", for example if point was in some whitespace that was modified -- but I think it ought to be reasonably close.
If you really want to just record the current line number and go back to that, you can use line-number-at-pos to get the current line number, and then (goto-char (point-min)) and use forward-line to get back to the line.

Force Emacs to use a particular encoding if and only if that causes no trouble

In my .emacs file, I use the line
'(setq coding-system-for-write 'iso-8859-1-unix)
to have Emacs save files in the iso-8859-1-unix encoding. When I
enter characters that cannot be encoded that way ("Łódź" for
example), I get prompted to select a different encoding, but upon
entering `iso-8859-1-unix' into the minibuffer, the file is saved and
the offending characters are lost.
If I just hit enter at the prompt, the file is saved in Unicode, and
when I close and reopen Emacs it is interpreted as a Unicode file
again. If I then remove the offending characters, save the file and
close and reopen Emacs another time, it is still interpreted as a
Unicode file -- from which I deduce that it has still been saved in
Unicode, even though saving in iso-8859-1-unix would have been
possible.
So is there a way to force Emacs to write a file in iso-8859-1
whenever possible, and never save it in that encoding if doing so
would gobble characters?
Many thanks in advance,
Thure Dührsen
As per the doc string for coding-system-for-write, you should not be setting it globally.
Perhaps what you are looking for is (prefer-coding-system 'iso-8859-1-unix)?
I'd try to write a save time hook function which would check the content of the buffer and set the encoding correspondingly. By using find-coding-system-region there shouldn't be much work.
Try
(setq-default buffer-file-coding-system 'iso-8859-1)
Edit:
Incorporating AProgrammer's suggestion, we get
(defun enforce-coding-system-priority ()
(let ((pref (car (coding-system-priority-list)))
(list (find-coding-systems-region (point-min) (point-max))))
(when (or (memq 'undecided list) (memq pref list))
(setq buffer-file-coding-system pref))))
(add-hook 'before-save-hook 'enforce-coding-system-priority)
(prefer-coding-system 'iso-8859-1)
The following should make Emacs ask when the buffer encoding is not that with which emacs is set up to save the file. Emacs will then prompt you to chose one among "safe" encodings.
(setq select-safe-coding-system-accept-default-p
'(lambda (coding)
(string=
(coding-system-base coding)
(coding-system-base buffer-file-coding-system))))

How can I set the encoding of shell-command-on-region output?

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):
(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start
end
"perltidy"
t
t
(get-buffer-create "*Perltidy Output*")))
(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
(setq start (region-beginning)
end (region-end))
(setq start (point-min)
end (point-max)))
(perltidy-command start end)
(goto-char point)))
(global-set-key "\C-ct" 'perltidy-dwim)
I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.
Is there any way I can fix that?
EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!
Below are from shell-command-on-region document
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.
During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.
If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.
Value: (("\\.dz\\'" no-conversion . no-conversion)
...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
...
("" undecided))
Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,
for example:
(setq default-process-coding-system '(utf-8 . utf-8))
(If input is encoded as utf-8, then output encoded as utf-8)
Or
(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))
I also wrote a post about this if you want details.
Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded in the same coding system that will be used to save the file,
`buffer-file-coding-system'. If the output is going to replace the region,
then it is decoded from that same coding system.
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
Noninteractive callers can specify coding systems by binding
`coding-system-for-read' and `coding-system-for-write'.
In other words, you'd do something like
(let ((coding-system-for-read 'utf-8-unix))
(shell-command-on-region ...) )
This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.
Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.

How to strip CR (^M) and leave LF (^J) characters?

I am trying to use Hexl mode to manually remove some special characters from a text file and don't see how to delete anything in Hexl mode.
What I really want is to remove carriage return and keep linefeed characters.
Is Hexl mode the right way to do this?
No need to find replace. Just use.
M-x delete-trailing-whitespace
You can also set the file encoding through
C-x RET f unix
Oops. That ^J^M needs to be entered as two literal characters.
Use c-q c-j, c-q c-m and for the replacement string, use c-q c-j.
No need for hexl-mode for this. Just do a global-search-and-replace of ^J^M with ^J Works for me. :) Then save the file, kill the buffer, and revisit the file so the window shows the new file mode (Unix vs DOS).
There's also a command-line tool called unix2dos/dos2unix that exists specifically to convert line endings.
Assuming you want a DOS encoded file to be changed into UNIX encoding, use M-x set-buffer-file-coding-system (C-x RET f) to set the coding-system to "unix" and save the file.
If you want to remove a carriage return (usually displayed as ^M) and leave the line feed. You can just visit the file w/out any conversion:
M-x find-file-literally /path/to/file
Because a file with carriage returns is generally displayed in DOS mode (hiding the carriage returns). The mode line will likely display (DOS) on the left side.
Once you've done that, the ^M will show up and you can delete them like you would any character.
You don't need to use hexl-mode. Instead:
open file in a way that shows you those ^M's. See M-x find-file-literally /path/to/file above. In XEmacs you can also do C-u C-x C-f and select binary encoding.
select the string you want replace and copy it using M-w
do M-% (query replace) and paste what you want to copy using C-y
present Enter when prompted to what replace it with
possible press ! now to replace all occurrences
The point is that even if you don't how to enter what you are trying to replace, you can always select/copy it.
(in hexl mode) I'm not sure that you can delete characters. I've always converted them to spaces or some other character, switched to the regular text editor, and deleted them there.
I use this function:
(defun l/cr-sanitise ()
"Make sure current buffer uses unix-utf8 encoding.
If necessary remove superfluous ^M. Buffer will need to be saved
for changes to be permanent."
(interactive)
(set-buffer-file-coding-system 'utf-8-unix)
(delete-trailing-whitespace)
(message "Please save buffer to persist encoding changes."))
From http://www.xsteve.at/prg/emacs/xsteve-functions.el:
;02.02.2000
(defun xsteve-remove-control-M ()
"Remove ^M at end of line in the whole buffer."
(interactive)
(save-match-data
(save-excursion
(let ((remove-count 0))
(goto-char (point-min))
(while (re-search-forward (concat (char-to-string 13) "$") (point-max) t)
(setq remove-count (+ remove-count 1))
(replace-match "" nil nil))
(message (format "%d ^M removed from buffer." remove-count))))))
Add this to your .emacs and run it via M-x xsteve-remove-control-M or bind it to a easier key. It will strip the ^Ms in anymode.

Hiding ^M in Emacs

Sometimes I need to read log files that have ^M (control-M) in the line endings. I can do a global replace to get rid of them, but then something more is logged to the log file and, of course, they all come back.
Setting Unix-style or dos-style end-of-line encoding doesn't seem to make much difference (but Unix-style is my default). I'm using the undecided-(unix|dos) coding system.
I'm on Windows, reading log files created by log4net (although log4net obviously isn't the only source of this annoyance).
(defun remove-dos-eol ()
"Do not show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
Solution by Johan Bockgård. I found it here.
Modern versions of emacs know how to handle both UNIX and DOS line endings, so when ^M shows up in the file, it means that there's a mixture of both in the file. When there is such a mixture, emacs defaults to UNIX mode, so the ^Ms are visible. The real fix is to fix the program creating the file so that it uses consistent line-endings.
What about?
C-x RET c dos RET C-x C-f FILENAME RET
I made a file that has two lines, with the second having a carriage return. Emacs would open the file in Unix coding, and switching coding system does nothing. However, the universal-coding-system-argument above works.
I believe you can change the line coding system the file is using to the Unix format with
C-x RET f UNIX RET
If you do that, the mode line should change to add the word "(Unix)", and all those ^M's should go away.
If you'd like to view the log files and simply hide the ^M's rather than actually replace them you can use Drew Adam's highlight extension to do so.
You can either write elisp code or make a keyboard macro to do the following
select the whole buffer
hlt-highlight-regexp-region
C-q C-M
hlt-hide-default-face
This will first highlight the ^M's and then hide them. If you want them back use `hlt-show-default-face'
Edric's answer should get more attention. Johan Bockgård's solution does address the poster's complaint, insofar as it makes the ^M's invisible, but that just masks the underlying problem, and encourages further mixing of Unix and DOS line-endings.
The proper solution would be to do a global M-x replace-regexp to turn all line endings to DOS ones (or Unix, as the case may be). Then close and reopen the file (not sure if M-x revert-buffer would be enough) and the ^M's will either all be invisible, or all be gone.
You can change the display-table entry of the Control-M (^M) character, to make it displayable as whitespace or even disappear totally (vacuous). See the code in library pp-c-l.el (Pretty Control-L) for inspiration. It displays ^L chars in an arbitrary way.
Edited: Oops, I just noticed that #binOr already mentioned this method.
Put this in your .emacs:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward "\r" nil t) (replace-match "")))
Now you can simply call dos2unix and remove all the ^M characters.
If you encounter ^Ms in received mail in Gnus, you can use W c (wash CRs), or
(setq gnus-treat-strip-cr t)
what about using dos2unix, unix2dos (now tofrodos)?
sudeepdino008's answer did not work for me (I could not comment on his answer, so I had to add my own answer.).
I was able to fix it using this code:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward (string ?\C-m) nil t) (replace-match "")))
Like binOr said add this to your %APPDATA%.emacs.d\init.el on windows or where ever is your config.
;; Windows EOL
(defun hide-dos-eol ()
"Hide ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
(defun show-dos-eol ()
"Show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M ?\^M))
(add-hook 'text-mode-hook 'hide-dos-eol)