Sometimes I need to read log files that have ^M (control-M) in the line endings. I can do a global replace to get rid of them, but then something more is logged to the log file and, of course, they all come back.
Setting Unix-style or dos-style end-of-line encoding doesn't seem to make much difference (but Unix-style is my default). I'm using the undecided-(unix|dos) coding system.
I'm on Windows, reading log files created by log4net (although log4net obviously isn't the only source of this annoyance).
(defun remove-dos-eol ()
"Do not show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
Solution by Johan Bockgård. I found it here.
Modern versions of emacs know how to handle both UNIX and DOS line endings, so when ^M shows up in the file, it means that there's a mixture of both in the file. When there is such a mixture, emacs defaults to UNIX mode, so the ^Ms are visible. The real fix is to fix the program creating the file so that it uses consistent line-endings.
What about?
C-x RET c dos RET C-x C-f FILENAME RET
I made a file that has two lines, with the second having a carriage return. Emacs would open the file in Unix coding, and switching coding system does nothing. However, the universal-coding-system-argument above works.
I believe you can change the line coding system the file is using to the Unix format with
C-x RET f UNIX RET
If you do that, the mode line should change to add the word "(Unix)", and all those ^M's should go away.
If you'd like to view the log files and simply hide the ^M's rather than actually replace them you can use Drew Adam's highlight extension to do so.
You can either write elisp code or make a keyboard macro to do the following
select the whole buffer
hlt-highlight-regexp-region
C-q C-M
hlt-hide-default-face
This will first highlight the ^M's and then hide them. If you want them back use `hlt-show-default-face'
Edric's answer should get more attention. Johan Bockgård's solution does address the poster's complaint, insofar as it makes the ^M's invisible, but that just masks the underlying problem, and encourages further mixing of Unix and DOS line-endings.
The proper solution would be to do a global M-x replace-regexp to turn all line endings to DOS ones (or Unix, as the case may be). Then close and reopen the file (not sure if M-x revert-buffer would be enough) and the ^M's will either all be invisible, or all be gone.
You can change the display-table entry of the Control-M (^M) character, to make it displayable as whitespace or even disappear totally (vacuous). See the code in library pp-c-l.el (Pretty Control-L) for inspiration. It displays ^L chars in an arbitrary way.
Edited: Oops, I just noticed that #binOr already mentioned this method.
Put this in your .emacs:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward "\r" nil t) (replace-match "")))
Now you can simply call dos2unix and remove all the ^M characters.
If you encounter ^Ms in received mail in Gnus, you can use W c (wash CRs), or
(setq gnus-treat-strip-cr t)
what about using dos2unix, unix2dos (now tofrodos)?
sudeepdino008's answer did not work for me (I could not comment on his answer, so I had to add my own answer.).
I was able to fix it using this code:
(defun dos2unix ()
"Replace DOS eolns CR LF with Unix eolns CR"
(interactive)
(goto-char (point-min))
(while (search-forward (string ?\C-m) nil t) (replace-match "")))
Like binOr said add this to your %APPDATA%.emacs.d\init.el on windows or where ever is your config.
;; Windows EOL
(defun hide-dos-eol ()
"Hide ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M []))
(defun show-dos-eol ()
"Show ^M in files containing mixed UNIX and DOS line endings."
(interactive)
(setq buffer-display-table (make-display-table))
(aset buffer-display-table ?\^M ?\^M))
(add-hook 'text-mode-hook 'hide-dos-eol)
Related
I'm looking for a solution to apply code formatting and static code analysis on contents of current buffer in c++ mode. I'm planning to use AStyle and CppCheck. Both tools need to be executed on current code. For example if I'm editing foo.cpp the function should run
astyle --arg1 --argn foo.cpp
And
cppcheck --arg1 --arg2 foo.cpp
What I already tried is a simple function from here which is not working:
(defun astyle-this-buffer (pmin pmax)
(interactive "r")
(shell-command-on-region pmin pmax
"astyle" ;; add options here...
(current-buffer) t
(get-buffer-create "*Astyle Errors*") t))
Update:
I found that the above code is compatible with Emacs23 while I'm using 24. So I used this instead:
(defun reformat-code ()
(interactive)
(shell-command-on-region (point-min) (point-max)
"astyle --options=~/.astylerc" t t))
(global-set-key (kbd "C-x C-a") 'reformat-code)
Now it works and formats the code, though I can't find out how to save cursor's position and tell emacs to move that line.
It seems to me that reformatting tools like astyle will modify whitespace, but presumably nothing else. (It's possible this is mildly wrong, like if they reformat a C macro then they must modify backslashes as well -- but that can also be taken into account.)
So, the way I would approach this would be to count how many non-whitespace characters appear before (point), invoke astyle, revert the buffer (or whatever), and finally, starting from the start of the buffer, move forward that many non-whitespace characters.
This won't always be "the same", for example if point was in some whitespace that was modified -- but I think it ought to be reasonably close.
If you really want to just record the current line number and go back to that, you can use line-number-at-pos to get the current line number, and then (goto-char (point-min)) and use forward-line to get back to the line.
In my .emacs file, I use the line
'(setq coding-system-for-write 'iso-8859-1-unix)
to have Emacs save files in the iso-8859-1-unix encoding. When I
enter characters that cannot be encoded that way ("Łódź" for
example), I get prompted to select a different encoding, but upon
entering `iso-8859-1-unix' into the minibuffer, the file is saved and
the offending characters are lost.
If I just hit enter at the prompt, the file is saved in Unicode, and
when I close and reopen Emacs it is interpreted as a Unicode file
again. If I then remove the offending characters, save the file and
close and reopen Emacs another time, it is still interpreted as a
Unicode file -- from which I deduce that it has still been saved in
Unicode, even though saving in iso-8859-1-unix would have been
possible.
So is there a way to force Emacs to write a file in iso-8859-1
whenever possible, and never save it in that encoding if doing so
would gobble characters?
Many thanks in advance,
Thure Dührsen
As per the doc string for coding-system-for-write, you should not be setting it globally.
Perhaps what you are looking for is (prefer-coding-system 'iso-8859-1-unix)?
I'd try to write a save time hook function which would check the content of the buffer and set the encoding correspondingly. By using find-coding-system-region there shouldn't be much work.
Try
(setq-default buffer-file-coding-system 'iso-8859-1)
Edit:
Incorporating AProgrammer's suggestion, we get
(defun enforce-coding-system-priority ()
(let ((pref (car (coding-system-priority-list)))
(list (find-coding-systems-region (point-min) (point-max))))
(when (or (memq 'undecided list) (memq pref list))
(setq buffer-file-coding-system pref))))
(add-hook 'before-save-hook 'enforce-coding-system-priority)
(prefer-coding-system 'iso-8859-1)
The following should make Emacs ask when the buffer encoding is not that with which emacs is set up to save the file. Emacs will then prompt you to chose one among "safe" encodings.
(setq select-safe-coding-system-accept-default-p
'(lambda (coding)
(string=
(coding-system-base coding)
(coding-system-base buffer-file-coding-system))))
I am trying to use Hexl mode to manually remove some special characters from a text file and don't see how to delete anything in Hexl mode.
What I really want is to remove carriage return and keep linefeed characters.
Is Hexl mode the right way to do this?
No need to find replace. Just use.
M-x delete-trailing-whitespace
You can also set the file encoding through
C-x RET f unix
Oops. That ^J^M needs to be entered as two literal characters.
Use c-q c-j, c-q c-m and for the replacement string, use c-q c-j.
No need for hexl-mode for this. Just do a global-search-and-replace of ^J^M with ^J Works for me. :) Then save the file, kill the buffer, and revisit the file so the window shows the new file mode (Unix vs DOS).
There's also a command-line tool called unix2dos/dos2unix that exists specifically to convert line endings.
Assuming you want a DOS encoded file to be changed into UNIX encoding, use M-x set-buffer-file-coding-system (C-x RET f) to set the coding-system to "unix" and save the file.
If you want to remove a carriage return (usually displayed as ^M) and leave the line feed. You can just visit the file w/out any conversion:
M-x find-file-literally /path/to/file
Because a file with carriage returns is generally displayed in DOS mode (hiding the carriage returns). The mode line will likely display (DOS) on the left side.
Once you've done that, the ^M will show up and you can delete them like you would any character.
You don't need to use hexl-mode. Instead:
open file in a way that shows you those ^M's. See M-x find-file-literally /path/to/file above. In XEmacs you can also do C-u C-x C-f and select binary encoding.
select the string you want replace and copy it using M-w
do M-% (query replace) and paste what you want to copy using C-y
present Enter when prompted to what replace it with
possible press ! now to replace all occurrences
The point is that even if you don't how to enter what you are trying to replace, you can always select/copy it.
(in hexl mode) I'm not sure that you can delete characters. I've always converted them to spaces or some other character, switched to the regular text editor, and deleted them there.
I use this function:
(defun l/cr-sanitise ()
"Make sure current buffer uses unix-utf8 encoding.
If necessary remove superfluous ^M. Buffer will need to be saved
for changes to be permanent."
(interactive)
(set-buffer-file-coding-system 'utf-8-unix)
(delete-trailing-whitespace)
(message "Please save buffer to persist encoding changes."))
From http://www.xsteve.at/prg/emacs/xsteve-functions.el:
;02.02.2000
(defun xsteve-remove-control-M ()
"Remove ^M at end of line in the whole buffer."
(interactive)
(save-match-data
(save-excursion
(let ((remove-count 0))
(goto-char (point-min))
(while (re-search-forward (concat (char-to-string 13) "$") (point-max) t)
(setq remove-count (+ remove-count 1))
(replace-match "" nil nil))
(message (format "%d ^M removed from buffer." remove-count))))))
Add this to your .emacs and run it via M-x xsteve-remove-control-M or bind it to a easier key. It will strip the ^Ms in anymode.
I've had these functions in my .emacs.el file for years:
(defun dos2unix ()
"Convert a DOS formatted text buffer to UNIX format"
(interactive)
(set-buffer-file-coding-system 'undecided-unix nil))
(defun unix2dos ()
"Convert a UNIX formatted text buffer to DOS format"
(interactive)
(set-buffer-file-coding-system 'undecided-dos nil))
These functions allow me to easily switch between formats, but I'm not sure how to configure Emacs to write in one particular format by default regardless of which platform I'm using. As it is now, when I run on Windows, Emacs saves in Windows format; when I run in UNIX/Linux, Emacs saves in UNIX format.
I'd like to instruct Emacs to write in UNIX format regardless of the platform on which I'm running. How do I do this?
Should I perhaps add some text mode hook that calls one of these functions? For example, if I'm on Windows, then call dos2unix when I find a text file?
I've got a bunch of these in my .emacs:
(setq-default buffer-file-coding-system 'utf-8-unix)
(setq-default default-buffer-file-coding-system 'utf-8-unix)
(set-default-coding-systems 'utf-8-unix)
(prefer-coding-system 'utf-8-unix)
I don't know which is right, I am just superstitious.
I up-voted question and answer, but spent a couple minutes possibly improving on the info, so I'll add it.
First, I checked documentation on each variable and function in user181548's answer, by (first cutting and pasting into Emacs, then) putting cursor over each, and typing C-h v RET and C-h f RET respectively.
This suggested that I might only need
(prefer-coding-system 'utf-8-unix)
Experimenting with the other lines didn't seem to change pre-existing buffer encodings (typing C-h C RET RET to check (describe-coding-system) and g each time to refresh), so I omitted the other lines and made a key-binding to quickly change any old files that were still DOS, that is,
(defun set-bfr-to-8-unx ()
(interactive)
(set-buffer-file-coding-system
'utf-8-unix)
)
(global-set-key (kbd "C-c u")
'set-bfr-to-8-unx
)
For the curious, to discover the 3rd and 4th line of above function, (set-buffer-file-coding-system 'utf-8-unix), I used C-x RET f RET to manually change the current buffer's encoding, then M-x command-history RET to see how those keys translate to code.
Now maybe my git commit's will stop whining about CRs.
I'm writing a config file and I need to define if the process expects a windows format file or a unix format file. I've got a copy of the expected file - is there a way I can check if it uses \n or \r\n without exiting emacs?
If it says (DOS) on the modeline when you open the file on Unix, the line endings are Windows-style. If it says (Unix) when you open the file on Windows, the line endings are Unix-style.
From the Emacs 22.2 manual (Node: Mode Line):
If the buffer's file uses
carriage-return linefeed, the colon
changes to either a backslash ('\') or
'(DOS)', depending on the operating
system. If the file uses just
carriage-return, the colon indicator
changes to either a forward slash
('/') or '(Mac)'. On some systems,
Emacs displays '(Unix)' instead of the
colon for files that use newline as
the line separator.
Here's a function that – I think – shows how to check from elisp what Emacs has determined to be the type of line endings. If it looks inordinately complicated, perhaps it is.
(defun describe-eol ()
(interactive)
(let ((eol-type (coding-system-eol-type buffer-file-coding-system)))
(when (vectorp eol-type)
(setq eol-type (coding-system-eol-type (aref eol-type 0))))
(message "Line endings are of type: %s"
(case eol-type
(0 "Unix") (1 "DOS") (2 "Mac") (t "Unknown")))))
If you go in hexl-mode (M-x hexl-mode), you shoul see the line termination bytes.
Open the file in emacs using find-file-literally. If lines have ^M symbols at the end, it expects a windows format text file.
The following Elisp function will return nil if no "\r\n" terminators appear in a file (otherwise it returns the point of the first occurrence). You can put it in your .emacs and call it with M-x check-eol.
(defun check-eol (FILE)
(interactive "fFile: ")
(set-buffer (generate-new-buffer "*check-eol*"))
(insert-file-contents-literally FILE)
(let ((point (search-forward "\r\n")))
(kill-buffer nil)
point))