emacs shell mode input buffer size - emacs

There is a problem with shell mode and comint.el described in this bug report when entering long lines. I tried to change the TERM env variable settings to emacs but it didn't fix the problem. Any clue where I should look in comint.el?

I'd encountered this in the past and the following piece of advice works well for me:
(defadvice process-send-string (around process-send-string-in-chunks act)
"break the string to be sent into chunks
This avoids some bug when sending long strings that causes a
^D character to be inserted. Breaking the string into chunks that
are 200 chars long (arbitrary value) avoids the problem."
(let ((str (ad-get-arg 1))
substr
(chunk-len 200))
(while (> (length str) 0)
(setq substr (substring str 0 (min (length str) chunk-len)))
(ad-set-arg 1 substr)
ad-do-it
(setq str (substring str (length substr) (length str))))))
It solves the problem by breaking the string into 200 character chunks.

I just ran across this limitation in shell mode... in my case, the problem was related to shell-mode appending --noediting when the shell is 'bash' (/bin/sh doesn't receive this argument).
The --noediting arg (same as set +o emacs +o vi in bash) meant bash wasn't using readline to process input and had some internal buffer-size limitation. To reproduce outside emacs, start bash --noediting -i and then just start typing... after some limit (1024 characters in my case), the input is just blocked...
Additionally, bash internally disables editing if TERM is "emacs" or "dumb", so the TERM entry needs to be something else (I chose "xterm", you could use "ansi" or "vt100" too)
The easy workaround for .emacs is:
(setq explicit-bash-args '("-i"))
(setq comint-terminfo-terminal "xterm")
NOTE: you can't use shell-mode-hook as it runs after bash-args is used.
This does break tab-completion, and may break with colorized prompts or similar edge cases, but I can finally paste my insanely long compile lines without the shell rejecting them. You may find a need to run 'set -o emacs' (or vi) from time to time, as some actions seem to reset noediting...

Related

Emacs ediff error "no newline at end of file"

On Debian Wheezy, Emacs 23.3.1, running ediff-files with a file that is missing a newline at the end results in the error \ No newline at end of file (I hope that's the correct translation; it's German \ Kein Zeilenumbruch am Dateiende. on my computer.)
Is it possible to have just a warning instead, so that I can see the diff and work on it (and fix the missing newline)? It's just a bit tedious to first have ediff fail, then open the file, add the newline, ediff again.
Try changing the value of the variable ediff-diff-ok-lines-regexp to include the German text ("Kein Zeilenumbruch am Dateiende"):
(setq ediff-diff-ok-lines-regexp
(concat
"^\\("
"[0-9,]+[acd][0-9,]+\C-m?$"
"\\|[] "
"\\|---"
"\\|.*Warning *:"
"\\|.*No +newline"
"\\|.*missing +newline"
"\\|.*Kein +Zeilenumbruch +am +Dateiende"
"\\|^\C-m?$"
"\\)"))
Update: Looking at the source code, it does seem that Ediff doesn't make any attempt to deal with the issue of localization of messages from diff. It should also be possible to work around this by wrapping diff in a shell script, e.g:
#!/bin/bash
LANG=C diff $*
..then customising the ediff-diff-program to call the wrapper instead:
(setq ediff-diff-program "~/bin/my-diff.sh")
Other code in the Emacs source directory lisp/vc does seem to handle this, for example vc-hg-state:
(defun vc-hg-state (file)
"Hg-specific version of `vc-state'."
...
(with-output-to-string
(with-current-buffer
standard-output
(setq status
(condition-case nil
;; Ignore all errors.
(let ((process-environment
;; Avoid localization of messages so we
;; can parse the output.
(append (list "TERM=dumb" "LANGUAGE=C")
process-environment)))
...
It seems a bit strange that Ediff doesn't also do this, but perhaps I'm missing something.
Ok, I found out what's wrong, and sadly, it's quite obvious: my environment has LANG=de, therefore when Emacs invokes diff, the warning message is returned in German as well, and Emacs, not recognising this “unkown” message, fails.
Starting emacs with LANG=C emacs works around this problem. However, I consider it a (quite silly) bug of emacs to make assumption on the user's language being English.

Modify Alt+f in Emacs for tex-mode

Alt+f in emacs when writing in tex mode seems to not include the . as part of the word. So how do I modify the alt+f behavior to remain the same exact when going forward if there is punctiation to include that as part of the word.
I have a separate file that loads for when writing in tex so I will just throw it in there so it doesn't affect normal emacs behavior.
Thanks for any help.
Thought of an addition to this but same related problem is when using Alt+d and deleting. Getting it to delete not only the word but also the punctation following eg.. (,.! etc..).
The following code should work for you:
(defun unpunctuate-syntax (str)
"Make the characters of the given string word characters."
(let ((st (copy-syntax-table (syntax-table))))
(dotimes (n (length str))
(modify-syntax-entry (elt str n) "w" st))
(set-syntax-table st)))
(defun dots-are-not-punctuation ()
(unpunctuate-syntax "."))
(add-hook 'TeX-mode-hook 'dots-are-not-punctuation)
The way M-f (the forward-word function) works is that it skips all characters in the buffer that have type "w" (ie word) in the current syntax table.
This code makes a modified syntax table and gives it to the buffer and the add-hook bit at the bottom sets it to run when you open a file in TeX-mode. (This method avoids you having to do the separate file thing you described).
You might notice that I make a copy of the syntax table rather than editing the one belonging to the TeX major mode. This is because I always get things wrong when playing with syntax tables and you can mess things up royally... This method means you just have to close the buffer and start again!

Stack overflow while generating tags completion table in emacs

I'm using GNU Emacs 23.3 on Windows. I work in a very large codebase for which I generate a TAGS file (using the etags binary supplied with Emacs). The TAGS file is quite large (usually hovers around 100MB). I rarely need to use any functionality beyond find-tag, but there are times when I wish I could do completion out of the TAGS table.
Calling complete-tag causes Emacs to make a completion table automatically. The process takes quite a bit of time, but my problem isn't in the amount of time it takes, but rather the fact that right at the end (around 100% completion), I get a stack overflow (sorry about the unprintable chars):
Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
re-search-forward("^\\(\\([^]+[^-a-zA-Z0-9_+*$:]+\\)?\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:]*\\)\\(\\([^\n]+\\)\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t)
etags-tags-completion-table()
byte-code(...)
tags-completion-table()
Has anyone else run into this? Know of a way to work around it?
EDIT: Stack output after turning on debug-on-error
EDIT: Removed stack, since I now know what the failing entries look like:
^L
c:\path\to\some\header.h,0
^L
c:\path\to\some\otherheader.h,0
My tags file contains quite a few entries in this format. Looking at the headers involved, it's clear that they couldn't be correctly parsed by etags. This is fine, but I'm surprised that tags-completion-table doesn't account for this format in its regex. For reference, here's what a real entry looks like:
^L
c:\path\to\some\validheader.h,115
class CSomeClass ^?12,345
bool SomeMethod(^?CSomeClass::SomeMethod^A67,890
The regexp in question is used to match a tag entry inside the TAGS file. I guess that the error can occur if the file is incorrectly formatted (e.g. using non-native line-endings), or if an entry simply is really, really large. (An entry is typically a line or two, which should not be a problem for the regexp matcher.)
One way of tracking down the problem is go to the TAGS buffer and see where the point (cursor) is, after the error has occurred. Once you know which function it is, and you could live without tags for it, you could simply avoid generating TAGS entries for it.
If the problem is due to too complex entry, I would suggest that you should send bug report to the Emacs team.
If you load the tags table (open the TAGS table with Emacs, then bury-buffer), try M-x dabbrev-expand (bound to M-/). If the present prefix is very common, you might end up running through many possible completions before reaching the desired one.
I don't use Windows, but on the Mac and Linux machines I use, I have not faced this issue.
This looks like a bug in Emacs, see:
https://groups.google.com/d/msg/gnu.emacs.help/Ew0sTxk0C-g/YsTPVEKTBAAJ
https://debbugs.gnu.org/db/20/20703.html
I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.
I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.
(defun etags-tags-completion-table () ; Doc string?
(let ((table (make-vector 511 0))
(progress-reporter
(make-progress-reporter
(format "Making tags completion table for %s..." buffer-file-name)
(point-min) (point-max))))
(save-excursion
(goto-char (point-min))
;; This monster regexp matches an etags tag line.
;; \1 is the string to match;
;; \2 is not interesting;
;; \3 is the guessed tag name; XXX guess should be better eg DEFUN
;; \4 is not interesting;
;; \5 is the explicitly-specified tag name.
;; \6 is the line to start searching at;
;; \7 is the char to start searching at.
(condition-case err
(while (re-search-forward
"^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:\177]*\\)\177\
\\(\\([^\n\001]+\\)\001\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n"
nil t)
(intern (prog1 (if (match-beginning 5)
;; There is an explicit tag name.
(buffer-substring (match-beginning 5) (match-end 5))
;; No explicit tag name. Best guess.
(buffer-substring (match-beginning 3) (match-end 3)))
(progress-reporter-update progress-reporter (point)))
table))
(error
(message "error happened near %d" (point))
(error (error-message-string err)))))
table))

elisp compile, add a regexp to error detection

I am starting with emacs, and don't know much elisp. Nearly nothing, really.
I want to use ack as a replacement of grep.
These are the instructions I followed to use ack from within emacs:
http://www.rooijan.za.net/?q=ack_el
Now I don't like the output format that is used in this el file, I would like the output to be that of ack --group.
So I changed:
(read-string "Ack arguments: " "-i" nil "-i" nil)
to:
(read-string "Ack arguments: " "-i --group" nil "-i --group" nil)
So far so good.
But this made me lose the ability to click-press_enter on the rows of the output buffer. In the original behaviour, compile-mode was used to be able to jump to the selected line.
I figured I should add a regexp to the ack-mode. The ack-mode is defined like this:
(define-compilation-mode ack-mode "Ack"
"Specialization of compilation-mode for use with ack."
nil)
and I want to add the regexp [0-9]+: to be detected as an error too, since it is what every row of the output bugger includes (line number).
I've tried to modify the define-compilation-modeabove to add the regexp, but I failed miserably.
How can I make the output buffer of ack let me click on its rows?
--- EDIT, I tried also: ---
(defvar ack-regexp-alist
'(("[0-9]+:"
2 3))
"Alist that specifies how to match rows in ack output.")
(setq compilation-error-regexp-alist
(append compilation-error-regexp-alist
ack-regexp-alist))
I stole that somewhere and tried to adapt to my needs. No luck.
--- EDIT, result after Ivan's proposal ---
With ack.el updated to include:
(defvar ack-regexp-alist
'(("^[0-9]+:" ;; match the line number
nil ;; the file is not found on this line, so assume that it's the same as before
0 ;; The line is the 0'th subexpression (the whole thing)
)
("^[^: ]+$" ;; match a file -- this could be better
0 ;; The file is the 0'th subexpression
))
"Alist that specifies how to match rows in ack output.")
(setq compilation-error-regexp-alist
(append compilation-error-regexp-alist
ack-regexp-alist))
(define-compilation-mode ack-mode "Ack"
"Specialization of compilation-mode for use with ack."
nil)
Then checking the compilation-error-regext-alist variable, I get the value:
(absoft ada aix ant bash borland caml comma edg-1 edg-2 epc ftnchek iar ibm irix java jikes-file jikes-line gnu gcc-include lcc makepp mips-1 mips-2 msft oracle perl rxp sparc-pascal-file sparc-pascal-line sparc-pascal-example sun sun-ada 4bsd gcov-file gcov-header gcov-nomark gcov-called-line gcov-never-called
("^[0-9]+:" nil 0)
("^[^: ]+$" 0))
I find the format of the variable very strange, isn't it? I don't know elisp (yet), so maybe it's correct that way.
Still no links or color in the *ack* buffer.
There is another full-ack package up on ELPA which I have used before and handles --group output.
That said, reading the documentation for compilation-error-regexp-alist you see that it has the form:
(REGEXP FILE [LINE COLUMN TYPE HYPERLINK HIGHLIGHT...])
In the case of --group output, you have to match file and line separately, so I think you want something like (untested)
(defvar ack-regexp-alist
'(("^\\S +$" ;; match a file -- this could be better
0 ;; The file is the 1st subexpression
)
("^[0-9]+:" ;; match the line number
nil ;; the file is not found on this line, so assume that it's the same as before
0 ;; The line is the 0'th subexpression (the whole thing)
))
"Alist that specifies how to match rows in ack output.")
-- Updated --
The variable compilation-error-regext-alist is a list of symbols or elements like (REGEXP ...). Symbols are looked up in compilation-error-regexp-alist-alist to find the corresponding elements. So yes, it is a little weird, but it's easier to see what's turned on and off without having to look at ugly regexes and guess what they do. If you were going to distribute this I would suggest adding the regex to compilation-error-regexp-alist-alist and then turning it on in compilation-error-regext-alist, but that is somewhat moot until you get it to work correctly.
Looking more closely at ack.el, I notice that it uses
(let (compile-command
(compilation-error-regexp-alist grep-regexp-alist)
...)
...
)
In other words it locally overwrites compilation-error-regexp-alist with grep-regexp-alist, so you need to add the regexes there instead. Or even better might be to replace it with
(let (compile-command
(compilation-error-regexp-alist ack-regexp-alist)
...)
...
)
In the end I still recommend full-ack since the filename regex does not seem to be working correctly. It seems more complete (though more complicated), and I have been happy with it.

How can I set the encoding of shell-command-on-region output?

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):
(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start
end
"perltidy"
t
t
(get-buffer-create "*Perltidy Output*")))
(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
(setq start (region-beginning)
end (region-end))
(setq start (point-min)
end (point-max)))
(perltidy-command start end)
(goto-char point)))
(global-set-key "\C-ct" 'perltidy-dwim)
I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.
Is there any way I can fix that?
EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!
Below are from shell-command-on-region document
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.
During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.
If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.
Value: (("\\.dz\\'" no-conversion . no-conversion)
...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
...
("" undecided))
Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,
for example:
(setq default-process-coding-system '(utf-8 . utf-8))
(If input is encoded as utf-8, then output encoded as utf-8)
Or
(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))
I also wrote a post about this if you want details.
Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):
To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command. By default, the input (from the current buffer)
is encoded in the same coding system that will be used to save the file,
`buffer-file-coding-system'. If the output is going to replace the region,
then it is decoded from that same coding system.
The noninteractive arguments are START, END, COMMAND,
OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER.
Noninteractive callers can specify coding systems by binding
`coding-system-for-read' and `coding-system-for-write'.
In other words, you'd do something like
(let ((coding-system-for-read 'utf-8-unix))
(shell-command-on-region ...) )
This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.
Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.