Stack overflow while generating tags completion table in emacs - emacs

I'm using GNU Emacs 23.3 on Windows. I work in a very large codebase for which I generate a TAGS file (using the etags binary supplied with Emacs). The TAGS file is quite large (usually hovers around 100MB). I rarely need to use any functionality beyond find-tag, but there are times when I wish I could do completion out of the TAGS table.
Calling complete-tag causes Emacs to make a completion table automatically. The process takes quite a bit of time, but my problem isn't in the amount of time it takes, but rather the fact that right at the end (around 100% completion), I get a stack overflow (sorry about the unprintable chars):
Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
re-search-forward("^\\(\\([^]+[^-a-zA-Z0-9_+*$:]+\\)?\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:]*\\)\\(\\([^\n]+\\)\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t)
etags-tags-completion-table()
byte-code(...)
tags-completion-table()
Has anyone else run into this? Know of a way to work around it?
EDIT: Stack output after turning on debug-on-error
EDIT: Removed stack, since I now know what the failing entries look like:
^L
c:\path\to\some\header.h,0
^L
c:\path\to\some\otherheader.h,0
My tags file contains quite a few entries in this format. Looking at the headers involved, it's clear that they couldn't be correctly parsed by etags. This is fine, but I'm surprised that tags-completion-table doesn't account for this format in its regex. For reference, here's what a real entry looks like:
^L
c:\path\to\some\validheader.h,115
class CSomeClass ^?12,345
bool SomeMethod(^?CSomeClass::SomeMethod^A67,890

The regexp in question is used to match a tag entry inside the TAGS file. I guess that the error can occur if the file is incorrectly formatted (e.g. using non-native line-endings), or if an entry simply is really, really large. (An entry is typically a line or two, which should not be a problem for the regexp matcher.)
One way of tracking down the problem is go to the TAGS buffer and see where the point (cursor) is, after the error has occurred. Once you know which function it is, and you could live without tags for it, you could simply avoid generating TAGS entries for it.
If the problem is due to too complex entry, I would suggest that you should send bug report to the Emacs team.

If you load the tags table (open the TAGS table with Emacs, then bury-buffer), try M-x dabbrev-expand (bound to M-/). If the present prefix is very common, you might end up running through many possible completions before reaching the desired one.
I don't use Windows, but on the Mac and Linux machines I use, I have not faced this issue.

This looks like a bug in Emacs, see:
https://groups.google.com/d/msg/gnu.emacs.help/Ew0sTxk0C-g/YsTPVEKTBAAJ
https://debbugs.gnu.org/db/20/20703.html
I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.
I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.
(defun etags-tags-completion-table () ; Doc string?
(let ((table (make-vector 511 0))
(progress-reporter
(make-progress-reporter
(format "Making tags completion table for %s..." buffer-file-name)
(point-min) (point-max))))
(save-excursion
(goto-char (point-min))
;; This monster regexp matches an etags tag line.
;; \1 is the string to match;
;; \2 is not interesting;
;; \3 is the guessed tag name; XXX guess should be better eg DEFUN
;; \4 is not interesting;
;; \5 is the explicitly-specified tag name.
;; \6 is the line to start searching at;
;; \7 is the char to start searching at.
(condition-case err
(while (re-search-forward
"^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:\177]*\\)\177\
\\(\\([^\n\001]+\\)\001\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n"
nil t)
(intern (prog1 (if (match-beginning 5)
;; There is an explicit tag name.
(buffer-substring (match-beginning 5) (match-end 5))
;; No explicit tag name. Best guess.
(buffer-substring (match-beginning 3) (match-end 3)))
(progress-reporter-update progress-reporter (point)))
table))
(error
(message "error happened near %d" (point))
(error (error-message-string err)))))
table))

Related

How do I font lock dollar signs (math mode delimiters) in AUCTeX buffer only outside comments?

I wrote the following code to highlight dollar signs in AUCTeX buffers in different colors, but then I found that it's even highlighting dollar signs in comments, which was unintended, but I am starting to like it. But now just for curiosity, I wonder if that can be avoided.
(defun my-LaTeX-mode-dollars ()
(font-lock-add-keywords
nil
`((,(rx "$") (0 'success t)))
t))
(add-hook 'LaTeX-mode-hook 'my-LaTeX-mode-dollars)
From the documentation of font-lock-keywords:
MATCH-HIGHLIGHT should be of the form:
(SUBEXP FACENAME [OVERRIDE [LAXMATCH]])
OVERRIDE and LAXMATCH are flags. If OVERRIDE is t, existing
fontification can be overwritten. If keep', only parts not already
fontified are highlighted. Ifprepend' or `append', existing
fontification is merged with the new, in which the new or existing
fontification, respectively, takes precedence.
In other words, if you drop the t after 'success, it will no longer fontify dollar signs in comments and strings.
EDIT:
Apparently, the above solution is not sufficient in this situation, probably because dollar signs have been colored using another face earlier.
One way that might work is to not pass the HOW parameter (currently t) to font-lock-add-keywords. This means that they should be added to the end of the list. However, this might cause other things to stop working.
If we need a bigger hammer, you can write a bit more advanced rule that inspects the current fontification, and decides what to do upon this. For example, the following is used by Emacs to add a warning face to parentheses placed at column 0 in strings:
"^\\s("
(0
(if
(memq
(get-text-property
(match-beginning 0)
'face)
'(font-lock-string-face font-lock-doc-face font-lock-comment-face))
(list 'face font-lock-warning-face 'help-echo "Looks like a toplevel defun: escape the parenthesis"))
prepend)
A third way to do this is to replace the regexp (rx "$") with the name of function that could search for $ and check that it appears in the correct context. One example of such font-lock rules can be found in the standard Emacs package cwarn.

Replace file at point in emacs

I found the function here for replace filen at point but it doesn't seem to work properly: http://www.emacswiki.org/emacs/InsertFileName. It correctly finds the file-at-point but the replace appears to take the file you input and not the file detected originally from what I can see. I'm not sure how to fix this.
If I Run the function on /home/testfile
First it says file to replace so for example: /home/secondfile
Then it says replace '/home/secondfile' with: /home/secondfile
and then it says: No file at point
Any ideas??
Here is the function:
(autoload 'ffap-guesser "ffap")
(autoload 'ffap-read-file-or-url "ffap")
(defun my-replace-file-at-point (currfile newfile)
"Replace CURRFILE at point with NEWFILE.
When interactive, CURRFILE will need to be confirmed by user
and will need to exist on the file system to be recognized,
unless it is a URL.
NEWFILE does not need to exist. However, Emacs's minibuffer
completion can help if it needs to be.
"
(interactive
(let ((currfile (ffap-read-file-or-url "Replace filename: "
(ffap-guesser))))
(list currfile
(ffap-read-file-or-url (format "Replace `%s' with: "
currfile) currfile))))
(save-match-data
(if (or (looking-at (regexp-quote currfile))
(let ((filelen (length currfile))
(opoint (point))
(limit (+ (point) (length currfile))))
(save-excursion
(goto-char (1- filelen))
(and (search-forward currfile limit
'noerror)
(< (match-beginning 0) opoint))
(>= (match-end 0) opoint))))
(replace-match newfile)
(error "No file at point to replace"))))
There are probably a few things wrong/going on here. The first is, your point position when you are executing this. The second is, if you are using /home/user/something, that there is a strong possibility you will have mismatch between /home/user/something and ~/something (ffap returns the latter while at the point you may have written the former).
First:
The use of looking-at with the regexp quoted filename expects the point to be at the beginning: e.g. |/home/user/something.
Its partner, looking-back expects /home/user/something|. Being somewhere in the middle will throw this error.
One quick fix for this is changing looking-at to thing-at-point-looking-at.
Second:
If you have written /home/user/something, ffap functions (in my case) shorten this using ~.
There are probably some settings that govern this, but the easiest, quick fix I know of is using expand-file-name. This will take care of the first case, and if it is written as ~/something, the save-excursion body will replace it in the alternate case.
The only negative result I see is that, you might sometimes replace:
/home/user/something with ~/somethingelse
But, anyways, these two quick fixes just result in this complete change:
(thing-at-point-looking-at (regexp-quote (expand-file-name currfile)))
can't see where "ffap-guesser" is defined. Looks like a bug.
Maybe try instead
"find-file-at-point"

fix an auto-complete-mode and linum-mode annoyance

I'm using auto-complete-mode which I think is totally fantastic. I'm also a big fan of linum-mode but I've got a very irritating issue when the two are used together, especially when I'm working in a new buffer (or a buffer with very few lines).
Basically the buffer is 'x' lines long but when auto-complete kicks in it "adds" lines to the buffer, so linum-mode keeps switching, for example, between displaying line numbers on one column or two columns, depending on whether auto-complete is suggesting a completion or not.
So you type a sentence and you see your buffer's content frantically shifting from left to right at every keypress. It is really annoying.
I take it the solution involves configuring the linum-format variable but I don't know how.
Ideally it would be great if my linum-format was:
dynamic
right-aligned
considering there are 'y' more lines to the buffer than what the buffer actually has
My rationale being that auto-complete shall not suggest more than 'y' suggestion and that, hence, the two shall start playing nicely together.
For example, if 'y' is set to 20 and my buffer has 75 lines, then linum should use two columns: because no matter where I am auto-complete shall not make the buffer 'bigger' than 99 lines.
On the contrary, if 'y' is still set to 20 and my buffer has 95 lines, then linum should use three columns because otherwise if I'm near the end of the buffer and auto-complete kicks in my buffer shall start "wobbling" left and right when I type.
I'd rather not hardcode "3 columns wide" for linum.
I guess using "dynamic but always at least two columns" would somehow fix most annoyances but still something as I described would be great.
P.S: I realize that my 'fix' would imply that linum would always display on at least two columns, and I'm fine with that... As long as it stays right-aligned and use 2, 3 or 4 columns depending on the need.
Simply put the following line in .emacs which resolves this issue. It is in auto-complete.el.
(ac-linum-workaround)
I've written a couple of previous answers on modifying the linum-mode output, which you could probably adapt to your purposes.
Relative Line Numbers In Emacs
Colorize current line number
Edit: Here's the most basic version of that code (also on EmacsWiki, albeit somewhat buried), which doesn't modify the default output at all, but uses the techniques from those other answers to be more efficient than the default code. That's probably a more useful starting point for you.
(defvar my-linum-format-string "%4d")
(add-hook 'linum-before-numbering-hook 'my-linum-get-format-string)
(defun my-linum-get-format-string ()
(let* ((width (length (number-to-string
(count-lines (point-min) (point-max)))))
(format (concat "%" (number-to-string width) "d")))
(setq my-linum-format-string format)))
(setq linum-format 'my-linum-format)
(defun my-linum-format (line-number)
(propertize (format my-linum-format-string line-number) 'face 'linum))
Just have the same problem, after seeing 'patching the source' I believe it could be done with advice. Here is what I come up with
(defadvice linum-update
(around tung/suppress-linum-update-when-popup activate)
(unless (ac-menu-live-p)
ad-do-it))
I would like to use popup-live-p as mentioned but unfortunately it requires the variable for the popup, which we couldn't know in advance.
Update:
I ended up patching the source for linum.el. I added an extra hook that runs before updates.
Here's the patched file: linum.el (github)
Here's the code I have in my init.el:
;; Load custom linum.
(load-file "~/.emacs.d/linum.el")
;; Suppress line number updates while auto-complete window
;; is displayed.
(add-hook 'linum-before-update-hook
'(lambda ()
(when auto-complete-mode
(if (ac-menu-live-p)
(setq linum-suppress-updates t)
(setq linum-suppress-updates nil)))))
Hope it helps!

Modify Alt+f in Emacs for tex-mode

Alt+f in emacs when writing in tex mode seems to not include the . as part of the word. So how do I modify the alt+f behavior to remain the same exact when going forward if there is punctiation to include that as part of the word.
I have a separate file that loads for when writing in tex so I will just throw it in there so it doesn't affect normal emacs behavior.
Thanks for any help.
Thought of an addition to this but same related problem is when using Alt+d and deleting. Getting it to delete not only the word but also the punctation following eg.. (,.! etc..).
The following code should work for you:
(defun unpunctuate-syntax (str)
"Make the characters of the given string word characters."
(let ((st (copy-syntax-table (syntax-table))))
(dotimes (n (length str))
(modify-syntax-entry (elt str n) "w" st))
(set-syntax-table st)))
(defun dots-are-not-punctuation ()
(unpunctuate-syntax "."))
(add-hook 'TeX-mode-hook 'dots-are-not-punctuation)
The way M-f (the forward-word function) works is that it skips all characters in the buffer that have type "w" (ie word) in the current syntax table.
This code makes a modified syntax table and gives it to the buffer and the add-hook bit at the bottom sets it to run when you open a file in TeX-mode. (This method avoids you having to do the separate file thing you described).
You might notice that I make a copy of the syntax table rather than editing the one belonging to the TeX major mode. This is because I always get things wrong when playing with syntax tables and you can mess things up royally... This method means you just have to close the buffer and start again!

How do I run an Emacs hook when a buffer is modified?

Building on Getting Emacs to untabify when saving certain file types (and only those file types) , I'd like to run a hook to untabify my C++ files when I start modifying the buffer. I tried adding hooks to untabify the buffer on load, but then it untabifies all my writable files that are autoloaded when emacs starts.
(For those that wonder why I'm doing this, it's because where I work enforces the use of tabs in files, which I'm happy to comply with. The problem is that I mark up my files to tell me when lines are too long, but the regexp matches the number of characters in the line, not how much space the line takes up. 4 tabs in a line can push it far over my 132 character limit, but the line won't be marked appropriately. Thus, I need a way to tabify and untabify automatically.)
Take a look at the variable "before-change-functions".
Perhaps something along this line (warning: code not tested):
(add-hook 'before-change-functions
(lambda (&rest args)
(if (not (buffer-modified-p))
(untabify (point-min) (point-max)))))
Here is what I added to my emacs file to untabify on load:
(defun untabify-buffer ()
"Untabify current buffer"
(interactive)
(untabify (point-min) (point-max)))
(defun untabify-hook ()
(untabify-buffer))
; Add the untabify hook to any modes you want untabified on load
(add-hook 'nxml-mode-hook 'untabify-hook)
This answer is tangential, but may be of use.
The package wide-column.el link text changes the cursor color when the cursor is past a given column - and actually the cursor colors can vary depending on the settings. This sounds like a less intrusive a solution than your regular expression code, but it may not suit your needs.
And a different, tangential answer.
You mentioned that your regexp wasn't good enough to tell when the 132 character limit was met. Perhaps a better regexp...
This regexp will match a line when it has more than 132 characters, assuming a tabs width is 4. (I think I got the math right)
"^\\(?: \\|[^ \n]\\{4\\}\\)\\{33\\}\\(.+\\)$"
The last parenthesized expression is the set of characters that are over the limit. The first parenthesized expression is shy.