Modify Alt+f in Emacs for tex-mode - emacs

Alt+f in emacs when writing in tex mode seems to not include the . as part of the word. So how do I modify the alt+f behavior to remain the same exact when going forward if there is punctiation to include that as part of the word.
I have a separate file that loads for when writing in tex so I will just throw it in there so it doesn't affect normal emacs behavior.
Thanks for any help.
Thought of an addition to this but same related problem is when using Alt+d and deleting. Getting it to delete not only the word but also the punctation following eg.. (,.! etc..).

The following code should work for you:
(defun unpunctuate-syntax (str)
"Make the characters of the given string word characters."
(let ((st (copy-syntax-table (syntax-table))))
(dotimes (n (length str))
(modify-syntax-entry (elt str n) "w" st))
(set-syntax-table st)))
(defun dots-are-not-punctuation ()
(unpunctuate-syntax "."))
(add-hook 'TeX-mode-hook 'dots-are-not-punctuation)
The way M-f (the forward-word function) works is that it skips all characters in the buffer that have type "w" (ie word) in the current syntax table.
This code makes a modified syntax table and gives it to the buffer and the add-hook bit at the bottom sets it to run when you open a file in TeX-mode. (This method avoids you having to do the separate file thing you described).
You might notice that I make a copy of the syntax table rather than editing the one belonging to the TeX major mode. This is because I always get things wrong when playing with syntax tables and you can mess things up royally... This method means you just have to close the buffer and start again!

Related

How to disable underscore (_) subscripting in Emacs, TeX input method

On Emacs, while editing a text document of notes for myself (a .txt document, not a .tex document), I am using M-x set-input-method Ret TeX, in order to get easy access to various Unicode characters. So for example, typing \tospace causes a "→" to be inserted into the text, and typing x^2 causes "x2" to be inserted, because the font I am using has support for Unicode codepoints 0x2192 and 0x00B2, respectively.
One of the specially handled characters in the method is for the underscore key, _. However, the font I am using for Emacs does not appear to have support for the codepoints for the various subscript characters, such as subscript zero (codepoint 0x2080), and so when I type _0, I get something rendered as a thin blank in my output. I would prefer to just have the two characters _0 in this case.
I can get _0 by the awkward keystroke sequence _spacedel0, since the space keystroke in the middle of the sequence causes Emacs to abort the TeX input method. But this is awkward.
So, my question: How can I locally customize my Emacs to not remap the _ key at all when I am in the TeX input method? Or how can I create a modified clone (or extension, etc) of the TeX input method that leaves out underscore from its magic?
Things I have tried so far:
I have already done M-xdescribe-key on _; but it is just bound to self-insert-command, like many other text characters. I did see a post-self-insert-hook there, but I have not explored trying to use that to subvert the TeX input method.
Things I have not tried so far:
I have not tried learning anything about the input method architecture or its source code. From my quick purview of the code and methods. it did not seem like something I could quickly jump into.
So here is the solution I just found: Make a personalized copy of the TeX input method, with all of the undesirable entries removed. Then when using M-x set-input-method, select the personalized version instead of TeX.
I would have tried this earlier, but the built-in documentation for set-input-mode and its ilk does not provide sufficient guidance to the actual source for the input-methods for me to find it. It was only after doing another search on SO and finding this: Emacs: Can't activate input method that I was able to get enough information to do this on my own.
Details:
In Emacs, open /usr/share/emacs/22.1/leim/leim-list.el and find the entry for the input method you want to customize. The entry will be something like the following form:
(register-input-method
"TeX" "UTF-8" 'quail-use-package
"\\" "LaTeX-like input method for many characters."
"quail/latin-ltx")
Note the file name prefix referenced in the last element in the form above. Find the corresponding Elisp source file; in this case, it is a relative path to the file quail/latin-ltx.el[.gz]. Open that file in Emacs, and check it out; it should have the entries for the method remappings, both desired and undesired.
Make a user-local copy of that Elisp source file amongst your other Emacs customizations. Open that local copy in Emacs.
In your local copy, find the (quail-define-package ...) form in the file, and change the name of the package; I used FSK-TeX as my new name, like so:
(quail-define-package
"FSK-TeX" "UTF-8" "\\" t ;; <-- The first argument here is the important bit to change.
"LaTeX-like input method for many characters but not as many as you might think.
...)
Go through your local copy, and delete all the S-expressions for mappings that you don't want.
In your .emacs configuration file, register your customized input method, using a form analogous to the one you saw when you looked at leim-list.el in step 1:
(register-input-method
"FSK-TeX" "UTF-8" 'quail-use-package
"\\" "FSK-customized LaTeX-like input method for many characters."
"~/ConfigFiles/Elisp/leim/latin-ltx")
Restart Emacs and test your new input-method; in my case, by doing M-x set-input-method FSK-TeX, typing a_0, and confirming that a_0 shows up in the buffer.
So, there's at least one answer that is less awkward once you have it installed than some of the workarounds listed in the question (and as it turns out, are also officially documented in the Emacs 22 manual as a way to cut off input method processing).
However, I am not really happy with this solution, since I would prefer to inherit future changes to TeX mode, and just have my .emacs remove the undesirable entries on startup.
So I will wait to see if anyone else comes up with a better answer than this.
I did not test this myself, but this seems to be the exact thing you are looking for:
"How to disable underscore subscript in TeX mode in emacs" - source
Two solutions are given in this blogpot:
By the author of the blogpost: (setq font-lock-maximum-decoration nil) (from maximum)
Mentioned as comment:
(eval-after-load "tex-mode" '(fset 'tex-font-lock-subscript 'ignore))
The evil plugin for vim-like modal keybinding allows to map two subsequent presses of the _ key to the insertion of a single _ character:
(set-input-method 'TeX)
(define-key evil-insert-state-local-map (kbd "_ _")
(lambda () (interactive) (insert "_")))
(define-key evil-insert-state-local-map (kbd "^ ^")
(lambda () (interactive) (insert "^")))
When _ and then 1 is pressed, we get ₁ as before, but
when _ and then _ is pressed, we get _.
Analogous for ^.
As already explained in pnkfelix answer, it seems we have to make a personalized copy of the TeX input method. But here comes a lighter way to do that, without any file tweaking. Simply put the following in your .emacs :
(eval-after-load "quail/latin-ltx"
'(let ((pkg (copy-tree (quail-package "TeX"))))
(setcar pkg "MyTeX")
(assq-delete-all ?_ (nth 2 pkg))
(quail-add-package pkg)))
(set-input-method 'TeX)
(register-input-method "MyTeX" "UTF-8" 'quail-use-package "\\")
(set-input-method 'MyTeX)
The important part is the assq-delete-all line in the middle that remove all shortcut entries starting with _. It's a bit of a lisp hack but it seems to work. Since I'm also annoyed by the shortcuts starting with - and ^, I also use the following two lines to disable them :
(assq-delete-all ?- (nth 2 pkg))
(assq-delete-all ?^ (nth 2 pkg))
Note that afterwards you can M-x set-input-method at any time and indicate TeX or MyTeX to switch between the pristine TeX input method or the customized one.

Define a character as a word boundary

I've defined the \ character to behave as a word constituent in latex-mode, and I'm pretty happy with the results. The only thing bothering me is that a sequence like \alpha\beta gets treated as a single word (which is the expected behavior, of course).
Is there a way to make emacs interpret a specific character as a word "starter"? This way it would always be considered part of the word following it, but never part of the word preceding it.
For clarity, here's an example:
\alpha\beta
^ ^
1 2
If the point is at 1 and I press M-d, the string "\alpha" should be killed.
If the point is at 2 and I press M-<backspace>, the string "\beta" should be killed.
How can I achieve this?
Another thought:
Your requirement is very like what subword-mode provides for camelCase.
You can't customize subword-mode's behaviour -- the regexps are hard-coded -- but you could certainly copy that library and modify it for your purposes.
M-x find-library RET subword RET
That would presumably be a pretty robust solution.
Edit: updated from the comments, as suggested:
For the record, changing every instance of [[:upper:]] to [\\\\[:upper:]] in the functions subword-forward-internal and subword-backward-internal inside subword.el works great =) (as long as "\" is defined as "w" syntax).
Personally I would be more inclined to make a copy of the library than edit it directly, unless for the purpose of making the existing library a little more general-purpose, for which the simplest solution would seem to be to move those regexps into variables -- after which it would be trivial to have buffer-local modified versions for this kind of purpose.
Edit 2: As of Emacs 24.3 (currently a release candidate), subword-mode facilitates this with the new subword-forward-regexp and subword-backward-regexp variables (for simple modifications), and the subword-forward-function and subword-backward-function variables (for more complex modifications).
By making those regexp variables buffer-local in latex-mode with the desired values, you can just use subword-mode directly.
You should be able to implement this using syntax text properties:
M-: (info "(elisp) Syntax Properties") RET
Edit: Actually, I'm not sure if you can do precisely this?
The following (which is just experimentation) is close, but M-<backspace> at 2 will only delete "beta" and not the preceding "\".
I suppose you could remap backward-kill-word to a function which checked for that preceding "\" and killed that as well. Fairly hacky, but it would probably do the trick if there's not a cleaner solution.
I haven't played with this functionality before; perhaps someone else can clarify.
(modify-syntax-entry ?\\ "w")
(setq parse-sexp-lookup-properties t)
(setq syntax-propertize-function 'my-propertize-syntax)
(defun my-propertize-syntax (start end)
"Set custom syntax properties."
(save-excursion
(goto-char start)
(while (re-search-forward "\\w\\\\" end t)
(put-text-property
(1- (point)) (point) 'syntax-table (cons "." ?\\)))))

Stack overflow while generating tags completion table in emacs

I'm using GNU Emacs 23.3 on Windows. I work in a very large codebase for which I generate a TAGS file (using the etags binary supplied with Emacs). The TAGS file is quite large (usually hovers around 100MB). I rarely need to use any functionality beyond find-tag, but there are times when I wish I could do completion out of the TAGS table.
Calling complete-tag causes Emacs to make a completion table automatically. The process takes quite a bit of time, but my problem isn't in the amount of time it takes, but rather the fact that right at the end (around 100% completion), I get a stack overflow (sorry about the unprintable chars):
Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
re-search-forward("^\\(\\([^]+[^-a-zA-Z0-9_+*$:]+\\)?\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:]*\\)\\(\\([^\n]+\\)\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t)
etags-tags-completion-table()
byte-code(...)
tags-completion-table()
Has anyone else run into this? Know of a way to work around it?
EDIT: Stack output after turning on debug-on-error
EDIT: Removed stack, since I now know what the failing entries look like:
^L
c:\path\to\some\header.h,0
^L
c:\path\to\some\otherheader.h,0
My tags file contains quite a few entries in this format. Looking at the headers involved, it's clear that they couldn't be correctly parsed by etags. This is fine, but I'm surprised that tags-completion-table doesn't account for this format in its regex. For reference, here's what a real entry looks like:
^L
c:\path\to\some\validheader.h,115
class CSomeClass ^?12,345
bool SomeMethod(^?CSomeClass::SomeMethod^A67,890
The regexp in question is used to match a tag entry inside the TAGS file. I guess that the error can occur if the file is incorrectly formatted (e.g. using non-native line-endings), or if an entry simply is really, really large. (An entry is typically a line or two, which should not be a problem for the regexp matcher.)
One way of tracking down the problem is go to the TAGS buffer and see where the point (cursor) is, after the error has occurred. Once you know which function it is, and you could live without tags for it, you could simply avoid generating TAGS entries for it.
If the problem is due to too complex entry, I would suggest that you should send bug report to the Emacs team.
If you load the tags table (open the TAGS table with Emacs, then bury-buffer), try M-x dabbrev-expand (bound to M-/). If the present prefix is very common, you might end up running through many possible completions before reaching the desired one.
I don't use Windows, but on the Mac and Linux machines I use, I have not faced this issue.
This looks like a bug in Emacs, see:
https://groups.google.com/d/msg/gnu.emacs.help/Ew0sTxk0C-g/YsTPVEKTBAAJ
https://debbugs.gnu.org/db/20/20703.html
I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.
I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.
(defun etags-tags-completion-table () ; Doc string?
(let ((table (make-vector 511 0))
(progress-reporter
(make-progress-reporter
(format "Making tags completion table for %s..." buffer-file-name)
(point-min) (point-max))))
(save-excursion
(goto-char (point-min))
;; This monster regexp matches an etags tag line.
;; \1 is the string to match;
;; \2 is not interesting;
;; \3 is the guessed tag name; XXX guess should be better eg DEFUN
;; \4 is not interesting;
;; \5 is the explicitly-specified tag name.
;; \6 is the line to start searching at;
;; \7 is the char to start searching at.
(condition-case err
(while (re-search-forward
"^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:\177]*\\)\177\
\\(\\([^\n\001]+\\)\001\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n"
nil t)
(intern (prog1 (if (match-beginning 5)
;; There is an explicit tag name.
(buffer-substring (match-beginning 5) (match-end 5))
;; No explicit tag name. Best guess.
(buffer-substring (match-beginning 3) (match-end 3)))
(progress-reporter-update progress-reporter (point)))
table))
(error
(message "error happened near %d" (point))
(error (error-message-string err)))))
table))

Emacs C-mode indent problem with Doxygen style comment

I am having a problem with doxygen style multi-line comments with emacs indent feature in c-mode. According to doxygen manual (http://www.doxygen.nl/manual/docblocks.html) the form below is accepted.
/********************************************//**
* ... text
***********************************************/
I am trying to use this format in emacs but when I tab in on the line '* ... text' the * ends up below the /** at the end of the first line like so:
/********************************************//**
* ... text
***********************************************/
Any suggestions on how to fix this? Still learning all the in-and-outs of emacs.
The reason it is indenting as such is that (by default) multi-line comments are lined up with the start of the comment on the previous line. In this case, the start of the containing comment is in column 47.
Now, how to fix it. Here's how I figured out how to fix it, the solution is at the end.
First, there's the cc-mode manual, specifically the section on customizing indentation. A useful key is C-c C-s which tells you which syntax is being used for indentation. In this case it is ((c 61)) - the c is the important part for now.
To customize it interactively, you can type C-c C-o (when the point is on the line whose indentation you want to fix). You'll be prompted for which syntax entry you want to customize (defaults to c in this case b/c that's the current syntax), then you'll be prompted for what you want to change the syntax entry to (default is c-lineup-C-comments).
Now we can look at that function to see how we might customize it to meet your needs. M-x find-function c-lineup-C-comments.
That's where it gets more difficult. You can customize the way cc-mode handles comment indentation, but what it looks like you want it to do (in this case) is to recognize that the c-comment you're in is immediately preceded by another c-comment, and that comment is the one you want to align indentation to.
How do you do that? The easiest way I can think of is to advise 'c-lineup-C-comments to recognize this special case and change the value of its first argument to be what you want. My limited testing shows this works for your example:
(defadvice c-lineup-C-comments (before c-lineup-C-comments-handle-doxygen activate)
(let ((langelm (ad-get-arg 0)))
(save-excursion
(save-match-data
(goto-char (1+ (c-langelem-pos langelem)))
(if (progn
(beginning-of-line)
;; only when the langelm is of form (c . something)
;; and we're at a doxygen comment line
(and (eq 'c (car langelm))
(looking-at "^\\(\\s-*\\)/\\*+//\\*\\*$")))
;; set the goal position to what we want
(ad-set-arg 0 (cons 'c (match-end 1))))))))
The end result of this advice should be that the argument passed into c-lineup-C-comments should be transformed from (c . 61) to (c . 17) (or something like that), essentially fooling the routine into lining up with the comment at the beginning of the line, and not the comment which you're currently modifying.
Which version of emacs are you using? My emacs 22 has this problem, but on another machine with emacs 23 does not. This is probalby due to some "electric" indentation. Try M-x describe-key RET RET and also M-x describe-mode to get a nice place to start searching for clues. There is also http://doxymacs.sourceforge.net/ but I have not tesed it personally.

How do I run an Emacs hook when a buffer is modified?

Building on Getting Emacs to untabify when saving certain file types (and only those file types) , I'd like to run a hook to untabify my C++ files when I start modifying the buffer. I tried adding hooks to untabify the buffer on load, but then it untabifies all my writable files that are autoloaded when emacs starts.
(For those that wonder why I'm doing this, it's because where I work enforces the use of tabs in files, which I'm happy to comply with. The problem is that I mark up my files to tell me when lines are too long, but the regexp matches the number of characters in the line, not how much space the line takes up. 4 tabs in a line can push it far over my 132 character limit, but the line won't be marked appropriately. Thus, I need a way to tabify and untabify automatically.)
Take a look at the variable "before-change-functions".
Perhaps something along this line (warning: code not tested):
(add-hook 'before-change-functions
(lambda (&rest args)
(if (not (buffer-modified-p))
(untabify (point-min) (point-max)))))
Here is what I added to my emacs file to untabify on load:
(defun untabify-buffer ()
"Untabify current buffer"
(interactive)
(untabify (point-min) (point-max)))
(defun untabify-hook ()
(untabify-buffer))
; Add the untabify hook to any modes you want untabified on load
(add-hook 'nxml-mode-hook 'untabify-hook)
This answer is tangential, but may be of use.
The package wide-column.el link text changes the cursor color when the cursor is past a given column - and actually the cursor colors can vary depending on the settings. This sounds like a less intrusive a solution than your regular expression code, but it may not suit your needs.
And a different, tangential answer.
You mentioned that your regexp wasn't good enough to tell when the 132 character limit was met. Perhaps a better regexp...
This regexp will match a line when it has more than 132 characters, assuming a tabs width is 4. (I think I got the math right)
"^\\(?: \\|[^ \n]\\{4\\}\\)\\{33\\}\\(.+\\)$"
The last parenthesized expression is the set of characters that are over the limit. The first parenthesized expression is shy.