Emacs web mode selection word delimiters include _

Emacs web mode selection word delimiters include _ - emacs

I'm unsure of how to properly describe this question, but here goes:
In emacs with I double-click to select a word, something determines which characters to select -- what is this called? (In Terminal profile preference, this is called select-by-word characters, so I'll use that phrase.)
Without web-mode, for example, if I double click the word title in image_title, it highlights only title -- that is, the underscore is recognized as a select-by-word delimiter.
Once I enabled web-mode, the behavior of select-by-word changes, and underscore is no longer a word delimiter. In the previous example, double-clicking now highlights the entire image_title. This irritates me, as I commonly want to select portions of an underscore-delimited-identifier. (In general, I'd prefer any mode not to change the default selection behavior.)
What is the option to change this behavior of web-mode?
Edit to add: in my preferred mode, if I double-click on the _ character, it does select the entire word including underscores. I like this subtle but precise control of the selection behavior.

#lawlist - thanks for your suggestions! I wasn't able to follow the functions through entirely, but it did lead me along the correct path:
found double-mouse-1, searched google
found the mouse word selection uses a global forward-word functions
that page mentioned syntax-table, you mention syntax-table, and what do we have here web-mode.el contains a syntax-table (and a reference to bug #377.)
web-mode's syntax table contains, roughly line 1969 in v 11.2.2:
(defvar web-mode-syntax-table
(let ((table (make-syntax-table)))
(modify-syntax-entry ?_ "w" table)
(modify-syntax-entry ?< "." table)
(modify-syntax-entry ?> "." table)
(modify-syntax-entry ?& "." table)
(modify-syntax-entry ?/ "." table)
(modify-syntax-entry ?= "." table)
(modify-syntax-entry ?% "." table)
table)
"Syntax table used to reveal whitespaces.")
I assume the ?_ "w" means it's treating _ as a word character. I changed the "w" to "." and underscore is now being treated as a word boundary like I wanted!
Hooray for stack overflow, google, and github.
Also, now that I've got some correct keywords, Stack Overflow suggests this potentially helpful related answer: Changing Emacs Forward-Word Behaviour

Related

Emacs Major Mode - Keywords "special char" and "one char" Keywords

i want to write a major mode for emacs which should do syntax highlighting for mml (music macro language) keywords. I followed this tutorial:
http://ergoemacs.org/emacs/elisp_syntax_coloring.html
here is my current code
(under x-events there are still placeholders, and x-functions I haven't adjusted yet and took over from the tutorial):
;;
;; to install this mode, put the following lines
;; (add-to-list 'load-path "~/.emacs.d/lisp/")
;; (load "mml-mode.el")
;; into your init.el file and activate it with
;; ALT+X mml-mode RET
;;
;; create the list for font-lock.
;; each category of keyword is given a particular face
(setq mml-font-lock-keywords
(let* (
;; define several category of keywords
(x-keywords '("#author" "#title" "#game" "#comment"))
(x-types '("&" "?" "/" "=" "[" "]" "^" "<" ">"))
(x-constants '("w" "t" "o" "#" "v" "y" "h" "q" "p" "n" "*" "!"))
(x-events '("#" "##" "ooo" "oooo"))
(x-functions '("llAbs" "llAcos" "llAddToLandBanList"
"llAddToLandPassList"))
;; generate regex string for each category of keywords
(x-keywords-regexp (regexp-opt x-keywords 'words))
(x-types-regexp (regexp-opt x-types 'words))
(x-constants-regexp (regexp-opt x-constants 'words))
(x-events-regexp (regexp-opt x-events 'words))
(x-functions-regexp (regexp-opt x-functions 'words)))
`(
(,x-types-regexp . font-lock-type-face)
(,x-constants-regexp . font-lock-constant-face)
(,x-events-regexp . font-lock-builtin-face)
(,x-functions-regexp . font-lock-function-name-face)
(,x-keywords-regexp . font-lock-keyword-face)
)))
;;;###autoload
(define-derived-mode mml-mode text-mode "mml mode"
"Major mode for editing mml (Music Macro Language)"
;; code for syntax highlighting
(setq font-lock-defaults '((mml-font-lock-keywords))))
;; add the mode to the `features' list
(provide 'mml-mode)
But now there are two problems:
First, I have several keywords that start with a # (e.g. #author). But the # doesn't seem to work, because if I leave it out, it works.
(x-keywords '("#author"))
does not work.
(x-keywords '("author"))
works, but the # is not colored. The same problem also occurs with the #. Possibly also with others, but I'll try to get them working one by one.
second, a keyword seems to need at least two letters.
(x-keywords '("o"))
does not work.
(x-keywords '("oo"))
works.
But I have several "keywords" which are followed by only one letter and two (arbitrary) hex numbers (0-F) (e.g. o7D)
How can I specify that these one letter keywords are found? (preferably together with the number, but no must).

Both problems arise from the same issue: it has to do with the way you construct the regular expressions:
(regexp-opt x-blabla 'words)
The problem is the 'words parameter. What this does is to enclose the generated regular expression in a \< ... \> pair. According to the Emacs manual, these special character classes are defined as follows:
\<
matches the empty string, but only at the beginning of a word.
‘\<’ matches at the beginning of the buffer only if a word-constituent
character follows.
\>
matches the empty string, but only at the end of a word.
‘\>’ matches at the end of the buffer only if the contents end with a
word-constituent character.
Now, what does "beginning of a word" mean to Emacs? That is mode-dependent. In fact, every major mode defines its own syntax-table which is a mapping of characters to syntax codes. There are a number of pre-defined classes, and one of them is "w" which defines a character as a word constituent. Normally, a text-based mode would define the letters a...z and A...Z to have the syntax code "w", but perhaps also other characters (e.g. a hyphen -).
Okay, back to the problem at hand. For, say x-keywords, the resulting x-keywords-regexp according to your definition is:
"\\<\\(#\\(?:author\\|comment\\|\\(?:gam\\|titl\\)e\\)\\)\\>"
(Note that inside strings, the backslash is a special character used to escape other special characters, e.g., \n or \t. So in order to encode a simple backslash itself, you have to quote it with another backslash.)
As discussed above, we see \< and \> (or, in string parlance: "\\<" and "\\>") at the beginning and the end of the regexp respectively. But, as we've just learned, in order for this regexp to match, both the first and the last character of the potential match need to have word-constituent syntax.
The letters are uncritical, but let's check the syntax code for # by typing C-h s:
The parent syntax table is:
C-# .. C-h . which means: punctuation
TAB .. C-j which means: whitespace
C-k . which means: punctuation
C-l .. RET which means: whitespace
C-n .. C-_ . which means: punctuation
SPC which means: whitespace
! . which means: punctuation
" " which means: string
# . which means: punctuation
...
(Obviously truncated.)
And there you have it! The # character does not have word constituent syntax, it is considered a punctuation.
But we can change that by putting the following line into the definition of your major-mode:
(modify-syntax-entry ?# "w" mml-mode-syntax-table)
?# is how chars are encoded in Emacs lisp (think '#' in C).
Regarding the second part of your question, in order to match something like o75, we'd have to do something similar: define all numbers to be word constituents:
(modify-syntax-entry '(?0 . ?9) "w" mml-mode-syntax-table)
However, we'd also need to write an appropriate regular expression to match such keywords. The regexp itself is not difficult:
"o[0-9A-F]\\{2\\}"
However, where to put that? Since it is already a regexp, we cannot simply add it to x-keywords because that is a list of simple strings.
However, we can concatenate it to x-keywords-regexp instead, by changing the respective line in your above code to read like this:
(x-keywords-regexp (concat (regexp-opt x-keywords 'words)
"\\|\\<[o][0-9A-F]\\{2\\}\\>"))
Note the "\\|" at the beginning of the string parameter, which is the regexp syntax for alternative matches.

Emacs 26.3: changing the definition of "a word" when deleting words, moving by the word, etc

I'd like commands like alt-backspace, alt-B, alt-F etc. treat ranges of at least maybe 4 spaces as a word, or perhaps if it's a range of whitespace that includes a newline...
Any pointers for making such a modification?
Perhaps a simpler request: add underscore to set of characters recognized as a word, as C++ symbols often have underscores.

If you don't have subword-mode enabled, you can change the syntax of the characters you want to jump over, eg. _ by wrapping any command you want in with-syntax-table, here wrapping forward-word,
(defvar my-syntax-table
(let ((tab (copy-syntax-table)))
(modify-syntax-entry ?_ "w" tab)
tab))
(define-advice forward-word (:around (fn &rest args) "modified-syntax")
(let (inhibit-field-text-motion)
(with-syntax-table my-syntax-table
(apply fn args))))
With subword-mode, it looks like you could modify the subword-forward-regexp and subword-backward-regexp variables to include the additional characters, eg. four spaces could be an addition of " \\{4\\}".

Hydra disable interpretation of prefix argument

For a long time I've used a common emacs hydra to navigate expressions, along the lines of
(defhydra hydra-word (:color red) "word"
("M-f" forward-word)
("M-b" backward-word)
("f" forward-word)
("b" backward-word)
;; etc..
)
But an annoying issue I always have: pressing a number is interpreted as a prefix argument when I always mean to simply insert a number. I looked through the hydra wiki, but couldn't find an answer to disable prefix interpretation. I know I can write a ("1" self-insert-command nil :exit t) for each number, but that's dumb and results in a bunch of extra functions created.
How can I disable interpretation of prefix arg during an active hydra? And, I guess more generally is there a way to temporarily disable interpretation of prefix arguments.

After looking through the code I found you can override hydras base map which is like universal-argument-map. So, to implement the above with only C-u starting a prefix, but all numbers and - self-inserting, the following works
(defhydra hydra-word (:color red :base-map (make-sparse-keymap)) "word"
("M-f" forward-word)
("M-b" backward-word)
("f" forward-word)
("b" backward-word)
;; etc..
)

Avoid font-locking interfering inside of comments

In my font-lock-defaults I have:
("\\(^\\| \\|\t\\)\\(![^\n]+\\)\n" 2 'factor-font-lock-comment)
The comment character is ! and this makes it so comments get the right face. This works mostly, except when there is a competing font-locked entity inside the comment, like a string (delimited by double quotes):
! this line is font-locked fine
! this one is "not" because "strings"
How do you get font-lock to understand that the comment is already font-locked fine and it doesn't need to try to font-lock any strings inside of it? The obvious way is to add ! to the comment starter class in the syntax table:
(modify-syntax-entry ?! "< 2b" table)
This solution is not possible because function names and other symbols containing ! are legal such as map! filter! and foo!bar. And adding ! would cause code containing such names to be highlighted incorrectly.

Generally, it's a bad idea to highlight comments using a font-lock keyword. It's better to use the syntactic phase for this.
Even though the syntax table isn't powerful enough to describe the syntax of your language, it's still possible to highlight comments using in the syntactic font-lock phase. The solution is to provide a custom function to assign syntactic properties to the ! characters that should start a comment. This is done using the variable syntax-propertize-function.
See the elisp manual for details. Also, this tutorial covers this in great detail.
Update: The following is a simple example that define ! to be comment start character, but not within identifiers. A real world example might need a more refined way to check if something is an identifier.
(defun exmark-syntax-propertize (start end)
(funcall (syntax-propertize-rules
("[[:alnum:]_]\\(!\\)"
(1 "_")))
start
end))
(defvar exmark-mode-syntax-table
(let ((table (make-syntax-table)))
(modify-syntax-entry ?\n "> " table)
(modify-syntax-entry ?! "< " table)
table))
(define-derived-mode exmark-mode prog-mode "!-Mark"
"Major mode for !-mark."
(set (make-local-variable 'syntax-propertize-function)
'exmark-syntax-propertize))

How to disable underscore (_) subscripting in Emacs, TeX input method

On Emacs, while editing a text document of notes for myself (a .txt document, not a .tex document), I am using M-x set-input-method Ret TeX, in order to get easy access to various Unicode characters. So for example, typing \tospace causes a "→" to be inserted into the text, and typing x^2 causes "x2" to be inserted, because the font I am using has support for Unicode codepoints 0x2192 and 0x00B2, respectively.
One of the specially handled characters in the method is for the underscore key, _. However, the font I am using for Emacs does not appear to have support for the codepoints for the various subscript characters, such as subscript zero (codepoint 0x2080), and so when I type _0, I get something rendered as a thin blank in my output. I would prefer to just have the two characters _0 in this case.
I can get _0 by the awkward keystroke sequence _spacedel0, since the space keystroke in the middle of the sequence causes Emacs to abort the TeX input method. But this is awkward.
So, my question: How can I locally customize my Emacs to not remap the _ key at all when I am in the TeX input method? Or how can I create a modified clone (or extension, etc) of the TeX input method that leaves out underscore from its magic?
Things I have tried so far:
I have already done M-xdescribe-key on _; but it is just bound to self-insert-command, like many other text characters. I did see a post-self-insert-hook there, but I have not explored trying to use that to subvert the TeX input method.
Things I have not tried so far:
I have not tried learning anything about the input method architecture or its source code. From my quick purview of the code and methods. it did not seem like something I could quickly jump into.

So here is the solution I just found: Make a personalized copy of the TeX input method, with all of the undesirable entries removed. Then when using M-x set-input-method, select the personalized version instead of TeX.
I would have tried this earlier, but the built-in documentation for set-input-mode and its ilk does not provide sufficient guidance to the actual source for the input-methods for me to find it. It was only after doing another search on SO and finding this: Emacs: Can't activate input method that I was able to get enough information to do this on my own.
Details:
In Emacs, open /usr/share/emacs/22.1/leim/leim-list.el and find the entry for the input method you want to customize. The entry will be something like the following form:
(register-input-method
"TeX" "UTF-8" 'quail-use-package
"\\" "LaTeX-like input method for many characters."
"quail/latin-ltx")
Note the file name prefix referenced in the last element in the form above. Find the corresponding Elisp source file; in this case, it is a relative path to the file quail/latin-ltx.el[.gz]. Open that file in Emacs, and check it out; it should have the entries for the method remappings, both desired and undesired.
Make a user-local copy of that Elisp source file amongst your other Emacs customizations. Open that local copy in Emacs.
In your local copy, find the (quail-define-package ...) form in the file, and change the name of the package; I used FSK-TeX as my new name, like so:
(quail-define-package
"FSK-TeX" "UTF-8" "\\" t ;; <-- The first argument here is the important bit to change.
"LaTeX-like input method for many characters but not as many as you might think.
...)
Go through your local copy, and delete all the S-expressions for mappings that you don't want.
In your .emacs configuration file, register your customized input method, using a form analogous to the one you saw when you looked at leim-list.el in step 1:
(register-input-method
"FSK-TeX" "UTF-8" 'quail-use-package
"\\" "FSK-customized LaTeX-like input method for many characters."
"~/ConfigFiles/Elisp/leim/latin-ltx")
Restart Emacs and test your new input-method; in my case, by doing M-x set-input-method FSK-TeX, typing a_0, and confirming that a_0 shows up in the buffer.
So, there's at least one answer that is less awkward once you have it installed than some of the workarounds listed in the question (and as it turns out, are also officially documented in the Emacs 22 manual as a way to cut off input method processing).
However, I am not really happy with this solution, since I would prefer to inherit future changes to TeX mode, and just have my .emacs remove the undesirable entries on startup.
So I will wait to see if anyone else comes up with a better answer than this.

I did not test this myself, but this seems to be the exact thing you are looking for:
"How to disable underscore subscript in TeX mode in emacs" - source
Two solutions are given in this blogpot:
By the author of the blogpost: (setq font-lock-maximum-decoration nil) (from maximum)
Mentioned as comment:
(eval-after-load "tex-mode" '(fset 'tex-font-lock-subscript 'ignore))

The evil plugin for vim-like modal keybinding allows to map two subsequent presses of the _ key to the insertion of a single _ character:
(set-input-method 'TeX)
(define-key evil-insert-state-local-map (kbd "_ _")
(lambda () (interactive) (insert "_")))
(define-key evil-insert-state-local-map (kbd "^ ^")
(lambda () (interactive) (insert "^")))
When _ and then 1 is pressed, we get ₁ as before, but
when _ and then _ is pressed, we get _.
Analogous for ^.

As already explained in pnkfelix answer, it seems we have to make a personalized copy of the TeX input method. But here comes a lighter way to do that, without any file tweaking. Simply put the following in your .emacs :
(eval-after-load "quail/latin-ltx"
'(let ((pkg (copy-tree (quail-package "TeX"))))
(setcar pkg "MyTeX")
(assq-delete-all ?_ (nth 2 pkg))
(quail-add-package pkg)))
(set-input-method 'TeX)
(register-input-method "MyTeX" "UTF-8" 'quail-use-package "\\")
(set-input-method 'MyTeX)
The important part is the assq-delete-all line in the middle that remove all shortcut entries starting with _. It's a bit of a lisp hack but it seems to work. Since I'm also annoyed by the shortcuts starting with - and ^, I also use the following two lines to disable them :
(assq-delete-all ?- (nth 2 pkg))
(assq-delete-all ?^ (nth 2 pkg))
Note that afterwards you can M-x set-input-method at any time and indicate TeX or MyTeX to switch between the pristine TeX input method or the customized one.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse