Check if a character (not string) is lowercase, uppercase, alphanumeric? - emacs

What is the idiomatic and most effecient way to check if a single character in Elisp is lowercase, uppercase, alphanumeric, digit, whitespace, or any other similar character category? For example Python has string methods like isdigit(), but converting a character (which is just a number) to a string in Elisp to check if it belongs to a certain case or category seems like a wrong approach:
(string-match-p "[[:lower:]]" (char-to-string ?a))

There is no standard way, but I think it is not hard to roll your own:
(defun wordp (c) (= ?w (char-syntax c)))
(defun lowercasep (c) (and (wordp c) (= c (downcase c))))
(defun uppercasep (c) (and (wordp c) (= c (upcase c))))
(defun whitespacep (c) (= 32 (char-syntax c)))
See also cl-digit-char-p in cl-lib.el.

Use get-char-code-property to look up the Unicode general category of the character. A lower-case letter, for example, gives a value of "Ll":
(string= "Ll" (get-char-code-property ?a 'general-category))
⇒ t
An upper-case letter gives "Lu", and a decimal number "Nd". See the full list of values.

Related

emacs lisp pcase error

I am having a hard time reading/understanding the syntax of the pcase statement in emacs-lisp. Please help me figure out how to make the following a valid pcase statement.
(defun execute-play (str)
(setq parse (mapcar (lambda (s) (split-string s ":")) (split-string str " ")))
(pcase (string-to-char (caar parse))``
((pred (<= (string-to-char "5"))) (t-to-pparse))
((pred (<= (string-to-char "d"))) (f-to-p parse))
((string-to-char "w") (w-to-p parse))
(_ "bad input")))
Note that typical input is "1:2 3" or "a 5".
The error from emacs that I get is: 'edebug-signal: Unknown upattern '(string-to-char w)'
This is the second to last case, -- I thought that this would just match the value of (caar parse) against (string-to-char "w") if it did not already match a case before this. Note that I also tried replacing (string-to-char "w") with (SELFQUOTING (string-to-char "w")) since the documentation says that: SELFQUOTING matches itself. This includes keywords, numbers, and strings.
Please help me get this emacs-lisp pcase statement working -- Thanks for all the help!
There are multiple issues with your code:
Since you're not doing any binding or deconstruction in your patterns, you don't need pcase — the conditional is better written using cond.
You have a spurious pair of backquotes at the end of line 3.
You appear to have inverted the first two tests — the first clause will match if the expression is larger than ?5, so the remaining clauses will never match.
pcase doesn't seem to support matching against evaluated values, so third clause should be written (pred (equal (string-to-char "0"))).

Emacs-lisp: prettify-symbols-mode for LaTeX

I was trying to port over the "pretty entities" behaviour from org-mode to latex-mode using the Emacs builtin prettify-symbols-mode. This mode uses font-lock-mode to display character sequences in a buffer as a single (unicode) character. By default for instance emacs-lisp code
(lambda () t)
becomes
(λ () t)
It does however seem to require the character sequences to be separated by some characters, e.g. white-spaces. For instance in my setup, the replacement
\alpha \beta -> α β`
will work, but it will fail when the strings are not separated, e.g.
\alpha\beta -> \alphaβ
This is an issue specifically, because I wanted to use this prettification to make quantum mechanical equations more readable, where I e.g. the replacement like
|\psi\rangle -> |ψ⟩
Is it possible to avoid this delimiter-issue using prettify-symbols-mode? And if it is not, is it possible by using font-lock-mode on a lower level?
Here's the code that should do what you want:
(defvar pretty-alist
(cl-pairlis '("alpha" "beta" "gamma" "delta" "epsilon" "zeta" "eta"
"theta" "iota" "kappa" "lambda" "mu" "nu" "xi"
"omicron" "pi" "rho" "sigma_final" "sigma" "tau"
"upsilon" "phi" "chi" "psi" "omega")
(mapcar
(lambda (x) (make-char 'greek-iso8859-7 x))
(number-sequence 97 121))))
(add-to-list 'pretty-alist '("rangle" . ?\⟩))
(defun pretty-things ()
(mapc
(lambda (x)
(let ((word (car x))
(char (cdr x)))
(font-lock-add-keywords
nil
`((,(concat "\\(^\\|[^a-zA-Z0-9]\\)\\(" word "\\)[a-zA-Z]")
(0 (progn
(decompose-region (match-beginning 2) (match-end 2))
nil)))))
(font-lock-add-keywords
nil
`((,(concat "\\(^\\|[^a-zA-Z0-9]\\)\\(" word "\\)[^a-zA-Z]")
(0 (progn
(compose-region (1- (match-beginning 2)) (match-end 2)
,char)
nil)))))))
pretty-alist))
As you can see above, pretty-alist starts out with greek chars. Then I add
\rangle just to demonstrate how to add new things.
To enable it automatically, add it to the hook:
(add-hook 'LaTeX-mode-hook 'pretty-things)
I used the code from here as a starting
point, you can look there for a reference.
The code of prettify-symbols-mode derives from code developped for languages like Haskell and a few others, which don't use something like TeX's \. So you may indeed be in trouble. I suggest you M-x report-emacs-bug requesting prettify-symbol-mode be improved to support TeX-style syntax.
In the mean time, you'll have to "do it by hand" along the lines of what abo-abo suggests.
One note, tho: back in the days of Emacs-21, I ported X-Symbol to work on Emacs, specifically because I wanted to see such pretty things in LaTeX. Yet, I discovered that it was mostly useless to me. And I think it's even more the case now. Here's why:
You can just use an actual ψ character in your LaTeX code instead of \psi nowadays. So you don't need display tricks for it to look "right".
Rather than repeating |\psi\rangle I much prefer defining macros and then use \Qr{\Vangle} (where \Vangle turns into \psi ("V" stands for "metaVariable"), and \Qr wraps it in a braket) so I can easily tweak those macros and know that the document will stay consistent. At that point, pretty-display of \psi and \rangle is of no importance.

How do I create shy groups in Emacs with rx?

Generally, I can use the excellent rx macro to create readable regular expressions and be sure that I've escaped the correct metacharacters.
(rx (any "A-Z")) ;; "[A-Z]"
However, I can't work out how to create shy groups, e.g. \(?:AB\). rx sometimes produces them in its output:
(rx (or "ab" "bc")) ;; "\\(?:ab\\|bc\\)"
but I want to explicitly add them. I can do:
(rx (regexp "\\(?:AB\\)"))
but this defeats the point of rx.
In a perfect world, I'd like to be able to write:
(rx (shy-group "A"))
I'd settle for something like this (none of these work):
;; sadly, `regexp` only accepts literal strings
(rx (regexp (format "\\(?:%s\\)" (rx WHATEVER))))
;; also unfortunate, `eval` quotes the string it's given
(rx (eval (format "\\(?:%s\\)" (rx WHATEVER))))
How can I create regular expressions with shy groups using rx?
I think the structure of a rx form eliminates any need to explicitly create shy groups -- everything that a shy group could be needed for is accounted for by other syntax.
e.g. your own example:
(rx (or "ab" "bc")) ;; "\\(?:ab\\|bc\\)"
For other cases, it is also possible to extend the keywords used by rx.
Example (taken from EmacsWiki link above):
(defmacro rx-extra (&rest body-forms)
(let ((add-ins (list
`(file . ,(rx (+ (or alnum digit "." "/" "-" "_"))))
`(ws0 . ,(rx (0+ (any " " "\t"))))
`(ws+ . ,(rx (+ (any " " "\t"))))
`(int . ,(rx (+ digit))))))
`(let ((rx-constituents (append ',add-ins rx-constituents nil)))
,#body-forms)))

Unicode glyphs for keywords and operators in Coq/Proof General under Emacs

This question has to do with configuring the Coq mode within Proof General, in Emacs.
I'm trying to have Emacs automatically replace keywords and notation in Coq with the corresponding Unicode glyphs. I managed to define fun to be the Greek lowercase lambda λ, forall to be the universal quantifier symbol ∀, etc. I've had no problems defining symbols for words.
The problem is that when I try to define operators =>, ->, <->, etc. to their Unicode symbol ⇒→↔, they are not replaced with the corresponding Unicode glyphs in Coq. They are, however, replaced in the *scratch* buffer, when I test them. I'm using the same mechanism to match Unicode glyps with Coq notation:
(defun define-glyph (string char-info)
(font-lock-add-keywords
nil
`((,(format "\\<%s\\>" string)
(0 (progn
(compose-region
(match-beginning 0) (match-end 0)
,(apply #'make-char char-info))
nil))))
))
I suspect the problem is that Coq mode marks certain punctuation marks as special, so Emacs ignores my code to replace them with the Unicode glyphs, but I'm not sure. Can someone please shed some light on this for me?
Those operators are probably symbols, not words, according to the mode specific syntax table. Try
(defun define-glyph (string char-info)
(font-lock-add-keywords
nil
`((,(format "\\_<%s\\_>" string)
(0 (progn
(compose-region
(match-beginning 0) (match-end 0)
,(apply #'make-char char-info))
nil))))))

Emacs should set the second character of a word in lower case

In most cases I'm writing german texts. Most words start with an uppercase letter followed by lower case letters. Sometimes I'm typing too fast and also the second letter of a word is typed upper case. To work around this issue I asked myself if it is poosible to write a function which automatically changes the case of the second letter. Optional this should only happen if the third and following are in lower case. Do you know if this is possible and do you have any suggestions?
Here's an 'always on' version that fixes as you type. It will let you type all uppercase words, but as soon as it detects mixed case it will capitalize.
(defun blah (s e l)
(let ((letter (string-to-char (word-before-point))))
(if (and (eq letter (upcase letter))
(not (eq (char-before) (upcase (char-before)))))
(capitalize-word -1))))
(add-to-list 'after-change-functions 'blah)
Here's a command that will convert to lowercase the second letter of each word if the first letter is uppercase and all other letters in the word are lowercase:
(defun fix-double-uppercase-at-start-of-words ()
(interactive)
(let ((case-fold-search nil))
(save-match-data
(while (re-search-forward "\\b\\([[:upper:]]\\)\\([[:upper:]]\\)\\([[:lower:]]*\\)\\b" nil t)
(replace-match (concat (match-string 1)
(downcase (match-string 2))
(match-string 3))
t)))))
The command will work on all words from the current cursor position to the (visible) end of the buffer.
You could setup a minor mode mapping all upcase characters to special input function.
See:
http://gist.github.com/516242