Using ispell/aspell to spell check camelcased words

Using ispell/aspell to spell check camelcased words - emacs

I need to spell check a large document containing many camelcased words. I want ispell or aspell to check if the individual words are spelled correctly.
So, in case of this word:
ScientificProgrezGoesBoink
I would love to have it suggest this instead:
ScientificProgressGoesBoink
Is there any way to do this? (And I mean, while running it on an Emacs buffer.) Note that I don't necessarily want it to suggest the complete alternative. However, if it understands that Progrez is not recognized, I would love to be able to replace that part at least, or add that word to my private dictionary, rather than including every camel-cased word into the dictionary.

I took #phils suggestions and dug around a little deeper. It turns out that if you get camelCase-mode and reconfigure some of ispell like this:
(defun ispell-get-word (following)
(when following
(camelCase-forward-word 1))
(let* ((start (progn (camelCase-backward-word 1)
(point)))
(end (progn (camelCase-forward-word 1)
(point))))
(list (buffer-substring-no-properties start end)
start end)))
then, in that case, individual camel cased words suchAsThisOne will actually be spell-checked correctly. (Unless you're at the beginning of a document -- I just found out.)
So this clearly isn't the fullblown solution, but at least it's something.

There is "--run-together" option in aspell. Hunspell can't check camelcased word.
If you read the code of aspell, you will find its algorithm actually does not split camelcase word into a list of sub-words. Maybe this algorithm is faster, but it will wrongly report word containing two character sub-word as typo. Don't waste time to tweak other aspell options. I tried and they didn't work.
So we got two problems:
aspell reports SOME camelcased words as typos
hunspell reports ALL camelcased words as typos
Solution to solve BOTH problems is to write our own predicate in Emacs Lisp.
Here is a sample predicate written for javascript:
(defun split-camel-case (word)
"Split camel case WORD into a list of strings.
Ported from 'https://github.com/fatih/camelcase/blob/master/camelcase.go'."
(let* ((case-fold-search nil)
(len (length word))
;; ten sub-words is enough
(runes [nil nil nil nil nil nil nil nil nil nil])
(runes-length 0)
(i 0)
ch
(last-class 0)
(class 0)
rlt)
;; split into fields based on class of character
(while (< i len)
(setq ch (elt word i))
(cond
;; lower case
((and (>= ch ?a) (<= ch ?z))
(setq class 1))
;; upper case
((and (>= ch ?A) (<= ch ?Z))
(setq class 2))
((and (>= ch ?0) (<= ch ?9))
(setq class 3))
(t
(setq class 4)))
(cond
((= class last-class)
(aset runes
(1- runes-length)
(concat (aref runes (1- runes-length)) (char-to-string ch))))
(t
(aset runes runes-length (char-to-string ch))
(setq runes-length (1+ runes-length))))
(setq last-class class)
;; end of while
(setq i (1+ i)))
;; handle upper case -> lower case sequences, e.g.
;; "PDFL", "oader" -> "PDF", "Loader"
(setq i 0)
(while (< i (1- runes-length))
(let* ((ch-first (aref (aref runes i) 0))
(ch-second (aref (aref runes (1+ i)) 0)))
(when (and (and (>= ch-first ?A) (<= ch-first ?Z))
(and (>= ch-second ?a) (<= ch-second ?z)))
(aset runes (1+ i) (concat (substring (aref runes i) -1) (aref runes (1+ i))))
(aset runes i (substring (aref runes i) 0 -1))))
(setq i (1+ i)))
;; construct final result
(setq i 0)
(while (< i runes-length)
(when (> (length (aref runes i)) 0)
(setq rlt (add-to-list 'rlt (aref runes i) t)))
(setq i (1+ i)))
rlt))
(defun flyspell-detect-ispell-args (&optional run-together)
"If RUN-TOGETHER is true, spell check the CamelCase words.
Please note RUN-TOGETHER will make aspell less capable. So it should only be used in prog-mode-hook."
;; force the English dictionary, support Camel Case spelling check (tested with aspell 0.6)
(let* ((args (list "--sug-mode=ultra" "--lang=en_US"))args)
(if run-together
(setq args (append args '("--run-together" "--run-together-limit=16"))))
args))
;; {{ for aspell only, hunspell does not need setup `ispell-extra-args'
(setq ispell-program-name "aspell")
(setq-default ispell-extra-args (flyspell-detect-ispell-args t))
;; }}
;; ;; {{ hunspell setup, please note we use dictionary "en_US" here
;; (setq ispell-program-name "hunspell")
;; (setq ispell-local-dictionary "en_US")
;; (setq ispell-local-dictionary-alist
;; '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)))
;; ;; }}
(defvar extra-flyspell-predicate '(lambda (word) t)
"A callback to check WORD. Return t if WORD is typo.")
(defun my-flyspell-predicate (word)
"Use aspell to check WORD. If it's typo return t."
(let* ((cmd (cond
;; aspell: `echo "helle world" | aspell pipe`
((string-match-p "aspell$" ispell-program-name)
(format "echo \"%s\" | %s pipe"
word
ispell-program-name))
;; hunspell: `echo "helle world" | hunspell -a -d en_US`
(t
(format "echo \"%s\" | %s -a -d en_US"
word
ispell-program-name))))
(cmd-output (shell-command-to-string cmd))
rlt)
;; (message "word=%s cmd=%s" word cmd)
;; (message "cmd-output=%s" cmd-output)
(cond
((string-match-p "^&" cmd-output)
;; it's a typo because at least one sub-word is typo
(setq rlt t))
(t
;; not a typo
(setq rlt nil)))
rlt))
(defun js-flyspell-verify ()
(let* ((case-fold-search nil)
(font-matched (memq (get-text-property (- (point) 1) 'face)
'(js2-function-call
js2-function-param
js2-object-property
js2-object-property-access
font-lock-variable-name-face
font-lock-string-face
font-lock-function-name-face
font-lock-builtin-face
rjsx-text
rjsx-tag
rjsx-attr)))
subwords
word
(rlt t))
(cond
((not font-matched)
(setq rlt nil))
;; ignore two character word
((< (length (setq word (thing-at-point 'word))) 2)
(setq rlt nil))
;; handle camel case word
((and (setq subwords (split-camel-case word)) (> (length subwords) 1))
(let* ((s (mapconcat (lambda (w)
(cond
;; sub-word wholse length is less than three
((< (length w) 3)
"")
;; special characters
((not (string-match-p "^[a-zA-Z]*$" w))
"")
(t
w))) subwords " ")))
(setq rlt (my-flyspell-predicate s))))
(t
(setq rlt (funcall extra-flyspell-predicate word))))
rlt))
(put 'js2-mode 'flyspell-mode-predicate 'js-flyspell-verify)
Or just use my new pacakge https://github.com/redguardtoo/wucuo

You should parse the camel cased words and split them, then check the individual spelling for each one and assemble a suggestion taking into account the single suggestion for each misspelled token. Considering that each misspelled token can have multiple suggestions this sounds a bit inefficient to me.

Related

How to extract XML processing instructions in Emacs Lisp?

I would like to extract the processing instructions (particularly xml-model) from an XML file; yet both (n)xml-parse-file as well as libxml-parse-xml-region do not recognize processing instructions.
Is there a clean way to extract processing instructions or do I have to regex search for PIs?
edit: Here is a first draft of the functionality I was looking for:
(cl-defun extract-processing-instructions (&rest processing-instructions)
"Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
(interactive)
(let ((pi-re
(format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
(result))
(save-excursion
(goto-char (point-min))
(while (re-search-forward pi-re nil t)
(push (match-string 0) result)))
(nreverse result)))
(cl-defun pi-str2sexp (pi-str)
"Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
edit 2: Turns out advicing/generally building upon xml-parse-* in this matter (like suggested by #Tom Regner) is a huge pain. :(
The thing I came up with was a context manager, the idea was to use it to around-advice string-parse-tag-1 (which is at the heart of xml-parse-* (of course stand-alone use is also an option):
(cl-defmacro --replace-first-group (regex-replace-alist)
`(save-excursion
(dolist (expression ,regex-replace-alist)
(goto-char (point-min))
(replace-regexp (car expression) (cadr expression)))))
(cl-defmacro with-parsable-pi (buffer &body body)
"Context manager that treats xml processing instructions in BUFFER as normal elements."
(declare (indent defun))
`(let ((old-buffer ,buffer))
(with-temp-buffer
(insert-buffer-substring old-buffer)
(goto-char (point-min))
(--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
,#body)))
This e.g. allows calls like
(with-parsable-pi (current-buffer)
(xml-parse-tag-1))
so it is at least possible to get an element at a time; but since the XML exposed in the context manager isn't actually valid and xml-parse-* (rightfully) errors if invalid XML is encountered, it isn't possible to process more than one element at a time.
I was thinking of maybe introducing a pseudo root element or something, but the kludge spiral is ghastly enough as it is.
Another idea of course would be to run an xpath query to extract processing instructions. If there only was a solid xpath solution in Emacs Lisp..

Ok, I think I found a satisfactory solution: xmltok-forward-prolog!
So here is the code I came up with for extracting processing instructions:
(cl-defun filter-xmltok-prolog (&optional (buffer (current-buffer))
(filter-re "processing-instruction-.*"))
"Filters the output of xmltok-forward-prolog (i.e. index 0 ('type') of each array) run in the context of BUFFER against FILTER-RE. Returns a list of vectors."
(with-current-buffer buffer
(save-excursion
(goto-char (point-min))
(let ((raw-prolog-data (xmltok-forward-prolog)))
(seq-filter
#'(lambda (x)
(string-match filter-re (symbol-name (aref x 0))))
raw-prolog-data)))))
(cl-defun --merge-pi-data (pi-data)
"Meant to operate on data filtered with filter-xmltok-prolog against 'processing-instruction-.*'.
Merges processing-instruction-left/-right and returns a list of vectors holding the start/end coordinates of a processing instruction at index 1 and 2."
(let ((left (car pi-data))
(right (cadr pi-data)))
(cond
((null pi-data) nil)
(t (cons
(vector 'processing-instruction
(aref left 1) (aref right 2))
(--merge-pi-data (cddr pi-data)))))))
;; test
(--merge-pi-data '([processing-instruction-left 40 51] [processing-instruction-right 52 126]))
(cl-defun pi-str2s-exp (pi-str)
"Takes a processing instruction as a string and transforms it into a sexp structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
(cl-defun get-processing-instructions (&optional (buffer (current-buffer)))
"Extracts processing instructions from BUFFER and returns a list of sexp representations in the style of xml-parse-*."
(save-excursion
(mapcar #'pi-str2s-exp
(mapcar #'(lambda (v)
(buffer-substring (aref v 1) (aref v 2)))
(--merge-pi-data (filter-xmltok-prolog buffer))))))
(cl-defun test/get-pis-from-file (file)
(with-temp-buffer
(insert-file-contents file)
(get-processing-instructions)))
(test/get-pis-from-file "~/some/xml/file.xml")
I'm not at all an Emacs Lisp expert and this isn't at all tested thoroughly, but it works for now! :)

Destructuring bind for regex matches

In elisp, how can I get a destructuring bind for regex matches?
For example,
;; what is the equivalent of this with destructuring?
(with-temp-buffer
(save-excursion (insert "a b"))
(re-search-forward "\\(a\\) \\(b\\)")
(cons (match-string 1)
(match-string 2)))
;; trying to do something like the following
(with-temp-buffer
(save-excursion (insert "a b"))
(cl-destructuring-bind (a b) (re-search-forward "\\(a\\) \\(b\\)")
(cons a b)))
I was thinking I would have to write a macro to expand matches if there isn't another way.

Here is one way: you first extend pcase to accept a new re-match pattern, with a definition such as:
(pcase-defmacro re-match (re)
"Matches a string if that string matches RE.
RE should be a regular expression (a string).
It can use the special syntax \\(?VAR: to bind a sub-match
to variable VAR. All other subgroups will be treated as shy.
Multiple uses of this macro in a single `pcase' are not optimized
together, so don't expect lex-like performance. But in order for
such optimization to be possible in some distant future, back-references
are not supported."
(let ((start 0)
(last 0)
(new-re '())
(vars '())
(gn 0))
(while (string-match "\\\\(\\(?:\\?\\([-[:alnum:]]*\\):\\)?" re start)
(setq start (match-end 0))
(let ((beg (match-beginning 0))
(name (match-string 1 re)))
;; Skip false positives, either backslash-escaped or within [...].
(when (subregexp-context-p re start last)
(cond
((null name)
(push (concat (substring re last beg) "\\(?:") new-re))
((string-match "\\`[0-9]" name)
(error "Variable can't start with a digit: %S" name))
(t
(let* ((var (intern name))
(id (cdr (assq var vars))))
(unless id
(setq gn (1+ gn))
(setq id gn)
(push (cons var gn) vars))
(push (concat (substring re last beg) (format "\\(?%d:" id))
new-re))))
(setq last start))))
(push (substring re last) new-re)
(setq new-re (mapconcat #'identity (nreverse new-re) ""))
`(and (pred stringp)
(app (lambda (s)
(save-match-data
(when (string-match ,new-re s)
(vector ,#(mapcar (lambda (x) `(match-string ,(cdr x) s))
vars)))))
(,'\` [,#(mapcar (lambda (x) (list '\, (car x))) vars)])))))
and once that is done, you can use it as follows:
(pcase X
((re-match "\\(?var:[[:alpha:]]*\\)=\\(?val:.*\\)")
(cons var val)))
or
(pcase-let
(((re-match "\\(?var:[[:alpha:]]*\\)=\\(?val:.*\\)") X))
(cons var val))
This has not been heavily tested, and as mentioned in the docstring it doesn't work as efficiently as it (c|sh)ould when matching a string against various regexps at the same time. Also you only get the matched substrings, not their position. And finally, it applies the regexp search to a string, whereas in manny/most cases regexps searches are used in a buffer. But you may still find it useful.

Is there a way to toggle a string between single and double quotes in emacs?

I'm looking for an emacs command that will toggle the surrounding quote characters on the string under the point, e.g. with the cursor in the string 'bar', hit a key and change it between:
foo = 'bar' <---> foo = "bar"
For bonus points it would:
handle toggling Python triple-quote strings (''' <---> """)
automatically change backslash escaping inside the string as appropriate.
e.g.
foo = 'bar "quote"' <---> foo = "bar \"quote\""

This could be a bit more robust:
(defun toggle-quotes ()
(interactive)
(save-excursion
(let ((start (nth 8 (syntax-ppss)))
(quote-length 0) sub kind replacement)
(goto-char start)
(setq sub (buffer-substring start (progn (forward-sexp) (point)))
kind (aref sub 0))
(while (char-equal kind (aref sub 0))
(setq sub (substring sub 1)
quote-length (1+ quote-length)))
(setq sub (substring sub 0 (- (length sub) quote-length)))
(goto-char start)
(delete-region start (+ start (* 2 quote-length) (length sub)))
(setq kind (if (char-equal kind ?\") ?\' ?\"))
(loop for i from 0
for c across sub
for slash = (char-equal c ?\\)
then (if (and (not slash) (char-equal c ?\\)) t nil) do
(unless slash
(when (member c '(?\" ?\'))
(aset sub i
(if (char-equal kind ?\") ?\' ?\")))))
(setq replacement (make-string quote-length kind))
(insert replacement sub replacement))))
It will use syntax information from the buffer to find the quotes at the beginning of the string (that is given that the strings are quoted), and will also try to flip quotes inside the string, unless they are escaped with backslash - which looks like it could be a common case.
PS. I've just realized you also wanted it to find triple quotes, so her goes.

Here's a quick hack to get you started:
(defun toggle-quotes ()
"Toggle single quoted string to double or vice versa, and
flip the internal quotes as well. Best to run on the first
character of the string."
(interactive)
(save-excursion
(re-search-backward "[\"']")
(let* ((start (point))
(old-c (char-after start))
new-c)
(setq new-c
(case old-c
(?\" "'")
(?\' "\"")))
(setq old-c (char-to-string old-c))
(delete-char 1)
(insert new-c)
(re-search-forward old-c)
(backward-char 1)
(let ((end (point)))
(delete-char 1)
(insert new-c)
(replace-string new-c old-c nil (1+ start) end)))))
The function swaps the internal quotes to the opposite, which is close to bonus 2.

Here's something even more robust, in that it doesn't delete the whole text between the quotes (doing so prevents save-excursion from keeping the point where it was, which is a pain). Also handles (un)backslash-ing nested quotes.
(defun toggle-quotes ()
(interactive)
(let* ((beg (nth 8 (syntax-ppss)))
(orig-quote (char-after beg))
(new-quote (case orig-quote
(?\' ?\")
(?\" ?\'))))
(save-restriction
(widen)
(save-excursion
(catch 'done
(unless new-quote
(message "Not inside a string")
(throw 'done nil))
(goto-char beg)
(delete-char 1)
(insert-char new-quote)
(while t
(cond ((eobp)
(throw 'done nil))
((= (char-after) orig-quote)
(delete-char 1)
(insert-char new-quote)
(throw 'done nil))
((= (char-after) ?\\)
(forward-char 1)
(when (= (char-after) orig-quote)
(delete-char -1))
(forward-char 1))
((= (char-after) new-quote)
(insert-char ?\\)
(forward-char 1))
(t (forward-char 1)))))))))

Here's a function I made for JavaScript, might help?
function swap_str(e, r, t) {
return e = e.split(r).join("WHAK_a_SWAP"), e = e.split(t).join("WHAK_b_SWAP"), e = e.split("WHAK_a_SWAP").join(t),
e = e.split("WHAK_b_SWAP").join(r);
}
//test 1
var str = 'this is "test" of a \'test\' of swapping strings';
var manipulated = swap_str(str,"'",'"');
document.writeln(manipulated)
//test 2
manipulated = swap_str(manipulated,"'",'"');
document.writeln('<hr>'+manipulated)

How do I convert a string of hex into ASCII using elisp?

Today I received a reply to one of my emails in the form of a string of hex bytes:
"686170707920333974682068617665206120676f6f64206f6e6521"
And I was thinking of the most efficient clean way to convert the string into it's ASCII equivalent. I'll add my answer to the question but I didn't feel it was as elegant as it could have been.

Here's an iterative solution
(defun decode-hex-string (hex-string)
(let ((res nil))
(dotimes (i (/ (length hex-string) 2) (apply #'concat (reverse res)))
(let ((hex-byte (substring hex-string (* 2 i) (* 2 (+ i 1)))))
(push (format "%c" (string-to-number hex-byte 16)) res)))))
And one using loop, if you're looking to avoid side-effect operations (you may need to (require 'cl) in order to use this one):
(defun decode-hex-string (hex-string)
(apply #'concat
(loop for i from 0 to (- (/ (length hex-string) 2) 1)
for hex-byte = (substring hex-string (* 2 i) (* 2 (+ i 1)))
collect (format "%c" (string-to-number hex-byte 16)))))
In general, it's best to avoid recursion in Elisp and Common Lisp; your stack is going to keel over with a big enough input, and neither language guarantees tail recursion (which you aren't using, but still). In Scheme, it's a different story.
Incidentally, Happy 39th.

For those that come here searching...
Elaborating a bit on Inaimathi's answer, here's the code to replace the selected region with the decoded hexa:
(defun decode-hex-string (hex-string)
(apply #'concat
(loop for i from 0 to (- (/ (length hex-string) 2) 1)
for hex-byte = (substring hex-string (* 2 i) (* 2 (+ i 1)))
collect (format "%c" (string-to-number hex-byte 16)))))
(defun hex-decode-region (start end)
"Decode a hex string in the selected region."
(interactive "r")
(save-excursion
(let* ((decoded-text
(decode-hex-string
(buffer-substring start end))))
(delete-region start end)
(insert decoded-text))))
(provide 'decode-hex-string)
(provide 'hex-decode-region)
Save that on a file and then M-x load-file. Or put on ~/emacs.d, or whatever. Then select the region with the hexa contents and M-x hex-decode-region. Enjoy!

If you use Magnar Sveen's dash.el list API (and you should), try:
(concat (--map (string-to-number (concat it) 16) (-partition 2 (string-to-list "686170707920333974682068617665206120676f6f64206f6e6521"))))
the solution uses Emacs functions string-to-number, string-to-list and concat, and dash.el functions -partition and anaphoric version of -map. What's good about concat is that it concatenates not only strings, but lists or vectors of characters too. We can rewrite this code using ->> threading macro. It takes the result of 1st argument, then applies it to 2nd, 3rd, etc arguments, just like Unix pipe.
(->> (string-to-list "686170707920333974682068617665206120676f6f64206f6e6521")
(-partition 2)
(--map (string-to-number (concat it) 16))
concat)

Building the answers provided by Inaimathi and
Shrein, I also added an encode function. Here is an implementation of both encode and decode, for both string and region arguments:
;; ASCII-HEX converion
(defun my/hex-decode-string (hex-string)
(let ((res nil))
(dotimes (i (/ (length hex-string) 2) (apply #'concat (reverse res)))
(let ((hex-byte (substring hex-string (* 2 i) (* 2 (+ i 1)))))
(push (format "%c" (string-to-number hex-byte 16)) res)))))
(defun my/hex-encode-string (ascii-string)
(let ((res nil))
(dotimes (i (length ascii-string) (apply #'concat (reverse res)))
(let ((ascii-char (substring ascii-string i (+ i 1))))
(push (format "%x" (string-to-char ascii-char)) res)))))
(defun my/hex-decode-region (start end)
"Decode a hex string in the selected region."
(interactive "r")
(save-excursion
(let* ((decoded-text
(my/hex-decode-string
(buffer-substring start end))))
(delete-region start end)
(insert decoded-text))))
(defun my/hex-encode-region (start end)
"Encode a hex string in the selected region."
(interactive "r")
(save-excursion
(let* ((encoded-text
(my/hex-encode-string
(buffer-substring start end))))
(delete-region start end)
(insert encoded-text))))

Here's mine. I'm not claiming this is particularly idiomatic or elegant, either. Maybe a bit old-skool.
(defun hex-string-decode (str)
"Decode STR of the form \"4153434949\" to corresponding \"ASCII\"."
(let (decoded sub)
(while (> (length str) 0)
(setq sub (substring str 0 2)
decoded (cons (string-to-number sub 16) decoded)
str (substring str 2) ) )
(when (not (zerop (length str))) (error "residue %s" str))
(mapconcat #'char-to-string (nreverse decoded) "") ) )

At first I didn't see a requirement that it must be Elisp, so I did it interactively and the code below follows my interactive procedure.
(defun decode-hex-string (hex-string)
(with-temp-buffer
(insert-char 32 (/ (length hex-string) 2))
(beginning-of-buffer)
(hexl-mode)
(hexl-insert-hex-string hex-string 1)
(hexl-mode-exit)
(buffer-string)))

This was the solution I came up with which struck me as a bit ugly:
(defun decode-hex-string(string)
"Decode a hex string into ASCII"
(let* ((hex-byte (substring string 0 2))
(rest (substring string 2))
(rest-as-string (if (> (length rest) 2)
(decode-hex-string rest)
"")))
(format "%c%s" (string-to-number hex-byte 16) rest-as-string)))

tweak-text: Lisp nesting exceeds `max-lisp-eval-depth

The program should reformat the string like below.
Example: (game-print '(THIS IS A SENTENCE。 WHAT ABOUT THIS？ PROBABLY.))
This is a sentence. What about ths? Probably.
But something is wrong( Lisp nesting exceeds `max-lisp-eval-depth) and i don't know why. This piece of code is actually from the book "Land of lisp" in page 97. The original code is written in common lisp. I want to rewrite it in elisp. The last two argument in tweak-text means captain and literal.
(defun tweak-text (lst caps lit)
(when lst
(let ((item (car lst))
(rest (cdr lst)))
(cond ((eql item ?\ ) (cons item (tweak-text rest caps lit)))
((member item '(?\! ?\? ?\.)) (cons item (tweak-text rest t lit)))
((eql item ?\") (tweak-text rest caps (not lit)))
(lit (cons item (tweak-text rest nil lit)))
(caps (cons (upcase item) (tweak-text rest nil lit)))
(t (cons (downcase item) (tweak-text rest nil nil)))))))
(defun game-print (lst)
(print (coerce (tweak-text (coerce (prin1-to-string lst) 'list) t nil) 'string)))
(game-print '(not only does this sentence have a "comma," it also mentions the "iPad."))
The orignal code written in common lisp.
(defun tweak-text (lst caps lit)
(when lst
(let ((item (car lst))
(rest (cdr lst)))
(cond ((eql item #\space) (cons item (tweak-text rest caps lit)))
((member item '(#\! #\? #\.)) (cons item (tweak-text rest t lit)))
((eql item #\") (tweak-text rest caps (not lit)))
(lit (cons item (tweak-text rest nil lit)))
(caps (cons (char-upcase item) (tweak-text rest nil lit)))
(t (cons (char-downcase item) (tweak-text rest nil nil)))))))
(defun game-print (lst)
(princ (coerce (tweak-text (coerce (string-trim "() " (prin1-to-string lst)) 'list) t nil) 'string))
(fresh-line))

In both cases, you have non-terminal recursions, so you're using
O(length(lst)) stack space. Obviously, systems may limit the stack
space you can use, and you do indeed reach this limit in emacs. (Now
then in emacs, you can increase the limit by changing
max-lisp-eval-depth, but this won't solve the fundamental problem).
The solution is to use iteration instead of recursion.
But first, write in emacs:
(defun character (x)
"common-lisp: return the character designated by X."
(etypecase x
(integer x)
(string (aref x 0))
(symbol (aref (symbol-name x) 0))))
(defun string-trim (character-bag string-designator)
"common-lisp: returns a substring of string, with all characters in \
character-bag stripped off the beginning and end."
(unless (sequencep character-bag)
(signal 'type-error "expected a sequence for `character-bag'."))
(let* ((string (string* string-designator))
(margin (format "[%s]*" (regexp-quote
(if (stringp character-bag)
character-bag
(map 'string 'identity character-bag)))))
(trimer (format "\\`%s\\(\\(.\\|\n\\)*?\\)%s\\'" margin margin)))
(replace-regexp-in-string trimer "\\1" string)))
(require 'cl)
so that you can write a single function for both CL and elisp:
(defun tweak-text (list caps lit)
(let ((result '()))
(dolist (item list (nreverse result))
(cond ((find item " !?.") (push item result))
((eql item (character "\"")) (setf lit (not lit)))
(lit (push item result)
(setf caps nil))
(caps (push (char-upcase item) result)
(setf caps nil))
(t (push (char-downcase item) result)
(setf caps nil
lit nil))))))
(defun game-print (list)
(princ (coerce (tweak-text (coerce (string-trim "() " (prin1-to-string list)) 'list)
t nil)
'string))
(terpri))
Then:
(game-print '(not only does this sentence have a "comma," it also mentions the "iPad."))
in emacs:
prints: Not only does this sentence have a comma, it also mentions the iPad.
returns: t
in Common Lisp:
prints: Not only does this sentence have a comma, it also mentions the iPad.
returns: nil
Now, in general there's little point of using lists to process strings, both emacs lisp and Common Lisp have powerful primitives to deal with sequences and strings directly.

Note that elisp (sadly) does not optimise for tail-recursion, so that turns out to be a very inefficient way to write this function.

You are indeed hitting the 'max-lisp-eval-depth' limit when recursing in tweak-text. I don't see anything wrong with the way the code is(I didn't check if its doing what you intend it to do).
You can configure/raise the 'max-lisp-eval-depth' limit. The documentation for that variable states that you can raise it as long as you are confident that you are not going to trip into running out of stack space. The limit is conservatively set to 541 on my machine. Raising it to 600 got the function definition above to work on the input you gave as example.

As Pascal Bourguignon already mentioned it, using strings w/o coercing them to lists and back would be a better approach, below is my take at it. Note that it is slightly different in that literal strings are verified for punctuation, and if they appear to have punctuation such as would cause it otherwise to have the next letter upper-cased, then it would be upper cased too. I'm not sure this is a disadvantage, this is why I didn't take care of this difference.
(defun tweak-text (source)
(let ((i 0) (separator "") (cap t) current)
(with-output-to-string
(dolist (i source)
(setq current
(concat separator
(etypecase i
(string i)
(symbol (downcase (symbol-name i)))))
separator " ")
(let (current-char)
(dotimes (j (length current))
(setq current-char (aref current j))
(cond
((position current-char " \t\n\r"))
(cap (setq cap nil
current-char (upcase current-char)))
((position current-char ".?!")
(setq cap t)))
(princ (char-to-string current-char))))))))
(tweak-text '(not only does this sentence have a "comma," it also mentions the "iPad."))
"Not only does this sentence have a comma, it also mentions the iPad."

I think you should write something like this:
(defun tweak-text-wrapper (&rest args)
(let ((max-lisp-eval-depth 9001)) ; as much as you want
(apply tweak-text args)))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Using ispell/aspell to spell check camelcased words - emacs

Related

How to extract XML processing instructions in Emacs Lisp?

Destructuring bind for regex matches

Is there a way to toggle a string between single and double quotes in emacs?

How do I convert a string of hex into ASCII using elisp?

tweak-text: Lisp nesting exceeds `max-lisp-eval-depth

Categories

Resources