How to replace string in a file with lisp? - lisp

What's the lisp way of replacing a string in a file.
There is a file identified by *file-path*, a search string *search-term* and a replacement string *replace-term*.
How to make file with all instances of *search-term*s replaced with *replace-term*s, preferably in place of the old file?

One more take at the problem, but few warnings first:
To make this really robust and usable in the real-life situation you would need to wrap this into handler-case and handle various errors, like insufficient disc space, device not ready, insufficient permission for reading / writing, insufficient memory to allocate for the buffer and so on.
This does not do regular expression-like replacement, it's simple string replacement. Making a regular expression based replacement on large files may appear far less trivial than it looks like from the start, it would be worth writing a separate program, something like sed or awk or an entire language, like Perl or awk ;)
Unlike other solutions it will create a temporary file near the file being replaced and will save the data processed so far into this file. This may be worse in the sense that it will use more disc space, but this is safer because in case the program fails in the middle, the original file will remain intact, more than that, with some more effort you could later resume replacing from the temporary file if, for example, you were saving the offset into the original file in the temporary file too.
(defun file-replace-string (search-for replace-with file
&key (element-type 'base-char)
(temp-suffix ".tmp"))
(with-open-file (open-stream
file
:direction :input
:if-exists :supersede
:element-type element-type)
(with-open-file (temp-stream
(concatenate 'string file temp-suffix)
:direction :output
:element-type element-type)
(do ((buffer (make-string (length search-for)))
(buffer-fill-pointer 0)
(next-matching-char (aref search-for 0))
(in-char (read-char open-stream nil :eof)
(read-char open-stream nil :eof)))
((eql in-char :eof)
(when (/= 0 buffer-fill-pointer)
(dotimes (i buffer-fill-pointer)
(write-char (aref buffer i) temp-stream))))
(if (char= in-char next-matching-char)
(progn
(setf (aref buffer buffer-fill-pointer) in-char
buffer-fill-pointer (1+ buffer-fill-pointer))
(when (= buffer-fill-pointer (length search-for))
(dotimes (i (length replace-with))
(write-char (aref replace-with i) temp-stream))
(setf buffer-fill-pointer 0)))
(progn
(dotimes (i buffer-fill-pointer)
(write-char (aref buffer i) temp-stream))
(write-char in-char temp-stream)
(setf buffer-fill-pointer 0)))
(setf next-matching-char (aref search-for buffer-fill-pointer)))))
(delete-file file)
(rename-file (concatenate 'string file temp-suffix) file))

It can be accomplished in many ways, for example with regexes. The most self-contained way I see is something like the following:
(defun replace-in-file (search-term file-path replace-term)
(let ((contents (rutil:read-file file-path)))
(with-open-file (out file-path :direction :output :if-exists :supersede)
(do* ((start 0 (+ pos (length search-term)))
(pos (search search-term contents)
(search search-term contents :start2 start)))
((null pos) (write-string (subseq contents start) out))
(format out "~A~A" (subseq contents start pos) replace-term))))
(values))
See the implementation of rutil:read-file here: https://github.com/vseloved/rutils/blob/master/core/string.lisp#L33
Also note, that this function will replace search terms with any characters, including newlines.

in chicken scheme with the ireggex egg:
(use irregex) ; irregex, the regular expression library, is one of the
; libraries included with CHICKEN.
(define (process-line line re rplc)
(irregex-replace/all re line rplc))
(define (quickrep re rplc)
(let ((line (read-line)))
(if (not (eof-object? line))
(begin
(display (process-line line re rplc))
(newline)
(quickrep re rplc)))))
(define (main args)
(quickrep (irregex (car args)) (cadr args)))
Edit: in the above example buffering the input doesn't permit the regexp to span over
many lines.
To counter that here is an even simpler implementation which scans the whole file as one string:
(use ireggex)
(use utils)
(define (process-line line re rplc)
(irregex-replace/all re line rplc))
(define (quickrep re rplc file)
(let ((line (read-all file)))
(display (process-line line re rplc))))
(define (main args)
(quickrep (irregex (car args)) (cadr args) (caddr args)))

Related

How to extract XML processing instructions in Emacs Lisp?

I would like to extract the processing instructions (particularly xml-model) from an XML file; yet both (n)xml-parse-file as well as libxml-parse-xml-region do not recognize processing instructions.
Is there a clean way to extract processing instructions or do I have to regex search for PIs?
edit: Here is a first draft of the functionality I was looking for:
(cl-defun extract-processing-instructions (&rest processing-instructions)
"Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
(interactive)
(let ((pi-re
(format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
(result))
(save-excursion
(goto-char (point-min))
(while (re-search-forward pi-re nil t)
(push (match-string 0) result)))
(nreverse result)))
(cl-defun pi-str2sexp (pi-str)
"Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
edit 2: Turns out advicing/generally building upon xml-parse-* in this matter (like suggested by #Tom Regner) is a huge pain. :(
The thing I came up with was a context manager, the idea was to use it to around-advice string-parse-tag-1 (which is at the heart of xml-parse-* (of course stand-alone use is also an option):
(cl-defmacro --replace-first-group (regex-replace-alist)
`(save-excursion
(dolist (expression ,regex-replace-alist)
(goto-char (point-min))
(replace-regexp (car expression) (cadr expression)))))
(cl-defmacro with-parsable-pi (buffer &body body)
"Context manager that treats xml processing instructions in BUFFER as normal elements."
(declare (indent defun))
`(let ((old-buffer ,buffer))
(with-temp-buffer
(insert-buffer-substring old-buffer)
(goto-char (point-min))
(--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
,#body)))
This e.g. allows calls like
(with-parsable-pi (current-buffer)
(xml-parse-tag-1))
so it is at least possible to get an element at a time; but since the XML exposed in the context manager isn't actually valid and xml-parse-* (rightfully) errors if invalid XML is encountered, it isn't possible to process more than one element at a time.
I was thinking of maybe introducing a pseudo root element or something, but the kludge spiral is ghastly enough as it is.
Another idea of course would be to run an xpath query to extract processing instructions. If there only was a solid xpath solution in Emacs Lisp..
Ok, I think I found a satisfactory solution: xmltok-forward-prolog!
So here is the code I came up with for extracting processing instructions:
(cl-defun filter-xmltok-prolog (&optional (buffer (current-buffer))
(filter-re "processing-instruction-.*"))
"Filters the output of xmltok-forward-prolog (i.e. index 0 ('type') of each array) run in the context of BUFFER against FILTER-RE. Returns a list of vectors."
(with-current-buffer buffer
(save-excursion
(goto-char (point-min))
(let ((raw-prolog-data (xmltok-forward-prolog)))
(seq-filter
#'(lambda (x)
(string-match filter-re (symbol-name (aref x 0))))
raw-prolog-data)))))
(cl-defun --merge-pi-data (pi-data)
"Meant to operate on data filtered with filter-xmltok-prolog against 'processing-instruction-.*'.
Merges processing-instruction-left/-right and returns a list of vectors holding the start/end coordinates of a processing instruction at index 1 and 2."
(let ((left (car pi-data))
(right (cadr pi-data)))
(cond
((null pi-data) nil)
(t (cons
(vector 'processing-instruction
(aref left 1) (aref right 2))
(--merge-pi-data (cddr pi-data)))))))
;; test
(--merge-pi-data '([processing-instruction-left 40 51] [processing-instruction-right 52 126]))
(cl-defun pi-str2s-exp (pi-str)
"Takes a processing instruction as a string and transforms it into a sexp structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
(cl-defun get-processing-instructions (&optional (buffer (current-buffer)))
"Extracts processing instructions from BUFFER and returns a list of sexp representations in the style of xml-parse-*."
(save-excursion
(mapcar #'pi-str2s-exp
(mapcar #'(lambda (v)
(buffer-substring (aref v 1) (aref v 2)))
(--merge-pi-data (filter-xmltok-prolog buffer))))))
(cl-defun test/get-pis-from-file (file)
(with-temp-buffer
(insert-file-contents file)
(get-processing-instructions)))
(test/get-pis-from-file "~/some/xml/file.xml")
I'm not at all an Emacs Lisp expert and this isn't at all tested thoroughly, but it works for now! :)

How to create and write to file without overwrite (just adding) in clisp

i want to use if-exits but dont know how to,add gives error,
-when i try overwrite it changes the file
(defun writeToFile (filename content)
(with-open-file (stream filename :external-format charset:iso-8859-1
:direction :output
;if-exists :add
:if-does-not-exist :create )
(format stream content)
(terpri stream)))
(loop for i from x to y
do (if (= (is_me i) 0)
(format t "i = ~d ~%" i)
(writeToFile "/home/out.txt"
(concatenate 'string (write-to-string i) " is me" )))
do (if (ime i)
(format t "~d IS ME~%" i)
(writeToFile "/home/out.txt"
(concatenate 'string (write-to-string i) " is me" ))))
Quick answer, you need to use :if-exists :append.
The Common Lisp HyperSpec has the following to say about open:
if-exists---one of :error, :new-version, :rename, :rename-and-delete, :overwrite, :append, :supersede, or nil. The default is :new-version if the version component of filespec is :newest, or :error otherwise.
And if we look at what it says about :append:
:append
Output operations on the stream destructively modify the existing file. The file pointer is initially positioned at the end of the file. If direction is :io, the file is opened in a bidirectional mode that allows both reading and writing.

elisp: read file into list of lists

I need to read a file contents into a two-dimensional list, split by newlines and spaces. For example,
a b
c d
needs to become
(list (list "a" "b") (list "c" "d"))
Currently I only know how to read the contents into a simple list determined by newlines. Whenever I need to use an element from that list, I have to split it by spaces everytime, but preferably this should be done only once beforehand.
While abo-abo's answer above is fine, it creates a temporary string with the full contents of the file, which is inefficient. If the file is very large, it is better to walk a buffer collecting data line-by-line:
(defun file-to-matrix (filename)
(with-temp-buffer
(insert-file-contents filename)
(let ((list '()))
(while (not (eobp))
(let ((beg (point)))
(move-end-of-line nil)
(push (split-string (buffer-substring beg (point)) " ") list)
(forward-char)))
(nreverse list))))
Note the use of with-temp-buffer, which avoids leaving a buffer lying around, and the use of insert-file-contents, which avoids interfering with any other buffer that might be visiting the same file.
Like this:
(with-current-buffer (find-file-noselect "~/foo")
(mapcar (lambda (x) (split-string x " " t))
(split-string
(buffer-substring-no-properties (point-min) (point-max))
"\n")))
With dash, s and f third-party libraries:
(--map (s-split " " it) (s-lines (s-chomp (f-read "FILE.TXT"))))
or:
(->> "FILE.TXT" f-read s-chomp s-lines (--map (s-split " " it)))
which are the same thing.

How do I code walk in elisp?

I'm trying to open a file and read through the sexps. If the form has setq in its first position then traverse the rest of the form adding the in the setq form to an alist.
;;; File passwords.el.gpg
(setq twitter-password "Secret"
github-password "Sauce")
My goal is to able to construct an alist from the pairs in the setq forms in teh file. How I even start?
First, I second the recommendation that you store the passwords in an actual alist and, if necessary, set whatever variables you need to based on that.
That aside, here's another solution that tries to break things out a bit. The -partition function is from the dash.el library, which I highly recommend.
You don't really need to "walk" the code, just read it in and check if its car is setq. The remainder of the form should then be alternating symbols and strings, so you simply partition them by 2 and you have your alist. (Note that the "pairs" will be proper lists as opposed to the dotted pairs in Sean's solution).
(defun setq-form-p (form)
(eq (car form) 'setq))
(defun read-file (filename)
(with-temp-buffer
(insert-file-literally filename)
(read (buffer-substring-no-properties 1 (point-max)))))
(defun credential-pairs (form)
(-partition 2 (cdr form)))
(defun read-credentials-alist (filename)
(let ((form (read-file filename)))
(credential-pairs form)))
;; usage:
(read-credentials-alist "passwords.el")
Alternatively, here's how it would work if you already had the passwords in an alist, like so
(defvar *passwords*
'((twitter-password "Secret")
(github-password "Sauce")))
And then wanted to set the variable twitter-password to "Sauce" and so on. You would just map over it:
(mapcar #'(lambda (pair)
(let ((name (car pair))
(value (cadr pair)))
(set name value)))
*passwords*)
You can use streams to read in the files (read-from-string) and then do the usual elisp hacking. The below isn't robust, but you get the idea. On a file, pwd.el that has your file, it returns the alist ((github-password . "Sauce") (twitter-password . "Secret"))
(defun readit (file)
"Read file. If it has the form (sexp [VAR VALUE]+), return
an alist of the form ((VAR . VALUE) ...)"
(let* (alist
(sexp-len
(with-temp-buffer
(insert-file-contents file)
(read-from-string (buffer-substring 1 (buffer-size)))))
(sexp (car sexp-len)))
(when (equal (car sexp) 'setq)
(setq sexp (cdr sexp))
(while sexp
(let* ((l (car sexp))
(r (cadr sexp)))
(setq alist (cons (cons l r) alist)
sexp (cddr sexp)))))
alist))
(readit "pwd.el")

Using ispell/aspell to spell check camelcased words

I need to spell check a large document containing many camelcased words. I want ispell or aspell to check if the individual words are spelled correctly.
So, in case of this word:
ScientificProgrezGoesBoink
I would love to have it suggest this instead:
ScientificProgressGoesBoink
Is there any way to do this? (And I mean, while running it on an Emacs buffer.) Note that I don't necessarily want it to suggest the complete alternative. However, if it understands that Progrez is not recognized, I would love to be able to replace that part at least, or add that word to my private dictionary, rather than including every camel-cased word into the dictionary.
I took #phils suggestions and dug around a little deeper. It turns out that if you get camelCase-mode and reconfigure some of ispell like this:
(defun ispell-get-word (following)
(when following
(camelCase-forward-word 1))
(let* ((start (progn (camelCase-backward-word 1)
(point)))
(end (progn (camelCase-forward-word 1)
(point))))
(list (buffer-substring-no-properties start end)
start end)))
then, in that case, individual camel cased words suchAsThisOne will actually be spell-checked correctly. (Unless you're at the beginning of a document -- I just found out.)
So this clearly isn't the fullblown solution, but at least it's something.
There is "--run-together" option in aspell. Hunspell can't check camelcased word.
If you read the code of aspell, you will find its algorithm actually does not split camelcase word into a list of sub-words. Maybe this algorithm is faster, but it will wrongly report word containing two character sub-word as typo. Don't waste time to tweak other aspell options. I tried and they didn't work.
So we got two problems:
aspell reports SOME camelcased words as typos
hunspell reports ALL camelcased words as typos
Solution to solve BOTH problems is to write our own predicate in Emacs Lisp.
Here is a sample predicate written for javascript:
(defun split-camel-case (word)
"Split camel case WORD into a list of strings.
Ported from 'https://github.com/fatih/camelcase/blob/master/camelcase.go'."
(let* ((case-fold-search nil)
(len (length word))
;; ten sub-words is enough
(runes [nil nil nil nil nil nil nil nil nil nil])
(runes-length 0)
(i 0)
ch
(last-class 0)
(class 0)
rlt)
;; split into fields based on class of character
(while (< i len)
(setq ch (elt word i))
(cond
;; lower case
((and (>= ch ?a) (<= ch ?z))
(setq class 1))
;; upper case
((and (>= ch ?A) (<= ch ?Z))
(setq class 2))
((and (>= ch ?0) (<= ch ?9))
(setq class 3))
(t
(setq class 4)))
(cond
((= class last-class)
(aset runes
(1- runes-length)
(concat (aref runes (1- runes-length)) (char-to-string ch))))
(t
(aset runes runes-length (char-to-string ch))
(setq runes-length (1+ runes-length))))
(setq last-class class)
;; end of while
(setq i (1+ i)))
;; handle upper case -> lower case sequences, e.g.
;; "PDFL", "oader" -> "PDF", "Loader"
(setq i 0)
(while (< i (1- runes-length))
(let* ((ch-first (aref (aref runes i) 0))
(ch-second (aref (aref runes (1+ i)) 0)))
(when (and (and (>= ch-first ?A) (<= ch-first ?Z))
(and (>= ch-second ?a) (<= ch-second ?z)))
(aset runes (1+ i) (concat (substring (aref runes i) -1) (aref runes (1+ i))))
(aset runes i (substring (aref runes i) 0 -1))))
(setq i (1+ i)))
;; construct final result
(setq i 0)
(while (< i runes-length)
(when (> (length (aref runes i)) 0)
(setq rlt (add-to-list 'rlt (aref runes i) t)))
(setq i (1+ i)))
rlt))
(defun flyspell-detect-ispell-args (&optional run-together)
"If RUN-TOGETHER is true, spell check the CamelCase words.
Please note RUN-TOGETHER will make aspell less capable. So it should only be used in prog-mode-hook."
;; force the English dictionary, support Camel Case spelling check (tested with aspell 0.6)
(let* ((args (list "--sug-mode=ultra" "--lang=en_US"))args)
(if run-together
(setq args (append args '("--run-together" "--run-together-limit=16"))))
args))
;; {{ for aspell only, hunspell does not need setup `ispell-extra-args'
(setq ispell-program-name "aspell")
(setq-default ispell-extra-args (flyspell-detect-ispell-args t))
;; }}
;; ;; {{ hunspell setup, please note we use dictionary "en_US" here
;; (setq ispell-program-name "hunspell")
;; (setq ispell-local-dictionary "en_US")
;; (setq ispell-local-dictionary-alist
;; '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)))
;; ;; }}
(defvar extra-flyspell-predicate '(lambda (word) t)
"A callback to check WORD. Return t if WORD is typo.")
(defun my-flyspell-predicate (word)
"Use aspell to check WORD. If it's typo return t."
(let* ((cmd (cond
;; aspell: `echo "helle world" | aspell pipe`
((string-match-p "aspell$" ispell-program-name)
(format "echo \"%s\" | %s pipe"
word
ispell-program-name))
;; hunspell: `echo "helle world" | hunspell -a -d en_US`
(t
(format "echo \"%s\" | %s -a -d en_US"
word
ispell-program-name))))
(cmd-output (shell-command-to-string cmd))
rlt)
;; (message "word=%s cmd=%s" word cmd)
;; (message "cmd-output=%s" cmd-output)
(cond
((string-match-p "^&" cmd-output)
;; it's a typo because at least one sub-word is typo
(setq rlt t))
(t
;; not a typo
(setq rlt nil)))
rlt))
(defun js-flyspell-verify ()
(let* ((case-fold-search nil)
(font-matched (memq (get-text-property (- (point) 1) 'face)
'(js2-function-call
js2-function-param
js2-object-property
js2-object-property-access
font-lock-variable-name-face
font-lock-string-face
font-lock-function-name-face
font-lock-builtin-face
rjsx-text
rjsx-tag
rjsx-attr)))
subwords
word
(rlt t))
(cond
((not font-matched)
(setq rlt nil))
;; ignore two character word
((< (length (setq word (thing-at-point 'word))) 2)
(setq rlt nil))
;; handle camel case word
((and (setq subwords (split-camel-case word)) (> (length subwords) 1))
(let* ((s (mapconcat (lambda (w)
(cond
;; sub-word wholse length is less than three
((< (length w) 3)
"")
;; special characters
((not (string-match-p "^[a-zA-Z]*$" w))
"")
(t
w))) subwords " ")))
(setq rlt (my-flyspell-predicate s))))
(t
(setq rlt (funcall extra-flyspell-predicate word))))
rlt))
(put 'js2-mode 'flyspell-mode-predicate 'js-flyspell-verify)
Or just use my new pacakge https://github.com/redguardtoo/wucuo
You should parse the camel cased words and split them, then check the individual spelling for each one and assemble a suggestion taking into account the single suggestion for each misspelled token. Considering that each misspelled token can have multiple suggestions this sounds a bit inefficient to me.