elisp implementation of the "uniq -c" Unix command to count unique lines - emacs

If there is a data in region:
flower
park
flower
stone
flower
stone
stone
flower
M-x some-command should give me in different buffer:
4 flower
2 stone
1 park
This data can then be sorted by frequency or item.

I suppose a common method would be to just hash the strings and then print the contents. This approach can be easily accomplished in emacs.
;; See the emacs manual for creating a hash table test
;; https://www.gnu.org/software/emacs/manual/html_node/elisp/Defining-Hash.html
(defun case-fold-string= (a b)
(eq t (compare-strings a nil nil b nil nil t)))
(defun case-fold-string-hash (a)
(sxhash (upcase a)))
(define-hash-table-test 'case-fold
'case-fold-string= 'case-fold-string-hash)
(defun uniq (beg end)
"Print counts of strings in region."
(interactive "r")
(let ((h (make-hash-table :test 'case-fold))
(lst (split-string (buffer-substring-no-properties beg end) "\n"
'omit-nulls " "))
(output-func (if current-prefix-arg 'insert 'princ)))
(dolist (str lst)
(puthash str (1+ (gethash str h 0)) h))
(maphash (lambda (key val)
(apply output-func (list (format "%d: %s\n" val key))))
h)))
Output when selecting that text
4: flower
1: park
3: stone

I suppose there are lots of approaches you could take to this. Here's a fairly simple approach:
(defun uniq-c (beginning end)
"Like M-| uniq -c"
(interactive "r")
(let ((source (current-buffer))
(dest (generate-new-buffer "*uniq-c*"))
(case-fold-search nil))
(set-buffer dest)
(insert-buffer-substring source beginning end)
(goto-char (point-min))
(while (let* ((line (buffer-substring (line-beginning-position)
(line-end-position)))
(pattern (concat "^" (regexp-quote line) "$"))
(count (count-matches pattern (point) (point-max))))
(insert (format "%d " count))
(forward-line 1)
(flush-lines pattern)
(not (eobp))))
(pop-to-buffer dest)))

It is similar to uniq -c in bash.
Then why not use uniq -c?
With the region highlighted, M-| "sort | uniq -c", will run that command on the current region. The results will show in the minibuffer and will be listed in *Messages* buffer. Adding a prefix arg will insert the results into the current buffer.

Related

Split lines of current paragraph in Emacs

I want to add a function (para2lines) to Emacs by which I can split the current paragraph into its sentences and print them line by line in a separate buffer. Following is code in Racket/Scheme:
(define (p2l paraString)
(define lst (string-split paraString ". "))
(for ((i lst))
(displayln i)))
Testing:
(p2l "This is a test. For checking only. Only three lines.")
Output:
This is a test.
For checking only.
Only three lines.
In Emacs Lisp, I could manage following code:
(defun pl (ss)
(interactive)
(let ((lst (split-string (ss))))
(while lst
(print (pop lst)))))
But I do not know how to get the text from the paragraph with current position. How can I correct this function?
Edit: basically, I want to read it as separate lines but want to save it as paragraph.
Here's an example that might help you on your way. It will do your conversion to the current paragraph (i.e. where the cursor is positioned), rather than to a new buffer. You could modify this to pass a string to your function if that's what you require.
(defun p2l ()
"Format current paragraph into single lines."
(interactive "*")
(save-excursion
(forward-paragraph)
(let ((foo (point)))
(backward-paragraph)
(replace-regexp "\n" " " nil (1+ (point)) foo)
(backward-paragraph)
(replace-regexp "\\. ?" ".\n" nil (point) foo))))
I would just run Emacs commands or write a macro to convert a paragraph to single-sentence lines, but maybe you are really just wanting to read wrapped paragraphs as lines, thus the need to have an Emacs command.
Here's something that will grab the current paragraph, insert a new buffer *Lines*, and then convert sentences to lines.
(defun para-lines ()
"Split sentences of paragraph to lines in new buffer."
(interactive)
;; Move the paragraph to a new buffer.
(let ((b (generate-new-buffer "*Lines*")))
(with-output-to-temp-buffer b
(let ((beg (save-excursion (forward-paragraph -1) (point)))
(end (save-excursion (forward-paragraph +1) (point))))
(princ (buffer-substring-no-properties beg end))))
;; Switch to new buffer
(with-current-buffer b
;; Since the name starts with "*", shut off Help Mode
(fundamental-mode)
;; Make sure buffer is writable
(setq buffer-read-only nil)
;; From the start of the buffer
(goto-char (point-min))
;; While not at the end of the buffer
(while (< (point) (point-max))
(forward-sentence 1)
;; Delete spaces between sentences before making new new line
(delete-horizontal-space)
;; Don't add a new line, if already at the end of the line
(unless (= (line-end-position) (point))
(newline))))))
To avoid using forward-sentence, and just use a regular expression, use re-search-forward. For instance, to match semi-colons as well as periods.
(defun para-lines ()
"Split sentences of paragraph to lines in new buffer."
(interactive)
;; Move the paragraph to a new buffer.
(let ((b (generate-new-buffer "*Lines*")))
(with-output-to-temp-buffer b
(let ((beg (save-excursion (forward-paragraph -1) (point)))
(end (save-excursion (forward-paragraph +1) (point))))
(princ (buffer-substring-no-properties beg end))))
;; Switch to new buffer
(with-current-buffer b
;; Since the name starts with "*", shut off Help Mode
(fundamental-mode)
;; Make sure buffer is writable
(setq buffer-read-only nil)
;; From the start of the buffer
(goto-char (point-min))
;; While not at the end of the buffer
(while (< (point) (point-max))
(re-search-forward "[.;]\\s-+" nil t)
;; Delete spaces between sentences before making new new line
(delete-horizontal-space)
;; Don't add a new line, if already at the end of the line
(unless (= (line-end-position) (point))
(newline))))))

How do I find and insert the average of multiple lines in Emacs / Elisp?

I have a file that looks similar to:
AT 4
AT 5.6
AT 7.2
EG 6
EG 6
S 2
OP 3
OP 1.2
OP 40
and I want to compute the average (I've just made these averages up) for each of the titles and output something like:
AT 5.42
EG 6
S 2
OP 32.1
The file is in order, so all headings will be right under each other, but there are a varying amount of headings. eg. AT has three, but S only has one.
How would I sum together each of these lines, divide by the number of lines, and then replace all of the lines in emacs / elisp?
I decided to try to solve this question while still learning elisp myself. There is perhaps more efficient ways to solve this.
After defining the function, you'll want to set the region around the scores. (If the whole file, then M-<, C-SPC, M->) I figured this would be cleanest since your scores may be in the middle of other text. My function will compute the averages and then insert the answer at the end of the region.
(defun my/averages (beg end)
(interactive "r")
(let ((avgs (make-hash-table :test 'equal))
(answer "")
(curval nil)
(key nil)
(val nil))
; Process each line in region
(save-excursion
(goto-char beg)
(while (< (point) end)
; split line
(let ((split-line
(split-string
(buffer-substring-no-properties
(line-beginning-position) (line-end-position)))))
(setq
key (car split-line)
val (string-to-number (cadr split-line))
curval (gethash key avgs '(0 . 0)))
(puthash key (cons (+ (car curval) 1) (+ (cdr curval) val )) avgs))
; Advance to next line
(forward-line))
; Accumulate answer string
(maphash
(lambda (k v)
(setq answer
(concat answer "\n" k " "
(number-to-string (/ (cdr v) (car v))))))
avgs)
(end-of-line)
(insert answer))))
As a warning, I have zero error checking for lines that do not strictly meet your formatting.
You need libraries dash, s, f, and their functions -map, -sum, -group-by, s-split, f-read-text.
;; average
(defun avg (values)
(/ (-sum values) (length values)))
(-map (lambda (item)
(list (car item)
(avg (-map (lambda (x)
(string-to-number (cadr x)))
(cdr item)))))
(-group-by (lambda (item)
(car item))
(-map (lambda (line)
(s-split " " line t))
(s-split "[\n\r]"
(f-read-text "file.txt")
t))))
Presuming your file is called "file.txt", the code above returns (("AT" 5.6000000000000005) ("EG" 6) ("S" 2) ("OP" 14.733333333333334)).
After that you can convert that into text:
(s-join "\n"
(-map (lambda (item)
(s-join " "
(list (car item)
(number-to-string (cadr item)))))
This string you can write into file using f-write-text. Don't forget you can format ugly floating-point numbers like that:
(format "%.2f" 3.33333333) ; => "3.33"

How do I get all paragraphs in Emacs Lisp?

I am defining a major mode that works on paragraphs of the following nature:
: Identifier
1. some text
2. ...
3. some more text
: New Identifier
: Another Identifier
some text
I want to write a defun called get-paragraphs that will return a list that looks like:
( ("Identifier", ("1. some text", "2. ...", "3. some more text")),
("New Identifier", ()),
("Another Identifier", ("some text"))
)
How do I go about cutting up the text like this in Emacs Lisp:
Is there a function to iterate through them (and subsequently chop them up to my liking)? Should I use regular expressions? Is there an easier way?
You should iterate over the buffer and collect your text (untested):
(defun get-paragraphs ()
(save-excursion
(goto-char (point-min))
(let ((ret '()))
(while (search-forward-regexp "^: " nil t)
(let ((header (buffer-substring-no-properties (point) (line-end-position)))
(body '()))
(forward-line)
(while (not (looking-at "^$"))
(push (buffer-substring-no-properties (point) (line-end-position)) body)
(forward-line))
(push (cons header (list (reverse body))) ret)))
(nreverse ret))))
Here, take this Lisp code:
(defun chopchop ()
(mapcar
(lambda (x)
(destructuring-bind (head &rest tail)
(split-string x "\n" t)
(list head tail)))
(split-string (buffer-substring-no-properties
(point-min)
(point-max)) "\n?: *" t)))

Has anyone used elisp as a script language? [duplicate]

In Python, you might do something like
fout = open('out','w')
fin = open('in')
for line in fin:
fout.write(process(line)+"\n")
fin.close()
fout.close()
(I think it would be similar in many other languages as well).
In Emacs Lisp, would you do something like
(find-file 'out')
(setq fout (current-buffer)
(find-file 'in')
(setq fin (current-buffer)
(while moreLines
(setq begin (point))
(move-end-of-line 1)
(setq line (buffer-substring-no-properties begin (point))
;; maybe
(print (process line) fout)
;; or
(save-excursion
(set-buffer fout)
(insert (process line)))
(setq moreLines (= 0 (forward-line 1))))
(kill-buffer fin)
(kill-buffer fout)
which I got inspiration (and code) from Emacs Lisp: Process a File line-by-line. Or should I try something entirely different? And how to remove the "" from the print statement?
If you actually want batch processing of stdin and sending the result to stdout, you can use the --script command line option to Emacs, which will enable you to write code that reads from stdin and writes to stdout and stderr.
Here is an example program which is like cat, except that it reverses each line:
#!/usr/local/bin/emacs --script
;;-*- mode: emacs-lisp;-*-
(defun process (string)
"just reverse the string"
(concat (nreverse (string-to-list string))))
(condition-case nil
(let (line)
;; commented out b/c not relevant for `cat`, but potentially useful
;; (princ "argv is ")
;; (princ argv)
;; (princ "\n")
;; (princ "command-line-args is" )
;; (princ command-line-args)
;; (princ "\n")
(while (setq line (read-from-minibuffer ""))
(princ (process line))
(princ "\n")))
(error nil))
Now, if you had a file named stuff.txt which contained
abcd
1234
xyz
And you invoked the shell script written above like so (assuming it is named rcat):
rcat < stuff.txt
you will see the following printed to stdout:
dcba
4321
zyx
So, contrary to popular belief, you can actually do batch file processing on stdin and not actually have to read the entire file in at once.
Here's what I came up with. Looks a lot more idiomatic to me:
(with-temp-buffer
(let ((dest-buffer (current-buffer)))
(with-temp-buffer
(insert-file-contents "/path/to/source/file")
(while (search-forward-regexp ".*\n\\|.+" nil t)
(let ((line (match-string 0)))
(with-current-buffer dest-buffer
(insert (process line)))))))
(write-file "/path/to/dest/file" nil))
Emacs Lisp is not suitable for processing file-streams. The whole file must be read at once:
(defun my-line-fun (line)
(concat "prefix: " line))
(let* ((in-file "in")
(out-file "out")
(lines (with-temp-buffer
(insert-file-contents in-file)
(split-string (buffer-string) "\n\r?"))))
(with-temp-file out-file
(mapconcat 'my-line-fun lines "\n")))

Org-Mode table to s-expressions

I would like to export from Org-Mode tables to s-expressions.
| first | second | thrid |
|--------+--------+--------|
| value1 | value2 | value3 |
| value4 | value5 | value6 |
Would turn into:
((:FIRST "value1" :SECOND "value2" :THIRD "value3")
(:FIRST "value4" :SECOND "value5" :THIRD "value6"))
I plan on writing such a setup if it doesn't exist yet but figured I'd tap into the stackoverflow before I start reinventing the wheel.
This does the trick. It has minimal error checking.
The interface to use is either the programmatic interface:
(org-table-to-sexp <location-of-beginning-of-table> <location-of-end-of-table>)
In which case it'll return the sexp you requested.
If you wanted an interactive usage, you can call the following command to operate on the table in the region. So, set the mark at the beginning of the table, move to the end, and type:
M-x insert-org-table-to-sexp
That will insert the desired sexp immediately after the table in the current buffer.
Here is the code:
(defun org-table-to-sexp-parse-line ()
"Helper, returns the current line as a list of strings"
(save-excursion
(save-match-data
(let ((result nil)
(end-of-line (save-excursion (end-of-line) (point))))
(beginning-of-line)
(while (re-search-forward "\\([^|]*\\)|" end-of-line t)
(let ((match (mapconcat 'identity (split-string (match-string-no-properties 1)) " ")))
(if (< 0 (length match))
;; really want to strip spaces from front and back
(push match result))))
(reverse result)))))
(require 'cl)
(defun org-table-to-sexp (b e)
"Parse an org-mode table to sexp"
(save-excursion
(save-match-data
(goto-char b)
(let ((headers (mapcar
(lambda (str)
(make-symbol (concat ":" (upcase str))))
(org-table-to-sexp-parse-line)))
(sexp nil))
(forward-line 1) ;skip |--+--+--| line
(while (< (point) e)
(forward-line 1)
(let ((line-result nil))
(mapcar* (lambda (h e)
(push h line-result)
(push e line-result))
headers
(org-table-to-sexp-parse-line))
(if line-result
(push (reverse line-result)
sexp))))
sexp))))
(defun insert-org-table-to-sexp (b e)
"Convert the table specified by the region and insert the sexp after the table"
(interactive "r")
(goto-char (max b e))
(print (org-table-to-sexp b e) (current-buffer)))