unibyte text buffers in emacs: encode in hexa? - emacs

I have a "text" file that has some invalid byte sequences. Emacs renders these as "\340\360", is there a way to make the mighty text processor render those in hexadecimal, for instance, e.g.: "\co0a"? Thanks.
EDIT: I will not mark my own answer as accepted, but just wanted to say that it does work fine.

Found it, just in case someone needs it too... (from here)
(setq standard-display-table (make-display-table))
(let ( (i ?\x80) hex hi low )
(while (<= i ?\xff)
(setq hex (format "%x" i))
(setq hi (elt hex 0))
(setq low (elt hex 1))
(aset standard-display-table (unibyte-char-to-multibyte i)
(vector (make-glyph-code ?\\ 'escape-glyph)
(make-glyph-code ?x 'escape-glyph)
(make-glyph-code hi 'escape-glyph)
(make-glyph-code low 'escape-glyph)))
(setq i (+ i 1))))

Related

Is there a string to 32 bit integer coding system in elisp?

Is there a coding system such that (encode-coding-string msg '??? t) would convert my message into a list of 32 bit integers?
The binary coding system converts well the message to 8 bit data, and I am aware that I could post-process it to convert the result into 32 bits. I'm just wondering if there is already a coding system that does this... :) #lazy
Ok, attempt number two.
I wrote a test generator snippet in python:
def make_test_2():
with open('test2.bin', 'wb') as f:
args = [1867, 1982]
for a in args:
f.write((a).to_bytes(4, byteorder='little'))
This is a pretty hacky (a couple of reverse calls etc). But, its only meant to be a quick and dirty prototype.
(defun read-int-list2 (filename)
(let ((result '())
(accumulator '())
(accum-count 0)
(accum-max 4))
(with-temp-buffer
(set-buffer-multibyte nil)
(setq buffer-file-coding-system 'binary) ;; find a way to set temporarily? not sure
(insert-file-contents-literally filename)
(while (< (point) (point-max))
(if (< accum-count accum-max)
(progn
(setq accumulator (cons (aref (buffer-substring-no-properties (point) (1+ (point))) 0) accumulator))
(setq accum-count (1+ accum-count))))
(if (>= accum-count accum-max) ;; four bytes accumulated, lets bundle
(progn
(let* ((s (reverse accumulator))
(e1 (elt s 0))
(e2 (elt s 1))
(e3 (elt s 2))
(e4 (elt s 3))
(val (logior (lsh e4 24) (lsh e3 16) (lsh e2 8) e1))) ;; assume little endian (intel, ARM)
;; (message (format "%x %x %x %x -> %d" e1 e2 e3 e4 val))
(setq result (cons val result))
(setq accum-count 0)
(setq accumulator '()))))
(forward-char)))
(reverse result)))
(read-int-list2 "test2.bin") ;; (1867 1982)
I only did the one test. So that needs improvement. In words:
accumulate 8 byte chars from the special temp buffer (special because binary/literal load)
once the bytes per integer count has been reached, merge the accumulated bytes into an integer by bit shifting (be aware some machines are big endian, I assume here little endian) the bytes into place.
dump merged into result list
reset the accumulator
go to step 1
i have no doubt there are many improvements, my lisp is rusty.

Display the binary version of hex value in status bar

I am doing a lot of embedded C programming right now, which means that I am writing things like this all the time:
(ioe_extra_A & 0xE7)
It would be super useful, if when put my cursor on the 0xE7, emacs would display "0b1110 0111" in the status bar or mini-buffer, so I could check that my mask is what I meant it to be.
Typically, no matter what it is I want emacs to do, 10 minutes of Googling will turn up the answer, but for this one, I have exhausted my searching skills and still not turned up an answer.
Thanks ahead of time.
This seems to work:
(defvar my-hex-idle-timer nil)
(defun my-hex-idle-status-on ()
(interactive)
(when (timerp my-hex-idle-timer)
(cancel-timer my-hex-idle-timer))
(setq my-hex-idle-timer (run-with-idle-timer 1 t 'my-hex-idle-status)))
(defun my-hex-idle-status-off ()
(interactive)
(when (timerp my-hex-idle-timer)
(cancel-timer my-hex-idle-timer)
(setq my-hex-idle-timer nil)))
(defun int-to-binary-string (i)
"convert an integer into it's binary representation in string format
By Trey Jackson, from https://stackoverflow.com/a/20577329/."
(let ((res ""))
(while (not (= i 0))
(setq res (concat (if (= 1 (logand i 1)) "1" "0") res))
(setq i (lsh i -1)))
(if (string= res "")
(setq res "0"))
res))
(defun my-hex-idle-status ()
(let ((word (thing-at-point 'word)))
(when (string-prefix-p "0x" word)
(let ((num (ignore-errors (string-to-number (substring word 2) 16))))
(message "In binary: %s" (int-to-binary-string num))))))
Type M-x my-hex-idle-status-on to turn it on.
As noted, thanks to Trey Jackson for int-to-binary-string.

Switch between frames by number or letter

I'd like to create a function that offers me numbered or lettered choices (1, 2, 3, or a, b, c) of available frames to switch to, instead of manually typing the name. Aspell would be the closest example I can think of.
Could someone please share an example of how this might be done? Lines 6 to 14 of the following function creates a list of all available frame names on the fly. Additional functions related to frame switching can be found here
(defun switch-frame (frame-to)
(interactive (list (read-string (format "From: (%s) => To: %s. Select: "
;; From:
(frame-parameter nil 'name)
;; To:
(mapcar
(lambda (frame) "print frame"
(reduce 'concat
(mapcar (lambda (s) (format "%s" s))
(list "|" (frame-parameter frame 'name) "|" )
)
)
)
(frame-list) )
)))) ;; end of interactive statement
(setq frame-from (frame-parameter nil 'name))
(let ((frames (frame-list)))
(catch 'break
(while frames
(let ((frame (car frames)))
(if (equal (frame-parameter frame 'name) frame-to)
(throw 'break (select-frame-set-input-focus frame))
(setq frames (cdr frames)))))) )
(message "Switched -- From: \"%s\" To: \"%s\"." frame-from frame-to) )
EDIT (November 13, 2014):  Here is a revised function using ido-completing-read:
(defun ido-switch-frame ()
(interactive)
(when (not (minibufferp))
(let* (
(frames (frame-list))
(frame-to (ido-completing-read "Select Frame: "
(mapcar (lambda (frame) (frame-parameter frame 'name)) frames))))
(catch 'break
(while frames
(let ((frame (car frames)))
(if (equal (frame-parameter frame 'name) frame-to)
(throw 'break (select-frame-set-input-focus frame))
(setq frames (cdr frames)))))))))
I see what you're trying to do. Here's how I've solved this problem:
Part 1
The files that you use every day should be bookmarked.
The reason is that you loose focus when you're reading any sort of menu,
even as short as you describe. After some time with bookmarks,
it becomes like touch-typing: you select the buffer without thinking about it.
You can check out this question
to see my system.
I've got about 20 important files and buffers bookmarked and reachable
in two keystrokes, e.g. μ k for keys.el and μ h for hooks.el.
A nice bonus is that bookmark-bmenu-list shows all this stuff, so I can
add/remove bookmarks easily
rename bookmarks (renaming changes binding)
it's clickable with mouse (sometimes useful)
bookmark+ allows function bookmarks, so I've got org-agenda on μ a
and magit on μ m.
And of course the dired bookmarks: source is on μ s and
org-files are on μ g.
Part 2
For the files that can't be bookmarked, I'm using:
(ido-mode)
(setq ido-enable-flex-matching t)
(global-set-key "η" 'ido-switch-buffer)
This is fast as well: you need one keystroke to call ido-switch-buffer
and around 2-3 letters to find the buffer you need, and RET to select.
I've also recently added this hack:
(add-hook 'ido-setup-hook
(lambda()
(define-key ido-buffer-completion-map "η" 'ido-next-match)))
With this you can use the same key to call ido-switch-buffer and cycle the selection.
Part 3
The actual function with lettered choices has been on my todo list for a while
now. I'll post back here when I get around to implementing it,
or maybe just copy the solution from a different answer:)
This answer describes Icicles command icicle-select-frame, which lets you choose frames by name using completion.
There is also Icicles command icicle-other-window-or-frame (C-x o), which combines commands icicle-select-frame, other-frame, and other-window. It lets you select a window or a frame, by its name or by order.
With no prefix arg or a non-zero numeric prefix arg:
If the selected frame has multiple windows then this is
other-window. Otherwise, it is other-frame.
With a zero prefix arg (e.g. C-0):
If the selected frame has multiple windows then this is
icicle-select-window with windows in the frame as candidates.
Otherwise (single-window frame), this is icicle-select-frame.
With plain C-u:
If the selected frame has multiple windows then this is
icicle-select-window with windows from all visible frames as
candidates. Otherwise, this is icicle-select-frame.
Depending upon the operating system, it may be necessary to use (select-frame-set-input-focus chosen-frame) instead of select-frame / raise-frame towards the end of the function.
(defface frame-number-face
'((t (:background "black" :foreground "red" )))
"Face for `frame-number-face`."
:group 'frame-fn)
(defface frame-name-face
'((t ( :background "black" :foreground "ForestGreen")))
"Face for `frame-name-face`."
:group 'frame-fn)
(defun select-frame-number ()
"Select a frame by number -- a maximum of 9 frames are supported."
(interactive)
(let* (
choice
chosen-frame
(n 0)
(frame-list (frame-list))
(total-frames (safe-length frame-list))
(frame-name-list
(mapcar
(lambda (frame) (cons frame (frame-parameter frame 'name)))
frame-list))
(frame-name-list-sorted
(sort
frame-name-list
#'(lambda (x y) (string< (cdr x) (cdr y)))))
(frame-number-list
(mapcar
(lambda (frame)
(setq n (1+ n))
(cons n (cdr frame)))
frame-name-list-sorted))
(pretty-list
(mapconcat 'identity
(mapcar
(lambda (x) (concat
"["
(propertize (format "%s" (car x)) 'face 'frame-number-face)
"] "
(propertize (format "%s" (cdr x)) 'face 'frame-name-face)))
frame-number-list)
" | ")) )
(message "%s" pretty-list)
(setq choice (read-char-exclusive))
(cond
((eq choice ?1)
(setq choice 1))
((eq choice ?2)
(setq choice 2))
((eq choice ?3)
(setq choice 3))
((eq choice ?4)
(setq choice 4))
((eq choice ?5)
(setq choice 5))
((eq choice ?6)
(setq choice 6))
((eq choice ?7)
(setq choice 7))
((eq choice ?8)
(setq choice 8))
((eq choice ?9)
(setq choice 9))
(t
(setq choice 10)))
(setq chosen-frame (car (nth (1- choice) frame-name-list-sorted)))
(when (> choice total-frames)
(let* (
(debug-on-quit nil)
(quit-message
(format "You must select a number between 1 and %s." total-frames)))
(signal 'quit `(,quit-message ))))
(select-frame chosen-frame)
(raise-frame chosen-frame)
chosen-frame))

How to correct files with mixed encodings?

Given a corrupted file with mixed encoding (e.g. utf-8 and latin-1), how do I configure Emacs to "project" all its symbols to a single encoding (e.g. utf-8) when saving the file?
I did the following function to automatize some of the cleaning, but I would guess I could find somewhere the information to map the symbol "é" in one encoding to "é" in utf-8 somewhere in order to improve this function (or that somebody already wrote such a function).
(defun jyby/cleanToUTF ()
"Cleaning to UTF"
(interactive)
(progn
(save-excursion (replace-regexp "अ" ""))
(save-excursion (replace-regexp "आ" ""))
(save-excursion (replace-regexp "ॆ" ""))
)
)
(global-unset-key [f11])
(global-set-key [f11] 'jyby/cleanToUTF)
I have many files "corrupted" with mixed encoding (due to copy pasting from a browser with an ill font configuration), generating the error below. I sometime clean them by hand by searching and replacing for each problematic symbol by either "" or the appropriate character, or more quickly specifying "utf-8-unix" as the encoding (which will prompt the same message next time I edit and save the file). It has become an issue as in any such corrupted file any accentuated character is replaced by a sequence which doubles in size at each save, ending up doubling the size of the file. I am using GNU Emacs 24.2.1
These default coding systems were tried to encode text
in the buffer `test_accents.org':
(utf-8-unix (30 . 4194182) (33 . 4194182) (34 . 4194182) (37
. 4194182) (40 . 4194181) (41 . 4194182) (42 . 4194182) (45
. 4194182) (48 . 4194182) (49 . 4194182) (52 . 4194182))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: ...
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text emacs-mule no-conversion
I have struggled with this in emacs many times. When I have a file that was messed up, e.g. in raw-text-unix mode, and save as utf-8, emacs complains even about text that is already clean utf-8. I haven't found a way to get it to only complain about non-utf-8.
I just found a reasonable semi-automated approach using recode:
f=mixed-file
recode -f ..utf-8 $f > /tmp/recode.out
diff $f recode.out | cat -vt
# manually fix lines of text that can't be converted to utf-8 in $f,
# and re-run recode and diff until the output diff is empty.
One helpful tool along the way is http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=342+200+224&mode=obytes
Then I just re-open the file in emacs, and it is recognized as clean unicode.
Here's something to maybe get you started:
(put 'eof-error 'error-conditions '(error eof-error))
(put 'eof-error 'error-message "End of stream")
(put 'bad-byte 'error-conditions '(error bad-byte))
(put 'bad-byte 'error-message "Not a UTF-8 byte")
(defclass stream ()
((bytes :initarg :bytes :accessor bytes-of)
(position :initform 0 :accessor position-of)))
(defun logbitp (byte bit) (not (zerop (logand byte (ash 1 bit)))))
(defmethod read-byte ((this stream) &optional eof-error eof)
(with-slots (bytes position) this
(if (< position (length bytes))
(prog1 (aref bytes position) (incf position))
(if eof-error (signal eof-error (list position)) eof))))
(defmethod unread-byte ((this stream))
(when (> (position-of this) 0) (decf (position-of this))))
(defun read-utf8-char (stream)
(let ((byte (read-byte stream 'eof-error)))
(if (not (logbitp byte 7)) byte
(let ((numbytes
(cond
((not (logbitp byte 5))
(setf byte (logand #2r11111 byte)) 1)
((not (logbitp byte 4))
(setf byte (logand #2r1111 byte)) 2)
((not (logbitp byte 3))
(setf byte (logand #2r111 byte)) 3))))
(dotimes (b numbytes byte)
(let ((next-byte (read-byte stream 'eof-error)))
(if (and (logbitp next-byte 7) (not (logbitp next-byte 6)))
(setf byte (logior (ash byte 6) (logand next-byte #2r111111)))
(signal 'bad-byte (list next-byte)))))
(signal 'bad-byte (list byte))))))
(defun load-corrupt-file (file)
(interactive "fFile to load: ")
(with-temp-buffer
(set-buffer-multibyte nil)
(insert-file-literally file)
(with-output-to-string
(set-buffer-multibyte t)
(loop with stream = (make-instance 'stream :bytes (buffer-string))
for next-char =
(condition-case err
(read-utf8-char stream)
(bad-byte (message "Fix this byte %d" (cdr err)))
(eof-error nil))
while next-char
do (write-char next-char)))))
What this code does - it loads a file with no conversion and tries to read it as if it was encoded using UTF-8, once it encounters a byte that doesn't seem like it belongs to UTF-8, it errors, and you need to handle it somehow, it's where "Fix this byte" message is). But you would need to be inventive about how you fixing it...

Using ispell/aspell to spell check camelcased words

I need to spell check a large document containing many camelcased words. I want ispell or aspell to check if the individual words are spelled correctly.
So, in case of this word:
ScientificProgrezGoesBoink
I would love to have it suggest this instead:
ScientificProgressGoesBoink
Is there any way to do this? (And I mean, while running it on an Emacs buffer.) Note that I don't necessarily want it to suggest the complete alternative. However, if it understands that Progrez is not recognized, I would love to be able to replace that part at least, or add that word to my private dictionary, rather than including every camel-cased word into the dictionary.
I took #phils suggestions and dug around a little deeper. It turns out that if you get camelCase-mode and reconfigure some of ispell like this:
(defun ispell-get-word (following)
(when following
(camelCase-forward-word 1))
(let* ((start (progn (camelCase-backward-word 1)
(point)))
(end (progn (camelCase-forward-word 1)
(point))))
(list (buffer-substring-no-properties start end)
start end)))
then, in that case, individual camel cased words suchAsThisOne will actually be spell-checked correctly. (Unless you're at the beginning of a document -- I just found out.)
So this clearly isn't the fullblown solution, but at least it's something.
There is "--run-together" option in aspell. Hunspell can't check camelcased word.
If you read the code of aspell, you will find its algorithm actually does not split camelcase word into a list of sub-words. Maybe this algorithm is faster, but it will wrongly report word containing two character sub-word as typo. Don't waste time to tweak other aspell options. I tried and they didn't work.
So we got two problems:
aspell reports SOME camelcased words as typos
hunspell reports ALL camelcased words as typos
Solution to solve BOTH problems is to write our own predicate in Emacs Lisp.
Here is a sample predicate written for javascript:
(defun split-camel-case (word)
"Split camel case WORD into a list of strings.
Ported from 'https://github.com/fatih/camelcase/blob/master/camelcase.go'."
(let* ((case-fold-search nil)
(len (length word))
;; ten sub-words is enough
(runes [nil nil nil nil nil nil nil nil nil nil])
(runes-length 0)
(i 0)
ch
(last-class 0)
(class 0)
rlt)
;; split into fields based on class of character
(while (< i len)
(setq ch (elt word i))
(cond
;; lower case
((and (>= ch ?a) (<= ch ?z))
(setq class 1))
;; upper case
((and (>= ch ?A) (<= ch ?Z))
(setq class 2))
((and (>= ch ?0) (<= ch ?9))
(setq class 3))
(t
(setq class 4)))
(cond
((= class last-class)
(aset runes
(1- runes-length)
(concat (aref runes (1- runes-length)) (char-to-string ch))))
(t
(aset runes runes-length (char-to-string ch))
(setq runes-length (1+ runes-length))))
(setq last-class class)
;; end of while
(setq i (1+ i)))
;; handle upper case -> lower case sequences, e.g.
;; "PDFL", "oader" -> "PDF", "Loader"
(setq i 0)
(while (< i (1- runes-length))
(let* ((ch-first (aref (aref runes i) 0))
(ch-second (aref (aref runes (1+ i)) 0)))
(when (and (and (>= ch-first ?A) (<= ch-first ?Z))
(and (>= ch-second ?a) (<= ch-second ?z)))
(aset runes (1+ i) (concat (substring (aref runes i) -1) (aref runes (1+ i))))
(aset runes i (substring (aref runes i) 0 -1))))
(setq i (1+ i)))
;; construct final result
(setq i 0)
(while (< i runes-length)
(when (> (length (aref runes i)) 0)
(setq rlt (add-to-list 'rlt (aref runes i) t)))
(setq i (1+ i)))
rlt))
(defun flyspell-detect-ispell-args (&optional run-together)
"If RUN-TOGETHER is true, spell check the CamelCase words.
Please note RUN-TOGETHER will make aspell less capable. So it should only be used in prog-mode-hook."
;; force the English dictionary, support Camel Case spelling check (tested with aspell 0.6)
(let* ((args (list "--sug-mode=ultra" "--lang=en_US"))args)
(if run-together
(setq args (append args '("--run-together" "--run-together-limit=16"))))
args))
;; {{ for aspell only, hunspell does not need setup `ispell-extra-args'
(setq ispell-program-name "aspell")
(setq-default ispell-extra-args (flyspell-detect-ispell-args t))
;; }}
;; ;; {{ hunspell setup, please note we use dictionary "en_US" here
;; (setq ispell-program-name "hunspell")
;; (setq ispell-local-dictionary "en_US")
;; (setq ispell-local-dictionary-alist
;; '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)))
;; ;; }}
(defvar extra-flyspell-predicate '(lambda (word) t)
"A callback to check WORD. Return t if WORD is typo.")
(defun my-flyspell-predicate (word)
"Use aspell to check WORD. If it's typo return t."
(let* ((cmd (cond
;; aspell: `echo "helle world" | aspell pipe`
((string-match-p "aspell$" ispell-program-name)
(format "echo \"%s\" | %s pipe"
word
ispell-program-name))
;; hunspell: `echo "helle world" | hunspell -a -d en_US`
(t
(format "echo \"%s\" | %s -a -d en_US"
word
ispell-program-name))))
(cmd-output (shell-command-to-string cmd))
rlt)
;; (message "word=%s cmd=%s" word cmd)
;; (message "cmd-output=%s" cmd-output)
(cond
((string-match-p "^&" cmd-output)
;; it's a typo because at least one sub-word is typo
(setq rlt t))
(t
;; not a typo
(setq rlt nil)))
rlt))
(defun js-flyspell-verify ()
(let* ((case-fold-search nil)
(font-matched (memq (get-text-property (- (point) 1) 'face)
'(js2-function-call
js2-function-param
js2-object-property
js2-object-property-access
font-lock-variable-name-face
font-lock-string-face
font-lock-function-name-face
font-lock-builtin-face
rjsx-text
rjsx-tag
rjsx-attr)))
subwords
word
(rlt t))
(cond
((not font-matched)
(setq rlt nil))
;; ignore two character word
((< (length (setq word (thing-at-point 'word))) 2)
(setq rlt nil))
;; handle camel case word
((and (setq subwords (split-camel-case word)) (> (length subwords) 1))
(let* ((s (mapconcat (lambda (w)
(cond
;; sub-word wholse length is less than three
((< (length w) 3)
"")
;; special characters
((not (string-match-p "^[a-zA-Z]*$" w))
"")
(t
w))) subwords " ")))
(setq rlt (my-flyspell-predicate s))))
(t
(setq rlt (funcall extra-flyspell-predicate word))))
rlt))
(put 'js2-mode 'flyspell-mode-predicate 'js-flyspell-verify)
Or just use my new pacakge https://github.com/redguardtoo/wucuo
You should parse the camel cased words and split them, then check the individual spelling for each one and assemble a suggestion taking into account the single suggestion for each misspelled token. Considering that each misspelled token can have multiple suggestions this sounds a bit inefficient to me.