HTML to Text Conversion in Emacs - emacs

I have a bunch of org-mode files with snippets containing HTML code and I would like to convert those to plain text.
I don't need any fancy fully automated solution, I can just past my HTML snippet into a scratch buffer if that's easier.
Here's a simple example of desired behavior:
<div><div>First Line<br>Second Line</div></div>
First Line
Second Line
What are the options available to Emacs users for such a task?

Emacs added EWW in Emacs 24.4 (2014), the Emacs Web Wowser, a built-in web browser . The shr.el library is used for rendering HTML, e.g.,
(with-temp-buffer
(insert
"<div><div>First Line<br>Second Line</div></div> ")
(shr-render-region (point-min) (point-max))
(buffer-substring-no-properties (point-min) (point-max)))
;; =>
"First Line
Second Line
"
shr-render-region uses libxml-parse-html-region which requires your Emacs has libxml2 support.

html2org package seems to get the job done
html2org function converts and replaces the HTML code as text.

Related

How can I disable *Web Beautify Errors* from the Emacs web-beautify package?

I use the web-beautify package. And, while I'm tangling my org-mode files to html, css and js files, every time I tangle these, error pop-up messages come in a new buffer. That annoys me a great deal. How do I disable those?
(Note: the message come in a *Web Beautify Errors* buffer.)
I think that all you can sensibly do is suppress this buffer when Emacs tries to display it, because displaying it is hard-coded in web-beautify-format-region which sets the DISPLAY-ERROR-BUFFER argument when it calls shell-command-on-region.
I think this will do the trick:
;; Avoid displaying the *Web Beautify Errors* buffer.
(add-to-list 'display-buffer-alist
(cons "\\*Web Beautify Errors\\*"
(cons 'display-buffer-no-window
'((allow-no-window . t)))))

Export highlighting in html when using bookmark+ in org-mode

I use the extension BookMark+ for Emacs in order to highlight permanently section of texts.
Is there anyway to mirror the highlighting when using normal html export ?
And for reveal.js export ?
I have been advised to add the txt file as well, it simplifies the help process : TextFile
It is not easy to integrate this with the org-mode export, but you can probably do this to get an html file with highlighting.
(let* ((html-buffer (htmlize-buffer))
(html (with-current-buffer html-buffer
(buffer-string))))
(with-temp-file "test.html"
(insert html))
(kill-buffer html-buffer))
(browse-url "test.html")

Close HTML tags as soon as one opens them

I'd like the corresponding, closing HTML tag to be automatically inserted whenever I open one.
So if I type
<div>
I should get
<div></div>
Without having to call to sgml-close-tag myself.
How to achieve this?
Rather than calling a hook function after every single key-stroke, it makes sense to only call it after a > was typed. This can be achieved by rebinding the > character in the keymap that sgml-mode uses.
In addition, sgml-close-tag shouldn't get called if the tag is already closed. Therefore, the following code adds a simple regexp check for that:
(defun my-sgml-insert-gt ()
"Inserts a `>' character and calls
`my-sgml-close-tag-if-necessary', leaving point where it is."
(interactive)
(insert ">")
(save-excursion (my-sgml-close-tag-if-necessary)))
(defun my-sgml-close-tag-if-necessary ()
"Calls sgml-close-tag if the tag immediately before point is
an opening tag that is not followed by a matching closing tag."
(when (looking-back "<\\s-*\\([^</> \t\r\n]+\\)[^</>]*>")
(let ((tag (match-string 1)))
(unless (and (not (sgml-unclosed-tag-p tag))
(looking-at (concat "\\s-*<\\s-*/\\s-*" tag "\\s-*>")))
(sgml-close-tag)))))
(eval-after-load "sgml-mode"
'(define-key sgml-mode-map ">" 'my-sgml-insert-gt))
If you like paredit (and if you're an emacs user, chances are you do), you may be interested in tagedit, an emacs package written by Magnar Sveen that provides paredit-like features for editing html.
The library is here: https://github.com/magnars/tagedit, and can be installed through Melpa/Marmalade (package-install tagedit).
If you enable the experimental features (tagedit-add-experimental-features), then it will automatically close tags for you and keep the corresponding closing tag text matching the opening tag text. That's on top of being able to splice, slurp, barf and all the other crazy things that paredit lets you do when working with balanced expressions...I think it's great!
I'm using yasnippet for this purpose.
To type shortcuts this answer, like <kbd>C-o</kbd>, I have the following snippet:
# -*- mode: snippet -*-
# name: kbd
# key: kbd
# --
<kbd>$0</kbd>
So I type kbdC-o and it get's expanded to <kbd></kbd> with cursor
right in the middle. You can have the same behavior for div.
You may eval this on your sgml-buffer or add ii to your sgml-hook:
(add-hook 'post-self-insert-hook
(lambda () (and (eq (char-before) ?>) (sgml-close-tag))) nil t)
Whenever you insert a ">", the function sgml-close-tag will be run for you

Customizing org-mode exports

So, I have been using org-mode for taking my research notes for some time now. I love how I can seamlessly export to both latex (for my papers) and html (for my blog). However, whenever I define macros with \newcommand with #+LATEX_HEADER, these do not show up in the HTML export at all.
I currently handle this by putting the all these commands as
(\newcommand \newcommand etc. etc.)
at the top and then manually removing the "(" and ")" from the tex file.
What I wish I could do was to keep a drawer for these commands and customize html and latex export of org mode to handle this drawer appropriately.
For example, I would add the following in the org file:
:LATEX_MACROS:
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\half}{\frac{1}{2}}
:END:
And after export, this shows up in the latex file verbatim in header section
and in the html file as
\(
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\half}{\frac{1}{2}}
\)
Here's a standalone way that works with Org, pdflatex and MathJax at the time of this writing. Make sure the drawers (or at least the LATEXMACROS drawer) are exported (this is the default), then insert near the top of the Org file:
#+OPTIONS: toc:nil
#+DRAWERS: LATEXMACROS ...
:LATEXMACROS:
##html:<div style="display: none">##
\(
\global\def\mymacro{...}
\global\def\mymacrow2args#1#2{...}
...
\)
##html:</div>##
:END:
#+TOC: headlines
This works around the following problems:
We have to use a math environment (here \( ... \)), otherwise Org escapes the TeX syntax.
We cannot use \newcommand, because LaTeX expects them to be in the preamble and MathJaX does not receive it. \newcommand can be used outside of the preamble, but then LaTeX (unlike MathJax) restricts the defined macros to the current math environment. We usually want to use them anywhere in the file.
We cannot use a plain \def because it behaves like \newcommand in terms of scoping (global for MathJax, local for LaTeX). We cannot use \xdef either because MathJax does not know it.
MathJax does not know \global but that won't prevent it from using \def which is global anyway. However, it will print a red warning for each \global in the web page. To get rid of them, we put the math environment inside an undisplayed HTML section.
For MathJax, the macros won't be defined in time for the table of content in its default location. If you use macros in headlines and want a TOC, disable the default with #+OPTIONS: toc:nil and manually add it after the drawer with #+TOC: headlines.
Caveats:
It's not possible to preview latex fragments which use custom macros, because the small TeX file Org build won't include our environment.
Unlike \newcommand, \def silently replaces anything. Also, its syntax is slightly different.
The math environment takes up some vertical space in the output of pdflatex, so we have to put it where that doesn't matter (e.g. before the first headline).
This is probably really version dependent:
Org mode may get better at exporting LaTeX macros (e.g. sending them to MathJax by being smarter when parsing #+LATEX_HEADER, providing a new export option, not escaping \newcommand and putting those into a correctly handled preamble).
MathJax may start accepting \xdef as an alias of \def or ignoring \global (so that the HTML section trick is not needed anymore).
I figured out how to do it myself. Note that this is perhaps not the most elegant solution since it does not place the latex part in the beginning of the latex file (i.e. outside \begin{document}), but it works well enough for me.
(setq org-export-blocks
(cons '(latexmacro org-export-blocks-latexmacro) org-export-blocks))
(defun org-export-blocks-latexmacro (body &rest headers)
(message "exporting latex macros")
(cond
((eq org-export-current-backend 'html) (concat "\\(" body "\\)"))
((eq org-export-current-backend 'latex) body)
(t nil))
)
Alternative solution (not stand-alone), using Org's dynamic blocks.
It has none of the caveats of my original solution (because it generates what Org actually expects for LaTeX or HTML)
It's possible to edit the macros in LaTeX mode (C-c C-')
Callback
Create a file named org-dblock-write:block-macro.el with the following content and add it to Emacs' load path.
(defun org-dblock-write:block-macro (params)
(let ((block-name (or (plist-get params :from) "macros"))
(org-buf (current-buffer)))
(with-temp-buffer
(let ((tmp-buf (current-buffer)))
(set-buffer org-buf)
(save-excursion
(org-babel-goto-named-src-block block-name)
(org-babel-mark-block)
(let ((mblock-begin (region-beginning))
(mblock-end (region-end)))
(set-buffer tmp-buf)
(insert-buffer-substring org-buf mblock-begin mblock-end)))
(set-buffer org-buf)
(insert "#+BEGIN_HTML\n\\(\n")
(insert-buffer-substring tmp-buf)
(insert "\\)\n#+END_HTML\n")
(set-buffer tmp-buf)
(beginning-of-buffer)
(while (re-search-forward "^" nil t)
(replace-match "#+LATEX_HEADER: " nil nil))
(set-buffer org-buf)
(insert-buffer-substring tmp-buf)))))
Org file
Somewhere in the file, create:
A LaTeX source block named "macros" containing your macros
An empty block-macro dynamic block
You can change the name of the source block, and use a :from <custom-name> header argument in the dynamic block. Also, note the :exports none in the source block (usually you don't want to export the LaTeX source).
#+NAME: macros
#+BEGIN_SRC latex :exports none
\newcommand\a{a}
\def\b{b}
\DeclareMathOperator\c{c}
#+END_SRC
#+BEGIN: block-macro
#+END:
Now use C-c C-c with the point on the dynamic block, and it will update to:
#+BEGIN: block-macro
#+BEGIN_HTML
\(
\newcommand\a{a}
\def\b{b}
\DeclareMathOperator\c{c}
\)
#+END_HTML
#+LATEX_HEADER: \newcommand\a{a}
#+LATEX_HEADER: \def\b{b}
#+LATEX_HEADER: \DeclareMathOperator\c{c}
#+LATEX_HEADER:
#+END:
Do this whenever the macros are modified.
I usually have a file with many custom definitions which I reuse for many documents and I want to use it also in my org documents.
The following is a modification of Blout's answer, so please read his answer for more info.
Define macro
(defun org-dblock-write:insert-latex-macros (params)
(let ((text)
(file (plist-get params :file)))
(with-temp-buffer
(insert-file file)
(setq text (split-string (buffer-string) "\n" t)))
(insert (mapconcat (lambda (str) (concat "#+LATEX_HEADER: " str)) text "\n"))
(insert "\n#+BEGIN_HTML\n\\(\n")
(insert (mapconcat 'identity text "\n"))
(insert "\n\\)\n#+END_HTML")))
Usage
File macros.tex:
\newcommand\a{a}
\def\b{b}
\DeclareMathOperator\c{c}
In org file:
#+BEGIN: insert-latex-macros :file "macros.tex"
#+BEGIN_HTML
\(
\newcommand\a{a}
\def\b{b}
\DeclareMathOperator\c{c}
\)
#+END_HTML
#+LATEX_HEADER: \newcommand\a{a}
#+LATEX_HEADER: \def\b{b}
#+LATEX_HEADER: \DeclareMathOperator\c{c}
#+LATEX_HEADER:
#+END:

Starting any Emacs buffer with a .c extension with a template

I write a lot of short throwaway programs, and one of the things I find myself doing repeatedly is typing out code like
#include <stdio.h>
#include <stdlib.h>
int main(void){
}
To save some tendon hits I was wondering if it was possible to insert a simple template above whenever I create a buffer with the extension of .c.
Put somthing like this in .emacs
(define-skeleton c-throwaway
"Throwaway C skeleton"
nil
"#include <stdio.h>\n"
"#include <stdlib.h>\n"
"\n"
"int main(void){\n"
"\n"
"}\n")
And eval (C-x C-e) it. That'll give you a
function (c-throwaway) that inserts your template.
To get this inserting automaticly you'll need to activate
auto-insert-mode. Once you do this you can describe-variable
auto-mode-alist and read up on how emacs does some of its open
file magic. Then define auto-insert-alist to apply it when you
find a new file.
Maybe something like this
(define-auto-insert "\\.\\([Cc]\\|cc\\|cpp\\)\\'" 'c-throwaway)
More detail:
Auto Insert Mode
Autotype
I use template.el from http://emacs-template.sourceforge.net/
Basically, I create a file called ~/.templates/TEMPLATE.c, and then that gets inserted into my .c files. You can also use special markup and arbitrary lisp expressions, if you don't just want to dump text into the buffer. I use this feature so that Perl modules start with "package Foo::Bar" when they are named lib/Foo/Bar.pm. Very handy.
The following function will ask for a filename and then insert a file and change to c-mode. The only problem is that you have to call this function to create the buffer instead of your normal way.
(defun defaultCtemplate(cfilename)
(interactive "sFilename: ")
(switch-to-buffer (generate-new-buffer cfilename))
(insert-file-contents "~/Desktop/test.c")
(c-mode)
)
P.S Thanks for the question, now I know how to do this for myself :)
You can also use the YASnippet template system for Emacs, which just has a builtin template called main. So while writing your code, just type main, hit TAB, and it will expand it to the form you want. (And you can always write your own snippet templates.)
I use following code to create files from templates. There are several templates, that are substitutes with actual file names, etc
This question is old, but this might help someone. Looking at this site, I copied and pasted this part into my .emacs file:
;; automatic insertion of templates
(require 'autoinsert)
(auto-insert-mode) ;;; Adds hook to find-files-hook
(setq auto-insert-directory "~/Documents/Emacs/templates/") ;;; *NOTE* Trailing slash important
;;(setq auto-insert-query nil) ;;; If you don't want to be prompted before insertion
(define-auto-insert "\.tex" "my-latex-template.tex")
(define-auto-insert "\.cpp" "my-cpp-template.cpp")
(define-auto-insert "\.h" "my-cpp-template.h")
After changing the directory and filenames, it works perfectly.
Here's how I do it (because I didn't know about auto insert mode :-)
(require 'tempo)
(setq c-new-buffer-template
'(
"#include <stdio.h>\n"
"#include <stdlib.h>\n"
"\n"
"int main(void){\n"
"\n"
"}\n"
))
(defun my-c-style ()
"My editing style for .c files."
(c-mode)
(if (zerop (buffer-size))
(tempo-template-c-skeleton)))
(setq auto-mode-alist
(cons '("\\.c\\'" . my-c-style) auto-mode-alist))
(tempo-define-template "c-skeleton" c-new-buffer-template
nil
"Insert a skeleton for a .c document")
I use a combination of Defaultcontent.el and YASnippet.el. The former fills brand-new files with default content. The latter is a sort of lightweight code-gen macro thing. Key in "for" and hit TAB and the skeleton of a for loop is inserted. Etc. You can define your own snippets pretty easily. "swi TAB" gets you a complete switch statement. And so on.
yasnippet is good at expand template in your file, and is very easy to create your snippets.
auto-insert is good at fill a new file with template, but write your own template is hard. There is a great package yatemplate bridges the gap between YASnippet and auto-insert-mode, i can write auto-insert rules with YASnippet.