Emacs: Convert string with multiple manipulations - emacs

I know, ridiculous title. Here's what I need to do:
Take strings that look like this:
* [1.1 Training]()
* [1.1.1 Special Training by Category]()
* [A. New Hire Orientation Program]()
And add .md suffix to each and place results in the parens, like this:
* [1.1 Training](training.md)
* [1.1.1 Special Training by Category](special_training_by_category.md)
* [A. New Hire Orientation Program](new_hire_orientation_program.md)
I'm using Emacs and need a macro that will:
Search each string (line)
Copy everything after the chapter number, chapter letter, or period, and before the closing " ] "
Transform it to lower case, remove spaces and replace them with underscores
Add " .md " to the results
Paste results between the parens
I'm trying to learn regex and string manipulation, but this one seems pretty involved. Any help is appreciated.
Thanks

Depending on how often you need to do this, you might want to write a command for it. But it can be done with query-replace-regexp (C-M-%) with regexp \[\([[:alnum:].]+\) \([[:alnum:] ]+\)\]() and replacement [\1 \2](\,(replace-regexp-in-string " " "_" (downcase \2)).md). Notice the \,: it allows you to use elisp to transform strings.
Links:
https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html
https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Backslash.html
https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html

Related

officejs : Search Word document using regular expression

I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.

How to escape double quote?

In org mode, if I want to format text a monospace verbatim, i.e. ~...~, if it is inside quotes: ~"..."~, it is not formatted (left as is).
Also, are quotes a reserved symbol, if so, what do they mean? (they don't seem to affect the generated HTML / inside Emacs display).
The culprit in this case is the regular expression in org-emph-re org-verbatim-re, responsible for determining if a sequence of characters in the document is to be set verbatim or not.
org-verbatim-re is a variable defined in `org.el'.
Its value is
"\([ ('\"{]\|^\)\(\([=~]\)\([^
\n,\"']\|[^
\n,\"'].?\(?:\n.?\)\{0,1\}[^
\n,\"']\)\3\)\([- .,:!?;'\")}\]\|$\)"
quotes and double quotes are explicitly forbidden inside verbatim characters =~ by
[^
\n,\"']\|[^
\n,\"']
I found discussions dating back 3 years comming to the conclusion that you have to tinker with this regular expression and set the variable org-emph-re/org-verbatim-re to something that matches your wishes in your emacs setup (maybe a file local variable works as well). You can experiment by excluding double quotes from the excluding character classes and outside matches as in
"\([ ('{]\|^\)\(\([*/_=~+]\)\([^
\n,']\|[^
\n,'].?\(?:\n.?\)\{0,1\}[^
\n,']\)\3\)\([- .,:!?;')}\]\|$\)"
but looking at that regex, heaven knows what happens to complex documents -- you have to try...
Edit: as it happens, if I evalute the following as region, quotes inside = are exported correctly, but nothing else is :-), I investigate further when I have more time.
(setq org-emph-re "\([ ('{]\|^\)\(\([*/_=~+]\)\([^
\n,']\|[^
\n,'].?\(?:\n.?\)\{0,1\}[^
\n,']\)\3\)\([- .,:!?;')}]\|$\)")
Edit 2:: Got it to work by changing org.el directly:
Change the line following (defvar org-emphasis-regexp-components from '(" \t('\"{" "- \t.,:!?;'\")}\\" " \t\r\n,\"'" "." 1) to '(" \t('{" "- \t.,:!?;')}\\" " \t\r\n,'" "." 1) and recompile org then restart emacs.
This was a defcustom prior to the 8.0 release, it isn't anymore, so you have to live with this manual modification.
regards,
Tom
Finally, I found a solution from http://comments.gmane.org/gmane.emacs.orgmode/82571
According to that thread, the regexp for verbatim is built from variable org-emphasis-regexp-components, which defines legal characters before, after, at the border of, or in the body of emphasis; and verbatim is one of the emphasis environment in org mode.
A workable setting given by that thread:
(setcar (nthcdr 2 org-emphasis-regexp-components) " \t\n,")
(custom-set-variables `(org-emphasis-alist ',org-emphasis-alist))
For small amounts of characters which have some unwanted effect in Emacs org-mode (because being metacharacters) it may be helpful to have a look at special symbols in org-mode (org-entities.el).
So for example " can be encoded by \quot{} (where the braces pair at the end is not mandatory, but needed if no whitespace follows).
So instead ="..."= you would write =\quot{}...\quot{}=.
That is some typing more and looks pretty ugly. But for the latter org-mode has a solution: by C-c C-x \ you can toggle a display magic for those symbols. If the magic is active, so directly after typing \quot{} resp. \quot{} a " will be displayed.
Besides, this symbols list can easily be extended, f.e.
(add-to-list 'org-entities
'("backslash" "\\textbackslash" nil "\\" "\\" "\\" "\\"))
Nevertheless I am heavily missing easier escaping in org-mode, besides the above solution and besides escaping a whole line by a : at its beginning.
I'd be happy if =verbatim= in all cases would leave the text between the ='s unchanged. Not =this*bold*text=, but =this *bold* text=. Like we know that from each well-designed markup/-down language.
But, of course, this is better placed at the org-mode development pages. Ideally with a fitting patch... :-)
I've met similar problem, and thanks #chaiko for a basic solution. However, #chaiko's solution only work for org-mode's fontification, it doesn't affect org-export. To get correct exported document, you need to do some more extra hack to org-mode's parser by (org-element--set-regexps).
So the full code snippets should be something like:
(setcar (nthcdr 2 org-emphasis-regexp-components) " \t\n\r")
(custom-set-variables `(org-emphasis-alist ',org-emphasis-alist))
(org-element--set-regexps)
I've integrated this to my oh-my-emacs project: https://github.com/xiaohanyu/oh-my-emacs/blob/e82fce10d47f7256df6d39e32ca288d0ec97a764/core/ome-org.org#code-block-fontification .

Regular Expression for number.(space), objective-c

I have an NSArray of lines (objective-c iphone), and I'm trying to find the line which starts with a number, followed by a dot and a space, but can have any number of spaces (including none) before it, and have any text following it eg:
1. random text
2. text random
3.
what regular expression would I use to get this? (I'm trying to learn it, and I needed the above expression anyway, so I thought I'd use it as an example)
With C#:
#"^ *[0-9]+\. "
It doesn't check for the presence of something after the ., so this is legal:
1.(space)
If you delete the # and escape the \ it should work with other languages (it is pretty "down-to-earth" as RegExpes go)
I may suggest (Perl-compatible regexp):
^\s*\d+\.\s
At the beginning of a line:
Any number (0-n) of spaces
One or more digits
A dot
A space
Something like
^\s*\d+\.
But it depends on the language.
/^\s*[0-9]+\.\s+/
would be my guess providing you don't have any space before the number

Comment token inside string

In pig etc. /* begins a block comment. If I put this in a regex string 'blah/blah/*', emacs thinks this is a block comment and syntax highlighting goes to hell. I am not familiar with elisp but I am certain that is a problem with script that is providing annotations for pig.
How can I fix it?
phils pointed out a better designed major mode in the question comments, but since you are still curious: The pig mode version you are using doesn't have the syntax table set up right. The most reliable way for emacs to recognize comments and strings is to use the syntax table to map characters to start/end of comments and strings. The version you are using is trying to do it with font-lock.
You have to escape the \'es and the *. All the characters that are used by the regexp engine, have to be escaped.
If you want to match "\", you might have to write "\\" when using replace-regexp interactively and "\\\\" if you use it as a lisp function.
(I even have to escape my escapes in this comment, so there are 8 escapes in the last escape sequence above)

How to use '^#' in Vim scripts?

I'm trying to work around a problem with using ^# (i.e., <ctrl-#>) characters in Vim scripts. I can insert them into a script, but when the script runs it seems the line is truncated at the point where a ^# was located.
My kludgy solution so far is to have a ^# stored in a variable, then reference the variable in the script whenever I would have quoted a literal ^#. Can someone tell me what's going on here? Is there a better way around this problem?
That is one reason why I never use raw special character values in scripts. While ^# does not work, string <C-#> in mappings works as expected, so you may use one of
nnoremap <C-#> {rhs}
nnoremap <Nul> {rhs}
It is strange, but you cannot use <Char-0x0> here. Some notes about null byte in strings:
Inserting null byte into string truncates it: vim uses old C-style strigs that end with null byte, thus it cannot appear in strings. These strings are very inefficient, so if you want to generate a very large text, try accumulating it into a list of lines (using setline is very fast as buffer is represented as a list of lines).
Most functions that return list of strings (like readfile, getline(start, end)) or take list of strings (like writefile, setline, append) treat \n (NL) as Null. It is also the internal representation of buffer lines, see :h NL-used-for-Nul.
If you try to insert \n character into the command-line, you will get Null shown (but this is really a newline). If you want to edit a file that has \n in a filename (it is possible on *nix), you will need to prepend newline with backslash.
The byte ctrl-# is also known as '\0'. Many languages, programs, etc. use it as an "end of string" marker, so it's not surprising that vim gets confused there. If you must use this byte in the middle of a script string, it sounds like your workaround is a decent one.