How to efficiently escape meta-characters in vim search - perl

Perl provides quotemeta function, as well as the possibility to surround strings using the \Qlots-of-meta-characters\E construct, to make sure that all the characters between \Q and \E are interpreted as literals.
Very often I search strings full of meta characters in Vim. It's counterproductive to escape every special character individually. Is there anything like /\Qstring-to-search\E in Vim, which would make life easier?

You can use /\Vstring-to-search.
There are two caveats:
\ is special. You can still use all regex metacharacters by putting a \ in front of them.
There is no \E equivalent. \V affects the rest of the regex.
See :help /\V.
You could combine this with the code from the answer in https://stackoverflow.com/a/676619/1848654 as follows:
vnoremap <C-f> "hy/\V<C-r>=substitute(#h,'[\/]','\\&','g')<cr>
The idea is:
Copy ("yank") the selected text into register h: "hy
Start search mode: /
Prefill the beginning of the regex: \V
Insert the contents of a register: <C-r>
Don't use a real register; take the result of evaluating an expression instead: =
Our expression (terminated by <cr>) is: substitute(#h,'[\/]','\\&','g')
Take the contents of the h register: #h
Apply a substitution. Insert a \ before every \ and /: substitute(...,'[\/]','\\&','g')

Related

How to use capture groups with sed?

I'm trying to replace some text in a file using sed but I'm having troubles.
sed -ir 's/(\$hello = )true/\1false/' /path/to/my/file.txt gives the error sed: -e expression #1, char 27: invalid reference \1 on 's' command's RHS.
I want to replace $hello = true with $hello = false, so in order to avoid typing $hello = twice I wanted to use capture groups - which isn't working.
What am I doing wrong?
You don't have to escape parentheses in extended regex mode, if it was your intent with the r into -ir, but actually if you want both options -i and -r then you have to keep them apart or use -ri instead of -ir because the latter interprets the part after -i as an optional backup suffix.
From sed manual
Because -i takes an optional argument, it should
not be followed by other short options:
sed -Ei '...' FILE
Same as -E -i with no backup suffix - FILE will be edited in-place without creating a backup.
sed -iE '...' FILE
This is equivalent to --in-place=E, creating FILEE as backup
of FILE
You must escape the parenthesis with backslashes \(...\), to be used as grouping.
See THE SED FAQ, section "3.1.2. Escape characters on the right side of "s///"" has an example:
3.1.2. Escape characters on the right side of "s///"
The right-hand side (the replacement part) in "s/find/replace/" is
almost always a string literal, with no interpolation of these
metacharacters:
. ^ $ [ ] { } ( ) ? + * |
Three things are interpolated: ampersand (&), backreferences, and
options for special seds. An ampersand on the RHS is replaced by
the entire expression matched on the LHS. There is never any
reason to use grouping like this:
s/\(some-complex-regex\)/one two \1 three/
And later in section "F. GNU sed v2.05 and higher versions":
F. GNU sed v2.05 and higher versions
...
Undocumented -r switch:
Beginning with version 3.02, GNU sed has an undocumented -r switch
(undocumented till version 4.0), activating Extended Regular
Expressions in the following manner:
? - 0 or 1 occurrence of previous character
+ - 1 or more occurrences of previous character
| - matches the string on either side, e.g., foo|bar
(...) - enable grouping without backslash
{...} - enable interval expression without backslash
When the -r switch (mnemonic: "regular expression") is used, prefix
these symbols with a backslash to disable the special meaning.
For documentation of regular expression syntax used in (GNU) sed, see Overview of basic regular expression syntax
5.3 Overview of basic regular expression syntax
...
\(regexp\)
Groups the inner regexp as a whole, this is used to:
Apply postfix operators, like (abcd)*: this will search for zero or more whole sequences of ‘abcd’, while abcd* would search for ‘abc’ followed by zero or more occurrences of ‘d’. Note that support for (abcd)* is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
Use back references (see below).

Sed special characters

I wanted to change out the home directory that I has in some files.
I figured that I would have to escape the curly brackets - like this :
sed -i 's/\$\{HOME\}/\/casper\/home/g' /var/tmp/casper.txt
Everything that I experience in sed tells me that I would have to escape the brackets, but I did not. The only thing that I needed to escape is the Dollar sign.
What kind of regex engine does sed use and where is a list of the special characters that need to be escaped and what not does not need to be escaped.
sed -i 's/\${HOME}/\/casper\/home/g' /var/tmp/casper.txt
I am not an expert, just an user of regexps, so I cannot give you strict technical answer, but according to info sed, if invoked without -r (--regexp-extended), than it uses basic regular expressions. If -r is put, then extended regular expressions are used. There is an explanation in info's Appendix:
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, braces ('{}'),
and '|'. While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them _to match a
literal character_.
By the way, if you need to put a slash inside sed's regex, it is very useful to use different symbol as a separator. It doesn't need to be a slash, you can choose the symbol arbibtrary, for example it could be #. So instead this:
sed -i 's/\${HOME}/\/casper\/home/g' /var/tmp/casper.txt
you can make this:
sed -i 's#\${HOME}#/casper/home#g' /var/tmp/casper.txt

Why does sed command contain at symbols

I don't understand why the following sed command contains an # symbol:
sed 's#session\s*required\s*pam_loginuid.so#session optional pam_loginuid.so#g' -i /etc/pam.d/sshd
I've looked at /etc/pam.d/sshd for the before/after effects of this command:
BEFORE:
...
# Set the loginuid process attribute.
session required pam_loginuid.so
...
AFTER:
...
# Set the loginuid process attribute.
session optional pam_loginuid.so
....
Is the # symbol possibly part of regex or sed syntax?
Could not find any doco on this.
Note: The above sed command is actually part of a Dockerfile RUN command in tutorial:
https://docs.docker.com/examples/running_ssh_service/
These are alternate delimiters for the regular expressions and replacement string. Handy when your regex or replacement string includes '/'.
From the sed manual
The syntax of the s (as in substitute) command is ‘s/regexp/replacement/flags’. The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.
From the POSIX specification:
[2addr]s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
as other says, it is another delimiter than traditionnal / in the s///action. This is usually used when / is found/part of the pattern like searching (or replacing by) a unix path that need to escape the /
s/\/my\/path/\/Your\/path/
# same as
s#my/path#/Your/path#
You often use a character that is not alpha numeric (but you can). The only (logical) constraint is to avoid a special character (aka special meaning like ^$[]{}()+\*.) for regex that make it difficult to read (but functionnal) and without the feature of this character in the pattern
echo "b(a)l" | sed 's(.)()('

PostgreSQL regexp_replace with matched expression

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?
OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)
it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')

How can I get sed to remove `\` followed by anything?

I am trying to write a sed script to convert LaTeX coded tables into tab delimited tables.
To do this I need to convert & into \t and strip out anything that is preceded by \.
This is what I have so far:
s/&/\t/g
s/\*/" "/g
The first line works as intended. In the second line I try to replace \ followed by anything with a space but it doesn't alter the lines with \ in them.
Any suggestions are appreciated. Also, can you briefly explain what suggested scripts "say"? I am new to sed and that really helps with the learning process!
Thanks
Assuming you're running this as a sed script, and not directly on the command line:
s/\\.*/ /g
Explanation:
\\ - double backslash to match a literal backslash (a single \ is interpreted as "escape the following character", followed by a .* (. - match any single character, * - arbitrarily many times).
You need to escape the backslash as it is a special character.
If you want to denote "any character" you need to use . (a period)
the second expression should be:
s/\\.//g
I hope I understood your intention and you want to strip the character after the backslash,
if you want to delete all the characters in the line after the backslash add a star (*)
after the period.