PostgreSQL regexp_replace with matched expression - postgresql

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?

OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)

it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')

Related

Oddities in fail2ban regex

This appears to be a bug in fail2ban, with different behaviour between the fail2ban-regex tool and a failregex filter
I am attempting to develop a new regex rule for fail2ban, to match:
\"%20and%20\"x\"%3D\"x
When using fail2ban-regex, this appears to produce the desired result:
^<HOST>.*GET.*\\"%20and%20\\"x\\"%3D\\"x.* 200.*$
As does this:
^<HOST>.*GET.*\\\"%20and%20\\\"x\\\"%3D\\\"x.* 200.*$
However, when I put either of these into a filter, I get the following error:
Failed during configuration: '%' must be followed by '%' or '(', found:…
To have this work in a filter you have to double-up the ‘%’, ie ‘%%’:
^<HOST>.*GET.*\\\"%%20and%%20\\\"x\\\"%%3D\\\"x.* 200.*$
While this gets the required hits running as a filter, it gets none running through fail2ban-regex.
I tried the \\\\ as Andre suggested below, but this gets no results in fail2ban-regex.
So, as this appears to be differential behaviour, I am going to file it as a bug.
According to Python's own site a singe backslash "\" has to be written as "\\\\" and there's no mention of %.
Regular expressions use the backslash character ('') to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python’s usage of
the same character for the same purpose in string literals; for
example, to match a literal backslash, one might have to write '\\'
as the pattern string, because the regular expression must be \, and
each backslash must be expressed as \ inside a regular Python string
literal
I would just go with:
failregex = (?i)^<HOST> -.*"(GET|POST|HEAD|PUT).*20and.*3d.*$
the .* wil match anything inbetween anyways and (?i) makes the entire regex case-insensitive

confused about what must be escaped for sed

I want to replace specific strings in php files automatically using sed. Some work, and some do not. I already investigated this is not an issue with the replacement string but with the string that is to be replaced. I already tried to escape [ and ] with no success. It seems to be the whitespace within the () - not whitespaces in general. The first whitespaces (around the = ) do not have any problems. Please can someone point me to the problem:
sed -e "1,\$s/$adm = substr($path . rawurlencode($upload['name']) , 16);/$adm = rawurlencode($upload['name']); # fix 23/g" -i administration/identify.php
I already tried to shorten the string which should be replaced and the result was if I cut it directly behind $path it works, with the following whitespace it does not. Escaping whitespace has no effect...
what must be escaped for sed
The following characters have special meaning in sed and have to be escaped with \ for the regex to be taken literally:
\
[
the character used in separating s command parts, ie. / here
.
*
& only replacement string
Newline character is handled specially as the end of the string, but can be replaced for \n.
So first escape all special characters in input and then pass it to sed:
rgx="$adm = substr($path . rawurlencode($upload['name']) , 16);"
rgx_escaped=$(sed 's/[\\\[\.\*\/&]/\\&/g' <<<"$rgx")
sed "s/$rgx_escaped/ etc."
See Escape a string for a sed replace pattern for a generic escaping solution.
You may use
sed -i 's/\$adm = substr(\$path \. rawurlencode(\$upload\['"'"'name'"'"']) , 16);/$adm = rawurlencode($upload['"'"'name'"'"']); # fix 23/g' administration/identify.php
Note:
the sed command is basically wrapped in single quotes, the variable expansion won't occur inside single quotes
In the POSIX BRE syntax, ( matches a literal (, you do not need to escape ) either, but you need to escape [ and . that must match themselves
The single quotes require additional quoting with concatenation.

How to efficiently escape meta-characters in vim search

Perl provides quotemeta function, as well as the possibility to surround strings using the \Qlots-of-meta-characters\E construct, to make sure that all the characters between \Q and \E are interpreted as literals.
Very often I search strings full of meta characters in Vim. It's counterproductive to escape every special character individually. Is there anything like /\Qstring-to-search\E in Vim, which would make life easier?
You can use /\Vstring-to-search.
There are two caveats:
\ is special. You can still use all regex metacharacters by putting a \ in front of them.
There is no \E equivalent. \V affects the rest of the regex.
See :help /\V.
You could combine this with the code from the answer in https://stackoverflow.com/a/676619/1848654 as follows:
vnoremap <C-f> "hy/\V<C-r>=substitute(#h,'[\/]','\\&','g')<cr>
The idea is:
Copy ("yank") the selected text into register h: "hy
Start search mode: /
Prefill the beginning of the regex: \V
Insert the contents of a register: <C-r>
Don't use a real register; take the result of evaluating an expression instead: =
Our expression (terminated by <cr>) is: substitute(#h,'[\/]','\\&','g')
Take the contents of the h register: #h
Apply a substitution. Insert a \ before every \ and /: substitute(...,'[\/]','\\&','g')

Substitution operator in Perl

I am new to perl. I have the following substitution expression:
$tmp =~ s:/x/y/z::;
I have searched a lot for it but couldn't find a similar expression.
What does it mean?
You can use non-whitespace any character as a delimiter; here, instead of the most common / (s/foo/bar/), the delimiter is : (s:foo:bar:), because what you are substituting has slash characters and if you used a slash delimiter, you'd have to escape them (s/\/x\/y\/z//) which many people consider ugly.
So your expression is simply removing the first /x/y/z from $tmp.
That means: replace /x/y/z with nothing.
For exmaple: If you have a strng like /a/b/x/y/z the result will be /a/b

How can I get sed to remove `\` followed by anything?

I am trying to write a sed script to convert LaTeX coded tables into tab delimited tables.
To do this I need to convert & into \t and strip out anything that is preceded by \.
This is what I have so far:
s/&/\t/g
s/\*/" "/g
The first line works as intended. In the second line I try to replace \ followed by anything with a space but it doesn't alter the lines with \ in them.
Any suggestions are appreciated. Also, can you briefly explain what suggested scripts "say"? I am new to sed and that really helps with the learning process!
Thanks
Assuming you're running this as a sed script, and not directly on the command line:
s/\\.*/ /g
Explanation:
\\ - double backslash to match a literal backslash (a single \ is interpreted as "escape the following character", followed by a .* (. - match any single character, * - arbitrarily many times).
You need to escape the backslash as it is a special character.
If you want to denote "any character" you need to use . (a period)
the second expression should be:
s/\\.//g
I hope I understood your intention and you want to strip the character after the backslash,
if you want to delete all the characters in the line after the backslash add a star (*)
after the period.