How to use capture groups with sed? - sed

I'm trying to replace some text in a file using sed but I'm having troubles.
sed -ir 's/(\$hello = )true/\1false/' /path/to/my/file.txt gives the error sed: -e expression #1, char 27: invalid reference \1 on 's' command's RHS.
I want to replace $hello = true with $hello = false, so in order to avoid typing $hello = twice I wanted to use capture groups - which isn't working.
What am I doing wrong?

You don't have to escape parentheses in extended regex mode, if it was your intent with the r into -ir, but actually if you want both options -i and -r then you have to keep them apart or use -ri instead of -ir because the latter interprets the part after -i as an optional backup suffix.
From sed manual
Because -i takes an optional argument, it should
not be followed by other short options:
sed -Ei '...' FILE
Same as -E -i with no backup suffix - FILE will be edited in-place without creating a backup.
sed -iE '...' FILE
This is equivalent to --in-place=E, creating FILEE as backup
of FILE

You must escape the parenthesis with backslashes \(...\), to be used as grouping.
See THE SED FAQ, section "3.1.2. Escape characters on the right side of "s///"" has an example:
3.1.2. Escape characters on the right side of "s///"
The right-hand side (the replacement part) in "s/find/replace/" is
almost always a string literal, with no interpolation of these
metacharacters:
. ^ $ [ ] { } ( ) ? + * |
Three things are interpolated: ampersand (&), backreferences, and
options for special seds. An ampersand on the RHS is replaced by
the entire expression matched on the LHS. There is never any
reason to use grouping like this:
s/\(some-complex-regex\)/one two \1 three/
And later in section "F. GNU sed v2.05 and higher versions":
F. GNU sed v2.05 and higher versions
...
Undocumented -r switch:
Beginning with version 3.02, GNU sed has an undocumented -r switch
(undocumented till version 4.0), activating Extended Regular
Expressions in the following manner:
? - 0 or 1 occurrence of previous character
+ - 1 or more occurrences of previous character
| - matches the string on either side, e.g., foo|bar
(...) - enable grouping without backslash
{...} - enable interval expression without backslash
When the -r switch (mnemonic: "regular expression") is used, prefix
these symbols with a backslash to disable the special meaning.
For documentation of regular expression syntax used in (GNU) sed, see Overview of basic regular expression syntax
5.3 Overview of basic regular expression syntax
...
\(regexp\)
Groups the inner regexp as a whole, this is used to:
Apply postfix operators, like (abcd)*: this will search for zero or more whole sequences of ‘abcd’, while abcd* would search for ‘abc’ followed by zero or more occurrences of ‘d’. Note that support for (abcd)* is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
Use back references (see below).

Related

Using sed -e to replace slash

I am trying to understand what this below command does with -e in sed and exclamatory marks in the command,
sed -e "s!VPC_CIDR!"$(get_cluster_vpc_cidr)"!g" "templates/network-policies-${ns}.yaml"
This command helped to replace VPC_CIDR with 1.2.3.4\16.
Could someone through light on this please?
-e option just tells sed that the next argument is the script to execute. "s!VPC_CIDR!"$(get_cluster_vpc_cidr)"!g" is the script.
The " usage is strange. I would just "s!VPC_CIDR!$(get_cluster_vpc_cidr)!g". Because $(get_cluster_vpc_cidr) is not within " quotes, the result will undergo word splitting and filename expansion. Ie. it will fail on spaces and * or ? characters may work strangely.
The "s!VPC_CIDR!"$(get_cluster_vpc_cidr)"!g" is a sed script. The s command does, from man 1 sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may con‐
tain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to
the corresponding matching sub-expressions in the regexp.
But you think - och ! is not /! But, as man 1 sed tells us This is just a brief synopsis of sed commands to serve as a reminder to those who already know sed. The POSIX sed or man 7 sed page will shed some more light:
[2addr]s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
Any character. You can evey pass byte 0x01, like sed $'s\x01BRE\x01replacement\x01' and it's a valid script.
So s!VPC_CIDR!$(get_cluster_vpc_cidr)!g command replaces every occurence (ie. the g global flag) of the VPC_CIDR string (the string is literal, there are no special regex expressions there) for the output of $(get_cluster_vpc_cidr) (except that & and \1 and such are interpreted specially in replacement part).

Sed special characters

I wanted to change out the home directory that I has in some files.
I figured that I would have to escape the curly brackets - like this :
sed -i 's/\$\{HOME\}/\/casper\/home/g' /var/tmp/casper.txt
Everything that I experience in sed tells me that I would have to escape the brackets, but I did not. The only thing that I needed to escape is the Dollar sign.
What kind of regex engine does sed use and where is a list of the special characters that need to be escaped and what not does not need to be escaped.
sed -i 's/\${HOME}/\/casper\/home/g' /var/tmp/casper.txt
I am not an expert, just an user of regexps, so I cannot give you strict technical answer, but according to info sed, if invoked without -r (--regexp-extended), than it uses basic regular expressions. If -r is put, then extended regular expressions are used. There is an explanation in info's Appendix:
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, braces ('{}'),
and '|'. While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them _to match a
literal character_.
By the way, if you need to put a slash inside sed's regex, it is very useful to use different symbol as a separator. It doesn't need to be a slash, you can choose the symbol arbibtrary, for example it could be #. So instead this:
sed -i 's/\${HOME}/\/casper\/home/g' /var/tmp/casper.txt
you can make this:
sed -i 's#\${HOME}#/casper/home#g' /var/tmp/casper.txt

Terminal File replace testA with a

Is there a way to convert testThatMy to thatMy using the Terminal?
This is what I have now:
sed -i 's/test//g' MyJavaFile.java
The only thing missing would be to convert the character after test now to lower case.
Also for some reason referencing to a regex variable does not seem to work.
sed -i 's/test([A-Z]{1})/\1/g' MyJavaFile.java
You can use the following GNU sed command:
sed -r 's/test([[:upper:]])([^[:space:]]*)/\L\1\2/g' file.java
For in place editing you need to pass -i, but I would test the command first.
Pattern Explanation:
-r enables extended POSIX regular expressions.
[[:upper:]] matches an uppercase character
[^[:space:]]* matches zero or more non space characters
Replacement Explanation:
\L transform the following expression to uppercase. \1 is the content first capturing group. \2 is the content of the second capturing group.

Why does sed command contain at symbols

I don't understand why the following sed command contains an # symbol:
sed 's#session\s*required\s*pam_loginuid.so#session optional pam_loginuid.so#g' -i /etc/pam.d/sshd
I've looked at /etc/pam.d/sshd for the before/after effects of this command:
BEFORE:
...
# Set the loginuid process attribute.
session required pam_loginuid.so
...
AFTER:
...
# Set the loginuid process attribute.
session optional pam_loginuid.so
....
Is the # symbol possibly part of regex or sed syntax?
Could not find any doco on this.
Note: The above sed command is actually part of a Dockerfile RUN command in tutorial:
https://docs.docker.com/examples/running_ssh_service/
These are alternate delimiters for the regular expressions and replacement string. Handy when your regex or replacement string includes '/'.
From the sed manual
The syntax of the s (as in substitute) command is ‘s/regexp/replacement/flags’. The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.
From the POSIX specification:
[2addr]s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
as other says, it is another delimiter than traditionnal / in the s///action. This is usually used when / is found/part of the pattern like searching (or replacing by) a unix path that need to escape the /
s/\/my\/path/\/Your\/path/
# same as
s#my/path#/Your/path#
You often use a character that is not alpha numeric (but you can). The only (logical) constraint is to avoid a special character (aka special meaning like ^$[]{}()+\*.) for regex that make it difficult to read (but functionnal) and without the feature of this character in the pattern
echo "b(a)l" | sed 's(.)()('

sed rare-delimiter (other than & | / ?...)

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).
Is there a complex delimiter (with two characters?) that can fix the error:
sed: -e expression #1, char 22: unknown option to `s'
The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.
At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:
$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'
In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.
Another way to do this is to use Shell Parameter Substitution.
${parameter/pattern/replace} # substitute replace for pattern once
or
${parameter//pattern/replace} # substitute replace for pattern everywhere
Here is a quite complex example that is difficult with sed:
$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\#]"
$ echo "${parameter//$pattern/replace}"
result is:
Common sed delimiters: [/_%:\#]
However: This only work with bash parameters and not files where sed excel.
There is no such option for multi-character expression delimiters in sed, but I doubt
you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.
You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.
Consider this:
$ perl -nle 'print if /something/' inputs
Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:
$ perl -nle "print if m($WHATEVER)" /usr/share/dict/words
That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.
If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:
$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }
With the help of Jim Lewis, I finally did a test before using sed :
if [ `echo $1 | grep '|'` ]; then
grep ".*$1.*:" $DB_FILE | sed "s#^.*$1*.*\(:\)## "
else
grep ".*$1.*:" $DB_FILE | sed "s|^.*$1*.*\(:\)|| "
fi
Thanks for help
Wow. I totally did not know that you could use any character as a delimiter.
At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)
echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'
> > > Holy shit!
That's so much easier.
Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).
To pull together thkala's answer and user4401178's comment:
DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"
This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.
There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).
Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.
If you're in a bash environment, you can use bash substitution to escape sed separator, like this:
safe_replace () {
sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}
It's pretty self-explanatory, except for the bizarre part.
Explanation to that:
${1//\//\\\/}
${ - bash expansion starts
1 - first positional argument - the pattern
// - bash pattern substitution pattern separator "replace-all" variant
\/ - literal slash
/ - bash pattern substitution replacement separator
\\ - literal backslash
\/ - literal slash
} - bash expansion ends
example use:
$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta