Escaping a variable with special characters within sed - comment and uncomment an arbitrary line of source code - perl

I need to comment out a line in a crontab file through a script, so it contains directories, spaces and symbols. This specific line is stored in a variable and I am starting to get mixed up on how to escape the variable. Since the line changes on a regular basis I dont want any escaping in there. I don't want to simply add # in front of it, since I also need to switch it around and replace the line again with the original without the #.
So the goal is to replace $line with #$line (comment) with the possibility to do it the other way around (uncomment).
So I have a variable:
line="* * * hello/this/line & /still/this/line"
This is a line that occurs in a file, file.txt. Wich needs to get comment out.
First try:
sed -i "s/^${line}/#${line}/" file.txt
Second try:
sed -i 's|'${line}'|'"#${line}"'|g' file.txt

choroba's helpful answer shows an effective solution using perl.
sed solution
If you want to use sed, you must use a separate sed command just to escape the $line variable value, because sed has no built-in way to escape strings for use as literals in a regex context:
lineEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$line") # escape $line for use in regex
sed -i "s/^$lineEscaped\$/#&/" file.txt # Note the \$ to escape the end-of-line anchor $
With BSD/macOS sed, use -i '' instead of just -i for in-place updating without backup.
And the reverse (un-commenting):
sed -i "s/^#\($lineEscaped\)\$/\1/" file.txt
See this answer of mine for an explanation of the sed command used for escaping, which should work with any input string.
Also note how variable $lineEscaped is only referenced once, in the regex portion of the s command, whereas the substitution-string portion simply references what the regex matched (which avoids the need to escape the variable again, using different rules):
& in the substitution string represents the entire match, and \1 the first capture group (parenthesized subexpression, \(...\)).
For simplicity, the second sed command uses double quotes in order to embed the value of shell variable $lineEscaped in the sed script, but it is generally preferable to use single-quoted scripts so as to avoid confusion between what the shell interprets up front vs. what sed ends up seeing.
For instance, $ is special to both the shell and sed, and in the above script the end-of-line anchor $ in the sed regex must therefore be escaped as \$ to prevent the shell from interpreting it.
One way to avoid confusion is to selectively splice double-quoted shell-variable references into the otherwise single-quoted script:
sed -i 's/^'"$lineEscaped"'$/#&/' file.txt
awk solution
awk offers literal string matching, which obviates the need for escaping:
awk -v line="$line" '$0 == line { $0 = "#" $0 } 1' file.txt > $$.tmp && mv $$.tmp file.txt
If you have GNU Awk v4.1+, you can use -i inplace for in-place updating.
And the reverse (un-commenting):
awk -v line="#$line" '$0 == line { $0 = substr($0, 2) } 1' file.txt > $$.tmp &&
mv $$.tmp file.txt

Perl has ways to do the quoting/escaping for you:
line=$line perl -i~ -pe '$regex = quotemeta $ENV{line}; s/^$regex/#$ENV{line}/' -- input.txt

Related

Use sed to replace every character by itself followed by $n times a char?

I'm trying to run the command below to replace every char in DECEMBER by itself followed by $n question marks. I tried both escaping {$n} like so {$n} and leaving it as is. Yet my output just keeps being D?{$n}E?{$n}... Is it just not possible to do this with a sed?
How should i got about this.
echo 'DECEMBER' > a.txt
sed -i "s%\(.\)%\1\(?\){$n}%g" a.txt
cat a.txt
This might work for you (GNU sed):
n=5
sed -E ':a;s/[^\n]/&\n/g;x;s/^/x/;/x{'"$n"'}/{z;x;y/\n/?/;b};x;ba' file
Append a newline to each non-newline character in a line $n times then replace all newlines by the intended character ?.
N.B. The newline is chosen as the initial substitute character as it is not possible for it to be within a line (sed uses newlines to separate lines) and if the final substitution character already exists within the current line, the substitutions are correct.
Range (also, interval or limiting quantifiers), like {3} / {3,} / {3,6}, are part of regex, and not replacement patterns.
You can use
sed -i "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" a.txt
See the online demo:
#!/bin/bash
sed "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" <<< "DECEMBER"
# => D???????E???????C???????E???????M???????B???????E???????R???????
Here, . matches any char, and & in the replacement pattern puts it back and $(for i in {1..7}; do echo -n '?'; done) adds seven question marks right after it.
This one-liner should do the trick:
sed 's/./&'$(printf '%*s' "$n" '' | tr ' ' '?')'/g' a.txt
with the assumption that $n expands to a positive integer and the command is executed in a POSIX shell.
Efficiently using any awk in any shell on every Unix box after setting n=2:
$ awk -v n="$n" '
BEGIN {
new = sprintf("%*s",n,"")
gsub(/./,"?",new)
}
{
gsub(/./,"&"new)
print
}
' a.txt
D??E??C??E??M??B??E??R??
To make the changes "inplace" use GNU awk with -i inplace just like GNU sed has -i.
Caveat - if the character you want to use in the replacement text is & then you'd need to use gsub(/./,"\\\\\\&",new) in the BEGIN section to make it is treated as literal instead of a backreference metachar. You'd have that issue and more (e.g. handling \1 or /) with any sed solution and any solution that uses double quotes around the script would have more issues with handling $s and the solutions that have a shell script expanding unquoted would have even more issues with globbing chars.

Insert linebreak in a file after a string

I have a unique (to me) situation:
I have a file - file.txt with the following data:
"Line1", "Line2", "Line3", "Line4"
I want to insert a linebreak each time the pattern ", is found.
The output of file.txt shall look like:
"Line1",
"Line2",
"Line3",
"Line4"
I am having a tough time trying to escape ", .
I tried sed -i -e "s/\",/\n/g" file.txt, but I am not getting the desired result.
I am looking for a one liner using either perl or sed.
You may use this gnu sed:
sed -E 's/(",)[[:blank:]]*/\1\n/g' file.txt
"Line1",
"Line2",
"Line3",
"Line4"
Note how you can use single quote in sed command to avoid unnecessary escaping.
If you don't have gnu sed then here is a POSIX compliant sed solution:
sed -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
To save changes inline use:
sed -i.bak -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
Could you please try following. using awk's substitution mechanism here, in case you are ok with awk.
awk -v s1="\"" -v s2="," '{gsub(/",[[:blank:]]+"/,s1 s2 ORS s1)} 1' Input_file
Here's a Perl solution:
perl -pe 's/",\K/\n/g' file.txt
The substitution pattern matches the ",, but the \K says to ignore anything to the left for the replacement (so, ",) will not be replaced. The replacement then effectively inserts the newline.
I used the single quote for the argument to -e, but that doesn't work on Windows where you have to use ". Instead of escaping the ", you can specify it in another way. That's code number 0x22, so you can write:
perl -pe "s/\x22,\K/\n/g" file.txt
Or in octal:
perl -pe "s/\042,\K/\n/g" file.txt
Use this Perl one-liner:
perl -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file > out_file
Or this for in-line replacement:
perl -i.bak -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'/"\K,\s*/' : Split into #F on a double quote, followed by comma, followed by 0 or more whitespace characters, rather than on whitespace. \K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. This causes to keep the double quote in #F elements, while comma and whitespace are removed during the split.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

I want to replace last / by ,

I want to replace this:
a/b/c|d,385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400/0.162,214|229|254|255|270|272|276|287|346|356|361|362|365|366|367|369/0.18,improve/11.11,
With:
a/b/c|d,385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400/0.162,214|229|254|255|270|272|276|287|346|356|361|362|365|366|367|369/0.18,improve,11.11,
With this sed command:
sed -i 's/\(.*\)\//\1,/'
This works in Unix. I tried to use this with system in Perl code, but it doesnt work. I request a solution using sed in Perl for the same.
First of all, the code you claim works doesn't.
$ printf 'a/b/c\n' | sed 's/(.*)//\1,/'
sed: -e expression #1, char 9: unknown option to `s'
It should be
$ printf 'a/b/c\n' | sed 's/\(.*\)\//\1,/'
a/b,c
You're asking how to execute this command from Perl. You can use the following:
system('sed', '-i', '/\\(.*\\)\\//\\1,/', '--', $qfn)
Note that you can quite easily do the same task in Perl itself.
local #ARGV = $qfn;
local $^I = '';
while (<>) {
s{^.*\K/}{,};
print;
}
Here is way to do this in sed:
echo "365|366|367|369/0.18,improve/11.11," | sed 's/^\(.*\)\/\(.*\)$/\1,\2/'
365|366|367|369/0.18,improve,11.11,
The regex pattern used is:
^\(.*\)\/\(.*\)$
This says to match and capture everything up until the last forward slash. Then, also match and capture everything after the last forward slash. Finally replace with the first two capture groups, but now separated by a comma.
Notes:
forward slash / needs to be escaped by a backslash, to distinguish it from being the pattern delimiter
parentheses in the capture groups also need to be escaped with backslash

How to search in sed for any name matching

How to find structures matching a pattern
struct struct_name {
....
....
};
I'm using
sed -n -e '/struct{/,/}/p'
how to search for any struct_name
To extract all struct definitions (POSIX-compliant command):
sed -n '/struct [^ {]\{1,\} {/,/}/p' file
More robust with respect to whitespace variations (POSIX-compliant):
sed -n '/struct[[:blank:]]\{1,\}[^ {]\{1,\}[[:blank:]]*{/,/}/p' file
Alternative, using an extended regular expression (works with both GNU and BSD/macOS sed):
sed -E -n '/struct[[:blank:]]+[^ {]+[[:blank:]]*\{/,/\}/p' file
awk alternative (awk only uses extended regexes):
awk '/struct[[:blank:]]+[^ {]+[[:blank:]]*\{/,/\}/' file
The awk solution has the added advantage that a given struct definition will also be extracted correctly if it is all on a single line: awk looks for the end of a range on the same input line as the start of the range, whereas sed does not.
To extract a specific struct definition by name:
sed doesn't support variables, so your best bet is to splice in a shell variable that the shell expands up front.
name='struct_name' # define name to search for as shell var.
sed -n '/struct '"$name"' {/,/}/p' file # splice shell var. into sed script
Note that I've deliberately not used sed -n "/struct $name {/,/}/p" - a single, double-quoted string expanded by the shell as a whole - so as to make it clear which part of the sed script is expanded by the shell up front.
This works in this simple case, but is tricky business in general, because you must ensure that the expanded variable value contains no regex/sed metacharacters that break the command.
Here's an awk alternative that uses awk variables and literal substring matching to bypass the problem of potentially having to escape the variable value:
awk -v name='struct_name' 'index($0, "struct " name " {"),/}/' file
This solution has the added advantage that the struct definition will also be extracted correctly if it is all on a single line: awk looks for the end of a range on the same input line as the start of the range, whereas sed does not.
This will search a text file for struct_name. You can use the -E switch to use a regular expression.
grep -no struct_name test.txt
The -n switch causes the line number to be included, the -o means only the matching element of the line will be displayed.

Escape backslash character in sed

I need to modify some Windows paths.
For instance,
D:\usr
to
D:\first\usr
So, I have created a variable.
$path = "first\usr"
then used the following command:
sed -i -e 's!\\usr!${path}/g;' test.txt
However, this ends up with the following:
D:\firstSr
How do I escape \u in sed?
Assuming your path variable was assigned properly (without spaces in the assignment: path='first\usr'), fixing step by step for an input file test.txt with one example path:
$ cat test.txt
D:\usr
Your original command
$ sed 's!\\usr!${path}/g;' test.txt
sed: -e expression #1, char 18: unterminated `s' command
doesn't do much, as you've mixed ! and / as the delimiter.
Fixing delimiters:
$ sed 's!\\usr!${path}!g;' test.txt
D:${path}
Now no interpolation happens at all because of the single quotes. I suspect these are just copy-paste mistakes, as you obviously got some output.
Double quotes:
$ sed "s!\\usr!${path}!g" test.txt
bash: !\\usr!${path}!g: event not found
Now this clashes with history expansion. We could escape the !, or use a different delimiter.
/ as delimiter:
$ sed "s/\\usr/${path}/g" test.txt
D:\firstSr
Now we're where the question actually started. ${path} expands to first\usr, but \u has a special meaning in GNU sed in the replacement string: it uppercases the following character, hence the S.
Even without the special meaning, \u would most likely just expand to u and the backslash would be gone.
Escaping the backslash:
$ path='first\\usr'
$ sed "s/\\usr/${path}/g" test.txt
D:\first\usr
This works.
Depending on which shell you are using, you may be able to use parameter expansion to double \ in your substitution string and prevent the \u interpretation:
path="first\usr"
sed -e "s/\\usr/${path//\\/\\\\}/g" <<< "D:\usr"
The syntax for replacing a pattern with the shell parameter expansion is ${parameter/pattern/string} (one replacement) or ${parameter//pattern/string} (replace all matches).
This substitution is not specified by POSIX, but is available in Bash.
Where it is not available, you may need to filter $path through a process:
path=$(echo "$path" | sed 's/[][\\*.%$]/\\&/g')
(N.B. I have also quoted other sed metacharacters in this filter).