How do I match multiple addresses in sed? - command-line

I want to execute some sed command for any line that matches either the and or or of multiple commands: e.g., sed '50,70/abc/d' would delete all lines in range 50,70 that match /abc/, or a way to do sed -e '10,20s/complicated/regex/' -e '30,40s/complicated/regex/ without having to retype s/compicated/regex/

Logical-and
The and part can be done with braces:
sed '50,70{/abc/d;}'
Further, braces can be nested for multiple and conditions.
(The above was tested under GNU sed. BSD sed may differ in small but frustrating details.)
Logical-or
The or part can be handled with branching:
sed -e '10,20{b cr;}' -e '30,40{b cr;}' -e b -e :cr -e 's/complicated/regex/' file
10,20{b cr;}
For all lines from 10 through 20, we branch to label cr
30,40{b cr;}
For all lines from 30 through 40, we branch to label cr
b
For all other lines, we skip the rest of the commands.
:cr
This marks the label cr
s/complicated/regex/
This performs the substitution on lines which branched to cr.
With GNU sed, the syntax for the above can be shortened a bit to:
sed '10,20{b cr}; 30,40{b cr}; b; :cr; s/complicated/regex/' file

To delete lines from 10 to 20 and 30 to 40 matching your complicated regex with GNU sed:
sed -e '10,20bA;30,40bA;b;:A;s/complicated/regex/;d' file
or:
sed -e '10,20bA' -e '30,40bA' -e 'b;:A;s/complicated/regex/;d' file
bA: jump to label :A
b: a jump without label -> jump to end of script
d: delete line

I don't think sed has the facility for multiple selection criteria, my advice would be to step up to awk, where you can do something like:
awk 'NR >= 50 && NR <= 70 && /abc/ {next} {print}' inputFile
awk '(NR >= 10 and NR <= 20) || (NR >= 30 && NR <= 40) {
sub("from-regex", "to-string", $0); print }'

sed is excellent for simple substitutions on individual lines but for anything else just use awk for clarity, robustness, portability, maintainability, etc...
awk '
(NR>=50 && NR<=70) && /abc/ { next }
(NR>=10 && NR<=20) || (NR>=30 && NR<=40) { sub(/complicated/,"regex") }
{ print }
' file

Related

How to apply one command into another sed command?

I have one command which is used to extract lines between two string patterns 'string1' and 'string2'. This is stored in variable called 'var1'.
var1=$(awk '/string1/{flag=1; next} /string2/{flag=0} flag' text.txt)
This command works well and the output is a set of lines.
Do you hear the people sing?
Singing a song of angry men?
It is the music of a people
Who will not be slaves again
I want the output of the above command to be inserted after a string pattern 'string3' in another file called stat.txt. I used sed as follows
sed '/string3/a'$var1'' stat.txt
I am having trouble getting the new output. Here, the $var1 seems to be working partially i.e. only one line -
string3
Do you hear the people sing?
Any other suggestions to solve this?
I would be tempted to use sed to extract the lines, and awk to insert them into the other text:
lines=$(sed -n '/string1/,/string2/ p' text.txt)
awk -v new="$lines" '{print} /string3/ {print new}' stat.txt
or perhaps both tasks in a single awk call
awk '
NR == FNR && /string1/ {flag = 1}
NR == FNR && /string2/ {flag = 0}
NR == FNR && flag {lines = lines $0 ORS}
NR == FNR {next}
{print}
/string3/ {printf "%s", lines} # it already ends with a newline
' text.txt stat.txt
It's a data format problem...
Appending a multi-line block of text with the sed append command requires that every line in the block to be appended ends with a \ -- except for the last line of that block. So if we take the two lines of code that didn't work in the question, and reformat the text as required by the append command, the original code should work as expected:
var1=$(awk '/string1/{flag=1; next} /string2/{flag=0} flag' text.txt)
var1="$(sed '$!s/$/\\/' <<< "$var1")"
sed '/string3/a'$var1'' stat.txt
Note that the 2nd line above contains a bashism. A more portable version would be:
var1="$(echo "$var1" | sed '$!s/$/\\/')"
Either variant would convert $var1 to:
Do you hear the people sing?\
Singing a song of angry men?\
It is the music of a people\
Who will not be slaves again

How to replace a string if double condition matches

There is a command to replace bbb to ccc, if the line contains abc.
echo "abc yyy bbb xzy" | sed -e "/abc/ s/bbb/ccc/"
Does anyone know what the command would be, if I want to do the replacement, only if the line contains both abc and xyz?
Because it doesn't matter which one is matched first, you can look for abc first, then make the substitution if also xyz matches1:
sed '/abc/{/xyz/s/bbb/ccc/}'
or, considerably less elegant:
sed '/abc.*xyz\|xyz.*abc/s/bbb/ccc/'
but no nesting.
1BSD sed requires a semicolon before the closing brace.
Just use awk and you can code it as you'd write it, with && between the conditions:
awk '/abc/ && /xyz/ { sub(/bbb/,"ccc") } 1'
Try writing:
awk '(/abc/ && /xyz/) || (/def/ && (/ghi/ || /klm/)) { sub(/bbb/,"ccc") } 1'
or any other more interesting compound condition with sed. Awk is available everywhere sed is and the above is fully portable and will work as-is in every awk in every UNIX installation.

Remove newline depending on the format of the next line

I have a special file with this kind of format :
title1
_1 texthere
title2
_2 texthere
I would like all newlines starting with "_" to be placed as a second column to the line before
I tried to do that using sed with this command :
sed 's/_\n/ /g' filename
but it is not giving me what I want to do (doing nothing basically)
Can anyone point me to the right way of doing it ?
Thanks
Try following solution:
In sed the loop is done creating a label (:a), and while not match last line ($!) append next one (N) and return to label a:
:a
$! {
N
b a
}
After this we have the whole file into memory, so do a global substitution for each _ preceded by a newline:
s/\n_/ _/g
p
All together is:
sed -ne ':a ; $! { N ; ba }; s/\n_/ _/g ; p' infile
That yields:
title1 _1 texthere
title2 _2 texthere
If your whole file is like your sample (pairs of lines), then the simplest answer is
paste - - < file
Otherwise
awk '
NR > 1 && /^_/ {printf "%s", OFS}
NR > 1 && !/^_/ {print ""}
{printf "%s", $0}
END {print ""}
' file
This might work for you (GNU sed):
sed ':a;N;s/\n_/ /;ta;P;D' file
This avoids slurping the file into memory.
or:
sed -e ':a' -e 'N' -e 's/\n_/ /' -e 'ta' -e 'P' -e 'D' file
A Perl approach:
perl -00pe 's/\n_/ /g' file
Here, the -00 causes perl to read the file in paragraph mode where a "line" is defined by two consecutive newlines. In your example, it will read the entire file into memory and therefore, a simple global substitution of \n_ with a space will work.
That is not very efficient for very large files though. If your data is too large to fit in memory, use this:
perl -ne 'chomp;
s/^_// ? print "$l " : print "$l\n" if $. > 1;
$l=$_;
END{print "$l\n"}' file
Here, the file is read line by line (-n) and the trailing newline removed from all lines (chomp). At the end of each iteration, the current line is saved as $l ($l=$_). At each line, if the substitution is successful and a _ was removed from the beginning of the line (s/^_//), then the previous line is printed with a space in place of a newline print "$l ". If the substitution failed, the previous line is printed with a newline. The END{} block just prints the final line of the file.

Remove lines from AWK output

I would like to remove lines that have less than 2 columns from a file:
awk '{ if (NF < 2) print}' test
one two
Is there a way to store these lines into variable and then remove it with xargs and sed, something like
awk '{ if (NF < 2) VARIABLE}' test | xargs sed -i /VARIABLE/d
GNU sed
I would like to remove lines that have less than 2 columns
less than 2 = remove lines with only one column
sed -r '/^\s*\S+\s+\S+/!d' file
If you would like to split the input into two files (named "pass" and "fail"), based on condition:
awk '{if (NF > 1 ) print > "pass"; else print > "fail"}' input
If you simply want to filter/remove lines with NF < 2:
awk '(NF > 1){print}' input

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile