tricky multiline erase in SED - sed

here is the input:
aaa
bbb
ccc
ddd
eee
fff
what I want? do sth like" sed "/ccc/,/(eee)/d" BUT ALSO DELETE "bbb" line (before "ccc")
so that output is:
aaa
fff
any ideas?

This might work for you (GNU sed):
sed ':a;$!{N;/\nccc/!{P;D};/\neee/!ba;d}' file

If you are fine with awk, this should do:
$ awk '/ccc/,/eee/{if(i!=1){i=1;x="";}next}{if (x)print x;x=$0;}END{print x}' file
aaa
fff
Every previous line is printed in the above case. Normal range filtering is done using awk. However, within the range filter, the variable x is reset so that the previous record just before the range is not printed.
Update:
sed solution:
$ sed '${x;p;};/ccc/,/eee/{/ccc/{s/.*//;x;};d;};1{h;d;};x;/^$/d;' file

You could do this in a simple 2-pass approach, first pass to identify the lines to delete and the second pass to print only the lines that are not marked for deletion:
awk '/ccc/,/eee/{d[NR]=d[NR-1]=1} NR!=FNR && !d[FNR]' file file

Related

Delete string after '#' using sed

I have a text file that looks like:
#filelists.txt
a
# aaa
b
#bbb
c #ccc
I want to delete parts of lines starting with '#' and afterwards, if line starts with #, then to delete whole line.
So I use 'sed' command in my shell:
sed -e "s/#*//g" -e "/^$/d" filelists.txt
I wish its result is:
a
b
c
but actually result is:
filelists.txt
a
aaa
b
bbb
c ccc
What's wrong in my "sed" command?
I know '*' which means "any", so I think that '#*' means string after "#".
Isn't it?
You may use
sed 's/#.*//;/^$/d' file > outfile
The s/#.*// removes # and all the rest of the line and /^$/d drops empty lines.
See an online test:
s="#filelists.txt
a
# aaa
b
#bbb
c #ccc"
sed 's/#.*//;/^$/d' <<< "$s"
Output:
a
b
c
Another idea: match lines having #, then remove # and the rest of the line there and drop if the line is empty:
sed '/#/{s/#.*//;/^$/d}' file > outfile
See another online demo.
This way, you keep the original empty lines.
* does not mean "any" (at least not in regular expression context). * means "zero or more of the preceding pattern element". Which means you are deleting "zero or more #". Since you only have one #, you delete it, and the rest of the line is intact.
You need s/#.*//: "delete # followed by zero or more of any character".
EDIT: was suggesting grep -v, but didn't notice the third example (# in the middle of the line).

sed query -- begin and end keyword

How does the begin and end keywords in sed actually works ? Do we have to specifically mention these keywords in the data file ?
For example, if I'm trying to delete empty lines using sed using the below code:
sed -n '/begin/,/end/ {
s/^$/ d
p
}
'
Now, should the data file should have begin and end keywork in it ? I'm sorry I've tried using these two keywords without actually entering them in the data and it doesn't give me the expected o/p.
If you want to remove all empty lines from your file, you use:
sed '/^$/d` file
or (remove also lines only contains tabs or spaces):
sed '/^\s*$/d' file
if you want to remove empty lines only between BBB line and AAA line:
sed '/BBB/,/AAA/{/^$/d}` file
and yes, BBB and AAA must be in your file.

delete strings from text file

I Have the file with the text strings for nucleotides (A, C, G, T). I would like to find specifics strings from a text file and delete them.
For example:
ACTGGGCTGTCCAACTG
ACTTCTGGGTCGAACTG
CCCACTTCTGGGTTCAA
And I would like to delete from all lines only this parts ACT and GGG
Then I will get the file with this strings:
CTGTCCAACTG
TCTTCGAACTG
CCCTCTTTCAA
sed can help you:
sed 's/ACT//g; s/GGG//g' inputFile
i.e. replace all occurrences of ACT and GGG with an empty string.
You can try:
awk '{gsub(/ACT|GGG/,"")}1' file
Using sed
sed -r 's/(ACT|GGG)//g' file
perl -pe 's/ACT|GGG//g' your_file

How to find and replace every even-numbered appearance of a match in BASH?

I am using sed -i 's/AAA/ZZZ/g' filename to replace every occurance of "AAA" with "ZZZ" in a file. I need to instead replace every even-numbered appearance of "AAA" with "ZZZ", e.g.:
This is a AAA sentence. AAA
This is another AAA sentence.
This is yet AAA another AAA sentence.
This is AAA stillAAA AAA yet AAA another AAA sentence.
This would become:
This is a AAA sentence. ZZZ
This is another AAA sentence.
This is yet ZZZ another AAA sentence.
This is ZZZ stillAAA ZZZ yet AAA another ZZZ sentence.
How to replace every even-numbered appearance of a match?
Here is a short gnu awk version
awk '{ORS=NR%2==0?"ZZZ":RS}1' RS="AAA" file
This is a AAA sentence. ZZZ
This is another AAA sentence.
This is yet ZZZ another AAA sentence.
This is ZZZ stillAAA ZZZ yet AAA another ZZZ sentence.
awk is better tool for this than sed. Consider this awk command:
awk -F 'AAA' '{for (i=1; i<NF; i++) {OFS=c%2?"ZZZ":FS; printf "%s%s", $i, OFS; c++}
print $NF}' file
This is a AAA sentence. ZZZ
This is another AAA sentence.
This is yet ZZZ another AAA sentence.
This is ZZZ stillAAA ZZZ yet AAA another ZZZ sentence.
This awk sets the input field separator as AAA and and toggles output field separator between AAA and ZZZ depending upon a counter is odd or even. Every time counter is even OFS is set to AAA and when it is odd OFS is set to ZZZ
Here is a perl solution:
$ cat inp
This is a AAA sentence. AAA
This is another AAA sentence.
This is yet AAA another AAA sentence.
This is AAA stillAAA AAA yet AAA another AAA sentence.
$ perl -pe 'my $line = "" ; while(<>){ $line=$line.$_} $line =~ s/(.*?AAA.*?)AAA/\1ZZZ/mgs; print $line;' < inp
This is another AAA sentence.
This is yet ZZZ another AAA sentence.
This is ZZZ stillAAA ZZZ yet AAA another ZZZ sentence.
Here, first I accumulate entire file in a variable $line. & Then, I replace every alternate occurrence of AAA with ZZZ; using non-greedy matching.
Perl:
perl -wpe 'BEGIN{$/="AAA"} $.%2 or s/AAA/ZZZ/' foo.txt
You can do it with sed too:
sed -n -e '1,$ {
:oddline s/AAA/\n/g; :odd s/\n/AAA/m; t even ;p;N;s/.*\n//;b oddline ;
:evenline s/AAA/\n/g; :even s/\n/ZZZ/m; t odd ; p;N;s/.*\n//;b evenline ;
}' << _END_
This is a AAA sentence. AAA
This is another AAA sentence.
This is yet AAA another AAA sentence.
This is AAA stillAAA AAA yet AAA another AAA sentence.
_END_
The sed script loops through all lines and remembers odd/even replacements (across lines). In the pattern space, all AAAs are first replaced by newlines and then replaced one at a time by either AAA or ZZZ. In order to switch to the next line it is first appended (N) and then the previous one is deleted (s/.*\n//).
sed "1 h;1 !H;$ {x;l;s/=/=e/g;s/²/=c/g;s/AAA/²/g;s/²\([^²]\{1,\}\)²/²\1ZZZ/g;s/²/AAA/g;s/=c/²/g;s/=e/=/g;}" YourFile
Using substitution (due to AAA that could be inside a .*) insurring that even with substitute char is inside it work with the double translation before and after
This might work for you (GNU sed):
sed -r ':a;$!{N;ba};/\x00/q1;s/AAA/\x00/g;s/(\x00)([^\x00]*)\1/AAA\2ZZZ/g' file
This slurps the file into memory and then replaces all occurences of AAA with a unique character. Then every odd and even occurence of the unique character is replaced by AAA and ZZZ respectively.
N.B. If the unique character is not unique, no change is made to the file and an error code of 1 is set.
This second method is more long-winded but can be used to change the N'th value and does not rely on an unique value:
sed -r 's/AAA/\n&/g;/\n/!b;G;:a;s/$/#/;s/#{2}$//;/\n$/s/\nAAA/\nZZZ/;s/\n//;/\n.*\n/ba;P;s/^.*\n//;h;d' file
It stores the number of occurences of the required pattern in the hold space and retrieves it when encounters a line with such a pattern.

Printing text between regexps

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?
A pattern range is for multiline patterns. This is how you'd do that:
sed -n '/pat1/,/pat2/{/pat1\|pat2/!p}' inputfile
-n - don't print by default
/pat1/,/pat2/ - within the two patterns inclusive
/pat1\|pat2/!p - print everything that's not one of the patterns
What you may be asking for is what's between two patterns on the same line. One of the other answers will do that.
Edit:
A couple of examples:
$ cat file1
aaaa bbbb cccc
123 start 456
this is what
I want
789 end 000
xxxx yyyy zzzz
$ sed -n '/start/,/end/{/start\|end/!p}' file1
this is what
I want
You can shorten it by telling sed to use the most recent pattern again (//):
$ sed -n '/.*start.*/,/^[0-9]\{3\} end 0*$/{//!p}' file1
this is what
I want
As you can see, I didn't have to duplicate the long, complicated regex in the second part of the command.
sed -r 's/pat1(.*)pat2/\1/g' somefile.txt
I don't know the kind of pattern you used, but i think it is also possible with regular expressions.
cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'
you can use awk.
$ cat file
other TEXT
pat1 text i want pat2
pat1 TEXT I
WANT
pat2
other text
$ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file
text i want
TEXT I
WANT
The solution works for patterns that span multiple lines