I would like to match a set of data between two patterns and remove this data and the start/end patterns but only for the first occurrence of the pattern.
So if this is the test data:
PATTERNSTART
LINE1
LINE2
LINE3
PATTERNEND
PATTERNSTART
LINE1
LINE2
LINE3
PATTERNEND
TESTLINE1
TESTLINE2
TESTLINE3
PATTERNSTART
LINE1
LINE2
LINE3
PATTERNEND
This will quite happy remove all the pattern matches and the lines in between but I only want to remove the first pattern match and the lines in between:
sed '/PATTERNSTART/,/PATTERNEND/d' testsed.txt
Output:
TESTLINE1
TESTLINE2
TESTLINE3
Required output:
PATTERNSTART
LINE1
LINE2
LINE3
PATTERNEND
TESTLINE1
TESTLINE2
TESTLINE3
PATTERNSTART
LINE1
LINE2
LINE3
PATTERNEND
Any sed ideas?
It's a bit incredible-machiney, but this works:
sed '/PATTERNSTART/,/PATTERNEND/ { // { x; s/$/./; x; }; x; /.../! { x; d; }; x; }' filename
as follows:
/PATTERNSTART/,/PATTERNEND/ { # in the pattern range
// { # in the first and last line:
x
s/$/./ # increment a counter in the hold buffer by
# appending a character to it. The counter is
# the number of characters in the hold buffer.
x
}
x # for all lines in the range: inspect the
# counter
/.../! { # if it is not three or more (the counter
# becomes three with the start line of the
# second matching range)
x
d # delete the line
}
x
}
The xs in that code are largely to ensure that the counter ends up back in the hold buffer when the whole thing is over. The // bit works because // repeats the last attempted regex, which is the start pattern of the range for its first line and the end pattern for the others.
Just use awk (the cat -n is just so you can see which line numbers are being printed):
$ cat -n file | awk '/PATTERNSTART/{f=1;++c} !(f && c==1); /PATTERNEND/{f=0}'
6 PATTERNSTART
7 LINE1
8 LINE2
9 LINE3
10 PATTERNEND
11 TESTLINE1
12 TESTLINE2
13 TESTLINE3
14 PATTERNSTART
15 LINE1
16 LINE2
17 LINE3
18 PATTERNEND
Set the test on c to be the occurrence number of whatever block you want to skip:
$ cat -n file | awk '/PATTERNSTART/{f=1;++c} !(f && c==2); /PATTERNEND/{f=0}'
1 PATTERNSTART
2 LINE1
3 LINE2
4 LINE3
5 PATTERNEND
11 TESTLINE1
12 TESTLINE2
13 TESTLINE3
14 PATTERNSTART
15 LINE1
16 LINE2
17 LINE3
18 PATTERNEND
sed '/PATTERNSTART/,/PATTERNEND/{0,/PATTERNEND/d}' file
You can do this with that (quite ugly I admit) sed code:
sed -e '/PATTERNSTART/,/PATTERNEND/{ /PATTERNEND/b after; d; :after; N; s/^.*\n//; :loop; n; b loop; }' testsed.txt
Let's look at it more closely:
sed -e '/PATTERNSTART/,/PATTERNEND/{
/PATTERNEND/b after; # if we're at the end of match, go to the hack
d; # if not, delete the line and start a new cycle
:after; # Begin "end of part to delete"
N; # get the next line...
s/^.*\n//; # ...and forget about this one
# We now only have to print everything:
:loop; n; b loop;
# And you sir, have your code!
}' testsed.txt
This might work for you (GNU sed):
sed '/PATTERNSTART/,/PATTERNEND/{x;/./{x;b};x;/PATTERNEND/h;d}' file
This uses the hold space as a switch. Check the file for the desired range of lines. If encountered and the hold space is not empty, the first range has already been deleted so bail out and print as normal. If not, set the switch on the last pattern match and delete all lines within the range.
Use
sed -e '/PATTERNSTART/,/PATTERNEND/d' -e '/PATTERNEND/q' some_file.txt
The q command causes sed to quit.
Related
I have two files: file1, file with content as follows:
file1:
f1 line1
f1 line2
f1 line3
file2:
f2 line1
f2 line2
f2 line3
f2 line4
Wonder how I can use sed to read line 1 to line 3 from file1 and use these lines to replace line2 to line 3 in file 2.
The output should be like this:
f2 line1
f1 line1
f1 line2
f1 line3
f2 line4
Can any one help? Thanks,
If you have GNU sed which supports s///e, you can use the e to call head -n3 file1
I think (w/o yet testing):
sed '2d;3s/.*/head -n3 file1/e' file2
I'll go verify...
sed is the best tool for doing s/old/new on individual strings. That's not what you're trying to do so you shouldn't be considering trying to use sed for it. Using any awk in any shell on every UNIX box:
$ cat tst.awk
NR==FNR {
if ( FNR <= num ) {
new = new $0 ORS
}
next
}
(beg <= FNR) && (FNR <= end) {
if ( !done++ ) {
printf "%s", new
}
next
}
{ print }
.
$ awk -v num=3 -v beg=2 -v end=3 -f tst.awk file1 file2
f2 line1
f1 line1
f1 line2
f1 line3
f2 line4
To read a different number of lines from file1 just change the value of num, to use the replacement text for a different range of lines in file2 just change the values of beg and end. So, for example, if you wanted to use 10 lines of data from a pipe (e.g. seq 15 |) instead of file1 and wanted to replace between lines 3 and 17 of a file2 like you have but with 20 lines instead of 4 then you'd leave the awk script as-is and just tweak how you call it:
$ seq 15 | awk -v num=10 -v beg=3 -v end=17 -f tst.awk - file2
f2 line1
f2 line2
1
2
3
4
5
6
7
8
9
10
f2 line18
f2 line19
f2 line20
Try doing the same with sed for such a minor change in requirements and note how you're changing the script and it's not portable to other sed versions.
I need to extract text between the first occurrence of a word called "BEGIN" and the last occurrence of a word called "END" using sed.
Input:
line1
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
line9
line10
Expected Output:
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
My approach:
It extracts text between BEGIN and END. Here there are two BEGIN & END statement, and my solution extracts text between these words.
My solution fails to extract text between first occurence of word1 (BEGIN) and last occurence of word2 (END).
dsonachalam$ sed -n -e '/^BEGIN$/,/^END$/p' logs.txt
BEGIN
line2
line3
END
BEGIN
line6
line7
ENDED
END
start=$(grep -n "BEGIN" $FILE_NAME |cut -f1 -d:|head -n 1)
end=$(grep -n "END" $FILE_NAME |cut -f1 -d:|tail -n 1)
sed -n $start,"$end"p $FILE_NAME
If the file is small enough to fit memory:
$ perl -0777 -ne 'print /(^BEGIN\n.*^END\n)/ms' ip.txt
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
With a 2-pass approach to avoid having to store any text in memory so it'll work for any size input file and with 1 call to 1 standard UNIX tool to avoid spawning multiple subshells, the following will work using any awk in any shell on every UNIX box:
$ awk '
NR==FNR{ if (!beg && /BEGIN/) beg=NR; if (/END/) end=NR; next}
(beg <= FNR) && (FNR <= end)
' file file
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
A one-liner sed command would suffice (using GNU sed):
sed -E '/^BEGIN$/,$!d; :a; /(^|\n).*END$/{p;d}; $d; N; ba'
/^BEGIN$/,$!d; deletes lines above the first BEGIN. :a; /(^|\n).*END$/{p;d}; $d; N; ba accumulates ("slurps") lines into pattern space. Whenever an END line is read then the accumulated lines are printed out and pattern space is deleted starting a new cycle. Note that this "slurping" approach may be slow, or even may crash the sed process if the input is too large.
Content of input file:
line1
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
line9
line10
and using GNU sed 4.8
sed -E '/^BEGIN$/,$!d; :a; /(^|\n).*END$/{p;d}; $d; N; ba' inputfile
prints
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
Another approach would be:
lastend=$(sed -n '/^END$/=' inputfile | tail -1)
[[ -n $lastend ]] && sed -n "/^BEGIN\$/,${lastend}p" inputfile
This two-pass approach doesn't suffer from "slurping" lines.
This might work for you (GNU sed):
sed -n '/\<BEGIN\>/{x;:a;n;/\<END\>/{x;p;ba};H;$!ba;x;//P}' file
Set automatic printing off by using the -n option and then focus on lines following one that contains the word BEGIN.
Swap to the hold space (HS) and initiate a loop that fetches the next line and if that line contains the word END swap to the HS, print its contents and repeat.
If the current line does not contain the word END, append the current line to the HS and unless it is the end of file repeat.
At the end of file, print the first line of the HS if it begins END and whatever the condition allow the file processing to terminate.
Thus processing of lines only occurs once the word BEGIN has been seen and printing of those lines every time the word END occurs.
I have a Verilog file that looks like this:
Line1
Line2
Line3
module1
Line4
Line5
Line6
endmodule
Line7
Line8
module2
Line9
Line11
Line12
Line13
endmodule
Line15
Line16
Here I want to delete whole modules and the module names will be specified by me. Ex: I want to delete module1 so I want lines from module1 to endmodule to be deleted(module1, Line4, Line5, Line6, endmodule). And keep the other remaining modules intact.
My expected output when I delete module1:
Line1
Line2
Line3
Line7
Line8
module2
Line9
Line11
Line12
Line13
endmodule
Line15
Line16
How do I go about it?
I'd use sed for this, not perl:
sed -e '/module1/,/endmodule/d' input.txt
X,Y specifies a range of lines to do something on, starting with the one matching X and ending with the one matching Y, and the d command basically says to delete the current line instead of printing it like normal.
If you're set on perl, the scalar form of the range operator (..) allows for the same sort of thing:
perl -ne 'print unless /module1/ .. /endmodule/' data.txt
I am using sed to combine lines of text files in a directory.
The command cd dir && sed -e 'N;s/\n//' *.txt works fine to do that but is there any way it can be tweaked to only combine the line sentences that start with ** with following sentence ending in **. So
This is Line1
**This is Line2
This is Line3**
This is Line4
This is Line5
Becomes
This is Line1
** This is Line2 This is Line3**
This is Line4
This is Line5
etc
sed is for simple subsitutions on individual lines, that is all. For anything else you should be using awk. This will do what you show with your sample input/output:
$ awk '{ORS=(/^\*\*/?FS:RS)}1' file
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
but of course it doesn't address any of the requirements you haven't shared with us yet (e.g. what to do when a line starts with ** but the next line doesn't end with ** or vice-versa or a line starts and ends with ** or a line starting with ** is at the end of the input file or....).
Sed is your friend
$ sed '/^\*\*/{:l1;/\*\*$/!{N;bl1};s/\n/ /g;}' file
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
You can use this sed:
sed '/^\*\*/{:loop; N; /\*\*$/{s/\n/ /g;p;d;}; b loop}' file
Test:
$ cat file
This is Line1
**This is Line2
in between
This is Line3**
This is Line4
**This is Line5
This is Line6**
$ sed '/^\*\*/{:loop; N; /\*\*$/{s/\n/ /g;p;d;}; b loop;}' file
This is Line1
**This is Line2 in between This is Line3**
This is Line4
**This is Line5 This is Line6**
$ cat ip.txt
This is Line1
**This is Line2
This is Line3**
This is Line4
This is Line5
$ # this slurps entire file
$ perl -0777 -pe 's/^(\*\*.*)\n(.*\*\*)$/$1 $2/mg' ip.txt
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
$ # can use this if testing start of line for ** is enough
$ perl -pe 's/\n/ / if /^\Q**/' ip.txt
Reference: How do I search and replace across multiple lines with Perl?
I saw some answers here, but can't make them work for me.
I have text like this:
line1
line2 text=^M
line3
line4
basically what i need is to replace =^M\n with empty character something like s/=^M\n//, so the output is (^M is special character ctrl+v ctrl+m)
line1
line2 textline3
line4
I know it's some sed branches but I have problem with making them work.
One way:
$ sed '/^M/{N;s/=^M\n//;}' file
line1
line2 textline3
line4
Where ^M has to be typed as: Ctrl-V + Ctrl-M
awk solution for this
#awk -f myawk.sh temp.txt
BEGIN { print "Start Records"}
{
if ($2 ~ /=\^M/){
a=$1;
gsub("=\\^M","",$2);
b=$2; f=1
}
else {
if(f==1){
print a""b""$0;
a="";
b="";
}else{
print $0
}
}
}
END {print "Process Complete"}