Delete strings between given two pattern using SED excluding matching pattern [duplicate]

Delete strings between given two pattern using SED excluding matching pattern [duplicate] - sed

I have a file like the following and I would like to print the lines between two given patterns PAT1 and PAT2.
1
2
PAT1
3 - first block
4
PAT2
5
6
PAT1
7 - second block
PAT2
8
9
PAT1
10 - third block
I have read How to select lines between two marker patterns which may occur multiple times with awk/sed but I am curious to see all the possible combinations of this, either including or excluding the pattern.
How can I print all lines between two patterns?

Print lines between PAT1 and PAT2
$ awk '/PAT1/,/PAT2/' file
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block
Or, using variables:
awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file
How does this work?
/PAT1/ matches lines having this text, as well as /PAT2/ does.
/PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
/PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.
Print lines between PAT1 and PAT2 - not including PAT1 and PAT2
$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3 - first block
4
7 - second block
10 - third block
This uses next to skip the line that contains PAT1 in order to avoid this being printed.
This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.
Print lines between PAT1 and PAT2 - including PAT1
$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3 - first block
4
PAT1
7 - second block
PAT1
10 - third block
By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.
Print lines between PAT1 and PAT2 - including PAT2
$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3 - first block
4
PAT2
7 - second block
PAT2
10 - third block
By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.
Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs
This is based on a solution by Ed Morton.
awk 'flag{
if (/PAT2/)
{printf "%s", buf; flag=0; buf=""}
else
buf = buf $0 ORS
}
/PAT1/ {flag=1}' file
As a one-liner:
$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3 - first block
4
7 - second block
# note the lack of third block, since no other PAT2 happens after it
This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.

What about the classic sed solution?
Print lines between PAT1 and PAT2 - include PAT1 and PAT2
sed -n '/PAT1/,/PAT2/p' FILE
Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2
GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
Any sed1
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' FILE
or even (Thanks Sundeep):
GNU sed
sed -n '/PAT1/,/PAT2/{//!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{//!p;}' FILE
Print lines between PAT1 and PAT2 - include PAT1 but not PAT2
The following includes just the range start:
GNU sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p;}' FILE
Print lines between PAT1 and PAT2 - include PAT2 but not PAT1
The following includes just the range end:
GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p;}' FILE
1 Note about BSD/Mac OS X sed
A command like this here:
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
Would emit an error:
▶ sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
sed: 1: "/PAT1/,/PAT2/{/PAT1/!{/ ...": extra characters at the end of p command
For this reason this answer has been edited to include BSD and GNU versions of the one-liners.

Using grep with PCRE (where available) to print markers and lines between markers:
$ grep -Pzo "(?s)(PAT1(.*?)(PAT2|\Z))" file
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block
-P perl-regexp, PCRE. Not in all grep variants
-z Treat the input as a set of lines, each
terminated by a zero byte instead of a newline
-o print only matching
(?s) DotAll, ie. dot finds newlines as well
(.*?) nongreedy find
\Z Match only at end of string, or before newline at the end
Print lines between markers excluding end marker:
$ grep -Pzo "(?s)(PAT1(.*?)(?=(\nPAT2|\Z)))" file
PAT1
3 - first block
4
PAT1
7 - second block
PAT1
10 - third block
(.*?)(?=(\nPAT2|\Z)) nongreedy find with lookahead for \nPAT2 and \Z
Print lines between markers excluding markers:
$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(?=(\nPAT2|\Z)))" file
3 - first block
4
7 - second block
10 - third block
(?<=PAT1\n) positive lookbehind for PAT1\n
Print lines between markers excluding start marker:
$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(PAT2|\Z))" file
3 - first block
4
PAT2
7 - second block
PAT2
10 - third block

For completeness, here is a Perl solution:
Print lines between PAT1 and PAT2 - include PAT1 and PAT2
perl -ne '/PAT1/../PAT2/ and print' FILE
or:
perl -ne 'print if /PAT1/../PAT2/' FILE
Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2
perl -ne '/PAT1/../PAT2/ and !/PAT1/ and !/PAT2/ and print' FILE
or:
perl -ne 'if (/PAT1/../PAT2/) {print unless /PAT1/ or /PAT2/}' FILE
Print lines between PAT1 and PAT2 - exclude PAT1 only
perl -ne '/PAT1/../PAT2/ and !/PAT1/ and print' FILE
Print lines between PAT1 and PAT2 - exclude PAT2 only
perl -ne '/PAT1/../PAT2/ and !/PAT2/ and print' FILE
See also:
Range operator section in perldoc perlop for more on the /PAT1/../PAT2/ grammar:
Range operator
...In scalar context, ".." returns a boolean value. The operator is
bistable, like a flip-flop, and emulates the line-range (comma)
operator of sed, awk, and various editors.
For the -n option, see perldoc perlrun, which makes Perl behave like sed -n.
Perl Cookbook, 6.8 for a detailed discussion of extracting a range of lines.

Here is another approach
Include both patterns (default)
$ awk '/PAT1/,/PAT2/' file
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block
Mask both patterns
$ awk '/PAT1/,/PAT2/{if(/PAT2|PAT1/) next; print}' file
3 - first block
4
7 - second block
10 - third block
Mask start pattern
$ awk '/PAT1/,/PAT2/{if(/PAT1/) next; print}' file
3 - first block
4
PAT2
7 - second block
PAT2
10 - third block
Mask end pattern
$ awk '/PAT1/,/PAT2/{if(/PAT2/) next; print}' file
PAT1
3 - first block
4
PAT1
7 - second block
PAT1
10 - third block

Alternatively:
sed '/START/,/END/!d;//d'
This deletes all lines except for those between and including START and END, then the //d deletes the START and END lines since // causes sed to use the previous patterns.

You can do what you want with sed by suppressing the normal printing of pattern space with -n. For instance to include the patterns in the result you can do:
$ sed -n '/PAT1/,/PAT2/p' filename
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block
To exclude the patterns and just print what is between them:
$ sed -n '/PAT1/,/PAT2/{/PAT1/{n};/PAT2/{d};p}' filename
3 - first block
4
7 - second block
10 - third block
Which breaks down as
sed -n '/PAT1/,/PAT2/ - locate the range between PAT1 and PAT2 and suppress printing;
/PAT1/{n}; - if it matches PAT1 move to n (next) line;
/PAT2/{d}; - if it matches PAT2 delete line;
p - print all lines that fell within /PAT1/,/PAT2/ and were not skipped or deleted.

This is like a foot-note to the 2 top answers above (awk & sed). I needed to run it on a large number of files, and hence performance was important. I put the 2 answers to a load-test of 10000 times:
sedTester.sh
for i in `seq 10000`;do sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' patternTester >> sedTesterOutput; done
awkTester.sh
for i in `seq 10000`;do awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' patternTester >> awkTesterOutput; done
Here are the results:
zsh sedTester.sh 11.89s user 39.63s system 81% cpu 1:02.96 total
zsh awkTester.sh 38.73s user 60.64s system 79% cpu 2:04.83 total
sed solutions seems to be twice as fast as the awk solution (Mac OS).

This might work for you (GNU sed) on the proviso that PAT1 and PAT2 are on separate lines:
sed -n '/PAT1/{:a;N;/PAT2/!ba;p}' file
Turn off implicit printing by using the -n option and act like grep.
N.B. All solutions using the range idiom i.e. /PAT1/,/PAT2/ command suffer from the same edge case, where PAT1 exists but PAT2 does not and therefore will print from PAT1 to the end of the file.
For completeness:
# PAT1 to PAT2 without PAT1
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/^[^\n]*\n//p}' file
# PAT1 to PAT2 without PAT2
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/\n[^\n]*$//p}' file
# PAT1 to PAT2 without PAT1 and PAT2
sed -n '/PAT1/{:a;N;/PAT2/!ba;/\n.*\n/!d;s/^[^\n]*\n\|\n[^\n]*$/gp}' file
N.B. In the last solution PAT1 and PAT2 may be on consecutive lines and therefore a further edge case may arise. IMO both are deleted and nothing printed.

Related

Using sed to copy one line to another

I need to replace a line (4) with a copy of another line (6) in a range of files.
So far I know how to return a single line (although returns with carriage return?)...
sed -n '6p' *
also this doesn't work with * for files, only seems to return the first file.
And I can also replace a line with some chars...
sed -i '5s/.*/ 00/' *
But I cannot figure out how to do both together.
Edit: One step closer but now need to apply to multiple files (in the same folder). * reads in the first file only.
sed -i '4s/.*/sed -n '6p' file.nc/e' file.nc

This might work for you (GNU sed):
sed -Ei '4{:a;N;6!ba;s/^[^\n]*(\n.*\n(.*))/\2\1/}' file1 file2 file3 fileetc
Gather up lines between line 4 and 6 and then replace line 4 by line 6.
Alternative:
sed '4d;5,6{H;5h;6!d;G}' file
Delete line 4 (not needed).
In the range between lines 5 and 6, append a copy of line 6 to a copy of line 5 and append those copies to line 6.

Replace the last digit for another digit

I would like to replace each digit 6 that is at the end of each text line below (file source.txt) by the digit 5 using bash.
File content source.txt:
17692186044416
36184372088832
70368744177664
140737488356328
281474976710666
562949963421312
1126899906842624
2251799813686248
4503699627370496
9007199264740992
18014398609481984
36028797018963968
72057694037927936
144115188075856872
I have been tempted the command below:
sed 's/\(.*\)5/\16/' source.txt > target.txt
But target.txt corresponds exactly to source.txt showing that there was no change, I believe the syntax for this sed command should be different when the pattern is numbers.
The expected content of target.txt should be as below:
17692186044415
36184372088832
70368744177664
140737488356328
281474976710665
562949963421312
1126899906842624
2251799813686248
4503699627370495
9007199264740992
18014398609481984
36028797018963968
72057694037927935
144115188075856872
I would like a help in understanding what is happening and how can I even use AWK or another other than sed

Use this Perl one-liner:
perl -pe 's/6$/5/' source.txt > target.txt
Example:
echo '16' | perl -pe 's/6$/5/'
# Prints: 15
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/THIS/THAT/ : replace THIS with THAT.
$ : end of the line (in regex).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlrequick: Perl regular expressions quick start

Replacing last 6 (not necessarily at the end) with 5:
sed 's/\(.*\)6/\15/' source.txt > target.txt
Replacing 6 located at the end with 5:
sed 's/6$/5/' source.txt > target.txt
EXPLANATION
--------------------------------------------------------------------------------
\( group and capture to \1:
--------------------------------------------------------------------------------
.* any character (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\) end of \1
--------------------------------------------------------------------------------
6 '6'
--------------------------------------------------------------------------------
$ end of string

How to find only the first and last line of a file using sed

I have a file called error_log for the apache and I want to see the first line and the last line of this file using sed command. Would you please help me how can I do that?
I know how to do that with head and tail commands, but I'm curious if it's possible in sed command too.
I have read the man sed and have googled a lot but nothing is found unfortunately.

This might work for you (GNU sed):
sed '1b;$b;d' file
All sed commands can be prefixed by either an address or a regexp. An address is either a line number or the $ which represents the last line. If neither an address or a regexp is present, the following command applies to all other lines.
The normal sed cycle, presents each line of input (less its newline) in the pattern space. The sed commands are then applied and the final act of the cycle is to re-attach the newline and print the result.
The b command controls command flow; if by itself it jumps out of the following sed commands to the final act of the cycle i.e. where the newline is re-attached and the result printed.
The d command deletes the pattern space and since there is nothing to be printed no further processing is executed (including re-attaching the newline and printing the result).
Thus the solution above prints the first line and the last and deletes the rest.
Sed has some command line options, one of which turns of the implicit printing of the result of the pattern space -n. The p command prints the current state of the pattern space. Thus the dual of the above solution is:
sed -n '1p;$p' file
N.B. If the input file is only one line the first solution will only print one line whereas the second solution will print the same line twice. Also if more than one file is input both solutions will print the first line of the first file and last line of the last file unless the -i option is in place, in which case each file will be amended. The -s option replicates this without amending each file but streams the results to stdout as if each file is treated separately.

This will work:
sed -n '1p ; $p' error_log
1p will print the first line and $p will print the last line.
As a suggestion, take a look at info sed, not only man sed. You can find the some examples about your question at the paragraph 2.1.

First line:
sed '2,$d' error_log
Last line:
sed '$!d' error_log

Based on your new requirement to output nothing if the input file is just 1 line (see How to find only the first and last line of a file using sed):
awk 'NR==1{first=$0} {last=$0} END{if (NR>1) print first ORS last}'
Original answer:
This is one of those things that you can, at face value, do easily enough in sed:
$ seq 3 7
3
4
5
6
7
$ seq 3 7 | sed -n '1p; $p'
3
7
but then how to handle edge cases like one line of input is non-obvious, e.g. is this REALLY the correct output:
$ printf 'foo\n' | sed -n '1p; $p'
foo
foo
or is the correct output just:
foo
and if the latter, how do you tweak that sed command to produce that output? #potong suggested a GNU sed command:
$ printf 'foo\n' | sed '1b;$b;d'
foo
which works but may be GNU-only (idk) and more importantly doesn't look much like the command we started with so the tiniest change in requirements meant a complete rewrite using different constructs.
Now, how about if you want to enhance it to, say, only print the first and last line if the file contained foo? I expect that'd be another challenging exercise with sed and probably involve non-portable constructs too.
It's just all pointless to learn how to do this with sed when you can use a different tool like awk and do whatever you like in a simple, consistent, portable syntax:
$ seq 3 7 |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
3
7
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
foo
foo
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first (NR>1 ? ORS last : "")}'
foo
$ printf '3\nfoo\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
3
7
$ printf '3\nbar\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
$
Notice that:
Every command looks like every other command.
A minor change in requirements leads to a minor change in the code, not a complete rewrite.
Once you learn how to do any given thing A, how to do similar things B, C, D, etc. just builds on top of the syntax you already used, you don't have to learn a completely different syntax.
Each of those commands will work using any awk in any shell on every UNIX box.
Now, how about if you want to do that for multiple files such as would be created by the following commands?
$ seq 3 7 > file1
$ seq 12 25 > file2
With awk you can just store the lines in an array for printing in the END:
$ awk 'FNR==1{first[++cnt]=$0} {last[cnt]=$0}
END{for (i=1;i<=cnt;i++) print first[i] ORS last[i]}' file1 file2
3
7
12
25
or with GNU awk you can print them from ENDFILE:
$ awk 'FNR==1{first=$0} {last=$0} ENDFILE{print first ORS last}' file1 file2
3
7
12
25
With sed? An exercise left for the reader.

Delete lines by pattern in specific range of lines

I want to remove lines from file by regex pattern using sed just like in this question Delete lines in a text file that containing a specific string, but only inside a range of lines (not in the whole file). I want to do it starting from some line number till the end of file.
This is how I've done it in combination with tail:
tail -n +731 file|sed '/some_pattern/d' >> file
manually remove edited range in file from previous step
Is there a shorter way to do it with sed only?
Something like sed -i '731,1000/some_pattern/d' file?

You can use this sed,
sed -i.bak '731,1000{/some_pattern/d}' yourfile
Test:
$ cat a
1
2
3
13
23
4
5
$ sed '2,4{/3/d}' a
1
2
23
4
5

You need $ address to match end of file. With GNU sed:
sed -i '731,${/some_pattern/d;}' file
Note that this can be slower than tail -n +number, because sed will start processing at start of file instead of doing lseek() like tail.
(With BSD sed you need sed -i '' ...)

sed is for simple substitutions on individual lines, that is all. For anything even marginally more interesting an awk solution will be clearer, more robust, portable, maintainable, extensible and better in just about ever other desirable attribute of software.
Given this sample input file:
$ cat file
1
2
3
4
1
2
3
4
1
2
3
4
The following script will print every line except a line containing the number 3 that occurs after the 6th line of the input file:
$ awk '!(NR>6 && /3/)' file
1
2
3
4
1
2
4
1
2
4
Want to only do the deletion between lines 6 and 10? No problem:
$ awk '!(NR>6 && NR<10 && /3/)' file
1
2
3
4
1
2
4
1
2
3
4
Want the skipped lines written to a log file? No problem:
awk 'NR>6 && /3/{print > "log";next} {print}' file
Written to stderr?
awk 'NR>6 && /3/{print | "cat>&2";next} {print}' file
Want a count of how many lines you deleted also written to stderr?
awk 'NR>6 && /3/{print | "cat>&2"; cnt++; next} {print} END{print cnt | "cat>&2"}' file
ANYTHING you want to do additionally or differently will be easy and build on what you start with. Try doing any of the above, or just about anything else, with a sed script that satisfies your original requirement.

awk to the rescue!
awk '!(NR>=731 && /pattern/)' input > output

How do I remove selected endlines with sed?

I'm trying to remove endlines for all lines in my file where the endline splits two equal signs
ie:
1
a=
=b
2
to
1
a==b
2
I have
sed -i.bak -e 's/=\n =//g' fileName
however, it doesn't seem to make any changes to my file. Is my script correct?

Try this. It saves the whole file content in pattern space and the removes all newline characters between equal signs.
sed -i.bak -e ':a ; $! { N; b a }; s/=\n=/==/g' fileName
It yields:
1
a==b
2

This might work for you (GNU sed):
sed '$!N;s/=\n=/==/;P;D' file
or
sed -e '$!N' -e 's/='$"\n"'=/==/' -e 'P' -e 'D' file

Different seds on different OSs treat newlines in different ways. The most portable way to specify a newline in sed is to use backslash before a return:
sed -e 's/=\
=//g' file
BUT that's not going to work for you until you invoke some other magic sed characters to slurp up multiple lines into a buffer, etc....
Just use awk:
$ cat file
1
a=
=b
2
$ awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file
1
a==b
2
Just prints the current line followed by nothing if the current line ends in an "=" or a newline otherwise. Couldn't be simpler and it's highly portable....
Oh, and if you want to change your original file, that's just:
awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file > tmp && mv tmp file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse