SED: Operate on Last seven lines regardless of file length - sed

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?

Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.

You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile

perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.

This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)

This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D

Related

I want to print the last line of group using sed

I have file which is shown below
Section1
George, 1998-1995
Peter, 1999-1990
Simon, 1988-1960
Section2
Gery, 2019-2015
John, 1984-1983
Thomson, 1978-1965
When i give Section1 Expected output is
Simon, 1988-1960
Like this i have lots of sections. I want to achieve this with sed not using awk.
I tried like this . But it has the line number hard coding. And also it is printing the complete range
sed -n '/Section1/,4{p}'
Here i could able to remove the hardcoding. But unable to print the last line. And also next section name also coming.
sed -n '/Section1/ , /Section./{p}'
This might work for you (GNU sed):
sed '$b;N;/\nSection/P;D' file
Make a moving window of two lines and print the first line if the second line is begins Section and always the last line.
For the last line of a specific section use:
sed -n '/^Section1/{:a;h;$!{n;/^\S/!ba};x;s/^\s*//p}' file
A gnu awk solution.
awk -v RS='Section' '$1=="1" {print $(NF-1),$NF}' file
Simon, 1988-1960
By setting Record Selector to Section, awk works in block. Then print the second latest and the latest field of block matching 1, since Section is stripped of.
You may consider using
sed -n '/^Section1$/,/^Section[0-9]*$/{:a;h;n;/^Section[0-9]*$/!ba;x;s/^[ \t]*//;p}' file > newfile
See the online demo.
Details
-n - the switch suppresses default line output mode
/^Section1$/,/^Section[0-9]*$/ - a block of lines between a line that is equal to Section1 and a line that fully matches a Section and any 0 or more digits pattern (the next {...} group of commands relates to the range matched with this)
:a - sets a label named a
h - copies the current line into hold buffer
n - discards the current pattern space value and reads the next line into it
/^Section[0-9]*$/!ba - if the pattern space value does not match the end block line go back to label a
x - else, once we get to the last line, the previous one is in hold space, so x is used to swap hold and pattern space
s/^[ \t]*// - remove initial whitespace
p - print the pattern space.
Regex:
(Section1)((\n.*,.*)*\n\s*)(?'lastLine'.*)
Test here.
I did not understand exactly what you want to do with the result, so I cannot tell you the exact sed command.

How does this sed command: "sed -e :a -e '$d;N;2,10ba' -e 'P;D' " work?

I saw a sed command to delete the last 10 rows of data:
sed -e :a -e '$d;N;2,10ba' -e 'P;D'
But I don't understand how it works. Can someone explain it for me?
UPDATE:
Here is my understanding of this command:
The first script indicates that a label “a” is defined.
The second script indicates that it first determines whether the
line currently reading pattern space is the last line. If it is,
execute the "d" command to delete it and restart the next cycle; if
not, skip the "d" command; then execute "N" command: append a new
line from the input file to the pattern space, and then execute
"2,10ba": if the line currently reading the pattern space is a line
in the 2nd to 10th lines, jump to label "a".
The third script indicates that if the line currently read into
pattern space is not a line from line 2 to line 10, first execute "P" command: the first line
in pattern space is printed, and then execute "D" command: the first line in pattern
space is deleted.
My understanding of "$d" is that "d" will be executed when sed reads the last line into the pattern space. But it seems that every time "ba" is executed, "d" will be executed, regardless of Whether the current line read into pattern space is the last line. why?
:a is a label. $ in the address means the last line, d means delete. N stands for append the next line into the pattern space. 2,10 means lines 2 to 10, b means branch (i.e. goto), P prints the first line from the pattern space, D is like d but operates on the pattern space if possible.
In other words, you create a sliding window of the size 10. Each line is stored into it, and once it has 10 lines, lines start to get printed from the top of it. Every time a line is printed, the current line is stored in the sliding window at the bottom. When the last line gets printed, the sliding window is deleted, which removes the last 10 lines.
You can modify the commands to see what's getting deleted (()), stored (<>), and printed by the P ([]):
$ printf '%s\n' {1..20} | \
sed -e ':a ${s/^/(/;s/$/)/;p;d};s/^/</;s/$/>/;N;2,10ba;s/^/[/;s/$/]/;P;D'
[<<<<<<<<<<1>
[<2>
[<3>
[<4>
[<5>
[<6>
[<7>
[<8>
[<9>
[<10>
(11]>
12]>
13]>
14]>
15]>
16]>
17]>
18]>
19]>
20])
a simpler resort, if your data in 'd' file by gnu sed,
sed -Ez 's/(.*\n)(.*\n){10}$/\1/' d
^
pointed 10 is number of last line to remove
just move the brace group to invert, ie. to get only the last 10 lines
sed -Ez 's/.*\n((.*\n){10})$/\1/' d

Join current and next line, then the next line and its successor using sed

Given the input:
1234
5678
9abc
defg
hijk
I'd like the output:
12345678
56789abc
9abcdefg
defghijk
There are lots of examples using sed(1) to joining a pair of lines, then the next pair after that pair and so on. But I haven't found an example that joins lines 1 with 2, 2 with 3, 3 with 4, ...
sed(1) solution preferred. Other options are less interesting - e.g., awk(1), python(1) and perl(1) implementations are fairly easy. I'm specifically stumped on a successful sed(1) incantation.
sed '1h;1d;x;G;s/\n//'
I guess it can be done some other way, but this works for me:
$ cat in
1234
5678
9abc
defg
hijk
$ sed '1h;1d;x;G;s/\n//' in
12345678
56789abc
9abcdefg
defghijk
How it works: we put first line to hold space and that's it for first line. Every line after the first - swap it with hold space, append the new hold space to the old hold space, remove newline.
This does it (now improved, thanks to potong's hint):
$ sed -n 'N;s/\n\(.*\)/\1&/;P;D' infile
12345678
56789abc
9abcdefg
defghijk
In detail:
N # Append next line to pattern space
s/\n\(.*\)/\1&/ # Make 111\n222 into 111222\n222
P # Print up to first newline
D # Delete up to first newline
The substitution makes these two lines
1111
2222
which in the pattern space look like 1111\n2222 into
11112222
2222
and the P and D print/delete the first line from the pattern space.
Notice that we never hit the bottom of the script (D starts a new loop) until the very last line, where N can't fetch a new line and would just print the last line on its own, if we didn't suppress that with -n.
Tweaking another answer (full credit to #aragaer) to handle single line input (and be more portable to bsd sed as well as gnu sed than the original version - update: that answer has been edited another way for portability):
% cat >> inputfile << eof
12
34
56
eof
% sed -e '1{$p;h;d' -e '}' -e 'x;G;s/\n//' inputfile # bsd + gnu sed [1]
1234
3456
or
% cat joinsuccessive.sed
1{
$p;h;d
}
x;G;s/\n//
% sed -f joinsuccessive.sed inputfile
1234
3456
Here's an annotated version.
1{ # special case for first line only:
$p # even MORE special case: print current line for input with
# only a single line
h # add line 1 to hold space (for joining with successive lines)
d # delete pattern space and move to next line (without printing)
}
x # for lines 2+, swap pattern space (current line) and hold space
G # add newline + hold space (now has current line) to pattern space
# (previous line) giving prev line, newline, curr line in pattern
# space (and curr line is in hold space)
s/\n// # remove newline added by G (between lines) before printing the
# pattern space
[1] bsd sed(1) wants a closing brace to be on a line by itself. Use -e to "build" the script or put the commands in a sed script file (and use -f joinsuccessive.sed).

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

What does the 'N' command do in sed?

It looks like the 'N' command works on every other line:
$ cat in.txt
a
b
c
d
$ sed '=;N' in.txt
1
a
b
3
c
d
Maybe that would be natural because command 'N' joins the next line and changes the current line number. But (I saw this here):
$ sed 'N;$!P;$!D;$d' thegeekstuff.txt
The above example deletes the last two lines of a file. This works not only for even-line-numbered files but also for odd-line-numbered files. In this example 'N' command runs on every line. What's the difference?
And could you tell me why I cannot see the last line when I run sed like this:
# sed N odd-lined-file.txt
Excerpt from info sed:
`sed' operates by performing the following cycle on each lines of
input: first, `sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space. Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.
...
When the end of the script is reached, unless the `-n' option is in
use, the contents of pattern space are printed out to the output
stream,
...
Unless special commands (like 'D') are used, the pattern space is
deleted between two cycles
...
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then `sed'
exits without processing any more commands.
...
`D'
Delete text in the pattern space up to the first newline. If any
text is left, restart cycle with the resultant pattern space
(without reading a new line of input), otherwise start a normal
new cycle.
This should pretty much resolve your query. But still I will try to explain your three different cases:
CASE 1:
sed reads a line from input. [Now there is 1 line in pattern space.]
= Prints the current line no.
N reads the next line into pattern space.[Now there are 2 lines in pattern space.]
If there is no next line to read then sed exits here. [ie: In case of odd lines, sed exits here - and hence the last line is swallowed without printing.]
sed prints the pattern space and cleans it. [Pattern space is empty.]
If EOF reached sed exits here. Else Restart the complete cycle from step 1. [ie: In case of even lines, sed exits here.]
Summary: In this case sed reads 2 lines and prints 2 lines at a time. Last line is swallowed it there are odd lines (see step 3).
CASE 2:
sed reads a line from input. [Now there is 1 line in pattern space.]
N reads the next line into pattern space. [Now there are 2 lines in pattern space.]
If it fails exit here. This occurs only if there is 1 line.
If its not last line($!) print the first line(P) from pattern space. [The first line from pattern space is printed. But still there are 2 lines in pattern space.]
If its not last line($!) delete the first line(D) from pattern space [Now there is only 1 line (the second one) in the pattern space.] and restart the command cycle from step 2. And its because of the command D (see the excerpt above).
If its last line($) then delete(d) the complete pattern space. [ie. reached EOF ] [Before beginning this step there were 2 lines in the pattern space which are now cleaned up by d - at the end of this step, the pattern space is empty.]
sed automatically stops at EOF.
Summary: In this case :
sed reads 2 lines first.
if there is next line available to read, print the first line and read the next line.
else delete both lines from cache. This way it always deletes the last 2 line.
CASE 3:
Its the same case as CASE:1, just remove the Step 2 from it.