How does this sed command: "sed -e :a -e '$d;N;2,10ba' -e 'P;D' " work? - sed

I saw a sed command to delete the last 10 rows of data:
sed -e :a -e '$d;N;2,10ba' -e 'P;D'
But I don't understand how it works. Can someone explain it for me?
UPDATE:
Here is my understanding of this command:
The first script indicates that a label “a” is defined.
The second script indicates that it first determines whether the
line currently reading pattern space is the last line. If it is,
execute the "d" command to delete it and restart the next cycle; if
not, skip the "d" command; then execute "N" command: append a new
line from the input file to the pattern space, and then execute
"2,10ba": if the line currently reading the pattern space is a line
in the 2nd to 10th lines, jump to label "a".
The third script indicates that if the line currently read into
pattern space is not a line from line 2 to line 10, first execute "P" command: the first line
in pattern space is printed, and then execute "D" command: the first line in pattern
space is deleted.
My understanding of "$d" is that "d" will be executed when sed reads the last line into the pattern space. But it seems that every time "ba" is executed, "d" will be executed, regardless of Whether the current line read into pattern space is the last line. why?

:a is a label. $ in the address means the last line, d means delete. N stands for append the next line into the pattern space. 2,10 means lines 2 to 10, b means branch (i.e. goto), P prints the first line from the pattern space, D is like d but operates on the pattern space if possible.
In other words, you create a sliding window of the size 10. Each line is stored into it, and once it has 10 lines, lines start to get printed from the top of it. Every time a line is printed, the current line is stored in the sliding window at the bottom. When the last line gets printed, the sliding window is deleted, which removes the last 10 lines.
You can modify the commands to see what's getting deleted (()), stored (<>), and printed by the P ([]):
$ printf '%s\n' {1..20} | \
sed -e ':a ${s/^/(/;s/$/)/;p;d};s/^/</;s/$/>/;N;2,10ba;s/^/[/;s/$/]/;P;D'
[<<<<<<<<<<1>
[<2>
[<3>
[<4>
[<5>
[<6>
[<7>
[<8>
[<9>
[<10>
(11]>
12]>
13]>
14]>
15]>
16]>
17]>
18]>
19]>
20])

a simpler resort, if your data in 'd' file by gnu sed,
sed -Ez 's/(.*\n)(.*\n){10}$/\1/' d
^
pointed 10 is number of last line to remove
just move the brace group to invert, ie. to get only the last 10 lines
sed -Ez 's/.*\n((.*\n){10})$/\1/' d

Related

Sed: find, replace and then append result to original line

I am on Mac, I want to find a pattern in lines, replace it with something, then append the resulting string to the end of the original line. Here is what I tried:
echo "test='123'" | sed -E '/([^a-z])/ s/$/ \1/'
sed: 1: "/([^a-z])/ s/$/ \1/": \1 not defined in the RE
What do I need to define \1? I thought I did it with ([^a-z]). No?
Edit: Perhaps this code will represent better what I want:
1) echo "test='123'" | sed 's/[a-zA-Z0-9]//g'
2) I want the new line = original line + line #1 above
In other words:
Before (what I get): test='123'
After (what I want): test='123' =''
You can edit this command this way:
echo "test='123'" | sed -E 'h;s/([a-zA-Z0-9])//g;G;s/(.*)\n(.*)/\2\1/'
For readability, the script, line by line, reads
h
s/([a-zA-Z0-9])//g
G
s/(.*)\n(.*)/\2\1/
h stores the current line in the hold space,
your s command does what it does
G appends the content of the hold space, i.e. the original line, to the pattern space, i.e. the current line as you have edited it, putting a newline \n in between.
another s command reorders the two pieces, also removing the \n that the G command inserted.
Comments
Your original attempt sed -E '/([^a-z])/ s/$/ \1/' could not work because \1 refers to what is captured by the leftmost (…) group in the search portion of the s command, it does not "remember" the group(s) you used to address the line.
Once you print the pattern space with p, a newline comes with it, and once it's been printed, there's no way you can remove it within the same sed program.

how to understand dollar sign ($) in sed script programming?

everybody.
I don't understand dollar sign ($) in sed script programming, it is stand for last line of a file or a counter of sed?
I want to reverse order of lines (emulates "tac") of /etc/passwd. like following:
$ cat /etc/passwd | wc -l ----> 52 // line numbers
$ sed '1!G;h;$!d' /etc/passwd | wc -l ----> 52 // working correctly
$ sed '1!G;h;$d' /etc/passwd | wc -l ----> 1326 // no ! followed by $
$ sed '1!G;h;$p' /etc/passwd | wc -l ----> 1430 // instead !d by p
Last two example don't work right, who can tell me what mean does dollar sign stand for?
All the commands "work right." They just do something you don't expect. Let's consider the first version:
sed '1!G;h;$!d
Start with the first two commands:
1!G; h
After these two commands have been executed, the pattern space and the hold space both contain all the lines reads so far but in reverse order.
At this point, if we do nothing, sed would take its default action which is to print the pattern space. So:
After the first line is read, it would print the first line.
After the second line is read, it would print the second line followed by the first line.
After the third line is read, it would print the third line, followed by the second line, followed by the first line.
And so on.
If we are emulating tac, we don't want that. We want it to print only after it has read in the last line. So, that is where the following command comes in:
$!d
$ means the last line. $! means not-the-last-line. $!d means delete if we are not on the last line. Thus, this tells sed to delete the pattern space unless we are on the last line, in which case it will be printed, displaying all lines in reverse order.
With that in mind, consider your second example:
sed '1!G;h;$d'
This prints all the partial tacs except the last one.
Your third example:
sed '1!G;h;$p'
This prints all the partial tacs up through the last one but the last one is printed twice: $p is an explicit print of the pattern space for the last line in addition to the implicit print that would happen anyway.

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

What does the 'N' command do in sed?

It looks like the 'N' command works on every other line:
$ cat in.txt
a
b
c
d
$ sed '=;N' in.txt
1
a
b
3
c
d
Maybe that would be natural because command 'N' joins the next line and changes the current line number. But (I saw this here):
$ sed 'N;$!P;$!D;$d' thegeekstuff.txt
The above example deletes the last two lines of a file. This works not only for even-line-numbered files but also for odd-line-numbered files. In this example 'N' command runs on every line. What's the difference?
And could you tell me why I cannot see the last line when I run sed like this:
# sed N odd-lined-file.txt
Excerpt from info sed:
`sed' operates by performing the following cycle on each lines of
input: first, `sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space. Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.
...
When the end of the script is reached, unless the `-n' option is in
use, the contents of pattern space are printed out to the output
stream,
...
Unless special commands (like 'D') are used, the pattern space is
deleted between two cycles
...
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then `sed'
exits without processing any more commands.
...
`D'
Delete text in the pattern space up to the first newline. If any
text is left, restart cycle with the resultant pattern space
(without reading a new line of input), otherwise start a normal
new cycle.
This should pretty much resolve your query. But still I will try to explain your three different cases:
CASE 1:
sed reads a line from input. [Now there is 1 line in pattern space.]
= Prints the current line no.
N reads the next line into pattern space.[Now there are 2 lines in pattern space.]
If there is no next line to read then sed exits here. [ie: In case of odd lines, sed exits here - and hence the last line is swallowed without printing.]
sed prints the pattern space and cleans it. [Pattern space is empty.]
If EOF reached sed exits here. Else Restart the complete cycle from step 1. [ie: In case of even lines, sed exits here.]
Summary: In this case sed reads 2 lines and prints 2 lines at a time. Last line is swallowed it there are odd lines (see step 3).
CASE 2:
sed reads a line from input. [Now there is 1 line in pattern space.]
N reads the next line into pattern space. [Now there are 2 lines in pattern space.]
If it fails exit here. This occurs only if there is 1 line.
If its not last line($!) print the first line(P) from pattern space. [The first line from pattern space is printed. But still there are 2 lines in pattern space.]
If its not last line($!) delete the first line(D) from pattern space [Now there is only 1 line (the second one) in the pattern space.] and restart the command cycle from step 2. And its because of the command D (see the excerpt above).
If its last line($) then delete(d) the complete pattern space. [ie. reached EOF ] [Before beginning this step there were 2 lines in the pattern space which are now cleaned up by d - at the end of this step, the pattern space is empty.]
sed automatically stops at EOF.
Summary: In this case :
sed reads 2 lines first.
if there is next line available to read, print the first line and read the next line.
else delete both lines from cache. This way it always deletes the last 2 line.
CASE 3:
Its the same case as CASE:1, just remove the Step 2 from it.

SED: Operate on Last seven lines regardless of file length

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?
Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.
You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile
perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.
This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)
This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D