Explain this sed conditional branching behavior - sed

I have the following (gnu) sed script, which is intended to parse another sed script, and output distinct commands on a separate line.
In words, this script should put a newline after each semicolon ;, except semicolons that are inside a matching or substitution command.
Sed script:
#!/bin/sed -rf
# IDEA:
# replace ';' by ';\n' except when it's inside a match expression or subst. expression.
# Ignored patterns:
/^#/b # commented lines
/^$/b # empty lines
# anything in a single line, without semicolon except at the end
/^[^\n;]*;?$/b
# Processed patterns (put on separate lines):
# Any match preceding a semicolon, or the end of the line, or a substitution
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
s/^\\(.)[^\1]+\1[^;s]*;?/&\n/;t printtopline
# Any substitution (TODO)
# Any other command, separated by semicolon
s/\;/\;\n/; t printtopline;
:printtopline
P;D; # print top line, delete it, start new cycle
For example, I tested it with the following file (actually adapted from an answer of #ctac_ to one of my previous sed questions):
Input file:
#!/bin/sed -f
#/^>/N;
:A;
/\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label
#h;
#s/\(.*\)\n.*/\1/p;
s/^>//g;P
#x;
#s/.*\n//;
D
bA;
Output
The above script produces the right output, for example, the line /\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label becomes:
/\n>/!{s/\n/ /;
N;
bA};
# join next line if not a sequence label
Question
However, could you help me understand why this part of the script works:
s/\;/\;\n/; t printtopline;
:printtopline
?
I seems to me that the branching command t printtopline is useless here. I thought whatever the success of the substitution, the next thing to be executed would be :printtopline.
However, if I comment out the t command, or if I replace it with b, the script produces the following output lines:
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
From info sed, here is the explanation of t:
't LABEL'
Branch to LABEL only if there has been a successful 's'ubstitution
since the last input line was read or conditional branch was taken.
The LABEL may be omitted, in which case the next cycle is started.
Why isn't the t command immediately followed by its label not behaving like no command at all or the b command?

The crucial part is this:
Branch to label only if there has been a successful substitution since the last input line was read or conditional branch was taken.
I.e. t looks into the past and takes into account the success of all recent substitutions up to the most recent
input, or
conditional branch.
Consider the input line you're asking about. After all the substitutions we have
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
in our pattern space when we reach P;D;. The P commands outputs the first line, then D deletes the first line and restarts the main loop. Now we just have
N;bA}; # join next line if not a sequence label
Note that this didn't involve reading any additional lines. No input occurred; D just removed parts of the pattern space.
We process the remaining text (which does nothing because none of the other patterns match) until we reach this part of the code:
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
The substitution fails (the pattern space doesn't contain /^). But the t command doesn't check the status of just this one s command; it looks at the history of all substitutions since the most recent input or conditional branch taken.
The most recent input occurred when /\n>/!{s/\n/ /;N;bA}; was read.
The most recent conditional branch taken was
s/\;/\;\n/; t printtopline;
:printtopline
in the original version of your code. Since then no other substitution succeeded, so the t command does nothing. The rest of the program continues as expected.
But in the modified version of your code there was no conditional branch at this point (b is an unconditional branch):
s/\;/\;\n/; b printtopline;
:printtopline
That means the t from s_/^[^/]+/[^;s]*;?_&\n_; t printtopline "sees" the s/\;/\;\n/; as having succeeded, so it immediately jumps to the P;D; part. This is what outputs
N;bA}; # join next line if not a sequence label
unmodified.
In summary: t makes a difference here not because of its immediate effect of jumping to a label, but because it serves as a dynamic delimiter for the next t that gets executed. Without t here, the previously executed s command is taken into account for the next t.

Part 1 - how the P;D; sequence works.
Compare this two command's outputs: sed 's/;/;\n/' and sed 's/;/;\n/; P;D;'.
First:
$ sed 's/;/;\n/' <<< 'one;two;three;four'
one;
two;three;four
Second:
$ sed 's/;/;\n/; P;D;' <<< 'one;two;three;four'
one;
two;
three;
four
Why the difference? I will to explain.
The first command substitutes only the first occurrence of the ; character. To substitute all occurrences, the g modifier should be added to the s command: sed 's/;/;\n/g'.
The second command works this way:
sed 's/;/;\n/; - the same as the first command - no difference. Before this command the pattern space is one;two;three;four, after - one\ntwo;three;four.
P; -
from man: "Print up to the first embedded newline of the current pattern space."
That is, it prints up to first newline - one. The pattern space stay unchanged: one\ntwo;three;four
D; -
from man: "If pattern space contains no newline, start a normal new cycle as if the d command was
issued. Otherwise, delete text in the pattern space up to the first newline, and restart
cycle with the resultant pattern space, without reading a new line of input."
In the our case, pattern space has newline - one\ntwo;three;four. The D; removes the one\n part and repeat all commands cycle from the beginning. Now, the pattern space is: two;three;four.
That is, again sed 's/;/;\n/; - pattern space: two\nthree;four, then P; - print two, pattern space unchanged: two\nthree;four, D; - removes two\n, pattern space becomes: three;four. Etc.
Part 2 - what happening with branching.
I looked at the sed source code and found next information:
When the s command is executing and having match, the replaced flag is setting to the true:
/* We found a match, set the 'replaced' flag. */
replaced = true;
The t command is executing, if the replaced flag is true. And it is changing this flag to the false:
case 't':
if (replaced)
{
replaced = false;
So, in the first, s/\;/\;\n/; t printtopline; case, the substitution is successful - therefore, replaced flag is setting to the true. Then, the following t command is running and changing replaced flag back to the false.
In the second case, without t command - s/\;/\;\n/;, substitution is successful, too - therefore, replaced flag is setting to the true.
But now, this flag is stored to the next cycle, initiated by the D command. So, then the first t command appears in the new cycle - s_/^[^/]+/[^;s]*;?_&\n_; t printtopline, it checks the replaced flag, sees, that the flag is true and jumps to the label :printtopline, omitting all other commands before the label.
The pattern space doesn't have newlines, so P;D; sequence just prints pattern space and starts the next cycle with the new line of input.

Related

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

In a file, I'm having the lines like this -
a.lo a.o: abc/util.c \
/usr/lib/def.h
b.lo b.o: hash/imp.h \
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
high/scan_f.c
Here you can see one extra \ (back slash) at the end of line number 4 (/usr/lib/toy.c ). How can I use sed command to remove this / (back slash)? Because of this I'm getting "*** multiple target patterns. Stop." error.
P.S. - I'm having this extra \ (back slash) at multiple places in my file. So using sed to delete it by line number won't be feasible. Need something which can check for .lo .o and check a line before, if it finds a \ (back slash) remove it.
Maybe not the simplest but this should work:
sed -nE '${s/\\$//;p;};N;s/\\([^\\]*:)/\1/;P;D' input_file
The main idea is to concatenate input lines in the pattern space (a sed internal text buffer), such that it always contains 2 consecutive lines, separated by a newline character. We then just delete the last \ before a :, if any, print the first of the 2 lines and remove it from the pattern space before continuing with the next line.
sed commands are separated by semi-columns (;) and grouped with curly braces ({...}). They are optionally preceded by a line(s) specification, for instance $ that stands for the last line of the input. So, in our case, ${s/\\$//;p;} applies only to the last line while the rest (N;s/\\([^\\]*:)/\1/;P;D) applies to all lines.
The -n option suppresses the default output. We need this to control the output ourselves with the p (print) command.
The -E option enables the use of extended regular expressions.
Let's first explain the tricky part: N;s/\\([^\\]*:)/\1/;P;D. It is a list of 4 commands that are run for each line of the input because there is no line(s) specification before the commands.
When sed starts processing the input the pattern space already contains the first line (a.lo a.o: abc/util.c \ in your example). This is how sed works: by default it puts the current line in the pattern space, applies the commands and restarts with the next line.
N appends the next input line (/usr/lib/def.h) to the pattern space with a newline character as separator. The pattern space now contains:
a.lo a.o: abc/util.c \
/usr/lib/def.h
N also increments the current line number which becomes 2.
s/\\([^\\]*:)/\1/ deletes the last \ before the first : in the pattern space, if there is one. In our example the only \ is after the first :. The pattern space is not modified.
P prints the first part of the pattern space, up to the first newline character. In our example what is printed is:
a.lo a.o: abc/util.c \
D deletes the first part of the pattern space, up to the first newline character (what has just been printed). The pattern space contains:
/usr/lib/def.h
D also starts a new cycle but different from the normal sed processing, it does not read the next line and leaves the pattern space and current line number unmodified. So when restarting the pattern space contains line number 2 and the the current line number is still 2.
By induction we see that, each time sed restarts executing the list of commands, the pattern space contains the current line, as normal. When processing line number 4 of your example it contains:
/usr/lib/toy.c \
After N it contains:
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
And there, the substitution command (s/\\([^\\]*:)/\1/) matches and deletes the first \:
/usr/lib/toy.c
c.lo c.o: high/scan.c \
It is thus:
/usr/lib/toy.c
that is printed and removed from the pattern space. Exactly what you want.
The last line needs a special treatment. When we start processing it the pattern space contains:
high/scan_f.c
If we don't do anything special N does not change it (there is no next line to concatenate) and terminates the processing. The last line is never printed.
This is why another list of commands is needed, just for the last line: ${s/\\$//;p;}. It applies only to the last line because it is preceded by a line(s) specification ($ for last line). The first command in the list (substitute s/\\$//) removes a trailing \, if there is one. The second (p) prints the pattern space.
Note: if you know that the last line does not end with a trailing backslash you can simplify a bit:
sed -nE '$p;N;s/\\([^\\]*:)/\1/;P;D' input_file
I agree with #G.M. in general, but this will work.
sed captures text before trailing "\" (if present) on lines starting with "\" and prints only that text on those lines. All other text is also printed, of course
sed -e 's/\(.* \)\\$/\1/' input_file
The question is a bit unclear about how to identify the lines from which a trailing backslash should be removed, but inasmuch as the input looks like set of a makefile-format prerequisite lists from which some lines have been removed, I take the objective to be to remove backslashes where they appear after the last (remaining) prerequisite in a list. That requires looking ahead to the next line, so it will be helpful to make use of sed's hold space to store data while you look ahead at the next line to figure out what to do with it.
This would be a pretty robust solution for that problem:
sed -nE 's/\s*(\\){0,1}$/ \\/; :a; /:/ { x; s/\s*\\$//; p; d; }; H; $ { s/.*/:/; b a }' input
That builds up each prerequisite list in the hold space, with backslashes and newlines embedded, then dumps it when the next target list or the end of the input arrives.
Details:
the -n option turns off automatically printing the pattern space after each line
the -E option turns on extended regular expressions
the sed expression contains several sub-expressions, joined by semicolons:
s/\s*(\\){0,1}$/ \\/ : ensure that the current line in the pattern space ends with a space and backslash, without adding a second backslash to lines that already have one
:a : labels that point in the script 'a'
/:/ { x; s/\s*\\$//; p; d; } : on lines that contain a colon, swap the pattern and hold spaces, remove the trailing backslash from (the new contents of) the pattern space, print the result, then start the next cycle
H : (if control reaches this point) append a newline and the contents of the pattern space to the hold space
$ { s/.*/:/; b a } : on the last line of input trigger dumping the hold space by putting a colon in the pattern space and jumping to label 'a'
[end of expression] : read the next line into the pattern space and start over
Alternatively, it would more exactly follow your request, and avoid introducing a leading blank line, to do this:
sed -n ':a; /\\$/! { p; d; }; h; :b; $ { x; s/\\//; p; }; n; /:/ { x; s/\\$//; p; x; b a; }; H; /\\$/ b b; s/.*//; x; p' input
That also assembles pieces in the hold space before ultimately printing them, but it goes about it in a different way:
it starts (at label a) by checking whether the line in the pattern space ends with a backslash. If not (/\\$/!), then it prints the pattern space and starts the next cycle.
otherwise, it replaces the current contents of the hold space with the contents of the pattern space (which must already end with a backslash), then
(at label b) if the current line is the last then it retrieves the contents of the hold space, strips the trailing newline, and prints the result ($ { x; s/\\//; p; }). Either way,
it attempts to read the next input line, and terminates if there are no more (n).
if that results in the pattern space containing a colon within, then the contents of the hold space are printed, less trailing backslash, and control is sent back to label a to process the colon-containing line as a new first line (/:/ { x; s/\\$//; p; x; b a; }).
otherwise, a newline and the contents of the pattern space are appended to the hold space (H).
if the pattern space ends with a backslash then control branches back to label b to consider reading another line (/\\$/ b b).
otherwise, the hold space is printed and cleared (s/.*//; x; p), and
if there are any more lines then the next is read and a new cycle started.
That makes fewer assumptions about the nature of the input, but it is a bit more complicated.

Can someone break this sed command down for me?

I found this magical command on the unix forum to move the last line of a file to the beginning of the file. I use sed quite a bit but not to this extent. Can someone explain each part to me?
sed '1h;1d;$!H;$!d;G' infile
Yes, it uses exotic commands.
1h: put first line in the "hold" space (sed has 2 spaces: 1 hold space to keep data and the pattern space: actual processed line)
1d: delete first line
$!H: append all lines BUT the last one (and the first one since d command skips to the next line) into the "hold" space
$!d: delete (do not print) all lines except the last one
G: Append a newline to the contents of the pattern space (this is the last line, the only one able to reach that part of the script), and then append the contents of the hold space to that of the pattern space, pattern space which is printed right away. Swap done.
Opinion based comment: I must admit I would never have thought of doing that using sed, and I would have had to make a test to convince me of what this command was doing... in awk, it is much much easier to do that.
But sed has a special place in my heart with it's cryptic commands. I wonder if there are some sed candidates to CodeGolf :)
reference manual: https://www.gnu.org/software/sed/manual/sed.html
some exotic things you can do with sed (my best 1999 read): http://sed.sourceforge.net/grabbag/tutorials/do_it_with_sed.txt
Here is the same command in a more procedural-looking pseudocode:
for line in infile:
# Always do this: Copy the current line to the pattern
pattern = line
# Process the script
if first line:
hold = pattern # 1h
pattern = ""; continue # 1d
elif not last line:
hold = hold + "\n" + pattern # $!H
pattern = ""; continue # $!d
pattern = pattern + "\n" + hold # G
# Always do this after the script is completed.
# Due to the continue statements above, this
# isn't always reached, and in this case
# is only reached for the last line.
print pattern
d clears the pattern space and continues to the next input line without executing the rest of the script.
h copies the pattern space to the hold space.
H appends a newline to the hold space, then appends the pattern space to the hold space.
G is like H, but in the other direction; it copies the hold space to the pattern space.
The overall affect on a file with N lines is to build up a copy of lines 1 through N-1 in the hold space. When the pattern holds line N, append the hold space to the pattern space and print the pattern space to standard output.

Sed - stop multi-part command if pattern doesn't match?

All, I'm trying to run a sed command to strip out card numbers from certain files. I was trying to do this in a one-liner and I thought all was going well - but I realized that if my first substitute didn't match the pattern it continued into the next commands. Is there a way to get it to exit if there is no match?
We have 16-22 length card numbers on our system, so I wrote this with a variable length in mind. My specifications were to preserve the first 6 and last 4 of any 16+ digit number, and axe (asterisk) out anything in the middle.
sed 'h;s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/;s/./*/g;x;s/\([0-9]\{6\}\)[0-9]*\([0-9]\{4\}\)/\1\2/;G;s/\n//;s/\([0-9]\{6\}\)\([0-9]\{4\}\)\(.*\)/\1\3\2/'
The problem lies in the fact that if this part of the command:
s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/
Finds nothing, the pattern space remains the input. It continues into the next command which then replaces everything with asterisks. What I end up with is the input followed by an equal number of asterisks (if it does not match the "card number qualifications" in my first substitute). It works perfectly if it is what is deemed a possible card number.
Any ideas?
but I realized that if my first substitute didn't match the pattern it
continued into the next commands. Is there a way to get it to exit if
there is no match?
You can use branch commands. I added and commented them in place:
sed '
h;
s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/;
## If last substitution command succeeds, go to label "a".
t a
## Begin next cycle (previous substitution command didn't succeed).
b
## Label "a".
:a
s/./*/g;
x;
s/\([0-9]\{6\}\)[0-9]*\([0-9]\{4\}\)/\1\2/;
G;
s/\n//;
s/\([0-9]\{6\}\)\([0-9]\{4\}\)\(.*\)/\1\3\2/
'
UPDATE due to comments.
So you want to transform
texttexttext111111222223333texttexttext
in
texttexttext111111*****3333texttexttext
Try:
echo "texttexttext111111222223333texttexttext" |
sed -e '
## Add newlines characters between the characters to substitute with "*".
s/\([0-9]\{6\}\)\([0-9]\{5\}\)\([0-9]*\)\([0-9]\{4\}\)/\1\n\2\3\n\4/;
## Label "a".
:a;
## Substitute first not-asterisk character between newlines with "*".
s/\(\n\**\)[^\n]\(.*\n\)/\1*\2/;
## If character before second newline is not an asterisk, repeat
## the substitution from label "a".
/^.*\*\n/! ta;
## Remove artificial newlines.
s/\n//g
## Implicit print.
'
Output:
texttexttext111111*****3333texttexttext
From man sed:
t label
If a s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
T label
If no s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
This is a GNU extension.
So I think you can just add T; after your first s command.

Decipher this sed one-liner

I want to remove duplicate lines from a file, without sorting the file.
Example of why this is useful to me: removing duplicates from Bash's $HISTFILE without changing the chronological order.
This page has a one-liner to do that:
http://sed.sourceforge.net/sed1line.txt
Here's the one-liner:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
I asked a sysadmin and he told me "you just copy the script and it works, don't go philosophising about this", which is fine, so I am asking here as it's a developer forum and I trust people might be like me, suspicious about using things they don't understand:
Could you kindly provide a pseudo-code explanation of what that "black magic" script is doing, please? I tried parsing the incantation in my head but especially the central part is quite hard.
I'll note that this script does not appear to work with my copy of sed (GNU sed 4.1.5) in my current locale. If I run it with LC_ALL=C it works fine.
Here's an annotated version of the script. sed basically has two registers, one is called "pattern space" and is used for (basically) the current input line, and the other, the "hold space", can be used by scripts for temporary storage etc.
sed -n ' # -n: by default, do not print
G # Append hold space to current input line
s/\n/&&/ # Add empty line after current input line
/^\([ -~]*\n\).*\n\1/d # If the current input line is repeated in the hold space, skip this line
# Otherwise, clean up for storing all input in hold space:
s/\n// # Remove empty line after current input line
h # Copy entire pattern space back to hold space
P # Print current input line'
I guess the adding and removal of an empty line is there so that the central pattern can be kept relatively simple (you can count on there being a newline after the current line and before the beginning of the matching line).
So basically, the entire input file (sans duplicates) is kept (in reverse order) in the hold space, and if the first line of the pattern space (the current input line) is found anywhere in the rest of the pattern space (which was copied from the hold space when the script started processing this line), we skip it and start over.
The regex in the conditional can be further decomposed;
^ # Look at beginning of line (i.e. beginning of pattern space)
\( # This starts group \1
[ -~] # Any printable character (in the C locale)
* # Any number of times
\n # Followed by a newline
\) # End of group \1 -- it contains the current input line
.*\n # Skip any amount of lines as necessary
\1 # Another occurrence of the current input line, with newline and all
If this pattern matches, the script discards the pattern space and starts over with the next input line (d).
You can get it to work independently of locale by changing [ -~] to [[:print:]]
The code doesn't work for me, perhaps due to some locale setting, but this does:
vvv
sed -n 'G; s/\n/&&/; /^\([^\n]*\n\).*\n\1/d; s/\n//; h; P'
^^^
Let's first translate this by the book (i.e. sed info page), into something perlish.
# The standard sed loop
my $hold = "";
while ($my pattern = <>) {
chomp $pattern;
$pattern = "$pattern\n$hold"; # G
$pattern =~ s/(\n)/$1$1/; # s/\n/&&/
if ($pattern =~ /^([^\n]*\n).*\n\1/) { # /…/
next; # d
}
$pattern =~ s/\n//; # s/\n//
$hold = $pattern; # h
$pattern =~ /^([^\n]*\n?)/; print $1; # P
}
OK, the basic idea is that the hold space contains all the lines seen so far.
G: At the beginning of each cycle, append that hold space to the current line. Now we have a single string consisting of the current line and all unique lines which preceeded it.
s/\n/&&/: Turn the newline which separates them into a double newline, so that we can match subsequent and non-subsequent duplicates the same, see the next step.
^\([^\n]*\n\).*\n\1/: Look through the current text for the following: at the beginning of all the lines (^) look for a first line including trailing newline (\([^\n]*\n\)), then anything (.*), then a newline (\n), and then that same first line including newline repeated again (\1). If two subsequent lines are the same, then the .* in the regular expression will match the empty string, but the two \n will still match due to the newline duplication in the preceding step. So basically this asks whether the first line appears again among the other lines.
d: If there is a match, this is a duplicate line. We discard this input, keep the hold space as it is as a buffer of all unique lines seen so far, and continue with the next line of input.
s/\n//: Otherwise, we continue and next turn the double newline back into a single newline.
h: We include the current line in our list of all unique lines.
P: And finally print this new unique line, up to the newline character.
For the actual problem to resolve, here is a simpler solution (at least it looks so) with awk:
awk '!_[$0]++' FILE
In short _[$0] is a counter (of appearance) for each unique line, for any line ($0) appearing for the second time _[$0] >= 1, thus !_[$0] evaluates to false, causing it not to be printed except its first time appearance.
See https://gist.github.com/ryenus/5866268 (credit goes to a recent forum I visited.)

What does the 'N' command do in sed?

It looks like the 'N' command works on every other line:
$ cat in.txt
a
b
c
d
$ sed '=;N' in.txt
1
a
b
3
c
d
Maybe that would be natural because command 'N' joins the next line and changes the current line number. But (I saw this here):
$ sed 'N;$!P;$!D;$d' thegeekstuff.txt
The above example deletes the last two lines of a file. This works not only for even-line-numbered files but also for odd-line-numbered files. In this example 'N' command runs on every line. What's the difference?
And could you tell me why I cannot see the last line when I run sed like this:
# sed N odd-lined-file.txt
Excerpt from info sed:
`sed' operates by performing the following cycle on each lines of
input: first, `sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space. Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.
...
When the end of the script is reached, unless the `-n' option is in
use, the contents of pattern space are printed out to the output
stream,
...
Unless special commands (like 'D') are used, the pattern space is
deleted between two cycles
...
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then `sed'
exits without processing any more commands.
...
`D'
Delete text in the pattern space up to the first newline. If any
text is left, restart cycle with the resultant pattern space
(without reading a new line of input), otherwise start a normal
new cycle.
This should pretty much resolve your query. But still I will try to explain your three different cases:
CASE 1:
sed reads a line from input. [Now there is 1 line in pattern space.]
= Prints the current line no.
N reads the next line into pattern space.[Now there are 2 lines in pattern space.]
If there is no next line to read then sed exits here. [ie: In case of odd lines, sed exits here - and hence the last line is swallowed without printing.]
sed prints the pattern space and cleans it. [Pattern space is empty.]
If EOF reached sed exits here. Else Restart the complete cycle from step 1. [ie: In case of even lines, sed exits here.]
Summary: In this case sed reads 2 lines and prints 2 lines at a time. Last line is swallowed it there are odd lines (see step 3).
CASE 2:
sed reads a line from input. [Now there is 1 line in pattern space.]
N reads the next line into pattern space. [Now there are 2 lines in pattern space.]
If it fails exit here. This occurs only if there is 1 line.
If its not last line($!) print the first line(P) from pattern space. [The first line from pattern space is printed. But still there are 2 lines in pattern space.]
If its not last line($!) delete the first line(D) from pattern space [Now there is only 1 line (the second one) in the pattern space.] and restart the command cycle from step 2. And its because of the command D (see the excerpt above).
If its last line($) then delete(d) the complete pattern space. [ie. reached EOF ] [Before beginning this step there were 2 lines in the pattern space which are now cleaned up by d - at the end of this step, the pattern space is empty.]
sed automatically stops at EOF.
Summary: In this case :
sed reads 2 lines first.
if there is next line available to read, print the first line and read the next line.
else delete both lines from cache. This way it always deletes the last 2 line.
CASE 3:
Its the same case as CASE:1, just remove the Step 2 from it.