sed: Why does command q add a new line? - sed

Suppose the following command
echo -en "abc1\ndef2\nghi1" | sed -n 'p; d;'
In this case the output is just the same as it would be without sed at all. So the last line still has no new line character. Next command
echo -en "abc1\ndef2\nghi1" | sed -n '$! {p; d;}; /1$/ {s/1$//; p; d;}'
sed prints all but the last line without modification. The last line is shortened by one character. Still there is no new line character on the last line. Next command
echo -en "abc1\ndef2\nghi1" | sed -n '$! {p; d;}; /1$/ {s/1$//; p; q1;}'
("d" replaced by "q1" in the last command block. Same output as before, but this time there is an additional new line character in the last line.
Why?
How to fix?
(For those who are interested in the intention for this command: Given a certain STDIN, I want to scan for the last character, pass on STDIN to STDOUT without this last character and set an exit code based on that character. There should no other modification. sed seems to be perfect, if there wouldn't be this newline problem
sed -n '
$! {p; d;}; #print any non last line, do next cycle
/0$/ {s/0$//; p; d}; #last line ending with 0? Remove 0, print, next cycle
/1$/ {s/1$//; p; d}; #last line ending with 1? Remove 1, print, next cycle
{p} #fall back, print last line
'
So far this script works perfect regarding to the newline issue. No new line is added. Now if i replace the "d" command with "q"
sed -n '
$! {p; d;}; #print any non last line, do next cycle
/0$/ {s/0$//; p; q0}; #last line ending with 0? Remove 0, print, exit 0
/1$/ {s/1$//; p; q1}; #last line ending with 1? Remove 1, print, exit 1
{p} #fall back, print last line
'
the newline problem suddenly arise...
Other solutions are welcome, they should be as fast as possible.

I guess this is a bug. According to the manual, q should not print the pattern space if auto-print is disabled. Thus, it should not print anything. Since you are already using GNU sed, you could avoid this problem by using Q instead of q. At least, this works for me (version 4.2.2).

Related

sed/awk conditionally delete lines from the start and end of a file

I have several thousand text files which might start with
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
perl is also ok
my attempt would be something like this with fish shell. awk is probably more performant though
if head -1 | grep \"
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
that might actually work I'm going to try it
The simplest way to do this is a 2-pass approach where on the first pass you figure out the beginning and ending line numbers for the "good" lines and on the second you print the lines between those numbers:
awk '
NR==FNR { if (NF && !/^"$/) { if (!beg) beg=NR; end=NR } next }
(beg <= FNR) && (FNR <= end)
' file file
For example given this input:
$ cat file
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
We can do the following using any awk in any shell on every UNIX box:
$ awk 'NR==FNR{if (NF && !/^"$/) {if (!beg) beg=NR; end=NR} next} (beg <= FNR) && (FNR <= end)' file file
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
You can use ed to do it in a single pass, too:
Something like
printf '%s\n' '1g/^"$/.,/^./-1d' '$g/^"$/?^.?+1,$d' w | ed -s "$file"
Translated: If the first line is nothing but a quote, delete it and any following empty lines. If the last line is nothing but a quote, delete all preceding empty lines and it. Finally write the file back to disk.
This might work for you (GNU sed):
sed '1{/^"$/d};/\S/!d;:a;${/^"$/Md};/\S/{n;ba};$d;N;ba' file
Delete the first line if contains a single ".
Delete all empty lines from the start of the file.
Form a loop for the remainder of the file.
Delete the last line(s) if it/they contains a single ".
If the current line(s) is/are not empty, print it/them, fetch the next and repeat.
If the current line(s) is/are the last and empty, delete it/them.
The current line(s) is/are empty so append the next line and repeat.
N.B. This is a single pass solution and allows for empty lines within the body of the file.
Alternative, memory intensive:
sed -Ez 's/^"?\n+//;s/\n+("\n)?$/\n/' file
In addition to the two-pass processing, here's a one-pass:
awk '!/^"*$/{print b $0;f=1;b=""} f&&/^"*$/{b=b $0 ORS}' file
The program consists of two small parts:
Whenever there's content (lines that contain more than "), print possibly buffered lines and the current input line, set a flag that content has started, and clear the buffer.
If content had started (f), but the current line doesn't contain any content, we may have reached the end, so we buffer these empty lines. Later, (1) will print them or they will be discarded on EOF.

How does this sed command: "sed -e :a -e '$d;N;2,10ba' -e 'P;D' " work?

I saw a sed command to delete the last 10 rows of data:
sed -e :a -e '$d;N;2,10ba' -e 'P;D'
But I don't understand how it works. Can someone explain it for me?
UPDATE:
Here is my understanding of this command:
The first script indicates that a label “a” is defined.
The second script indicates that it first determines whether the
line currently reading pattern space is the last line. If it is,
execute the "d" command to delete it and restart the next cycle; if
not, skip the "d" command; then execute "N" command: append a new
line from the input file to the pattern space, and then execute
"2,10ba": if the line currently reading the pattern space is a line
in the 2nd to 10th lines, jump to label "a".
The third script indicates that if the line currently read into
pattern space is not a line from line 2 to line 10, first execute "P" command: the first line
in pattern space is printed, and then execute "D" command: the first line in pattern
space is deleted.
My understanding of "$d" is that "d" will be executed when sed reads the last line into the pattern space. But it seems that every time "ba" is executed, "d" will be executed, regardless of Whether the current line read into pattern space is the last line. why?
:a is a label. $ in the address means the last line, d means delete. N stands for append the next line into the pattern space. 2,10 means lines 2 to 10, b means branch (i.e. goto), P prints the first line from the pattern space, D is like d but operates on the pattern space if possible.
In other words, you create a sliding window of the size 10. Each line is stored into it, and once it has 10 lines, lines start to get printed from the top of it. Every time a line is printed, the current line is stored in the sliding window at the bottom. When the last line gets printed, the sliding window is deleted, which removes the last 10 lines.
You can modify the commands to see what's getting deleted (()), stored (<>), and printed by the P ([]):
$ printf '%s\n' {1..20} | \
sed -e ':a ${s/^/(/;s/$/)/;p;d};s/^/</;s/$/>/;N;2,10ba;s/^/[/;s/$/]/;P;D'
[<<<<<<<<<<1>
[<2>
[<3>
[<4>
[<5>
[<6>
[<7>
[<8>
[<9>
[<10>
(11]>
12]>
13]>
14]>
15]>
16]>
17]>
18]>
19]>
20])
a simpler resort, if your data in 'd' file by gnu sed,
sed -Ez 's/(.*\n)(.*\n){10}$/\1/' d
^
pointed 10 is number of last line to remove
just move the brace group to invert, ie. to get only the last 10 lines
sed -Ez 's/.*\n((.*\n){10})$/\1/' d

How to avoid the last newline in sed?

I want to remove the last part of a file, starting at a line following a certain pattern and including the preceding newline.
So, stopping at "STOP", the following file:
keep\n
STOP\n
whatever
Should output:
keep
With no trailing newline.
I tried this, and the logic seems to work, but it seems that sed adds a newline every time it prints its buffer. How can I avoid that? When sed doesn't manipulate the buffer, I don't have that problem (IE If I remove the STOP, sed outputs 'whatever' at the end of the file without a newline).
printf 'keep
STOP
Whatever' | sed 'N
/\nSTOP/ {
s/\n.*$//
P
Q
}
P
D'
I'm trying to write a git cleaning filter, and I cannot have a new newline appended every time I commit.
$ awk '/^STOP/{exit} {printf "%s%s", ors, $0; ors=RS}' file
keep$
The above prints every line without a trailing newline but preceded by a newline (\n or \r\n - whichever your environment dictates so it'll behave correctly on UNIX or Windows or whatever) for every 2nd and subsequent line. When it finds a STOP line it just exits before printing anything.
Note that the above doesn't keep anything in memory except the current line so it'll work no matter how large your input file is and no matter where the STOP appears in it - it'll even work if STOP is the first line of the file unlike the other answers you have so far.
It will also work using any awk in any shell on every UNIX box.
This might work for you (GNU sed):
sed -z 's/\nSTOP.*//' file
The -z option slurps the whole file into memory and the substitute command, removes the remainder of the file from the first newline followed by STOP.
Using awk you could:
$ awk '$0=="STOP"{exit} {b=b (b==""?"":ORS) $0} END{printf "%s",b}' file
Output:
keep$
Explained:
$ awk '
$0=="STOP" { exit } # exit at STOP, ie. go to END
{ b=b (b==""?"":ORS) $0 } # gather an output buffer, control \n
END { printf "%s",b } # in the END output output buffer
' file
... more (focusing a bit on the conditional operator):
b=b # appending to b, so b is b and ...
(b==""?"":ORS) # if b was empty, add nothing to it, if not add ORS ie. \n ...
$0 # and the current record

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

join 2 consecutive rows under condition

I have 5 lines like:
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
result output would be:
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
Is it possible to use sed or awk for this purpose?
This is easy with awk:
awk -F';' '$1 == prevType { printf("%s;%s;%s\n", $1, prevPoint, $0) } { prevType = $1; prevPoint = $2 }'
I've assumed that the blank lines between the records are not part of the input; if they are, just run the input through grep -v '^$' before awk.
paste could be useful in this case. it could save a lot of codes:
sed '1d' file|paste -d";" file -|awk -F';' '$1==$3'
see the test below
kent$ cat a
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
kent$ sed '1d' a|paste -d";" a -|awk -F';' '$1==$3'
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
This GNU sed solution might work for you:
sed -rn '1{h;b};H;x;/^([^;]*);.*\n\1/!{s/.*\n//;x;d};s/\n/;/p' source_file
Assumes no blank lines else pipe preformat the source file with sed '/^$/d' source_file
EDIT:
On reflection the above solution is far too elaborate and can be condensed to:
sed -ne '1{h;b};H;x;/^\([^;]*\);.*\1/s/\n/;/p' source_file
Explanation:
The -n prevents any lines being implicitly printed. The first line is copied to the hold space (HS an extra register) and then a break is made that ends the iteration. All subsequent lines are appended to the HS. The HS is then swapped with the pattern space (PS - a register holding the current line). The HS at this point contains the previous and current lines which are now checked to see if the first field in each line are identical. If so, the newline separating the two lines is replaced by a ; and providing the substitution occurred the PS is printed out. The next iteration now takes place, the current line refreshes the PS and HS now holds the previous line.