Does deleting sed pattern space with 'd' erase hold space as well? - sed

Can someone please explain why this is happening?
This is expected:
$ echo -e "foo\nbar" | sed -n 'h; x; p'
foo
bar
I put every line in the hold space, then swap hold space and pattern space, then print the pattern space, so every line is printed. Now, why is the following different?
$ echo -e "foo\nbar" | sed -n 'h; d; x; p'
I thought that wouldn't be, because I delete the pattern space before swapping, so the stored line should be put back to the pattern space anyway. It's the hold space that should be empty after x;, right? I delete the pattern space, then swap. Where does the line I've saved go?

When you use d, the pattern space is cleared, the next line is read, and processing starts over from the beginning of the script. Thus, you never actually reach the x and p steps, instead just copying to the hold space and deleting.

I guess it's related to the following line in man sed:
d Delete pattern space. Start next cycle.
The following works as expected:
$ echo -e "foo\nbar" | sed -n 'h; s/.*//; g; p'
foo
bar
Sorry for bothering you guys.

Related

Sed: find, replace and then append result to original line

I am on Mac, I want to find a pattern in lines, replace it with something, then append the resulting string to the end of the original line. Here is what I tried:
echo "test='123'" | sed -E '/([^a-z])/ s/$/ \1/'
sed: 1: "/([^a-z])/ s/$/ \1/": \1 not defined in the RE
What do I need to define \1? I thought I did it with ([^a-z]). No?
Edit: Perhaps this code will represent better what I want:
1) echo "test='123'" | sed 's/[a-zA-Z0-9]//g'
2) I want the new line = original line + line #1 above
In other words:
Before (what I get): test='123'
After (what I want): test='123' =''
You can edit this command this way:
echo "test='123'" | sed -E 'h;s/([a-zA-Z0-9])//g;G;s/(.*)\n(.*)/\2\1/'
For readability, the script, line by line, reads
h
s/([a-zA-Z0-9])//g
G
s/(.*)\n(.*)/\2\1/
h stores the current line in the hold space,
your s command does what it does
G appends the content of the hold space, i.e. the original line, to the pattern space, i.e. the current line as you have edited it, putting a newline \n in between.
another s command reorders the two pieces, also removing the \n that the G command inserted.
Comments
Your original attempt sed -E '/([^a-z])/ s/$/ \1/' could not work because \1 refers to what is captured by the leftmost (…) group in the search portion of the s command, it does not "remember" the group(s) you used to address the line.
Once you print the pattern space with p, a newline comes with it, and once it's been printed, there's no way you can remove it within the same sed program.

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

Print last "window" with sed

I have a log to process that's roughly structured like this:
...
...
sentinel
marker
...
marker
...
sentinel
marker
...
I want everything between a marker and the following sentinel, and I want the last such "window." The following works ok:
sed -e "1{h;d} ; 2,109{H;d} ; 110{H;g} ; /sentinel/h ; \${g;q} ; N ; D" file.log
Here, 110 is a rough (but consistent within a couple lines) estimate of the space between markers for this log, but I'd have to recompute this estimate for other logs, which is annoying.
I'm wondering if there's a more elegant way to achieve this with sed, i.e. to automatically return the last window between marker and sentinel (I'll also accept an answer that demonstrates why you can't do this in sed).
Thanks.
P.S. I know that could do this in any number of languages, but I'd like to exercise the sed muscles.
This might work for you (GNU sed):
sed '/marker/,/sentinel/{/marker/h;//!H};$!d;x' file
Stash lines between marker and sentinel in the hold space (overwriting old with new) and at the end of the file print whatever is left in the hold space.
EDIT:
The solution above caters for marker and sentinel pairs. If the either of those is likely to be missing then use:
sed '/marker/,/sentinel/H;$!d;x;s/.*\(marker.*sentinel\).*/\1/p;d' file
This saves all marker/sentinel pairs in the hold space and the at end of the file removes all but the last complete pair.
If you know that there are no blank lines in the file, you could do:
sed -e '/^marker$/i\
\
' -e '/^sentinel$/a\
\
' input | awk '/sentinel/{l=$0}END{print l}' RS=
(Not sure I'd call that elegant: basically you are inserting blank lines between the records and letting awk's RS to the hard work. If you cannot guarantee that there are no blank lines, pre/post process the data to ensure that:
sed 's/^/x/' input | sed -e '/^xmarker$/i\
\
' -e '/^sentinel$/a\
\
' | awk '/sentinel/{l=$0}END{print l}' RS= | sed 's/^x//'
(Of course, you could avoid the extra sed by wrapping them into the existing sed and the awk, but the idea is (I think) clearer this way.)

The pattern space and hold space of the Sed utility has an initialized value of null or empty string?

From the documentation of sed:
sed maintains two data buffers: the active pattern space, and the
auxiliary hold space. Both are initially empty.
I initially think the value of pattern space and hold space is null (nothing). But from the following example, it seems that the initially value of them is a single newline character (\n).
[root#localhost ~]# cat e.txt
aa
bb
cc
dd
[root#localhost ~]# cat e.txt | sed -r '/c/{x;p;x}'
aa
bb
cc
dd
[root#localhost ~]#
Is my understanding right?
Thanks.
I think the answer is that the p command, like the default print action, is actually adding a newline to the end of the empty pattern space. This is based on this little snippet from the GNU sed documentation (just below that bit you quote, by the way):
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space.
... blah, blah blah ...
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.
In other words, the line being held in the pattern (and hold) space does not have the trailing newline - the aa line is held as aa rather than aa<newline>.
Of course, the hold space may still contain multiple lines but that just means that executing the H command on the first two lines of your file will give you a hold space containing aa<newline>bb, not aa<newline>bb<newline>.

join 2 consecutive rows under condition

I have 5 lines like:
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
result output would be:
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
Is it possible to use sed or awk for this purpose?
This is easy with awk:
awk -F';' '$1 == prevType { printf("%s;%s;%s\n", $1, prevPoint, $0) } { prevType = $1; prevPoint = $2 }'
I've assumed that the blank lines between the records are not part of the input; if they are, just run the input through grep -v '^$' before awk.
paste could be useful in this case. it could save a lot of codes:
sed '1d' file|paste -d";" file -|awk -F';' '$1==$3'
see the test below
kent$ cat a
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
kent$ sed '1d' a|paste -d";" a -|awk -F';' '$1==$3'
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
This GNU sed solution might work for you:
sed -rn '1{h;b};H;x;/^([^;]*);.*\n\1/!{s/.*\n//;x;d};s/\n/;/p' source_file
Assumes no blank lines else pipe preformat the source file with sed '/^$/d' source_file
EDIT:
On reflection the above solution is far too elaborate and can be condensed to:
sed -ne '1{h;b};H;x;/^\([^;]*\);.*\1/s/\n/;/p' source_file
Explanation:
The -n prevents any lines being implicitly printed. The first line is copied to the hold space (HS an extra register) and then a break is made that ends the iteration. All subsequent lines are appended to the HS. The HS is then swapped with the pattern space (PS - a register holding the current line). The HS at this point contains the previous and current lines which are now checked to see if the first field in each line are identical. If so, the newline separating the two lines is replaced by a ; and providing the substitution occurred the PS is printed out. The next iteration now takes place, the current line refreshes the PS and HS now holds the previous line.