The pattern space and hold space of the Sed utility has an initialized value of null or empty string? - sed

From the documentation of sed:
sed maintains two data buffers: the active pattern space, and the
auxiliary hold space. Both are initially empty.
I initially think the value of pattern space and hold space is null (nothing). But from the following example, it seems that the initially value of them is a single newline character (\n).
[root#localhost ~]# cat e.txt
aa
bb
cc
dd
[root#localhost ~]# cat e.txt | sed -r '/c/{x;p;x}'
aa
bb
cc
dd
[root#localhost ~]#
Is my understanding right?
Thanks.

I think the answer is that the p command, like the default print action, is actually adding a newline to the end of the empty pattern space. This is based on this little snippet from the GNU sed documentation (just below that bit you quote, by the way):
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space.
... blah, blah blah ...
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.
In other words, the line being held in the pattern (and hold) space does not have the trailing newline - the aa line is held as aa rather than aa<newline>.
Of course, the hold space may still contain multiple lines but that just means that executing the H command on the first two lines of your file will give you a hold space containing aa<newline>bb, not aa<newline>bb<newline>.

Related

how to type the beginning or end in sed multiple-lines mode?

As we all knew,the "\‘" and "\’"
indicates the beginning or end respectively in multiple-lines mode.But under ASCII(or input-in-english) only "'" exists.
How to type the beginning?
This might work for you (GNU sed):
seq 3 | sed -n 'p;H;1h;$!d;g;l0
s/^.*$/ALL/mgp
s/\`.*$/START/mp
s/^.*\'\''/END/mp'
1
2
3
1\n2\n3$
ALL
ALL
ALL
START
ALL
ALL
START
ALL
END
The command seq generates a file of three consecutive integers.
The sed uses the -n option to turn off implicit printing and then slurps the three integers into hold space. Printing each integer as it is read.
The first substitution, replace all lines with the literal ALL.
The second substitution, replaces the first line with START.
The third substitution, replace the last line with END.
N.B. The use of the m(multiline), g(global) and p(print) substitution flags. Lastly, if the -z option is in use, these zero width anchors work with respect to null characters not newlines.

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

In a file, I'm having the lines like this -
a.lo a.o: abc/util.c \
/usr/lib/def.h
b.lo b.o: hash/imp.h \
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
high/scan_f.c
Here you can see one extra \ (back slash) at the end of line number 4 (/usr/lib/toy.c ). How can I use sed command to remove this / (back slash)? Because of this I'm getting "*** multiple target patterns. Stop." error.
P.S. - I'm having this extra \ (back slash) at multiple places in my file. So using sed to delete it by line number won't be feasible. Need something which can check for .lo .o and check a line before, if it finds a \ (back slash) remove it.
Maybe not the simplest but this should work:
sed -nE '${s/\\$//;p;};N;s/\\([^\\]*:)/\1/;P;D' input_file
The main idea is to concatenate input lines in the pattern space (a sed internal text buffer), such that it always contains 2 consecutive lines, separated by a newline character. We then just delete the last \ before a :, if any, print the first of the 2 lines and remove it from the pattern space before continuing with the next line.
sed commands are separated by semi-columns (;) and grouped with curly braces ({...}). They are optionally preceded by a line(s) specification, for instance $ that stands for the last line of the input. So, in our case, ${s/\\$//;p;} applies only to the last line while the rest (N;s/\\([^\\]*:)/\1/;P;D) applies to all lines.
The -n option suppresses the default output. We need this to control the output ourselves with the p (print) command.
The -E option enables the use of extended regular expressions.
Let's first explain the tricky part: N;s/\\([^\\]*:)/\1/;P;D. It is a list of 4 commands that are run for each line of the input because there is no line(s) specification before the commands.
When sed starts processing the input the pattern space already contains the first line (a.lo a.o: abc/util.c \ in your example). This is how sed works: by default it puts the current line in the pattern space, applies the commands and restarts with the next line.
N appends the next input line (/usr/lib/def.h) to the pattern space with a newline character as separator. The pattern space now contains:
a.lo a.o: abc/util.c \
/usr/lib/def.h
N also increments the current line number which becomes 2.
s/\\([^\\]*:)/\1/ deletes the last \ before the first : in the pattern space, if there is one. In our example the only \ is after the first :. The pattern space is not modified.
P prints the first part of the pattern space, up to the first newline character. In our example what is printed is:
a.lo a.o: abc/util.c \
D deletes the first part of the pattern space, up to the first newline character (what has just been printed). The pattern space contains:
/usr/lib/def.h
D also starts a new cycle but different from the normal sed processing, it does not read the next line and leaves the pattern space and current line number unmodified. So when restarting the pattern space contains line number 2 and the the current line number is still 2.
By induction we see that, each time sed restarts executing the list of commands, the pattern space contains the current line, as normal. When processing line number 4 of your example it contains:
/usr/lib/toy.c \
After N it contains:
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
And there, the substitution command (s/\\([^\\]*:)/\1/) matches and deletes the first \:
/usr/lib/toy.c
c.lo c.o: high/scan.c \
It is thus:
/usr/lib/toy.c
that is printed and removed from the pattern space. Exactly what you want.
The last line needs a special treatment. When we start processing it the pattern space contains:
high/scan_f.c
If we don't do anything special N does not change it (there is no next line to concatenate) and terminates the processing. The last line is never printed.
This is why another list of commands is needed, just for the last line: ${s/\\$//;p;}. It applies only to the last line because it is preceded by a line(s) specification ($ for last line). The first command in the list (substitute s/\\$//) removes a trailing \, if there is one. The second (p) prints the pattern space.
Note: if you know that the last line does not end with a trailing backslash you can simplify a bit:
sed -nE '$p;N;s/\\([^\\]*:)/\1/;P;D' input_file
I agree with #G.M. in general, but this will work.
sed captures text before trailing "\" (if present) on lines starting with "\" and prints only that text on those lines. All other text is also printed, of course
sed -e 's/\(.* \)\\$/\1/' input_file
The question is a bit unclear about how to identify the lines from which a trailing backslash should be removed, but inasmuch as the input looks like set of a makefile-format prerequisite lists from which some lines have been removed, I take the objective to be to remove backslashes where they appear after the last (remaining) prerequisite in a list. That requires looking ahead to the next line, so it will be helpful to make use of sed's hold space to store data while you look ahead at the next line to figure out what to do with it.
This would be a pretty robust solution for that problem:
sed -nE 's/\s*(\\){0,1}$/ \\/; :a; /:/ { x; s/\s*\\$//; p; d; }; H; $ { s/.*/:/; b a }' input
That builds up each prerequisite list in the hold space, with backslashes and newlines embedded, then dumps it when the next target list or the end of the input arrives.
Details:
the -n option turns off automatically printing the pattern space after each line
the -E option turns on extended regular expressions
the sed expression contains several sub-expressions, joined by semicolons:
s/\s*(\\){0,1}$/ \\/ : ensure that the current line in the pattern space ends with a space and backslash, without adding a second backslash to lines that already have one
:a : labels that point in the script 'a'
/:/ { x; s/\s*\\$//; p; d; } : on lines that contain a colon, swap the pattern and hold spaces, remove the trailing backslash from (the new contents of) the pattern space, print the result, then start the next cycle
H : (if control reaches this point) append a newline and the contents of the pattern space to the hold space
$ { s/.*/:/; b a } : on the last line of input trigger dumping the hold space by putting a colon in the pattern space and jumping to label 'a'
[end of expression] : read the next line into the pattern space and start over
Alternatively, it would more exactly follow your request, and avoid introducing a leading blank line, to do this:
sed -n ':a; /\\$/! { p; d; }; h; :b; $ { x; s/\\//; p; }; n; /:/ { x; s/\\$//; p; x; b a; }; H; /\\$/ b b; s/.*//; x; p' input
That also assembles pieces in the hold space before ultimately printing them, but it goes about it in a different way:
it starts (at label a) by checking whether the line in the pattern space ends with a backslash. If not (/\\$/!), then it prints the pattern space and starts the next cycle.
otherwise, it replaces the current contents of the hold space with the contents of the pattern space (which must already end with a backslash), then
(at label b) if the current line is the last then it retrieves the contents of the hold space, strips the trailing newline, and prints the result ($ { x; s/\\//; p; }). Either way,
it attempts to read the next input line, and terminates if there are no more (n).
if that results in the pattern space containing a colon within, then the contents of the hold space are printed, less trailing backslash, and control is sent back to label a to process the colon-containing line as a new first line (/:/ { x; s/\\$//; p; x; b a; }).
otherwise, a newline and the contents of the pattern space are appended to the hold space (H).
if the pattern space ends with a backslash then control branches back to label b to consider reading another line (/\\$/ b b).
otherwise, the hold space is printed and cleared (s/.*//; x; p), and
if there are any more lines then the next is read and a new cycle started.
That makes fewer assumptions about the nature of the input, but it is a bit more complicated.

Sed: find, replace and then append result to original line

I am on Mac, I want to find a pattern in lines, replace it with something, then append the resulting string to the end of the original line. Here is what I tried:
echo "test='123'" | sed -E '/([^a-z])/ s/$/ \1/'
sed: 1: "/([^a-z])/ s/$/ \1/": \1 not defined in the RE
What do I need to define \1? I thought I did it with ([^a-z]). No?
Edit: Perhaps this code will represent better what I want:
1) echo "test='123'" | sed 's/[a-zA-Z0-9]//g'
2) I want the new line = original line + line #1 above
In other words:
Before (what I get): test='123'
After (what I want): test='123' =''
You can edit this command this way:
echo "test='123'" | sed -E 'h;s/([a-zA-Z0-9])//g;G;s/(.*)\n(.*)/\2\1/'
For readability, the script, line by line, reads
h
s/([a-zA-Z0-9])//g
G
s/(.*)\n(.*)/\2\1/
h stores the current line in the hold space,
your s command does what it does
G appends the content of the hold space, i.e. the original line, to the pattern space, i.e. the current line as you have edited it, putting a newline \n in between.
another s command reorders the two pieces, also removing the \n that the G command inserted.
Comments
Your original attempt sed -E '/([^a-z])/ s/$/ \1/' could not work because \1 refers to what is captured by the leftmost (…) group in the search portion of the s command, it does not "remember" the group(s) you used to address the line.
Once you print the pattern space with p, a newline comes with it, and once it's been printed, there's no way you can remove it within the same sed program.

Delete string after '#' using sed

I have a text file that looks like:
#filelists.txt
a
# aaa
b
#bbb
c #ccc
I want to delete parts of lines starting with '#' and afterwards, if line starts with #, then to delete whole line.
So I use 'sed' command in my shell:
sed -e "s/#*//g" -e "/^$/d" filelists.txt
I wish its result is:
a
b
c
but actually result is:
filelists.txt
a
aaa
b
bbb
c ccc
What's wrong in my "sed" command?
I know '*' which means "any", so I think that '#*' means string after "#".
Isn't it?
You may use
sed 's/#.*//;/^$/d' file > outfile
The s/#.*// removes # and all the rest of the line and /^$/d drops empty lines.
See an online test:
s="#filelists.txt
a
# aaa
b
#bbb
c #ccc"
sed 's/#.*//;/^$/d' <<< "$s"
Output:
a
b
c
Another idea: match lines having #, then remove # and the rest of the line there and drop if the line is empty:
sed '/#/{s/#.*//;/^$/d}' file > outfile
See another online demo.
This way, you keep the original empty lines.
* does not mean "any" (at least not in regular expression context). * means "zero or more of the preceding pattern element". Which means you are deleting "zero or more #". Since you only have one #, you delete it, and the rest of the line is intact.
You need s/#.*//: "delete # followed by zero or more of any character".
EDIT: was suggesting grep -v, but didn't notice the third example (# in the middle of the line).

Join current and next line, then the next line and its successor using sed

Given the input:
1234
5678
9abc
defg
hijk
I'd like the output:
12345678
56789abc
9abcdefg
defghijk
There are lots of examples using sed(1) to joining a pair of lines, then the next pair after that pair and so on. But I haven't found an example that joins lines 1 with 2, 2 with 3, 3 with 4, ...
sed(1) solution preferred. Other options are less interesting - e.g., awk(1), python(1) and perl(1) implementations are fairly easy. I'm specifically stumped on a successful sed(1) incantation.
sed '1h;1d;x;G;s/\n//'
I guess it can be done some other way, but this works for me:
$ cat in
1234
5678
9abc
defg
hijk
$ sed '1h;1d;x;G;s/\n//' in
12345678
56789abc
9abcdefg
defghijk
How it works: we put first line to hold space and that's it for first line. Every line after the first - swap it with hold space, append the new hold space to the old hold space, remove newline.
This does it (now improved, thanks to potong's hint):
$ sed -n 'N;s/\n\(.*\)/\1&/;P;D' infile
12345678
56789abc
9abcdefg
defghijk
In detail:
N # Append next line to pattern space
s/\n\(.*\)/\1&/ # Make 111\n222 into 111222\n222
P # Print up to first newline
D # Delete up to first newline
The substitution makes these two lines
1111
2222
which in the pattern space look like 1111\n2222 into
11112222
2222
and the P and D print/delete the first line from the pattern space.
Notice that we never hit the bottom of the script (D starts a new loop) until the very last line, where N can't fetch a new line and would just print the last line on its own, if we didn't suppress that with -n.
Tweaking another answer (full credit to #aragaer) to handle single line input (and be more portable to bsd sed as well as gnu sed than the original version - update: that answer has been edited another way for portability):
% cat >> inputfile << eof
12
34
56
eof
% sed -e '1{$p;h;d' -e '}' -e 'x;G;s/\n//' inputfile # bsd + gnu sed [1]
1234
3456
or
% cat joinsuccessive.sed
1{
$p;h;d
}
x;G;s/\n//
% sed -f joinsuccessive.sed inputfile
1234
3456
Here's an annotated version.
1{ # special case for first line only:
$p # even MORE special case: print current line for input with
# only a single line
h # add line 1 to hold space (for joining with successive lines)
d # delete pattern space and move to next line (without printing)
}
x # for lines 2+, swap pattern space (current line) and hold space
G # add newline + hold space (now has current line) to pattern space
# (previous line) giving prev line, newline, curr line in pattern
# space (and curr line is in hold space)
s/\n// # remove newline added by G (between lines) before printing the
# pattern space
[1] bsd sed(1) wants a closing brace to be on a line by itself. Use -e to "build" the script or put the commands in a sed script file (and use -f joinsuccessive.sed).