Delete code pattern using sed? - sed

I want to use sed to delete part of code (paragraph) beginning with a pattern and ending with a semicolon (;).
Now I came across an example to delete a paragraph separated by new lines
sed -e '/./{H;$!d;}' -e 'x;/Pattern/!d'
I'm confused how to use semicolon not as a delimiter but as a pattern instead.
Thanks.

Other option is to use the GNU extension of address range.
Next example means: delete everything from a line which begins with pattern until a line ending with semicolon.
sed '/pattern/,/;$/ d' infile
EDIT to comment of Harsh:
Try next sed command:
sed '/^\s*LOG\s*(.*;\s*$/ d ; /^\s*LOG/,/;\s*$/ d' infile
Explanation:
/^\s*LOG\s*(.*;\s*$/ d # Delete line if begins with 'LOG' and ends with semicolon.
/^\s*LOG/,/;\s*$/ d # Delete range of lines between one that begins with LOG and
# other that ends with semicolon.

This might work for you:
cat <<! >file
> a
> b
> ;
> x
> y
> ;
> !
sed '/^[^;]*$/{H;$!d};x;s/;//;/x/!d' file
x
y
Explanation:
For any line the does not have a single ; in it /^[^;]*$/
Append the above line to the hold space (HS) and delete the pattern space (PS) and begin the next iteration unless it is the last line in the file. {H;$!d}
If a line is empty /^$/ or the last line of the file:
Swap to the HS x
Delete the first ; s/;//
Search for pattern (x) and if not found delete the PS /x/!d
N.B. This finds any pattern /x/ to find the beginning pattern use /^x/.
EDIT:
After having seen your data and expected result, this may work for you:
sed '/^\s*LOG(.*);/d;/^\s*LOG(/,/);/d' file

Related

Sed: find, replace and then append result to original line

I am on Mac, I want to find a pattern in lines, replace it with something, then append the resulting string to the end of the original line. Here is what I tried:
echo "test='123'" | sed -E '/([^a-z])/ s/$/ \1/'
sed: 1: "/([^a-z])/ s/$/ \1/": \1 not defined in the RE
What do I need to define \1? I thought I did it with ([^a-z]). No?
Edit: Perhaps this code will represent better what I want:
1) echo "test='123'" | sed 's/[a-zA-Z0-9]//g'
2) I want the new line = original line + line #1 above
In other words:
Before (what I get): test='123'
After (what I want): test='123' =''
You can edit this command this way:
echo "test='123'" | sed -E 'h;s/([a-zA-Z0-9])//g;G;s/(.*)\n(.*)/\2\1/'
For readability, the script, line by line, reads
h
s/([a-zA-Z0-9])//g
G
s/(.*)\n(.*)/\2\1/
h stores the current line in the hold space,
your s command does what it does
G appends the content of the hold space, i.e. the original line, to the pattern space, i.e. the current line as you have edited it, putting a newline \n in between.
another s command reorders the two pieces, also removing the \n that the G command inserted.
Comments
Your original attempt sed -E '/([^a-z])/ s/$/ \1/' could not work because \1 refers to what is captured by the leftmost (…) group in the search portion of the s command, it does not "remember" the group(s) you used to address the line.
Once you print the pattern space with p, a newline comes with it, and once it's been printed, there's no way you can remove it within the same sed program.

How to use sed to remove all trailing Q's, but not remove the last Q if the line is all Q's

I am processing a file from a commercial vendor that uses the letter Q as placeholder at the end of each line. I need to remove all Q's at the end of each line, with the exception that it should never remove everything from a line. If a line is all Q's, it should leave a single Q.
I use this sed code to remove all trailing Q's...
line=$( echo $line | sed 's/Q*$//' )
...but it doesn't handle the case where the line is all Q's, where it should leave 1 Q. I can add the Q back, of course, with this code...
if [ -z "$line" ]; then
line="Q"
fi
...but I want to learn how to handle this case entirely in sed for future reference. Sample outputs:
TESTQQQQQQ --> TEST
QQQ --> Q
Using sed:
sed 's/\(.\)Q*$/\1/'
Roughly, this replaces any single character . with zero or more Q's to the end of line with the single character, using \1, saved via the \( and \). It works because sed is "greedy".
Using awk you could try:
awk '/^Q+$/{print "Q"; next} {gsub(/Q+$/,"")} 1' prueba.txt
I assuming that you only have one word for line, if it's not the case let me know.

Join current and next line, then the next line and its successor using sed

Given the input:
1234
5678
9abc
defg
hijk
I'd like the output:
12345678
56789abc
9abcdefg
defghijk
There are lots of examples using sed(1) to joining a pair of lines, then the next pair after that pair and so on. But I haven't found an example that joins lines 1 with 2, 2 with 3, 3 with 4, ...
sed(1) solution preferred. Other options are less interesting - e.g., awk(1), python(1) and perl(1) implementations are fairly easy. I'm specifically stumped on a successful sed(1) incantation.
sed '1h;1d;x;G;s/\n//'
I guess it can be done some other way, but this works for me:
$ cat in
1234
5678
9abc
defg
hijk
$ sed '1h;1d;x;G;s/\n//' in
12345678
56789abc
9abcdefg
defghijk
How it works: we put first line to hold space and that's it for first line. Every line after the first - swap it with hold space, append the new hold space to the old hold space, remove newline.
This does it (now improved, thanks to potong's hint):
$ sed -n 'N;s/\n\(.*\)/\1&/;P;D' infile
12345678
56789abc
9abcdefg
defghijk
In detail:
N # Append next line to pattern space
s/\n\(.*\)/\1&/ # Make 111\n222 into 111222\n222
P # Print up to first newline
D # Delete up to first newline
The substitution makes these two lines
1111
2222
which in the pattern space look like 1111\n2222 into
11112222
2222
and the P and D print/delete the first line from the pattern space.
Notice that we never hit the bottom of the script (D starts a new loop) until the very last line, where N can't fetch a new line and would just print the last line on its own, if we didn't suppress that with -n.
Tweaking another answer (full credit to #aragaer) to handle single line input (and be more portable to bsd sed as well as gnu sed than the original version - update: that answer has been edited another way for portability):
% cat >> inputfile << eof
12
34
56
eof
% sed -e '1{$p;h;d' -e '}' -e 'x;G;s/\n//' inputfile # bsd + gnu sed [1]
1234
3456
or
% cat joinsuccessive.sed
1{
$p;h;d
}
x;G;s/\n//
% sed -f joinsuccessive.sed inputfile
1234
3456
Here's an annotated version.
1{ # special case for first line only:
$p # even MORE special case: print current line for input with
# only a single line
h # add line 1 to hold space (for joining with successive lines)
d # delete pattern space and move to next line (without printing)
}
x # for lines 2+, swap pattern space (current line) and hold space
G # add newline + hold space (now has current line) to pattern space
# (previous line) giving prev line, newline, curr line in pattern
# space (and curr line is in hold space)
s/\n// # remove newline added by G (between lines) before printing the
# pattern space
[1] bsd sed(1) wants a closing brace to be on a line by itself. Use -e to "build" the script or put the commands in a sed script file (and use -f joinsuccessive.sed).

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3