Delete a paragraph from a file using sed

Delete a paragraph from a file using sed - sed

I have a markdown file that looks something like this:
markdown.md
# Title1
line 1
line 2
line 3
# Title2
line 1
line 2
line 3
I'd like to be able to delete one of the paragraphs by searching for the title. I would need to delete the title, the following line, and then every subsequent line that is not blank.
The desired output would be:
# Title2
line 1
line 2
line 3
I was doing some reading about using {} to group multiple commands together but I can't seem to quite get the syntax right.
cat markdown.md | sed '/^# Title1.*/,+1d {/^\s*$/d}'
My thinking was this would delete the line beginning with '# Title1', then the following line with ,+1d, then subsequent lines until a blank line, but i see the following error:
sed: 1: "/^# Title1.*/,+1d { ...": extra characters at the end of d command
I've tried a few variations but no luck. Any help would be appreciated!

This is the kind of sed puzzle that makes me wish for a slightly different tool.
sed -n -e '/Title1/!{p;d;};n;' -e ':a' -e 'n;/./ba'
Loosely translated: "Don't print anything. If it doesn't contain 'Title1', then all right, print it, then start over with the next line. But if it does contain 'Title1', then grab the next line (which will be blank), enter a loop, and keep grabbing new lines until you come to the next empty line."

Using GNU sed
$ sed -z 's/# Title1[^#]*//' input_file
# Title2
line 1
line 2
line 3

This might work for you (GNU sed):
sed '/^# /h;G;/\n# Title1/!P;d' file
If a line begins # , make a copy.
Append the copy to each line and if that line does not contain \n# Title1, print it.
Delete all lines.
Alternative:
sed '/^# Title1/{:a;N;/\n#/!s/\n//;ta;D}' file

Related

Can I avoid duplicate strings with the sed "a\" command?

Can I avoid duplicate strings with the sed "a" command?
I added the word "apple" under "true" in my file.txt.
The problem is that every time I run the command "apple" is appended.
$ sed -i '/true/a\apple' file.txt ...execute 3 time
$ cat file.txt
true
apple
apple
apple
If the word "apple" already exists, repeating the sed command does not want to add any more.
I have no idea, please help me
...
I want to do this,
...execute sed command anytime
$ cat file.txt
true
apple

It seems you don't want to append the line apple if the line following the true already contains apple. Then this sed command should do the trick.
sed -i.backup '
/true/!b
$!{N;/\napple$/!s/\n/&apple&/;p;d;}
a\
apple
' file.txt
Explanation of sed commands:
If the line doesn't contain true then jump to the end of the script, which will print out the line read (/true/!b).
Otherwise the line contains true:
If it isn't the last line ($!) then• read the next line (N).• If the next line doesn't consist of apple (/\napple$/!) then insert the apple between two lines (s/\n/&apple&/).• Print out the pattern space (p) and start a new cycle (d)
Otherwise it is the last line (and contains true)
Append apple (a\ apple)
Edit:
The above sed script won't work properly if two consecutive true line occurs in the file, as pointed out by #potong. The version below should fix this, if I haven't overlooked something.
sed -i.backup ':a
/true/!b
a\
apple
n
/^apple$/d
ba
' file.txt
Explanation:
/true/!b: If the line doesn't contain true, no further processing is required. Jump to the end of the script. This will print the current pattern space.
a\ apple: Otherwise, the line contains true. Append apple.
n: Print the current pattern space and appended line (apple) and replace the pattern space with the next line. This will end the script if no next line available.
/^apple$/d: If the line read consists of string apple then delete it and start a new cycle (because it is already appended before)
ba: Jump to the start of the script (label a) without reading an input line.

There is no general solution for sed unless the file is sorted. If sorted, the following deletes the duplicate lines:
sed '$!N; /^\(.*\)\n\1$/!P; D'
This was taken from this link: https://www.unix.com/shell-programming-and-scripting/146404-command-remove-duplicate-lines-perl-sed-awk.html

Great answer by M. Nejat Aydin but to make things simpler just add grep:
grep -q apple file.txt || sed -i '/true/a\apple' file.txt

This might work for you (GNU sed):
sed -e ':a;/true/!b;$a apple' -e 'n;/apple/b;i apple' -e 'ba' file
If a line does not contain true just print it.
Otherwise, if it is the last line, append the line apple.
Otherwise, print that line and fetch the next.
If that line contains apple just print it.
Otherwise, insert a line apple and jump to the first sed instruction since the fetched line might be one containing true.
N.B. This uses both the a command (for end of file condition) and the i command for when there is a following line.

Remove line and the one before that matches pattern using sed

I have a file containing the following text:
>seq 1
GAA--ACGAA
>seq 2
CATCTCGGGA
>seq 3
GACG-CG-AG
>seq 4
ATTCCGTGCC
How can I delete the lines containing "-" and the ones before it using sed?
Expected output:
>seq 2
CATCTCGGGA
>seq 4
ATTCCGTGCC
I have tried sed -e '/-/-1d' file, but I get sed: -e expression #1, char 4: unknown command: `-'
Thank you in advance.

Using sed
$ sed 'N;/-/d' input_file
>seq 2
CATCTCGGGA
>seq 4
ATTCCGTGCC

This might work for you (GNU sed):
sed -n 'N;/\n.*-/!{P;D};:a;s/.*//;N;/\n.*-/ba;D' file
Open a two line window and if the second line does not contains - print/delete the first line and repeat.
Otherwise, empty the pattern space and append another line and if the second line does contain - repeat this part again.
Otherwise, delete the first line and repeat.
N.B. This deletes the previous line and any multiples of lines containing -. Also the D deletes upto and including the first newline in the pattern space and the sed cycle omits the automatic reading of the next line into the pattern space if the pattern space is not empty.

sed delete block of lines after pattern1 to pattern2, but not the line matching pattern1 itself?

I am struggling to use sed to work through 'testfile.txt' and every time it encounters a line that starts delete_me: abc it will then:
leave the line delete_me: abc intact
but delete all the lines that follow until the next blank line is reached in the file.
eg. I want this input:
delete_me: abc
sSAsaAaSA
AsaSAsaSAsa
asASAsS
^--- <blank line>
...to be changed to just this one line:
delete_me: abc
I have tried:
sed '/delete_me/ {n;d}' jil_testfile.txt
# deletes only the first line after 'delete_me'
sed '/delete_me/,/^$/d' jil_testfile.txt
# nearly works but deletes the 'delete_me' line too which I want to stay preserved.
Any suggestions please?

This might work for you (GNU sed):
sed -n ':a;/delete_me/{p;:b;n;//ba;bb};p' file
Print lines as normal until the first occurrence of delete_me. Print this line and do not print any further lines unless that line contains delete_me.
As the spec has changed since I wrote the first solution, here is new one:
sed -n '/delete_me/{p;:a;n;/^$/b;ba};p' file

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...

Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...

This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.

Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?

This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines

Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input

I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.

You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close

Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.

A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Delete a paragraph from a file using sed - sed

Using GNU sed $ sed -z 's/# Title1[^#]*//' input_file # Title2 line 1 line 2 line 3

This might work for you (GNU sed): sed '/^# /h;G;/\n# Title1/!P;d' file If a line begins # , make a copy. Append the copy to each line and if that line does not contain \n# Title1, print it. Delete all lines. Alternative: sed '/^# Title1/{:a;N;/\n#/!s/\n//;ta;D}' file

Related

Can I avoid duplicate strings with the sed "a\" command?

Remove line and the one before that matches pattern using sed

sed delete block of lines after pattern1 to pattern2, but not the line matching pattern1 itself?

sed: replace pattern only if followed by empty line

sed: joining lines depending on the second one

Categories

Resources