Remove line and the one before that matches pattern using sed - sed

I have a file containing the following text:
>seq 1
GAA--ACGAA
>seq 2
CATCTCGGGA
>seq 3
GACG-CG-AG
>seq 4
ATTCCGTGCC
How can I delete the lines containing "-" and the ones before it using sed?
Expected output:
>seq 2
CATCTCGGGA
>seq 4
ATTCCGTGCC
I have tried sed -e '/-/-1d' file, but I get sed: -e expression #1, char 4: unknown command: `-'
Thank you in advance.

Using sed
$ sed 'N;/-/d' input_file
>seq 2
CATCTCGGGA
>seq 4
ATTCCGTGCC

This might work for you (GNU sed):
sed -n 'N;/\n.*-/!{P;D};:a;s/.*//;N;/\n.*-/ba;D' file
Open a two line window and if the second line does not contains - print/delete the first line and repeat.
Otherwise, empty the pattern space and append another line and if the second line does contain - repeat this part again.
Otherwise, delete the first line and repeat.
N.B. This deletes the previous line and any multiples of lines containing -. Also the D deletes upto and including the first newline in the pattern space and the sed cycle omits the automatic reading of the next line into the pattern space if the pattern space is not empty.

Related

Delete a paragraph from a file using sed

I have a markdown file that looks something like this:
markdown.md
# Title1
line 1
line 2
line 3
# Title2
line 1
line 2
line 3
I'd like to be able to delete one of the paragraphs by searching for the title. I would need to delete the title, the following line, and then every subsequent line that is not blank.
The desired output would be:
# Title2
line 1
line 2
line 3
I was doing some reading about using {} to group multiple commands together but I can't seem to quite get the syntax right.
cat markdown.md | sed '/^# Title1.*/,+1d {/^\s*$/d}'
My thinking was this would delete the line beginning with '# Title1', then the following line with ,+1d, then subsequent lines until a blank line, but i see the following error:
sed: 1: "/^# Title1.*/,+1d { ...": extra characters at the end of d command
I've tried a few variations but no luck. Any help would be appreciated!
This is the kind of sed puzzle that makes me wish for a slightly different tool.
sed -n -e '/Title1/!{p;d;};n;' -e ':a' -e 'n;/./ba'
Loosely translated: "Don't print anything. If it doesn't contain 'Title1', then all right, print it, then start over with the next line. But if it does contain 'Title1', then grab the next line (which will be blank), enter a loop, and keep grabbing new lines until you come to the next empty line."
Using GNU sed
$ sed -z 's/# Title1[^#]*//' input_file
# Title2
line 1
line 2
line 3
This might work for you (GNU sed):
sed '/^# /h;G;/\n# Title1/!P;d' file
If a line begins # , make a copy.
Append the copy to each line and if that line does not contain \n# Title1, print it.
Delete all lines.
Alternative:
sed '/^# Title1/{:a;N;/\n#/!s/\n//;ta;D}' file

sed delete block of lines after pattern1 to pattern2, but not the line matching pattern1 itself?

I am struggling to use sed to work through 'testfile.txt' and every time it encounters a line that starts delete_me: abc it will then:
leave the line delete_me: abc intact
but delete all the lines that follow until the next blank line is reached in the file.
eg. I want this input:
delete_me: abc
sSAsaAaSA
AsaSAsaSAsa
asASAsS
^--- <blank line>
...to be changed to just this one line:
delete_me: abc
I have tried:
sed '/delete_me/ {n;d}' jil_testfile.txt
# deletes only the first line after 'delete_me'
sed '/delete_me/,/^$/d' jil_testfile.txt
# nearly works but deletes the 'delete_me' line too which I want to stay preserved.
Any suggestions please?
This might work for you (GNU sed):
sed -n ':a;/delete_me/{p;:b;n;//ba;bb};p' file
Print lines as normal until the first occurrence of delete_me. Print this line and do not print any further lines unless that line contains delete_me.
As the spec has changed since I wrote the first solution, here is new one:
sed -n '/delete_me/{p;:a;n;/^$/b;ba};p' file

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.
If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.

SED Command to remove first digits and spaces of each line

I have a simple text file in below format.
1 12658003Y
2 34345345N
3 34653785Y
4 36452342N
5 86747488Y
6 34634543Y
so on
10 37456338Y
11 33535555Y
12 37456378Y
so on
100 23432434Y
As you can see there are two white spaces after first number.
I'm trying to write SED command to remove the digits before whitespaces. Is there any SED command to remove spaces and number before spaces?
Output file should look like below.
12658003Y
34345345N
34653785Y
36452342N
so on..
Please assist. I'm very new to shell scripting.
sed 's/[0-9]\+\s\+//' infile > outfile
Explanation:
s: we want to use substitution
/: mark start and end of the expression we want to match
[0-9]: match any digit
+: match the previous one or more time
\s: space
+: match the previous one or more time
/: mark start of what we want to change our matches to (which is nothing)
/: some special operators goes after this (we use no such)
infile: the file we want to change
>: pipe stdout to
outfile: where we want to store output
Your sed command would be,
sed 's/.* //g' file
This would remove the first numbers along with the space followed.
Remove leading digits, then following spaces:
sed 's/^[0-9]* *//' file
sed 's/^[0-9]*[ ]*//g' input.txt

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3