Use sed to delete a matched regexp and the line (or two) underneath it - sed

OK I found this question:
How do I delete a matching line, the line above and the one below it, using sed?
and just spent the last hour trying to write something that will match a string and delete the line containing the string and the line beneath it (or a variant - delete 2 lines beneath it).
I feel I'm now typing random strings. Please somebody help me.

If I've understood that correctly, to delete match line and one line after
/matchstr/{N;d;}
Match line and two lines after
/matchstr/{N;N;d;}
N brings in the next line
d - deletes the resulting single line

you can use awk. eg search for the word "two" and skip 2 lines after it
$ cat file
one
two
three
four
five
six
seven
eight
$ awk -vnum=2 '/two/{for(i=0;i<=num;i++)getline}1' file
one
five
six
seven
eight

Related

A way to append the beginning of every line before a pattern to the end of each same line?

I am trying to copy the beginning of every line in a text file before a certain character to the end of the same line.
I've tried duplicating each line to the end of itself, and then deleting everything after the character, but the trouble is I haven't been able to figure out how to skip the first instance of the character so the result is that the duplicated text gets deleted as well as everything beyond the first instance of the character.
I've tried things like
sed '/S/ s/$/ append text/' sample.txt > cleaned.txt
but this only adds a fixed text. I've also tried using:
s/\(.*\)/\1 \1/
to duplicate the line, and then deleting everything after the S, but I can't figure out how to get it to go to the 3rd S not the 1st to start deleting.
What I have to start with:
dog 50_50_S5_Scale
cat 10_RV_S76_Scale
mouse 15_SQ_S81_Scale
What I'm trying to get:
dog 50_50_S5_Scale dog 50_50_
cat 10_RV_17_S76_Scale cat 10_RV_17_
mouse 15_EQ_S81_Scale mouse 15_EQ_
Where everything before the first S gets copied to the end of the line.
You may use
sed 's/\([^S]*\)S.*/& \1/' file
See the online demo
Details
\([^S]*\) - Capturing group 1 (\1): any 0+ chars other than S
S.* - S and the rest of the string (actually, line, since sed processes line by line by default).
The replacement is the concatenation of the whole match (&), space and Group 1 value.
You could try:
awk '{print $0 " " substr($0, 0, index($0,"S") - 1)}' file
We take the substring from the first character up to but not including the first occurance of "S".

Joining specific lines in file

I have a text file (snippet below) containing some public-domain corporate earnings report data, formatted as follows:
Current assets:
Cash and cash equivalents
$ 21,514 $ 21,120
Short-term marketable securities
33,769 20,481
Accounts receivable
12,229 16,849
Inventories
2,281 2,349
and what I'm trying to do (with sed) is the following: if the current line starts with a capital letter, and the next line starts with whitespace, copy the last N characters from the next line into the last N columns of the current line, then delete the next line. I'm doing it this way, because there are other lines in the files that begin with whitespace that I want to ignore. The results should look like the following:
Current assets:
Cash and cash equivalents $ 21,514 $ 21,120
Short-term marketable securities 33,769 20,481
Accounts receivable 12,229 16,849
Inventories 2,281 2,349
The closest I've come to getting what I want is:
sed -i -r ':a;N;$!ba;s/[^A-Z]*\n([[:space:]])/\1/g' file.txt
and I believe I've got the pattern matching ok, but the subsequent substitution really messes up the alignment of the columns of numbers. When I first started this, this seemed like a simple operation, but hours of searching and experimenting haven't helped. I'm open to any solutions that use something else other than sed, but would prefer to keep it strictly bash. Thank you much!
This might work for you (GNU sed):
sed -r '/^[[:upper:]]/{N;/\n\s/{h;x;s/\n.*//;s/./ /g;x;G;s/(\n *)(.*)\1$/\2/};P;D}' file
This solution only processes two consecutive lines that start with an upper-case letter and a white space respectively. All other lines are printed as is.
Having gathered the above two lines into the pattern space (PS), a copy is made and stored in the hold space (HS). Processing now swaps to the HS. The second line is removed and the contents of the first turned into spaces. Processing now swaps back to the PS. The HS is appended to the PS and using matching and back references the length of the first line in spaces is subtracted from the combined lines.
The line(s) are printed and then deleted. If the second line did not begin with a space, by use of the P and D commands, it is not deleted but re-appraised by virtue of the regexp at the start of the sed script.

compare first 60 characters and delete the duplicate row

How to use the sed to check the consecutive lines where there first 10 characters is the same? If same, the second row of the lines will be deleted.
Example:
Before
ABCDEF123456
123456ABCDEF
123456789012
123456789090
After
ABCDEF123456
123456ABCDEF
123456789012
This mihgt work for you (GNU sed):
sed 'N;P;/^\(.\{10\}\).*\n\1/d;D' file
Read two lines, print the first and then compare the first ten characters of the first line with the second line. If they are the same delete both lines otherwise delete the first.

How do I replace lines between two patterns with a single line in sed?

This is my input file:
one
two
three
four
five
six
seven
eight
nine
ten
I want to turn the file into
one
two
three
NEW LINE
eight
nine
ten
with sed. That is, I want to replace the lines from /four/ (including) to /seven/ (including) with the single line NEW LINE.
I can do that with
sed '/four/aNEW LINE
/four/,/seven/d' file.txt
But I am wondering if there is a simpler way, notably one without having to repeat a pattern (as I needed to with /four/).
Edit As per fedorquis comment-question, this can also be in awk (although for "academic" purposes I'd be interested in sed solutions.)
Edit 2 Unfortunately, the input file suggests that there is a logical order of words in the input file (one followed by two followed by three etc). In my "real world" problem, this is not the case, however. I have no idea how many lines the file has, nor what is preceeded or followed by the lines four and seven. The onl thing I know is that there is a line four which is (not necessarily immediately) followed by a line seven. I am sorry for not stating this clearly when I asked the question, especially because fedorqui has put so much effort in his answer.
Perl is pretty concise, and you don't need to repeat any keywords:
perl -00 -pe 's/four.*seven/NEW_LINE/s'
Here is how you do in sed:
$ sed ':a;N;s/four.*seven/NEW LINE/;ba' file
one
two
three
NEW LINE
eight
nine
ten
Logic is pretty much similar to Glenn's answer. Slurp the entire file in to one long line separated by newlines and substitute everything from four to seven and replace it with NEW LINE.
With sed, you can delete from line four to seven and append after seven. Which is in fact what you posted in your question :)
$ sed -e '/seven/a \NEW LINE' -e '/four/,/seven/d' file
one
two
three
NEW LINE
eight
nine
ten
With awk you can do:
$ awk '/four/ {f=1} !f; /seven/ {print "NEW LINE"; f=0}' file
one
two
three
NEW LINE
eight
nine
ten
What it does is to keep updating the flag f that stops the printing.
When "four" is found, the flag is activated.
When "seven" is found, the flag is deactivated, printing also the NEW LINE.
This might work for you (GNU sed & bash):
sed $'/^four/{:a;N;/^seven/McNEWLINE\nba}' file

SED: Operate on Last seven lines regardless of file length

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?
Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.
You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile
perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.
This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)
This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D