Using Sed to Delete multiple lines using a file with patterns - sed

I am currently using sed to delete lines and subsequent line with various patterns from a file using the following the following code:
sed -i -e"/String1/,+1d" -e"/String2/,+1d," filename.txt
Works very well however I have a lot of patterns which vary from time to time.
Is it possible to put all patterns in another text file and make sed to delete all entries for patterns found in such file ?
Thanks

Here is an awk version
awk 'NR==FNR {a[$0]++;next} {for (i in a) if ($0~i) f=2} --f<0' list yourfile
NR==FNR {a[$0]++;next} store the list of lines to remove for file list in array a
for (i in a) for every line, loop through all lines in list
if ($0~i) f=2 if trigger line is found, set flag f to 2
--f<0 decrease flag f by one and test if it less than 0, if yes, print the line.
example
cat yourfile
one
two
three
four
five
six
seven
eight
nine
ten
eleven
cat list
three
eight
awk 'NR==FNR {a[$0]++;next} {for (i in a) if ($0~i) f=2} --f<0' list yourfile
one
two
five
six
seven
ten
eleven

Trying to stick with sed - at all cost, and being creative :-)
Consider using sed itself to generate the sed script that will perform the substitutions, based on the patterns file.
Important to note that this is solution will process each input file with one-pass, making it possible to use on large files/many patterns.
Proposed Solution:
sed -i -e "$(sed -e '/\//d;s/^/\//;s/$/\/,+1d/' < patterns.txt)" filename.txt
The embedded sed program (sed -e '/\//d;s/^/\//;s/$/\/,+1d/ ...) will convert the patterns.txt to a small sed script:
pattern.txt:
three
eight
foo/bar
Output: (noticed foo/bar ignored - contains '/')
/three/,+1d
/eight/,+1d
Notes, Limitations, etc:
One limit (of above implementation) is the delimiter, code remove any pattern with '/' to simplify generation of sed script, and to avoid potential injection. Possible to work around this limitation and allow for alternate delimiter (by escaping special characters in the pattern, or leveraging the '\%' addresses). May need additional testing.
Code assumes that the patterns are valid RE.

Related

How to change the first occurrence of a line containing a pattern?

I need to find the line with first occurrence of a pattern, then I need to replace the whole line with a completely new one.
I found this command that replaces the first occurrence of a pattern, but not the whole line:
sed -e "0,/something/ s//other-thing/" <in.txt >out.txt
If in.txt is
one two three
four something
five six
something seven
As a result I get in out.txt:
one two three
four other-thing
five six
something seven
However, when I try to modify this code to replace the whole line, as follows:
sed -e "0,/something/ c\COMPLETE NEW LINE" <in.txt >out.txt
This is what I get in out.txt:
COMPLETE NEW LINE
five six
something seven
Do you have any idea why the first line is lost?
The c\ command deletes all lines between and inclusive the first matching address through the second matching address, when used with 2 addresses, and prints out the text specified following the c\ upon matching the second address. If there is no line matching the second address in the input, it just deletes all lines (inclusively) between the first matching address through the last line. Since you want to replace one line only, you shouldn't use the c\ command on an address range. The c\ is immediately followed by a new-line character in normal usage.
The 0,/regexp/ address range is a GNU sed extension, which will try to match regexp in the first input line too, which is different from 1,/regexp/ in that aspect. So, the correct command in GNU sed could be
sed '0,/something/{/something/c\
COMPLETE NEW LINE
}' < in.txt
or simplified as pointed out by Sundeep
sed '0,/something/{//c\
COMPLETE NEW LINE
}' < in.txt
or a one-liner,
sed -e '0,/something/{//cCOMPLETE NEW LINE' -e '}' < in.txt
if a literal new-line character is not desirable.
This one-liner also works as pointed out by potong:
sed '0,/something/!b;//cCOMPLETE NEW LINE' in.txt
This might work for you (GNU sed):
sed '1!b;:a;/something/!{n;ba};cCOMPLETE NEW LINE' file
Set up a loop that will only operate from the first line.
Within in the loop, if the key word is not found in the current line, print the current line, fetch the next and repeat until the end of the file or a match is found.
When a match is found, change the contents of the current line to the required result.
N.B. The c command terminates any further processing of sed commands in the same way the d command does.
If there are lines in the input following the key word match, the negation of address at the start of the sed cycle will capture these lines and result in their printing and no further processing.
An alternative:
sed 'x;/./{x;b};x;/something/h;//cCOMPLETE NEW LINE' file
Or (specific to GNU and bash):
sed $'0,/something/{//cCOMPLETE NEW LINE\n}' file
Just use awk:
$ awk '!done && sub(/something/,"other-thing"){done=1} {print}' file
one two three
four other-thing
five six
something seven
$ awk '!done && sub(/.*something.*/,"other-thing"){done=1} {print}' file
one two three
other-thing
five six
something seven
$ awk '!done && /something/{$0="other-thing"; done=1} {print}' file
one two three
other-thing
five six
something seven
and look what you can trivially do if you want to replace the Nth occurrence of something:
$ awk -v n=1 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
other-thing
five six
something seven
$ awk -v n=2 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
four something
five six
other-thing

How to insert sth with sed just once

I'm trying to substitute the first empty line in my input file with a multiline block, i. e. out of
one
two
three
four
five
six
I want to create
one
two
foo
three
four
five
six
For this I tried this sed script:
sed '/^$/i\
\
foo'
But it inserts at /each/ empty line.
How can I tweak this call to sed so that it inserts just at the first occurrence of an empty line? Is there a way to tell sed that now the rest of the input should just be copied from to the output?
I do not want to switch to awk or other shell tools like read in a loop or similar. I'm just interested in the use of sed for this task.
You can loop and print lines until the end of the file:
sed '/^$/{i\
\
foo
:a;n;ba}' file
I found a way by replacing the i with a s command:
sed '0,/^$/s//\
foo\
/'
But I would prefer a solution using the i command because not everything I could want to do after the search might be easily replaceable with an s.

Can sed match one part of a line and replace another part, for multiple lines that each have a different match and replacement?

Given a file containing multiple lines with this format:
define('SOME_NAME', some_value);
How can sed be used to match SOME_NAME and replace some_value with some_other_value, for multiple different lines ?
This is a solution for one line:
sed -re "s|^(define *\('SOME_NAME'\s*,\s*).*(\);)|\1 some_other_value \2|" defs_file
To process a number of similar definitions in a file, I had to script outside sed (this example uses a bash version 4 associative array):
#!/bin/bash
declare -A args
args=([SOME_NAME1]=some_other_value1
[SOME_NAME2]=some_other_value2
[SOME_NAME3]=some_other_value3
[SOME_NAME4]=some_other_value4
[SOME_NAME5]=some_other_value5)
for arg in "${!args[#]}"
do
sed -i -re "s|^(define *\('$arg'\s*,\s*).*(\);)|\1 ${args[$arg]} \2|" defs_file
done
Is there a more elegant way to achieve this result that only relies on sed ?
This might work for you (GNU sed):
sed -r 's/$/\n#NAME:new_value/;s/([^,]*),[^\n]*\n.*#\1:([^#]*).*/\1,\2/;P;d' <<<"NAME,old_value"
This is a simplified example of yours; appending a lookup to the pattern space and then using regexp and back references to match and rearrange the output. See here for a detailed explanation.
N.B. In the example above (for brevity) I only included one lookup, in a real solution the lookup table would have several e.g. s/$/\n#NAME1:new_value1#NAME2:new_value2..../

How do I replace lines between two patterns with a single line in sed?

This is my input file:
one
two
three
four
five
six
seven
eight
nine
ten
I want to turn the file into
one
two
three
NEW LINE
eight
nine
ten
with sed. That is, I want to replace the lines from /four/ (including) to /seven/ (including) with the single line NEW LINE.
I can do that with
sed '/four/aNEW LINE
/four/,/seven/d' file.txt
But I am wondering if there is a simpler way, notably one without having to repeat a pattern (as I needed to with /four/).
Edit As per fedorquis comment-question, this can also be in awk (although for "academic" purposes I'd be interested in sed solutions.)
Edit 2 Unfortunately, the input file suggests that there is a logical order of words in the input file (one followed by two followed by three etc). In my "real world" problem, this is not the case, however. I have no idea how many lines the file has, nor what is preceeded or followed by the lines four and seven. The onl thing I know is that there is a line four which is (not necessarily immediately) followed by a line seven. I am sorry for not stating this clearly when I asked the question, especially because fedorqui has put so much effort in his answer.
Perl is pretty concise, and you don't need to repeat any keywords:
perl -00 -pe 's/four.*seven/NEW_LINE/s'
Here is how you do in sed:
$ sed ':a;N;s/four.*seven/NEW LINE/;ba' file
one
two
three
NEW LINE
eight
nine
ten
Logic is pretty much similar to Glenn's answer. Slurp the entire file in to one long line separated by newlines and substitute everything from four to seven and replace it with NEW LINE.
With sed, you can delete from line four to seven and append after seven. Which is in fact what you posted in your question :)
$ sed -e '/seven/a \NEW LINE' -e '/four/,/seven/d' file
one
two
three
NEW LINE
eight
nine
ten
With awk you can do:
$ awk '/four/ {f=1} !f; /seven/ {print "NEW LINE"; f=0}' file
one
two
three
NEW LINE
eight
nine
ten
What it does is to keep updating the flag f that stops the printing.
When "four" is found, the flag is activated.
When "seven" is found, the flag is deactivated, printing also the NEW LINE.
This might work for you (GNU sed & bash):
sed $'/^four/{:a;N;/^seven/McNEWLINE\nba}' file

Separating every line in a file at specific points

I have a dictionary file formatted like this:
A B [C] D
Where a is a word (with no spaces), B is another word (with no spaces inside it), C is the pronunciation (there are spaces here), and D is the definition expressed in words (there are spaces, and a variety of symbols).
I wish to separate it into 4 parts, like this:
A####B####C####D
In this way, the first space is converted to ####, the first [ is converted to ####, and the first ] is converted to ####. This will allow easy import into a spreadsheet as a CSV (####'s serve as the commas).
Can this be achieved with awk or another tool in BASH?
Update:
Here are some samples:
一千零一夜 一千零一夜 [Yi1 qian1 ling2 yi1 ye4] /The Book of One Thousand and One Nights/
灰姑娘 灰姑娘 [Hui1 gu1 niang5] /Cinderella/a sudden rags-to-riches celebrity/
雪白 雪白 [xue3 bai2] /snow white/
Would be converted to:
一千零一夜####一千零一夜 ####Yi1 qian1 ling2 yi1 ye4#### /The Book of One Thousand and One Nights/
灰姑娘####灰姑娘 ####Hui1 gu1 niang5#### /Cinderella/a sudden rags-to-riches celebrity/
雪白####雪白 ####xue3 bai2#### /snow white/
Consider that anything might appear after the third set of ####'s, including more spaces, [, etc., however, before the third ####, everything is consistent in format.
I think sed will be easier:
sed -e 's/ /####/' -e 's/ [/####/' -e 's/] /####/' infile > outfile
By default (i.e. if you don't specify the g modifier at the end) substitutions only work once per line.
Or, if you want to do it in-place:
sed -i -e 's/ /####/' -e 's/ [/####/' -e 's/] /####/' infile
(but not all versions of sed support that, and you'll lose your input file)