How to mach a string when not followed by another with sed - sed

Does anybody knows how to match with sed each 'foo' instance excepted when it is following by 'bar' in the following string?
'foo boo foo
foo bar foo
foo
foo'
Desired result (matched instances in bold)
'MaTcHeD boo
MaTcHeD foo
bar MaTcHeD
MaTcHeD MaTcHeD'
After a big amount of tests, I found:
sed -e "s/foo\( *\)bar/FOO\1BAR/g" -e "s/foo/MaTcHeD/g" -e "s/FOO\( *\)BAR/foo\1bar/g"
It consists in first matching 'foo bar' instances and replacing them with some temporary string (here 'FOO BAR'), then in matching the resting 'foo' instances before replacing "back" the 'foo bar' ones to their original version... (I hope I am clear...)
But anyway this is not clean at all. I would be surprised there is not a more straight way to do it, even if I have not been able to find it out so far.
Any hint would be appreciated. :-)
Thank you very much,

if you have to stick to sed, your idea is ok. However if the original string has FOO BAR, your solution fails. You have to always choose a right temp string. It makes your script insecure.
you could improve it by not choosing regular String as the temp string, but those invisible strings.
For example, this line works:
sed -r 's/foo( *)bar/\x94\1\x98/g; s/foo/Matched/g;s/\x94( *)\x98/foo\1bar/g' file

Related

Keep lines containing "list of different words" like pattern [duplicate]

This question already has answers here:
How to make sed remove lines not matched by a substitution
(4 answers)
Boolean OR in sed regex
(4 answers)
Closed 4 years ago.
How can I keep all lines matching all those words
toto OR titi OR clic OR SOMETHING and delete any other lines?
If I do sed '/toto/ p ' file I cannot select titi for example.
What I am looking for is something similar to a Perl Regular expression as
^ (word1|word2|word3|andsoon).*. However, I need it for sed because it will be integrated into a bigger sed script.
The goal is to keep all lines starting with word where word is any word from a set of words.
The answer here depends a bit on how your master script is called. Imagine you have a file with the following content:
foo
car
bar
and you are interested in the lines matching "foo" and "bar", then you can do:
sed '/foo\|bar/!d'
sed -n '/foo\|bar/!d;p'
sed -n '/foo\|bar/p'
all these will output:
foo
bar
If you would just do:
sed '/foo\|bar/p'
you actually duplicate the lines.
foo
foo
car
bar
bar
As you see, there is a bit of different handling depending on the usage of the -n flag.
-n, --quiet, --silent suppress automatic printing of pattern space
source: man sed
In general, my suggestion is to delete the lines you don't need at the beginning of your sed script.

Append to non-empty line that doesn't start with whitespace AND is followed, two lines down, by a non-empty line that doesn't start with whitespace

I am converting several unruly, early 90's DOS-generated text files to something more usable. I need to append a set of characters to all of the non-empty lines in said text files that don't start with whitespace AND that are followed, two lines down, by another non-empty line that doesn't start with whitespace (I will refer to all single lines of text that meet these characteristics as "target" lines). BTW, irrelevant to the problem are the characteristics of the line directly below each of the target lines.
Of interest is the fact that all of the target lines in the above-mentioned text files end with the same character. Also, the command I'm looking for needs to slot into a rather long pipeline.
Suppose I have the following file:
foo
third line foo
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foo
eleventh line foo
this line starts with a space foo
last line foo
I want the output to look like this:
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
Although I'm looking for a sed solution, awk and perl are welcome as well. All solutions must be able to be used in a pipeline. Also welcomed are solutions which handle a more general case (e.g. able to append the desired text to target lines that end in various ways, including whitespace).
Now, for the backstory:
I recently asked a question similar to the subject question a few days ago (see here). As you can see, I got some great answers. It turned out, however, that I did not fully understand my problem, so I did not ask the correct question that would actually solve said problem.
Now, I'm asking the right question!
Based on what I learned by scrutinizing the answers to the question I linked to above, I've cobbled together the following sed command
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Ugly, yes, but it works for my humble purposes. Indeed, as my original intent with this question was to post a question, then self-answer same, you can see this sed construct posted below as one of the answers (posted by me).
I'm sure there are better ways to solve this particular problem, however...any ideas, anyone?
From your posted expected output it looks like you meant to say "is followed, two lines down, by a line that DOES NOT start with whitespace" instead of "is followed, two lines down, by a line that DOES start with whitespace".
This produces the output you show:
$ cat tst.awk
NR>2 { print p2 ((p2 ~ /^[^[:blank:]]/) && /^[^[:blank:]]/ ? "bar" : "") }
{ p2=p1; p1=$0 }
END { print p2 ORS p1 }
$ awk -f tst.awk file
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
It simply keeps a 2 line buffer and adds "bar" to the end of the line being printed given whatever condition you need. It will work on all POSIX awks and any others that support POSIX character classes (for the rest, change [[:blank:]] to [ \t]).
You have over-analysed the problem so that your question now reads as a computer program, and you have got that program wrong. Requirements are best explained using examples and real data, so that we have some hope of rationalising the problem in our heads
This Perl program alters your algorithm so the output matches your required output
use strict;
use warnings 'all';
chomp(my #data = <>);
my $i = 0;
for ( #data ) {
$_ .= 'bar' if /^\S/ and $data[$i+2] =~ /^\S/;
++$i;
last if $i+2 > $#data;
}
print "$_\n" for #data;
output
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
This sed one-liner seems to do the trick for the specific case outlined in the OP:
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Thanks to the excellent clarifying information given by Benjamin W. in his answer to one of my recent questions, I was able to cobble together this one-liner that solved my specific problem. Please refer to same if you wish to gain insight into said command.

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

simple multiline sed command does not quite work

I want to match some text including line feeds. The command below almost works, but it does not match the first line
(echo foo; echo foo; echo bar) | sed '1!N; s/foo.*bar/zap\nbaz/'
foo
zap
baz
Same problem here:
(echo foo; echo bar; echo bar) | sed '1!N; s/foo.*bar/zap\nbaz/'
foo
bar
bar
I have found a much more complex sed command which works correctly in both cases but I would rather fix the simple one (if possible), or at least understand why it does not work.
(echo foo; echo bar; echo bar) | sed -n '1h;1!H;${g;s/foo.*bar/zap\nbaz/p}'
zap
baz
sed is very simply just not the right tool for anything involving multiple lines because it is line-oriented and as such is designed to handle one line at a time. All of sed's language constructs for handling multi-line input became obsolete in the mid-1970s when awk was invented because awk is record-oriented instead of line-oriented and so trivially handles newlines within records just like any other character. For example:
$ (echo foo; echo bar; echo bar) |
awk -v RS= '{sub(/foo.*bar/,"zap\nbaz"); print}'
zap
baz
Any time you find yourself using more than s, g, and p (with -n) in sed or talking about "spaces" you have the wrong approach.
Your simple approach can hold at most two lines of the text in the pattern space at once, so it can't match a three-line pattern.
In particular:
(echo foo; echo foo; echo bar) | sed '1!N; s/foo.*bar/zap\nbaz/'
foo
zap
baz
It reads the first line (foo), finds no match, and prints foo. Then it reads the second (foo), appends the next (bar), finds a match and performs the replacement, and prints zap\nbaz.
In the second run:
(echo foo; echo bar; echo bar) | sed '1!N; s/foo.*bar/zap\nbaz/'
foo
bar
bar
It reads the first line (foo), finds no match, and prints foo. Then it reads the second (bar), appends the next (bar), finds no match and prints bar\nbar.
Here's a workaround
sed 's/$/\\n/' | tr -d '\n' | sed 's/foo.*bar/zap\\nbar/g' | sed 's/\\n/\n/g'
This might work for you (GNU sed):
sed '/foo/{:a;N;/foo.*bar/!ba;s//zap\nbaz/}' file
If the current line contains foo then append a newline and the next line and look for foo followed by bar (any number of characters apart including newlines). If this pattern is found replace it by zap\nbaz and print out the result. If not loop back to :a and repeat until it is found or the end-of-file (in which case the entire string in the pattern space will be printed out without any changes).
N.B. the N command will not allow you to read pass the end-of-file and will bail out it you try. The command s//zap\nbaz/ substitutes the current regexp with zap\nbaz where the current regexp is the last /.../ in this case /foo.*baz/.
An alternative without braces:
sed '/foo/!b;:a;N;/foo.*bar/!ba;s//zap\nbaz/' file

How do you duplicate two matched lines in order with sed?

If I have a file that looks like this:
word
foo
bar
word
And I want to duplicate foo\nbar lines, so that it looks like this:
word
foo
bar
foo
bar
word
I have tried using N to load the next line into the buffer, but I must be using it incorrectly, as it appears to skip over lines sometimes.
sed -e '{
N
s/\(foo\nbar\)/\1\1/
}' foobar.txt
I think it is loading word\nfoo into the buffer, then bar\nword into the buffer, and misses the pattern entirely. How do you use N appropriately? Would this be easier with awk, perl, or some other tool?
Since you specifically tagged the question with sed I thought I'd post a sed solution:
/foo/,/bar/{
i\
foo
i\
bar
d
}
$ sed -f s.sed input
word
foo
bar
foo
bar
word
in awk:
awk -v word1="foo" -v word2="bar" '
{print}
prev==word1 && $1==word2 {print word1; print word2}
{prev=$1}
' filename