SED: Use of character as variable in repeat remove code - sed

I have a need to use a number contained in each line of a text file as a variable in a repeat remove task on that line.
Specifically, per line;
(1) The column position of the START of the number to be selected is known, but not the end. i.e it can be of any digit length, BUT is ended with a space.
(2) Take this number and use it in a repeat remove task across the whole line. I would like to remove ALL text between foo and bar, similar to foo.*bar the number of times given in (1). foo and bar are repeated in pairs (foo first) the number of times in (1), and can be different per line.
Does anyone know how to do this with SED please. Keen, through SED, to get a way to control sed -i 's/foo.*bar//g' type examples I keep seeing, which removes ALL text from the first foo to the last bar, inclusive. We need all text between foo and bar, for every foo bar that appears.
UPDATE: Consider a file, example.txt
someText 1 foo someOtherText bar someOtherOtherText
someText 2 foo someOtherDiffText bar x x x foo text bar
We need to return;
someText
someText x x x
The numbers tell us how many time the pairing foo-bar, with text between it, appears, and range from 1 to 100s
Kind Regards
HS

Sed lacks non-greedy matching, but try Perl:
perl -pe 's|foo .*? bar||g' file.txt

Related

Replace block of text inside a file using the contents of another file using sed

I am looking to replace a block of text that is between markers with the contents of another file.
I came across this solution but it only works with one line
$ sed -n '/foo/{p;:a;N;/bar/!ba;s/.*\n/REPLACEMENT\n/};p' file
line 1
line 2
foo
REPLACEMENT
bar
line 6
line 7
I am trying to get the following working but it's not.
content=`cat file_content`
sed -n '/foo/{p;:a;N;/bar/!ba;s/.*\n/${content}\n/};p' file
output
line 1
line 2
foo
${content}
bar
line 6
line 7
How can I get ${content} to list the output of the file?
So I guess this should be a reasonably short way of doing it to replace text between foo and bar lines with content of file file_content:
sed -e '/^foo$/,/^bar$/{/^bar$/{x;r content_file
D};d}' file
For range of lines matching ^foo$ and ^bar$. If line matches ^bar$ swap (empty) hold space into pattern space, read and append content of content_file, then delete pattern space up to first newline and start next cycle with the reminder of the pattern space. For all other lines in that range... just drop the line (delete patter space and move to the next line of input).
Otherwise to the result of your question... any string enclosed in single quotes is taken literally by shell and without any expansion (also of variables) taking place. '${content}' means literally ${content} and that is also part of the argument passed to sed, whereas double quote text ("${content}") would still see shell expand variable to what its value before becoming part of the sed arguments. Since that could still see content tripping up sed, I would opt for the r method for being more generic / robust.
EDIT: Edit keeping the start and end lines in (since I've misread the question):
sed -e '/^foo$/,/^bar$/{/^foo$/{r content_file
p};/^bar$/!d}' file
This time for range between matched of ^foo$ and ^bar$... for opening line matching ^foo$ we it reads content from content_file appending it to pattern space and then prints it (because of delete that follow). Then for all line in the range not matching the closing line pattern ^bar$ it just drops it and moves on.
This might work for you (GNU sed):
sed '/foo/!b;:a;$b;N;/bar/!ba;P;s/.*\n//;e cat contentFile' file
Print all lines until one containing foo.
If this is the last line, then there will never be a line containing bar so break out and do not insert the contentFile.
Otherwise, append the next line and check for it containing bar, if not repeat.
The pattern space should now contain both foo and bar so, print the first line (containing foo), remove all other lines other than the one containing bar, print the file contentFile and then print the last line of the collection containing bar.
N.B. This does not insert the contentFile unless both foo and bar exist in file. Also the e command will evaluate the cat contentFile immediately and insert the result into the output stream before printing the line containing bar, whereas the r command always prints to the output stream after the implicit print of the sed cycle.
An alternative:
sed -ne '/foo/{p;:a;n;/bar/!ba;e cat contentFile' -e '};p' file
However this solution will only print lines before foo if file does not have a line containing bar.
sed '/foo/,/bar/{//!d;/foo/s//&\n'${content}'/}' file
From foo to bar, delete lines not matching previous match //!d.
On foo line, replace match & with match followed by \n${content}

replace all dots contained within braces

In a file, a would like to replace all occurences of a dot within braces to be replaced by an underscore.
input
something.dots {test.test} foo.bar
another.line
expected output
something.dots {test_test} foo.bar
another.line
What would be the easiest way to achieve that?
You can choose the least ugly sed from the two options below:
$ cat file
something.dots {test.test} foo.bar {a.a} x
something.dots
$ sed 's|\({[^}]*\)\.\([^}]*}\)|\1_\2|g' file
something.dots {test_test} foo.bar {a_a} x
something.dots
$ sed -E 's|(\{[^}]*)\.([^}]*\})|\1_\2|g' file
something.dots {test_test} foo.bar {a_a} x
something.dots
Explanation (I'll use the last form, but they are equivalent):
(\{[^}]*): Matching group 1 consisting of a {, and any number of non-} characters.
\.: A dot.
([^}]*\}): Matching group 2 consisting of any number of non-} characters followed by a }.
If found, replace the whole expression by [Matching group 1].[Matching group 2].
easiest way
Hold the line, extract the part within braces, do the substitution, grab the holded line and shuffle it for the output.
sed 'h;s/.*{//;s/}.*//;s/\./_/g;G;s/^\(.*\)\n\(.*{\).*}/\2\1}/'
#edit - ignore lines without {.*}:
sed '/{.*}/!b; h;s/.*{//;s/}.*//;s/\./_/g;G;s/^\(.*\)\n\(.*{\).*}/\2\1}/'
Tested on repl.
If it's going to be the "easiest way" use AWK instead of sed and then:
awk -F"{|}" '$0 !~ /{.*}/{print($0)}; gsub("\.","_",$2) {print($1"{"$2"}"$3)}' file
This will replace any number of dots, e.g. {test.test.test} and lines without parentheses leaves unchanged.
Explanation:
-F"{|}" Sets the field separator to { or }
$0 !~ /{.*}/{print($0)}; Prints lines unchanged without the {. *}
pattern, "print" can be omitted as this is
the default behavior
gsub("\.","_",$2) Substitutions . to _ for field 2
{print($1"{"$2"}"$3)} Formats and prints lines after changes

Append to non-empty line that doesn't start with whitespace AND is followed, two lines down, by a non-empty line that doesn't start with whitespace

I am converting several unruly, early 90's DOS-generated text files to something more usable. I need to append a set of characters to all of the non-empty lines in said text files that don't start with whitespace AND that are followed, two lines down, by another non-empty line that doesn't start with whitespace (I will refer to all single lines of text that meet these characteristics as "target" lines). BTW, irrelevant to the problem are the characteristics of the line directly below each of the target lines.
Of interest is the fact that all of the target lines in the above-mentioned text files end with the same character. Also, the command I'm looking for needs to slot into a rather long pipeline.
Suppose I have the following file:
foo
third line foo
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foo
eleventh line foo
this line starts with a space foo
last line foo
I want the output to look like this:
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
Although I'm looking for a sed solution, awk and perl are welcome as well. All solutions must be able to be used in a pipeline. Also welcomed are solutions which handle a more general case (e.g. able to append the desired text to target lines that end in various ways, including whitespace).
Now, for the backstory:
I recently asked a question similar to the subject question a few days ago (see here). As you can see, I got some great answers. It turned out, however, that I did not fully understand my problem, so I did not ask the correct question that would actually solve said problem.
Now, I'm asking the right question!
Based on what I learned by scrutinizing the answers to the question I linked to above, I've cobbled together the following sed command
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Ugly, yes, but it works for my humble purposes. Indeed, as my original intent with this question was to post a question, then self-answer same, you can see this sed construct posted below as one of the answers (posted by me).
I'm sure there are better ways to solve this particular problem, however...any ideas, anyone?
From your posted expected output it looks like you meant to say "is followed, two lines down, by a line that DOES NOT start with whitespace" instead of "is followed, two lines down, by a line that DOES start with whitespace".
This produces the output you show:
$ cat tst.awk
NR>2 { print p2 ((p2 ~ /^[^[:blank:]]/) && /^[^[:blank:]]/ ? "bar" : "") }
{ p2=p1; p1=$0 }
END { print p2 ORS p1 }
$ awk -f tst.awk file
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
It simply keeps a 2 line buffer and adds "bar" to the end of the line being printed given whatever condition you need. It will work on all POSIX awks and any others that support POSIX character classes (for the rest, change [[:blank:]] to [ \t]).
You have over-analysed the problem so that your question now reads as a computer program, and you have got that program wrong. Requirements are best explained using examples and real data, so that we have some hope of rationalising the problem in our heads
This Perl program alters your algorithm so the output matches your required output
use strict;
use warnings 'all';
chomp(my #data = <>);
my $i = 0;
for ( #data ) {
$_ .= 'bar' if /^\S/ and $data[$i+2] =~ /^\S/;
++$i;
last if $i+2 > $#data;
}
print "$_\n" for #data;
output
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
This sed one-liner seems to do the trick for the specific case outlined in the OP:
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Thanks to the excellent clarifying information given by Benjamin W. in his answer to one of my recent questions, I was able to cobble together this one-liner that solved my specific problem. Please refer to same if you wish to gain insight into said command.

how to use sed to print line #6 from a file, but only if any other line in the file matches a pattern

and, in a more generic way, is it possible to use program sed to print any line matching PATTERN 1, but only if any other line in the file matches PATTERN 2? It can be done with a combination of grep commands, but I am trying to get it done with a single sed command.
This is NOT a job for sed:
awk 'NR==FNR{if (/PATTERN2/) f=1; next} f && (FNR==6)' file file
awk 'NR==FNR{if (/PATTERN2/) f=1; next} f && /PATTERN1/' file file
or if you don't want to specify the file name twice:
awk 'BEGIN{ARGV[ARGC]=ARGV[ARGC-1]; ARGC++} NR==FNR{if (/PATTERN2/) f=1; next} f && /PATTERN1/' file
I believe it's possible:
:l1 {
/foo/ { H }
/bar/ { x ; s/^\n//; p ; s/.*//; h ; b l2}
n
b l1
}
:l2 {
/foo/ { p }
n
b l2
}
Quick overview:
l1 is our initial loop. It will check for /foo/ (being pattern 1). If it's found on a line, that line will be APPENDED to the holding space.
The next line will check for /bar/, when found, it will exchange the holding space and pattern space (x), remove an initial newline from the data (this is because we use H in our first line, we print the data, we empty this data and store it back in the holding pattern (so it will be empty). Then, we branch to l2, in effect, leaving the loop l2.
If the line does not match pattern 1 foo or pattern 2 bar, it will go to the next line, and jump back to the start l1 again.
Once we are in l2, we check for pattern 1 /foo/. Since we KNOW that we have found pattern 2 earlier (otherwise we wouldn't be here), we can safely print this data. If not foo, we just skip that line, and loop back to the start of l2.
Pretty much tested this with the following data:
a
b
c foo
d
e foo
f bar
g foo
h
i foo
j
k
Depending on "bar" being there, it will either print all lines with foo, or nothing at all.
Granted, it will not win any beauty contests, but it's written in sed only.
Here's a sed script that prints lines matching pattern1, if there exists a line matching pattern2, regardless of the order of pattern1, pattern2:
#n
:loop
/foo/H
/bar/{
g
s/\n//
/foo/p
:loop2
n
/foo/p
b loop2
}
n
b loop
If you save this into a file like s.sed, you can do
sed -f s.sed file
The #n works the same way as -n, meaning suppress standard output. The loop appends any lines matching foo (pattern1) to the hold space. When it encounters bar (pattern2), it gets the contents of the hold space (wiping out the current pattern space) with the g command. It removes the first new line (as the H command adds a new line even when the hold space is empty). It prints out the pattern space if it contains foo (meaning its not empty). Then the n goes to the next line. Now that we have matched pattern2, we can safely print all matches of foo by starting loop2
This might work for you (GNU sed):
sed -n ':a;6H;/pattern/{z;H};n;$!ba;x;s/\n//;s///p' file
Turn off automatic printing of the pattern space by using option -n. Set up a loop that reads every line of the file and appends a single line (for line 6) and/or(not) an empty line (denoting a match on pattern has occurred) in the hold space. At the end of the file swap to the hold space, remove the ever present first newline (if a line or an empty line has been appended) and removes a second newline and prints the result if successful.
N.B. If pattern exists in the file the hold space will contain two newlines either the first two characters or the first and the last characters.
There is no way you can do this in the general case without parsing the file twice. This means invoking sed twice.
If the line(s) matching the "trigger pattern" always occur after all occurrences of the line(s) matching the "match pattern", then this might do it for you:
$ cat testdata
1 aaa
2 match
3 match
4 ddd
5 trigger (only print line 2 and 3 if this line is present)
$ sed -n -e '/match/H' -e '/trigger/{x;^Mp;^M;q^M}' testdata
2 match
3 match
(the ^M in there are verbatim newlines)
I'm not sure how to delete the initial empty line in the output. Pointers about this are welcomed.
UPDATE: I put a final q (quit) at the end of the command sequence for the "trigger pattern" just to make sure that further trigger patterns later in the file wouldn't screw up the output.

Swapping two lines

How can I make use of the sed H, h, x, g, G etc. commands to swap two lines?
For example in the file
START
this is a dog
this is a cat
this is something else
END
say I want to swap "this is a dog" with "this is something else".
This is what I have so far:
/this is a dog/{
h # put to hold space
}
/this is something else/{
# now i am stuck on what to do.
}
If you know a pattern on each of the two lines you want to swap, but not the full contents of the lines, you can do something like this:
sed -n ' # turn off default printing
/dog/{ # if the line matches "dog"
h # put it in hold space
:a # label "a" - the top of a loop
n # fetch the next line
/something/{ # if it matches "something"
p # print it
x # swap hold and pattern space
bb # branch out of the loop to label "b"
} # done with "something"
# if we're here, the line doesn't match "something"
H # append pattern space to hold space
x # swap hold and pattern space
s/\([^\n]*\)\n\([^\n]*\)$/\2\n\1/ # see below
x # swap hold and pattern space
ba # branch to the top of the loop to label "a"
} # done with "dog"
:b # label "b" - outside the loop
# print lines that don't match and are outside the pair
p # also prints what had been accumulating in hold space
' inputfile
The substitution pattern keeps "dog" at the end of the accumulated lines. It keeps swapping the last two lines that we're keeping in hold space so that "dog" "bubbles" to the bottom.
For example, let's put another line after the "cat" line so the process is a little clearer. We'll ignore the lines before "dog" and after "something". And I'll continue to refer to the lines using my nicknames
this is a dog
this is a cat
there's a bear here, too
this is something else
"Dog" is read, then "cat" is fetched. Some appending and swapping is done. Now pattern space looks like this (\N represents a newline, I'm using an upper case "N" so it stands out, the ^ is the beginning of the pattern space and $ is the end):
^this is a dog\Nthis is a cat$
The substitution command looks for any number of characters that are not newlines (and captures them) followed by a newline followed by any number of characters that are not newlines (and captures them) that are at the end of the line ($) and replaces all that with the two captured strings in the reverse order separated by a newline. Now pattern space looks like this:
^this is a cat\Nthis is a dog$
Now we swap and read a new line. It's not "something" so we do some appending and swapping and now we have:
^this is a cat\Nthis is a dog\Nthere's a bear here, too$
We do the substitution again and get:
^this is a cat\Nthere's a bear here, too\Nthis is a dog$
Why didn't we get "bear/dog/cat" instead? Because the regex pattern consisting of two lines (which each, as usual, consist of non-newlines followed by a newline) is anchored at the end of the line using the $ so we're ignoring anything that comes before it. Note that the last newline is implied and doesn't actually exist in pattern or hold space. That's why I'm not showing it here.
Now we read "something" and print it. We swap. Hey! there's that stuff that we've been "bubbling". Branch and print. Since "dog" is at the bottom of the lines (that had been accumulated in hold space) and we printed "something" right before that bunch, the effect is that we swapped the two lines.
This script will work regardless of how many lines appear before, between or after the two lines to be swapped. In fact, if there are multiple pairs of matching lines, the members of each pair will be swapped throughout the file.
As you can see, I'm keying on just one word in the lines of interest, but any suitable regular expression would do.
/this is a dog/{
h # put line in hold space
s//this is something else/ # replace with 'this is something else'
b # branch to end of the script
}
/this is something else/x # swap hold space with pattern space
Since sed is a stream editor you really can't replace a current line with the contents of future lines. You can, however, do the reverse and replace future lines with a line you've already seen and that's what my script does.
Perhaps another way to go about this would be to use the N command to append each line into the pattern space then do two s commands to swap the two literal strings.