Find and replace wildcard that spans across lines? - sed

I would like to input some text like this:
foo
stuff
various stuff
could be anything here
variable number of lines
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
and get out this:
foo
teststuff
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
So far I've read and messed around with using sed, grep, and pcregrep and not been able to make it work. It looks like I could probably do such a thing with Vim, but I've never used Vim before and it looks like a royal pain.
Stuff I've tried includes:
sed -e s/foo.*bar/teststuff/g -i file.txt
and
pcregrep -M 'foo\s+bar\s' file.txt | xargs sed 's/foo.*bar/teststuff/g'
I'm not sure if it's just that I'm not using these commands correctly, or if I'm using the wrong tool. It's hard for me to believe that there is no way to use the terminal to find and replace wildcards that span lines.

For clarity, simplicity, robustness, maintainability, portability and most other desirable qualities of software, just use awk:
$ awk 'f&&/bar/{print "teststuff";f=0} !f; /foo/{f=1}' file
foo
teststuff
bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important

Try this with GNU sed:
sed -e '/^foo/,/^bar/{/^foo/b;/^bar/{i teststuff' -e 'b};d}'
See: man sed

This might work for you (GNU sed):
sed '/foo/,/bar/cteststuff' file
This will replace everything between foo and bar with teststuff.
However what you appear to want is everything following foo before bar, perhaps:
sed '/foo/,/bar/cfoo\nteststuff\nbar' file

Related

Pipe Grep Results to Sed — Only do sed on results of grep

[Mac OS]
It seems that sed requires an input file, and that I cannot pipe grep to it. Although sed can match like grep does, it can make the sed operation very complex if it's handling both a find and replace.
For example, if I wanted to remove the 3rd word of every line that started with 'T', it's much more convenient to separate the find/replace commands than to create a complex regex.
Looking through SO answers, there doesn't seem to be an elegant solution where you can pipe grep to sed without new files being involved. I did find this, which almost does what I want:
sed -i "s/$(grep 'old' input.txt)/new/g" input.txt
But it doesn't handle multiple matches well.
I'll generalize:
Is there a better way to find specific lines in a text file and modify those lines in-place? Preferably cli, or as low-level as possible.

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

Remove a hyphen from a specific line in a file

I have a data file that needs to have several uniq identifiers stripped of hyphens.
So I have:
(Special_Section "data-values")
and I want to have it replaced with:
(Special_Section "datavalues")
I wanted to use a simple sed find/replace, but the data and values are different each time. Preferably, I'd run this in-place since the file has a lot of other information I want to keep in tact.
Does sed or awk have a way to remove the hyphen from the matched portion only?
Currently I can match with: sed -i 's/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/&/g *myfiles*
But I would like to then run s/-// on & if it's possible.
You seems to be using GNU sed, so something like this might work:
sed -ri '
s/(Special_Section [^-]*)-([^)]*)/\1\2/g
' <your_filename_glob>
does this work?
sed -i '/(Special_Section ".*-.*")/{s/-//}' yourFile
Close - scan for the lines and then substitute on those that match:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/' *myfiles*
You can split that over several lines to avoid the scroll bar in SO:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/{
s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/
}' *myfiles*
And on further thoughts, you can also do:
sed -i 's/\(Special_Section "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*"\)/\1\2/' *myfiles*
This is more compact. You can add the g qualifier if you need it. Both solutions use the special \(...\) notation to capture parts of the regular expression.

Looking for a Perl or sed one-liner to replace a string with contents from another file

In an HTML file, let’s call it index.html, I want to replace a comment string, say //gac goes here, with the contents (multi-line) from a separate file which is called: gac.js. Is there a nice one-liner for that?
I found something as: sed -e "/$str/r FileB" -e "/$str/d" FileA, but it is not working as promised.
I do like it as short as possible, as it will be called after an SVN revert (I don't want any of that google.script polluting my development environment).
This should work, even though it is nasty:
perl -pe 'BEGIN{open F,"gac.js";#f=<F>}s#//gac goes here##f#' index.html
In the case that gac.js is supposed to be dynamic:
perl -pe 's#//(\S+) goes here#open+F,"$1.js";join"",<F>#e' index.html
perl -mFile::Slurp -pe 's/\/\/(\w+) goes here/#{[File::Slurp::read_file("$1.js")]}/;'
Obviously requires File::Slurp
Not very nice, but seems to work:
cat index.html | perl -pe 'open(GAC, "gac.js");#gac=<GAC>;$data=join("", #gac); s/gac goes here/$data/g'
After going through man sed, this tutorial and some experimenting I came up with:
sed -i '\_//gac goes here_ {
r gac.js
d
}' index.html
Which does exactly what I want. It's not exactly a oneliner (if i make it one line i get: sed: -e expression #1, char 0: unmatched '{') which I don't understand. However expression above fits nicely in my update script.
Lessons learned: sed is very powerfull, -i behaves different on mac os x / linux, /string/ can easily be replaced with \[other delimiter]string[other delimiter].

capturing groups in sed

I have many lines of the form
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
ko04080 ko:GZMA
and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'
which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.
The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.
Any help would be greatly appreciated!
sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'
\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.
This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.
sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result
And ofcourse you can use
sed --posix 's/ko://'
You don't need sed for this
Here is how you can do it with bash:
var="ko05414 ko:ITGA4"
echo ${var//"ko:"}
${var//"ko:"} replaces all "ko:" with ""
See Manipulating Strings for more info
#OP, if you just want to get rid of "ko:", then
$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 ko:GZMA
$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 GZMA
Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.