Use sed to replace line if pattern is on next line - sed

How do I get sed to replace previous line? I only came across examples of delete, insert lines, but what I actually need is that I only make substitution to current line if a condition on following line is met.
My sample file is like this
$ /bin/cat test
Cygwin
Cygwin is a cool emulator for Linux on Windows.
Unix
Maybe
the coolest environment?
Linux
Is also one of the best environments
Solaris
Why did Sun feel copying Java into Unix would matter?
AIX
Unknown
The output I expect is as below. Prepend ::: to strings having max 25 chars but only if the string on next line is longer than 25 chars. Thus, the line having Unix, AIX below should not get prepended with :::, but others would.
$ # See detailed sed expression in my answer below
:::Cygwin
Cygwin is a cool emulator for Linux on Windows.
Unix
Maybe
the coolest environment?
:::Linux
Is also one of the best environments
:::Solaris
Why did Sun feel copying Java into Unix would matter?
AIX
Unknown
What sed expression can help me do this?
I am inclined to use only sed since this is a part of some other script that has other sed expressions going on, so I do not want to deviate if possible.

Here's one sed expression that gives me the output I desire,
/bin/sed -rne '/^\s*$/{d;};{p;}' test | /bin/sed -rne '/(^.{5,26}$)/{$p;h;n;/^.{5,26}$/{x;p;x;p;D;};{x;s/(^.*$)/:::\1/;p;x;p;D;}};{$p;h;p;}'
Specifically, below two sed expressions are piped together above,
/bin/sed -rne '/^\s*$/{d;};{p;}' test
# Remove any empty-lines (optionally containing spaces)
/bin/sed -rne '/(^.{5,26}$)/{$p;h;n;/^.{5,26}$/{x;p;x;p;D;};{x;s/(^.*$)/:::\1/;p;x;p;D;}};{$p;h;p;}'
# This is the killer sed expression I came up with hunting around with my limited knowledge
# The detailed breakdown of this expression is as below,
/(^.{5,26}$)/ # Get a string of characters atleast 5 chars to max 26 chars
{
$p; # Print if it's already on last line (since -n is in effect)
h; # Save it to hold space
n; # Get the next line into pattern space
/^.{5,26}$/ # Check if pattern space (i.e. next line) also has min 5, max 26 chars
{ # if above condition passed, execute inside here
x; # Swap pattern with hold space; i.e. Get current line back
p; # Print it (i.e. the first line)
x; # Swap again; to get back next line
p; # Print it (i.e. the second line)
D; # Stop cycle here, and process the next line in the input file
};
{ # else block for above if-condition
x; # Swap pattern with hold space; i.e. Get current line back
s/(^.*$)/:::\1/; # Append ::: in front of line
p; # Print it (i.e. the first line)
x; # Swap again; to get back next line
p; # Print it (i.e. the second line)
D; # Stop cycle here, and process the next line in the input file
} # End processing next line
} # End if match
{ # Current line is longer than max 26 chars,
$p; # Print if it's already on last line (since -n is in effect)
h; # Remember it in hold space
p; # Print it (i.e. the current line)
}
With above explanation, I am able to achieve what I need.
But I still not confident if this could not be written or explained in a concise, or perhaps better way?

It's pretty simple in awk if you get tired of trying to use the hammer of sed on this particular screw :-)
awk '{x[NR]=$0} END{for(i=1;i<=NR;i++){if(length(x[i])<26 && length(x[i+1])>25)printf ":::";print x[i]}}' file
Save all the lines in array x[]. At the end, go through the lines printing them but prefixing ones that meet your conditions with :::.

This might work for you (GNU sed):
sed -r '$!N;/^.{1,25}\n.{26,}$/s/^/:::/;P;D' file

Perl One-Liner from Command-Line
This perl one-liner will do it (tested just now):
perl -0777 -pe 's/^([^\n]{1,25}$)(?=\n[^\n]{25,}$)/:::$1/smg' yourfile

Related

Sed - replace with variable first occurrence only [duplicate]

I would like to update a large number of C++ source files with an extra include directive before any existing #includes. For this sort of task, I normally use a small bash script with sed to re-write the file.
How do I get sed to replace just the first occurrence of a string in a file rather than replacing every occurrence?
If I use
sed s/#include/#include "newfile.h"\n#include/
it replaces all #includes.
Alternative suggestions to achieve the same thing are also welcome.
A sed script that will only replace the first occurrence of "Apple" by "Banana"
Example
Input: Output:
Apple Banana
Apple Apple
Orange Orange
Apple Apple
This is the simple script: Editor's note: works with GNU sed only.
sed '0,/Apple/{s/Apple/Banana/}' input_filename
The first two parameters 0 and /Apple/ are the range specifier. The s/Apple/Banana/ is what is executed within that range. So in this case "within the range of the beginning (0) up to the first instance of Apple, replace Apple with Banana. Only the first Apple will be replaced.
Background: In traditional sed the range specifier is also "begin here" and "end here" (inclusive). However the lowest "begin" is the first line (line 1), and if the "end here" is a regex, then it is only attempted to match against on the next line after "begin", so the earliest possible end is line 2. So since range is inclusive, smallest possible range is "2 lines" and smallest starting range is both lines 1 and 2 (i.e. if there's an occurrence on line 1, occurrences on line 2 will also be changed, not desired in this case). GNU sed adds its own extension of allowing specifying start as the "pseudo" line 0 so that the end of the range can be line 1, allowing it a range of "only the first line" if the regex matches the first line.
Or a simplified version (an empty RE like // means to re-use the one specified before it, so this is equivalent):
sed '0,/Apple/{s//Banana/}' input_filename
And the curly braces are optional for the s command, so this is also equivalent:
sed '0,/Apple/s//Banana/' input_filename
All of these work on GNU sed only.
You can also install GNU sed on OS X using homebrew brew install gnu-sed.
# sed script to change "foo" to "bar" only on the first occurrence
1{x;s/^/first/;x;}
1,/foo/{x;/first/s///;x;s/foo/bar/;}
#---end of script---
or, if you prefer: Editor's note: works with GNU sed only.
sed '0,/foo/s//bar/' file
Source
An overview of the many helpful existing answers, complemented with explanations:
The examples here use a simplified use case: replace the word 'foo' with 'bar' in the first matching line only.
Due to use of ANSI C-quoted strings ($'...') to provide the sample input lines, bash, ksh, or zsh is assumed as the shell.
GNU sed only:
Ben Hoffstein's anwswer shows us that GNU provides an extension to the POSIX specification for sed that allows the following 2-address form: 0,/re/ (re represents an arbitrary regular expression here).
0,/re/ allows the regex to match on the very first line also. In other words: such an address will create a range from the 1st line up to and including the line that matches re - whether re occurs on the 1st line or on any subsequent line.
Contrast this with the POSIX-compliant form 1,/re/, which creates a range that matches from the 1st line up to and including the line that matches re on subsequent lines; in other words: this will not detect the first occurrence of an re match if it happens to occur on the 1st line and also prevents the use of shorthand // for reuse of the most recently used regex (see next point).1
If you combine a 0,/re/ address with an s/.../.../ (substitution) call that uses the same regular expression, your command will effectively only perform the substitution on the first line that matches re.
sed provides a convenient shortcut for reusing the most recently applied regular expression: an empty delimiter pair, //.
$ sed '0,/foo/ s//bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar # only 1st match of 'foo' replaced
Unrelated
2nd foo
3rd foo
A POSIX-features-only sed such as BSD (macOS) sed (will also work with GNU sed):
Since 0,/re/ cannot be used and the form 1,/re/ will not detect re if it happens to occur on the very first line (see above), special handling for the 1st line is required.
MikhailVS's answer mentions the technique, put into a concrete example here:
$ sed -e '1 s/foo/bar/; t' -e '1,// s//bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar # only 1st match of 'foo' replaced
Unrelated
2nd foo
3rd foo
Note:
The empty regex // shortcut is employed twice here: once for the endpoint of the range, and once in the s call; in both cases, regex foo is implicitly reused, allowing us not to have to duplicate it, which makes both for shorter and more maintainable code.
POSIX sed needs actual newlines after certain functions, such as after the name of a label or even its omission, as is the case with t here; strategically splitting the script into multiple -e options is an alternative to using an actual newlines: end each -e script chunk where a newline would normally need to go.
1 s/foo/bar/ replaces foo on the 1st line only, if found there.
If so, t branches to the end of the script (skips remaining commands on the line). (The t function branches to a label only if the most recent s call performed an actual substitution; in the absence of a label, as is the case here, the end of the script is branched to).
When that happens, range address 1,//, which normally finds the first occurrence starting from line 2, will not match, and the range will not be processed, because the address is evaluated when the current line is already 2.
Conversely, if there's no match on the 1st line, 1,// will be entered, and will find the true first match.
The net effect is the same as with GNU sed's 0,/re/: only the first occurrence is replaced, whether it occurs on the 1st line or any other.
NON-range approaches
potong's answer demonstrates loop techniques that bypass the need for a range; since he uses GNU sed syntax, here are the POSIX-compliant equivalents:
Loop technique 1: On first match, perform the substitution, then enter a loop that simply prints the remaining lines as-is:
$ sed -e '/foo/ {s//bar/; ' -e ':a' -e '$!{n;ba' -e '};}' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar
Unrelated
2nd foo
3rd foo
Loop technique 2, for smallish files only: read the entire input into memory, then perform a single substitution on it.
$ sed -e ':a' -e '$!{N;ba' -e '}; s/foo/bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar
Unrelated
2nd foo
3rd foo
1 1.61803 provides examples of what happens with 1,/re/, with and without a subsequent s//:
sed '1,/foo/ s/foo/bar/' <<<$'1foo\n2foo' yields $'1bar\n2bar'; i.e., both lines were updated, because line number 1 matches the 1st line, and regex /foo/ - the end of the range - is then only looked for starting on the next line. Therefore, both lines are selected in this case, and the s/foo/bar/ substitution is performed on both of them.
sed '1,/foo/ s//bar/' <<<$'1foo\n2foo\n3foo' fails: with sed: first RE may not be empty (BSD/macOS) and sed: -e expression #1, char 0: no previous regular expression (GNU), because, at the time the 1st line is being processed (due to line number 1 starting the range), no regex has been applied yet, so // doesn't refer to anything.
With the exception of GNU sed's special 0,/re/ syntax, any range that starts with a line number effectively precludes use of //.
sed '0,/pattern/s/pattern/replacement/' filename
this worked for me.
example
sed '0,/<Menu>/s/<Menu>/<Menu><Menu>Sub menu<\/Menu>/' try.txt > abc.txt
Editor's note: both work with GNU sed only.
You could use awk to do something similar..
awk '/#include/ && !done { print "#include \"newfile.h\""; done=1;}; 1;' file.c
Explanation:
/#include/ && !done
Runs the action statement between {} when the line matches "#include" and we haven't already processed it.
{print "#include \"newfile.h\""; done=1;}
This prints #include "newfile.h", we need to escape the quotes. Then we set the done variable to 1, so we don't add more includes.
1;
This means "print out the line" - an empty action defaults to print $0, which prints out the whole line. A one liner and easier to understand than sed IMO :-)
Quite a comprehensive collection of answers on linuxtopia sed FAQ. It also highlights that some answers people provided won't work with non-GNU version of sed, eg
sed '0,/RE/s//to_that/' file
in non-GNU version will have to be
sed -e '1s/RE/to_that/;t' -e '1,/RE/s//to_that/'
However, this version won't work with gnu sed.
Here's a version that works with both:
-e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}'
ex:
sed -e '/Apple/{s//Banana/;:a' -e '$!N;$!ba' -e '}' filename
With GNU sed's -z option you could process the whole file as if it was only one line. That way a s/…/…/ would only replace the first match in the whole file. Remember: s/…/…/ only replaces the first match in each line, but with the -z option sed treats the whole file as a single line.
sed -z 's/#include/#include "newfile.h"\n#include'
In the general case you have to rewrite your sed expression since the pattern space now holds the whole file instead of just one line. Some examples:
s/text.*// can be rewritten as s/text[^\n]*//. [^\n] matches everything except the newline character. [^\n]* will match all symbols after text until a newline is reached.
s/^text// can be rewritten as s/(^|\n)text//.
s/text$// can be rewritten as s/text(\n|$)//.
#!/bin/sed -f
1,/^#include/ {
/^#include/i\
#include "newfile.h"
}
How this script works: For lines between 1 and the first #include (after line 1), if the line starts with #include, then prepend the specified line.
However, if the first #include is in line 1, then both line 1 and the next subsequent #include will have the line prepended. If you are using GNU sed, it has an extension where 0,/^#include/ (instead of 1,) will do the right thing.
Just add the number of occurrence at the end:
sed s/#include/#include "newfile.h"\n#include/1
A possible solution:
/#include/!{p;d;}
i\
#include "newfile.h"
:a
n
ba
Explanation:
read lines until we find the #include, print these lines then start new cycle
insert the new include line
enter a loop that just reads lines (by default sed will also print these lines), we won't get back to the first part of the script from here
I know this is an old post but I had a solution that I used to use:
grep -E -m 1 -n 'old' file | sed 's/:.*$//' - | sed 's/$/s\/old\/new\//' - | sed -f - file
Basically use grep to print the first occurrence and stop there. Additionally print line number ie 5:line. Pipe that into sed and remove the : and anything after so you are just left with a line number. Pipe that into sed which adds s/.*/replace to the end number, which results in a 1 line script which is piped into the last sed to run as a script on the file.
so if regex = #include and replace = blah and the first occurrence grep finds is on line 5 then the data piped to the last sed would be 5s/.*/blah/.
Works even if first occurrence is on the first line.
i would do this with an awk script:
BEGIN {i=0}
(i==0) && /#include/ {print "#include \"newfile.h\""; i=1}
{print $0}
END {}
then run it with awk:
awk -f awkscript headerfile.h > headerfilenew.h
might be sloppy, I'm new to this.
As an alternative suggestion you may want to look at the ed command.
man 1 ed
teststr='
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
'
# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s <(echo "$teststr")
H
/# *include/i
#include "newfile.h"
.
,p
q
EOF
I finally got this to work in a Bash script used to insert a unique timestamp in each item in an RSS feed:
sed "1,/====RSSpermalink====/s/====RSSpermalink====/${nowms}/" \
production-feed2.xml.tmp2 > production-feed2.xml.tmp.$counter
It changes the first occurrence only.
${nowms} is the time in milliseconds set by a Perl script, $counter is a counter used for loop control within the script, \ allows the command to be continued on the next line.
The file is read in and stdout is redirected to a work file.
The way I understand it, 1,/====RSSpermalink====/ tells sed when to stop by setting a range limitation, and then s/====RSSpermalink====/${nowms}/ is the familiar sed command to replace the first string with the second.
In my case I put the command in double quotation marks becauase I am using it in a Bash script with variables.
Using FreeBSD ed and avoid ed's "no match" error in case there is no include statement in a file to be processed:
teststr='
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
'
# using FreeBSD ed
# to avoid ed's "no match" error, see
# *emphasized text*http://codesnippets.joyent.com/posts/show/11917
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s <(echo "$teststr")
H
,g/# *include/u\
u\
i\
#include "newfile.h"\
.
,p
q
EOF
This might work for you (GNU sed):
sed -si '/#include/{s//& "newfile.h\n&/;:a;$!{n;ba}}' file1 file2 file....
or if memory is not a problem:
sed -si ':a;$!{N;ba};s/#include/& "newfile.h\n&/' file1 file2 file...
If anyone came here to replace a character for the first occurrence in all lines (like myself), use this:
sed '/old/s/old/new/1' file
-bash-4.2$ cat file
123a456a789a
12a34a56
a12
-bash-4.2$ sed '/a/s/a/b/1' file
123b456a789a
12b34a56
b12
By changing 1 to 2 for example, you can replace all the second a's only instead.
The use case can perhaps be that your occurences are spread throughout your file, but you know your only concern is in the first 10, 20 or 100 lines.
Then simply adressing those lines fixes the issue - even if the wording of the OP regards first only.
sed '1,10s/#include/#include "newfile.h"\n#include/'
The following command removes the first occurrence of a string, within a file. It removes the empty line too. It is presented on an xml file, but it would work with any file.
Useful if you work with xml files and you want to remove a tag. In this example it removes the first occurrence of the "isTag" tag.
Command:
sed -e 0,/'<isTag>false<\/isTag>'/{s/'<isTag>false<\/isTag>'//} -e 's/ *$//' -e '/^$/d' source.txt > output.txt
Source file (source.txt)
<xml>
<testdata>
<canUseUpdate>true</canUseUpdate>
<isTag>false</isTag>
<moduleLocations>
<module>esa_jee6</module>
<isTag>false</isTag>
</moduleLocations>
<node>
<isTag>false</isTag>
</node>
</testdata>
</xml>
Result file (output.txt)
<xml>
<testdata>
<canUseUpdate>true</canUseUpdate>
<moduleLocations>
<module>esa_jee6</module>
<isTag>false</isTag>
</moduleLocations>
<node>
<isTag>false</isTag>
</node>
</testdata>
</xml>
ps: it didn't work for me on Solaris SunOS 5.10 (quite old), but it works on Linux 2.6, sed version 4.1.5
Nothing new but perhaps a little more concrete answer: sed -rn '0,/foo(bar).*/ s%%\1%p'
Example: xwininfo -name unity-launcher produces output like:
xwininfo: Window id: 0x2200003 "unity-launcher"
Absolute upper-left X: -2980
Absolute upper-left Y: -198
Relative upper-left X: 0
Relative upper-left Y: 0
Width: 2880
Height: 98
Depth: 24
Visual: 0x21
Visual Class: TrueColor
Border width: 0
Class: InputOutput
Colormap: 0x20 (installed)
Bit Gravity State: ForgetGravity
Window Gravity State: NorthWestGravity
Backing Store State: NotUseful
Save Under State: no
Map State: IsViewable
Override Redirect State: no
Corners: +-2980+-198 -2980+-198 -2980-1900 +-2980-1900
-geometry 2880x98+-2980+-198
Extracting window ID with xwininfo -name unity-launcher|sed -rn '0,/^xwininfo: Window id: (0x[0-9a-fA-F]+).*/ s%%\1%p' produces:
0x2200003
POSIXly (also valid in sed), Only one regex used, need memory only for one line (as usual):
sed '/\(#include\).*/!b;//{h;s//\1 "newfile.h"/;G};:1;n;b1'
Explained:
sed '
/\(#include\).*/!b # Only one regex used. On lines not matching
# the text `#include` **yet**,
# branch to end, cause the default print. Re-start.
//{ # On first line matching previous regex.
h # hold the line.
s//\1 "newfile.h"/ # append ` "newfile.h"` to the `#include` matched.
G # append a newline.
} # end of replacement.
:1 # Once **one** replacement got done (the first match)
n # Loop continually reading a line each time
b1 # and printing it by default.
' # end of sed script.
A possible solution here might be to tell the compiler to include the header without it being mentioned in the source files. IN GCC there are these options:
-include file
Process file as if "#include "file"" appeared as the first line of
the primary source file. However, the first directory searched for
file is the preprocessor's working directory instead of the
directory containing the main source file. If not found there, it
is searched for in the remainder of the "#include "..."" search
chain as normal.
If multiple -include options are given, the files are included in
the order they appear on the command line.
-imacros file
Exactly like -include, except that any output produced by scanning
file is thrown away. Macros it defines remain defined. This
allows you to acquire all the macros from a header without also
processing its declarations.
All files specified by -imacros are processed before all files
specified by -include.
Microsoft's compiler has the /FI (forced include) option.
This feature can be handy for some common header, like platform configuration. The Linux kernel's Makefile uses -include for this.
I needed a solution that would work both on GNU and BSD, and I also knew that the first line would never be the one I'd need to update:
sed -e "1,/pattern/s/pattern/replacement/"
Trying the // feature to not repeat the pattern did not work for me, hence needing to repeat it.
I will make a suggestion that is not exactly what the original question asks for, but for those who also want to specifically replace perhaps the second occurrence of a match, or any other specifically enumerated regular expression match. Use a python script, and a for loop, call it from a bash script if needed. Here's what it looked like for me, where I was replacing specific lines containing the string --project:
def replace_models(file_path, pixel_model, obj_model):
# find your file --project matches
pattern = re.compile(r'--project.*')
new_file = ""
with open(file_path, 'r') as f:
match = 1
for line in f:
# Remove line ending before we do replacement
line = line.strip()
# replace first --project line match with pixel
if match == 1:
result = re.sub(pattern, "--project='" + pixel_model + "'", line)
# replace second --project line match with object
elif match == 2:
result = re.sub(pattern, "--project='" + obj_model + "'", line)
else:
result = line
# Check that a substitution was actually made
if result is not line:
# Add a backslash to the replaced line
result += " \\"
print("\nReplaced ", line, " with ", result)
# Increment number of matches found
match += 1
# Add the potentially modified line to our new file
new_file = new_file + result + "\n"
# close file / save output
f.close()
fout = open(file_path, "w")
fout.write(new_file)
fout.close()
sed -e 's/pattern/REPLACEMENT/1' <INPUTFILE

How to avoid the last newline in sed?

I want to remove the last part of a file, starting at a line following a certain pattern and including the preceding newline.
So, stopping at "STOP", the following file:
keep\n
STOP\n
whatever
Should output:
keep
With no trailing newline.
I tried this, and the logic seems to work, but it seems that sed adds a newline every time it prints its buffer. How can I avoid that? When sed doesn't manipulate the buffer, I don't have that problem (IE If I remove the STOP, sed outputs 'whatever' at the end of the file without a newline).
printf 'keep
STOP
Whatever' | sed 'N
/\nSTOP/ {
s/\n.*$//
P
Q
}
P
D'
I'm trying to write a git cleaning filter, and I cannot have a new newline appended every time I commit.
$ awk '/^STOP/{exit} {printf "%s%s", ors, $0; ors=RS}' file
keep$
The above prints every line without a trailing newline but preceded by a newline (\n or \r\n - whichever your environment dictates so it'll behave correctly on UNIX or Windows or whatever) for every 2nd and subsequent line. When it finds a STOP line it just exits before printing anything.
Note that the above doesn't keep anything in memory except the current line so it'll work no matter how large your input file is and no matter where the STOP appears in it - it'll even work if STOP is the first line of the file unlike the other answers you have so far.
It will also work using any awk in any shell on every UNIX box.
This might work for you (GNU sed):
sed -z 's/\nSTOP.*//' file
The -z option slurps the whole file into memory and the substitute command, removes the remainder of the file from the first newline followed by STOP.
Using awk you could:
$ awk '$0=="STOP"{exit} {b=b (b==""?"":ORS) $0} END{printf "%s",b}' file
Output:
keep$
Explained:
$ awk '
$0=="STOP" { exit } # exit at STOP, ie. go to END
{ b=b (b==""?"":ORS) $0 } # gather an output buffer, control \n
END { printf "%s",b } # in the END output output buffer
' file
... more (focusing a bit on the conditional operator):
b=b # appending to b, so b is b and ...
(b==""?"":ORS) # if b was empty, add nothing to it, if not add ORS ie. \n ...
$0 # and the current record

Sed insert back reference into command

Can i use matched group from sed command, for another command, which generates replacement. Something like that:
sed -e 's/\(<regex>\)/$(<command using \1 reference and generating replacement>)/g'
I need it for replacement in first file, according to another file contents (replacement not constant and based on concrete replaced line).
As #EtanReisner mentions, this is possible only with GNU sed -- and still somewhat tricky. Also, it is potentially dangerous, and you should only use it if the input comes from a trustworthy source.
Anyway, the e modifier to the s/// command treats the contents of the pattern space after the substitution was made as a shell command, runs it, and replaces the pattern space with the output of that command, which means that the output will have to be shunted into place manually. A general pattern for this is
sed '/regex/ { h; s//\n/; x; s//\n&\n/; s/.*\n\(.*\)\n.*/command \1/e; x; G; s/\([^\n]*\)\n\([^\n]*\)\n\(.*\)/\1\3\2/ }' filename
Let's go through this from the top:
/regex/ { # When we find what we seek:
h # Make a copy of the current line in
# the hold buffer.
s//\n/ # Put a newline where the match occurs
# (// reattempts the last attempted
# regex, which is the one from the
# start). This serves as a marker
# where the output of the command will
# be inserted.
x # Swap the copy back in; the marked
# line moves to the hold buffer
s//\n&\n/ # put markers around the match this
# time,
s/.*\n\(.*\)\n.*/command \1/e # then use those markers to construct
# the command and run it. The pattern
# space contains the output of the
# command now.
x # swap the marked line back in
G # append the output to it
s/\([^\n]*\)\n\([^\n]*\)\n\(.*\)/\1\3\2/ # split, reassemble all that in
# the right order, using the
# newline marker we put there in
# the beginning as a splitting
# point.
}
regex and command have to be replaced with your regex and command, obviously. You can try this out with
echo 'foo /tmp/ bar' | sed '/\/\S*/ { h; s//\n/; x; s//\n&\n/; s/.*\n\(.*\)\n.*/ls \1/e; x; G; s/\([^\n]*\)\n\([^\n]*\)\n\(.*\)/\1\3\2/ }'
This will run ls /tmp/ and put the listing between foo and bar.
You might find it simpler and clearer to use awk. e.g. to multiply some number in the middle of the input by 3:
$ echo 'abc 12 def' |
awk 'match($0,/[0-9]+/) {print substr($0,1,RSTART-1) substr($0,RSTART,RLENGTH)*3 substr($0,RSTART+RLENGTH)}'
abc 36 def
With GNU awk you can use the 3rd arg to match() to save the regexp matching segments:
$ echo 'abc 12 def' |
awk 'match($0,/(.* )([0-9]+)( .*)/,a){print a[1] a[2]*3 a[3]}'
abc 36 def
or to pass it to a shell command (probably not a good idea, but can be done):
$ echo 'abc 12 def' |
awk 'match($0,/(.* )([0-9]+)( .*)/,a){system("echo \"" a[2] "\"")}'
12

sed recipe: how to do stuff between two patterns that can be either on one line or on two lines?

Let's say we want to do some substitutions only between some patterns, let them be <a> and </a> for clarity... (all right, all right, they're start and end!.. Jeez!)
So I know what to do if start and end always occur on the same line: just design a proper regex.
I also know what to do if they're guaranteed to be on different lines and I don't care about anything in the line containing end and I'm also OK with applying all the commands in the line containing start before start: just specify the address range as /start/,/end/.
This, however, doesn't sound very useful. What if I need to do a smarter job, for instance, introduce changes inside a {...} block?
One thing I can think of is breaking the input on { and } before processing and putting it back together afterwards:
sed 's/{\|}/\n/g' input | sed 'main stuff' | sed ':a $!{N;ba}; s/\n\(}\|{\)\n/\1/g'
Another option is the opposite:
cat input | tr '\n' '#' | sed 'whatever; s/#/\n/g'
Both of these are ugly, mainly because the operations are not confined within a single command. The second one is even worse because one has to use some character or substring as a "newline holder" assuming it isn't present in the original text.
So the question is: are there better ways or can the above-mentioned ones be optimized? This is quite a regular task from what I read in recent SO questions, so I'd like to choose the best practice once and for all.
P.S. I'm mostly interested in pure sed solutions: can the job be do with one invocation of sed and nothing else? Please no awk, Perl, etc.: this is more of a theoretical question, not a "need the job done asap" one.
This might work for you:
# create multiline test data
cat <<\! >/tmp/a
> this
> this { this needs
> changing to
> that } that
> that
> !
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/a
this
this { THIS needs
changing to
THAT } that
that
# convert multiline test data to a single line
tr '\n' ' ' </tmp/a >/tmp/b
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/b
this this { THIS needs changing to THAT } that that
Explanation:
Read the data into the pattern space (PS). /{/!b;:a;/}/!{$q;N;ba}
Copy the data into the hold space (HS). h
Strip non-data from front and back of string. s/[^{]*{//;s/}.*//
Convert data e.g. s/this\|that/\U&/g
Swap to HS and append converted data. x;G
Replace old data with converted data.s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/
EDIT:
A more complicated answer which I think caters for more than one block per line.
# slurp file into pattern space (PS)
:a
$! {
N
ba
}
# check for presence of \v if so quit with exit value 1
/\v/q1
# replace original newlines with \v's
y/\n/\v/
# append a newline to PS as a delimiter
G
# copy PS to hold space (HS)
h
# starting from right to left delete everything but blocks
:b
s/\(.*\)\({.*}\).*\n/\1\n\2/
tb
# delete any non-block details form the start of the file
s/.*\n//
# PS contains only block details
# do any block processing here e.g. uppercase this and that
s/th\(is\|at\)/\U&/g
# append ps to hs
H
# swap to HS
x
# replace each original block with its processed one from right to left
:c
s/\(.*\){.*}\(.*\)\n\n\(.*\)\({.*}\)/\1\n\n\4\2\3/
tc
# delete newlines
s/\n//g
# restore original newlines
y/\v/\n/
# done!
N.B. This uses GNU specific options but could be tweaked to work with generic sed's.

How to restrict a find and replace to only one column within a CSV?

I have a 4-column CSV file, e.g.:
0001 # fish # animal # eats worms
I use sed to do a find and replace on the file, but I need to limit this find and replace to only the text found inside column 3.
How can I have a find and replace only occur on this one column?
Are you sure you want to be using sed? What about csvfix? Is your CSV nice and simple with no quotes or embedded commas or other nasties that make regexes...a less than satisfactory way of dealing with a general CSV file? I'm assuming that the # is the 'comma' in your format.
Consider using awk instead of sed:
awk -F# '$3 ~ /pattern/ { OFS= "#"; $3 = "replace"; }'
Arguably, you should have a BEGIN block that sets OFS once. For one line of input, it didn't make any odds (and you'd probably be hard-pressed to measure a difference on a million lines of input, too):
$ echo "pattern # pattern # pattern # pattern" |
> awk -F# '$3 ~ /pattern/ { OFS= "#"; $3 = "replace"; }'
pattern # pattern #replace# pattern
$
If sed still seems appealing, then:
sed '/^\([^#]*#[^#]*\)#pattern#\(.*\)/ s//\1#replace#\2/'
For example (and note the slightly different input and output – you can fix it to handle the same as the awk quite easily if need be):
$ echo "pattern#pattern#pattern#pattern" |
> sed '/^\([^#]*#[^#]*\)#pattern#\(.*\)/ s//\1#replace#\2/'
pattern#pattern#replace#pattern
$
The first regex looks for the start of a line, a field of non-at-signs, an at-sign, another field of non-at-signs and remembers the lot; it looks for an at-sign, the pattern (which must be in the third field since the first two fields have been matched already), another at-sign, and then the residue of the line. When the line matches, then it replaces the line with the first two fields (unchanged, as required), then adds the replacement third field, and the residue of the line (unchanged, as required).
If you need to edit rather than simply replace the third field, then you think about using awk or Perl or Python. If you are still constrained to sed, then you explore using the hold space to hold part of the line while you manipulate the other part in the pattern space, and end up re-integrating your desired output line from the hold space and pattern space before printing the line. That's nearly as messy as it sounds; actually, possibly even messier than it sounds. I'd go with Perl (because I learned it long ago and it does this sort of thing quite easily), but you can use whichever non-sed tool you like.
Perl editing the third field. Note that the default output is $_ which had to be reassembled from the auto-split fields in the array #F.
$ echo "pattern#pattern#pattern#pattern" | sh -x xxx.pl
> perl -pa -F# -e '$F[2] =~ s/\s*pat(\w\w)rn\s*/ prefix-$1-suffix /; $_ = join "#", #F; ' "$#"
pattern#pattern# prefix-te-suffix #pattern
$
An explanation. The -p means 'loop, reading lines into $_ and printing $_ at the end of each iteration'. The -a means 'auto-split $_ into the array #F'. The -F# means the field separator is #. The -e is followed by the Perl program. Arrays are indexed from 0 in Perl, so the third field is split into $F[2] (the sigil — the # or $ — changes depending on whether you're working with a value from the array or the array as a whole. The =~ is a match operator; it applies the regex on the RHS to the value on the LHS. The substitute pattern recognizes zero or more spaces \s* followed by pat then two 'word' characters which are remembered into $1, then rn and zero or more spaces again; maybe there should be a ^ and $ in there to bind to the start and end of the field. The replacement is a space, 'prefix-', the remembered pair of letters, and '-suffix' and a space. The $_ = join "#", #F; reassembles the input line $_ from the possibly modified separate fields, and then the -p prints that out. Not quite as tidy as I'd like (so there's probably a better way to do it), but it works. And you can do arbitrary transforms on arbitrary fields in Perl without much difficulty. Perl also has a module Text::CSV (and a high-speed C version, Text::CSV_XS) which can handle really complex CSV files.
Essentially break the line into three pieces, with the pattern you're looking for in the middle. Then keep the outer pieces and replace the middle.
/\([^#]*#[^#]*#\[^#]*\)pattern\([^#]*#.*\)/s//\1replacement\2/
\([^#]*#[^#]*#\[^#]*\) - gather everything before the pattern, including the 3rd # and any text before the math - this becomes \1
pattern - the thing you're looking for
\([^#]*#.*\) - gather everything after the pattern - this becomes \2
Then change that line into \1 then the replacement, then everything after pattern, which is \2
This might work for you:
echo 0001 # fish # animal # eats worms|
sed 's/#/&\n/2;s/#/\n&/3;h;s/\n#.*//;s/.*\n//;y/a/b/;G;s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/'
0001 # fish # bnimbl # eats worms
Explanation:
Define the field to be worked on (in this case the 3rd) and insert a newline (\n) before it and directly after it. s/#/&\n/2;s/#/\n&/3
Save the line in the hold space. h
Delete the fields either side s/\n#.*//;s/.*\n//
Now process the field i.e. change all a's to b's. y/a/b/
Now append the original line. G
Substitute the new field for the old field (also removing any newlines). s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/
N.B. That in step 4 the pattern space only contains the defined field, so any number of commands may be carried out here and the result will not affect the rest of the line.