Using invert range in sed - sed

I have this:
$ cat f2
123-foo-456
abc-xx
foo-yy
ddd-ao
abc
6778
123
This gives me: (#1)
$ sed -n -e '/456/,/ddd/{/ddd/{!s/a/A/g;!s/o/Q/g};p}' f2
123-foo-456
abc-xx
foo-yy
ddd-ao
And this gives me: (#2)
$ sed -n -e '/456/,/ddd/{/ddd/!{s/a/A/g;s/o/Q/g};p}' f2
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
I prefer #2 since it does what I wanted to get as output.
Can someone explain the difference between the two?
And a good source of documentation that explains the difference?

/ddd/{!s/a/A/g;!s/o/Q/g}
when ddd is on the line (working buffer)
execute sub code { ...}
never (!) address ( with empty adress it mean every line so on no lines) substitute (s/a/A/g) ...
So it do nothing
/ddd/!{s/a/A/g;s/o/Q/g}
when ddd is NOT on the line (working buffer) (! is for address/pattern /ddd/)
execute sub code { ...}
substitue (s/a/A/g), ...
It change a to A on line that does not contain ddd

There is no noteworthy difference between the 2. They are both unintelligible sequences of random characters that became obsolete in the mid-1970s when awk was invented and so should never be used. sed is for simple substitution on individual lines, that is all. If you're using more than s, g, and p (with -n) then you're using the wrong tool. Stop wasting your time on this and just use awk:
$ cat tst.awk
/456/ { f=1 }
f {
if (/ddd/) {
f=0
}
else {
gsub(/a/,"A")
gsub(/o/,"Q")
}
print
}
$ awk -f tst.awk file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
Clear, simple, concise, robust, efficient, portable and better in every other way than an equivalent sed solution.
Or if having everything squeezed onto one line is appealing to you:
$ awk '/456/{f=1}f{if(/ddd/)f=0;else{gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
You COULD write the awk script in the same style as the sed script:
$ awk '/456/,/ddd/{if(!/ddd/){gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
but then you get the duplicated conditions (/ddd/ twice) that come with using range expressions which is one reason why they should never be used. Fortunately, unlike sed, awk has variables and so you never need to write range expressions.

Related

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

Append text to a line on multiple conditions

I am very new to sed so please bear with me... I have a file with contents like
a=1
b=2,3,4
c=3
d=8
.
.
I want to append 'x' to a line which starts with 'c=' and does not contain an 'x'. What I am using right now is
sed -i '/^c=/ s/$/x/'
but this does not cover the second part of my explanation, the 'x' should only be appended if the line did not have it already and hence if I run the command twice it makes the line "c=3xx" which I do not want.
Any help here would be highly appreciated and I know there are a lot of sharp heads around here :) I understand that this can be handled pretty easily through bash but using sed here is a hard requirement.
You can do something like this:
sed -i '/^c=/ {/x/b; s/$/x/}'
Curly brackets are used for grouping. The b command branches to the end of the script (stops the processing of the current line).
b label
Branch to label; if label is omitted, branch to end of script.
Edit: as William Pursell suggests in the comment, a shorter version would be
sed -i '/^c=/ { /x/ !s/$/x/ }'
awk is probably a better choice here as you can easily combine regular expression matches with logical operators. Given the input:
$ cat file
a=1
b=2,3,4
c=3
c=x
c=3
d=8
The command would be:
$ awk '/^c=/ && !/x/ {$0=$0"x"; print $0}' file
a=1
b=2,3,4
c=3x
c=x
c=3x
d=8
Where $0 is the awk variable that contains the current line being read.
This might work for you (GNU sed):
sed -i '/^c=[^x]*$/s/$/x/' file
or:
sed -i 's/^c=[^x]*$/&x/' file

join 2 consecutive rows under condition

I have 5 lines like:
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
result output would be:
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
Is it possible to use sed or awk for this purpose?
This is easy with awk:
awk -F';' '$1 == prevType { printf("%s;%s;%s\n", $1, prevPoint, $0) } { prevType = $1; prevPoint = $2 }'
I've assumed that the blank lines between the records are not part of the input; if they are, just run the input through grep -v '^$' before awk.
paste could be useful in this case. it could save a lot of codes:
sed '1d' file|paste -d";" file -|awk -F';' '$1==$3'
see the test below
kent$ cat a
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
kent$ sed '1d' a|paste -d";" a -|awk -F';' '$1==$3'
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
This GNU sed solution might work for you:
sed -rn '1{h;b};H;x;/^([^;]*);.*\n\1/!{s/.*\n//;x;d};s/\n/;/p' source_file
Assumes no blank lines else pipe preformat the source file with sed '/^$/d' source_file
EDIT:
On reflection the above solution is far too elaborate and can be condensed to:
sed -ne '1{h;b};H;x;/^\([^;]*\);.*\1/s/\n/;/p' source_file
Explanation:
The -n prevents any lines being implicitly printed. The first line is copied to the hold space (HS an extra register) and then a break is made that ends the iteration. All subsequent lines are appended to the HS. The HS is then swapped with the pattern space (PS - a register holding the current line). The HS at this point contains the previous and current lines which are now checked to see if the first field in each line are identical. If so, the newline separating the two lines is replaced by a ; and providing the substitution occurred the PS is printed out. The next iteration now takes place, the current line refreshes the PS and HS now holds the previous line.

one line using sed and bc together?

I want to add one to the last value at the end of a string in sed.
I'm thinking along the lines of
cat 0809_data.csv |sed -e 's/\([0-9]\{6\}\).*\(,[^,]*$\)/\1\2/g'| export YEARS = $(echo `grep -o '[^,]*$' + 1`|bc)
e.g. 123456, kjhsflk, lksjgrlks, 2.8 -> 123456, 3.8
Would this be more reasonable/feasible in awk?
This should work:
years=$(awk -F, 'BEGIN{ OFS=", "} {print $1, $4+1}' 0809_data.csv)
It would be really awkward to try to use sed and do arithmetic with part of the result. You'd have to pull the string apart and do the math and put everything back together. AWK does that neatly without any fuss.
Notice that cat is not necessary (even using sed in a command similar to the one in your question) and it's probably not necessary to export the variable unless you're calling another script and need it to be able to access it as a "global" variable. Also, shells generally do integer math so you don't need to use bc unless you need floats.