Merging 2 sed commands into 1 - sed

Is there a way to do the following in a single sed command to improve performance?
cat some_file | sed -n '/^MODIFIED/p' | sed 's/^MODIFIED\s*//g'

You could try the below sed command. -n and p helps to print those lines where the replacement takesplace. [[:space:]]* POSIX notation which matches zero or more spaces.
sed -n 's/^MODIFIED[[:space:]]*//p' some_file
OR
sed -n 's/^MODIFIED\s*//p' some_file
Example:
$ cat ri
MODIFIED foo bar
apple
mango
$ cat ri | sed -n '/^MODIFIED/p' | sed 's/^MODIFIED\s*//g'
foo bar
$ sed -n 's/^MODIFIED[[:space:]]*//p' ri
foo bar

Here is an awk version:
awk 'gsub(/^MODIFIED\s*/,"")' file
Example:
cat file
test
MODIFIED data
more MODIFIED home
awk 'gsub(/^MODIFIED\s*/,"")' file
data

You can remove both cat and sed, just use last sed like:
sed -nr 's/^MODIFIED\s*//p' some_file

Related

Using a single sed call to split and grep

This is mostly by curiosity, I am trying to have the same behavior as:
echo -e "test1:test2:test3"| sed 's/:/\n/g' | grep 1
in a single sed command.
I already tried
echo -e "test1:test2:test3"| sed -e "s/:/\n/g" -n "/1/p"
But I get the following error:
sed: can't read /1/p: No such file or directory
Any idea on how to fix this and combine different types of commands into a single sed call?
Of course this is overly simplified compared to the real usecase, and I know I can get around by using multiple calls, again this is just out of curiosity.
EDIT: I am mostly interested in the sed tool, I already know how to do it using other tools, or even combinations of those.
EDIT2: Here is a more realistic script, closer to what I am trying to achieve:
arch=linux64
base=https://chromedriver.storage.googleapis.com
split="<Contents>"
curl $base \
| sed -e 's/<Contents>/<Contents>\n/g' \
| grep $arch \
| sed -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out
What I would like to simplify is the curl line, turning it into something like:
curl $base \
| sed 's/<Contents>/<Contents>\n/g' -n '/1/p' -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out
Here are some alternatives, awk and sed based:
sed -E "s/(.*:)?([^:]*1[^:]*).*/\2/" <<< "test1:test2:test3"
awk -v RS=":" '/1/' <<< "test1:test2:test3"
# or also
awk 'BEGIN{RS=":"} /1/' <<< "test1:test2:test3"
Or, using your logic, you would need to pipe a second sed command:
sed "s/:/\n/g" <<< "test1:test2:test3" | sed -n "/1/p"
See this online demo. The awk solution looks cleanest.
Details
In sed solution, (.*:)?([^:]*1[^:]*).* pattern matches an optional sequence of any 0+ chars and a :, then captures into Group 2 any 0 or more chars other than :, 1, again 0 or more chars other than :, and then just matches the rest of the line. The replacement just keeps Group 2 contents.
In awk solution, the record separator is set to : and then /1/ regex is used to only return the record having 1 in it.
This might work for you (GNU sed):
sed 's/:/\n/;/^[^\n]*1/P;D' file
Replace each : and if the first line in the pattern space contains 1 print it.
Repeat.
An alternative:
sed -Ez 's/:/\n/g;s/^[^1]*$//mg;s/\n+/\n/;s/^\n//' file
This slurps the whole file into memory and replaces all colons by newlines. All lines that do not contain 1 are removed and surplus newlines deleted.
An alternative to the really ugly sed is: grep -o '\w*2\w*'
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | grep -o '\w*2\w*'
test2
bob2
fred2
grep -o: only matching
Or: grep -o '[^:]*2[^:]*'
echo -e "test1:test2:test3" | sed -En 's/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;//!D'
sed -n doesn't print unless told to
sed -E allows using parens to match (\n|$) which is newline or the end of the pattern space
P prints the pattern buffer up to the first newline.
D trims the pattern buffer up to the first newline
[^\n] is a character class that matches anything except a newline
// is sed shorthand for repeating a match
//! is then matching everything that didn't match previously
So, after you split into newlines, you want to make sure the 2 character is between the start of the pattern buffer ^ and the first newline.
And, if there is not the character you are looking for, you want to D delete up to the first newline.
At that point, it works for one line of input, with one string containing the character you're looking for.
To expand to several matches within a line, you have to ta, conditionally branch back to label :a:
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | \
sed -En ':a s/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;D;ta'
test2
bob2
fred2
This is simply NOT a job for sed. With GNU awk for multi-char RS:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '/1/'
test1
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' 'NR%2'
test1
test3
test5
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '!(NR%2)'
test2
test4
test6
$ echo "foo1:bar1:foo2:bar2:foo3:bar3" | awk -v RS='[:\n]' '/foo/ || /2/'
foo1
foo2
bar2
foo3
With any awk you'd just have to strip the \n from the final record before operating on it:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS=':' '{sub(/\n$/,"")} /1/'
test1

Sed not matching one or more patterns

I have this list of files:
$ more files
one_this_2017_1_abc.txt
two_that_2018_1_abc.txt
three_another_2017_10.abc.txt
four_again_2018_10.abc.txt
five_back_2018_1a.abc.txt
I would like to get this output:
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_YY.abc.txt
four_again_XXXX_YY.abc.txt
five_back_XXXX_YY.abc.txt
I am trying to remove the year and the bit after the year and replace them with another string--this is to generate test cases.
I can get the year just fine, but it's that one or two character piece after it I can't seem to match.
This should work, right?
~/test_cases
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\{1,2\}_/_YY_/'
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Except it doesn't for the 2 character cases.
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\
{2\}_/_YY_/'
one_this_XXXX_1_abc.txt
two_that_XXXX_1_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Doesn't work for the two character cases either, and this works not at all (but according to the docs it should):
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\+_/_YY_/'
one_YY_XXXX_1_abc.txt
two_YY_XXXX_1_abc.txt
three_YY_XXXX_10.abc.txt
four_YY_XXXX_10.abc.txt
five_YY_XXXX_1a.abc.txt
Other random experiments that don't work:
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\+_/_YY_/'
one_YY_XXXX_1_abc.txt
two_YY_XXXX_1_abc.txt
three_YY_XXXX_10.abc.txt
four_YY_XXXX_10.abc.txt
five_YY_XXXX_1a.abc.txt
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\{1\}_/_YY_/'
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\{2\}_/_YY_/'
one_this_XXXX_1_abc.txt
two_that_XXXX_1_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Tried with both GNU sed version 4.2.1 under Linux and sed (GNU sed) 4.4 under Cygwin.
And yes, I realize I can pipe this through multiple sed calls to get it to work, but that regex SHOULD work, right?
if your Input_file is same as shown sample then following may help you in same.
sed 's/\([^_]*\)_\([^_]*\)_\(.*_\)\(.*\)/\1_\2_XXXX_YY_\4/g' Input_file
Output will be as follows.
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_YY_10.abc.txt
four_again_XXXX_YY_10.abc.txt
five_back_XXXX_YY_1a.abc.txt

How to replace only last match in a line with sed?

With sed, I can replace the first match in a line using
sed 's/pattern/replacement/'
And all matches using
sed 's/pattern/replacement/g'
How do I replace only the last match, regardless of how many matches there are before it?
Copy pasting from something I've posted elsewhere:
$ # replacing last occurrence
$ # can also use sed -E 's/:([^:]*)$/-\1/'
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):/\1-/'
foo:123:bar-baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):/\1-/'
456:foo:123:bar:789-baz
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)and/\1XYZ/'
foo and bar and baz lXYZ good
$ # use word boundaries as necessary - GNU sed
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)\band\b/\1XYZ/'
foo and bar XYZ baz land good
$ # replacing last but one
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
foo:123-bar:baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
456:foo:123:bar-789:baz
$ # replacing last but two
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){2})/\1-\2/'
456:foo:123-bar:789:baz
$ # replacing last but three
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){3})/\1-\2/'
456:foo-123:bar:789:baz
Further Reading:
Buggy behavior if word boundaries is used inside a group with quanitifiers - for example: echo 'it line with it here sit too' | sed -E 's/with(.*\bit\b){2}/XYZ/' fails
Greedy vs. Reluctant vs. Possessive Quantifiers
Reference - What does this regex mean?
sed manual: Back-references and Subexpressions
This might work for you (GNU sed):
sed 's/\(.*\)pattern/\1replacement/' file
Use greed to swallow up the pattern space and then regexp engine will step back through the line and find the first match i.e. the last match.
A fun way to do this, is to use rev to reverse the characters of each line and write your sed replacement backwards.
rev input_file | sed 's/nrettap/tnemecalper/' | rev

Sed Editor single check

I have a sed command which will append a string on the end of a line. When I re-run the same command again the same content is getting append at the end of the line again and again.
I am looking for a command which will check if the content is already there or not then proceed.
Here is my sed command:
shell: sed -i '/only_from/s/$/ xx.xx.xx.xx\/24/' file.txt
this line works for your needs:
sed -i '/only_from/{/ xx\.xx\.xx\.xx\/24$/!s#$# xx.xx.xx.xx/24#}' file
E.g:
kent$ cat f
only_from foo bar
kent$ sed -i '/only_from/{/xx\.xx\.xx\.xx\/24$/!s#$# xx.xx.xx.xx/24#}' f
kent$ cat f
only_from foo bar xx.xx.xx.xx/24
kent$ sed -i '/only_from/{/xx\.xx\.xx\.xx\/24$/!s#$# xx.xx.xx.xx/24#}' f
kent$ cat f
only_from foo bar xx.xx.xx.xx/24
You can try this sed:
sed '/only_from/{ / xx\.xx\.xx\.xx\/24/ !s/$/ xx\.xx\.xx\.xx\/24/}' file
This might be a bit naive, but why don't you write something as simple as
sed -i '/only_from$/s/$/ xx.xx.xx.xx\/24/' file.txt

How to find and replace all percent, plus, and pipe signs?

I have a document containing many percent, plus, and pipe signs. I want to replace them with a code, for use in TeX.
% becomes \textpercent.
+ becomes \textplus.
| becomes \textbar.
This is the code I am using, but it does not work:
sed -i "s/\%/\\\textpercent /g" ./file.txt
sed -i "s/|/\\\textbar /g" ./file.txt
sed -i "s/\+/\\\textplus /g" ./file.txt
How can I replace these symbols with this code?
Test script:
#!/bin/bash
cat << 'EOF' > testfile.txt
1+2+3=6
12 is 50% of 24
The pipe character '|' looks like a vertical line.
EOF
sed -i -r 's/%/\\textpercent /g;s/[+]/\\textplus /g;s/[|]/\\textbar /g' testfile.txt
cat testfile.txt
Output:
1\textplus 2\textplus 3=6
12 is 50\textpercent of 24
The pipe character '\textbar ' looks like a vertical line.
This was already suggested in a similar way by #tripleee, and I see no reason why it should not work. As you can see, my platform uses the very same version of GNU sed as yours. The only difference to #tripleee's version is that I use the extended regex mode, so I have to either escape the pipe and the plus or put it into a character class with [].
nawk '{sub(/%/,"\\textpercent");sub(/\+/,"\\textplus");sub(/\|/,"\\textpipe"); print}' file
Tested below:
> echo "% + |" | nawk '{sub(/%/,"\\textpercent");sub(/\+/,"\\textplus");sub(/\|/,"\\textpipe"); print}'
\textpercent \textplus \textpipe
Use single quotes:
$ cat in.txt
foo % bar
foo + bar
foo | bar
$ sed -e 's/%/\\textpercent /g' -e 's/\+/\\textplus /g' -e 's/|/\\textbar /g' < in.txt
foo \textpercent bar
foo \textplus bar
foo \textbar bar