In sed what does this mean?

In sed what does this mean? - sed

Can any one tell what does this following code signifies to in bash shell scripting?
sed -n 's/^.*\([Mm][Ss]_[^ ]*\).*/\1/p'

If you have this text:
cat file
her ms_45 are you
MS_boat by ms US
What Ms_time are you
You will get all word starting with ms_ case insensitive.
sed -n 's/^.*\([Mm][Ss]_[^ ]*\).*/\1/p' file
ms_45
MS_boat
Ms_time
The sed can even be shorten some from sed -n 's/^.*\([Mm][Ss]_[^ ]*\).*/\1/p' to:
sed -nr 's/.*([Mm][Ss]_[^ ]*).*/\1/p'
-r (regular expression), you do not need to escape the ()
First ^ (start of line) is not needed, since .* grabs it.
You can also use grep for this (simpler to understand):
grep -o '[Mm][Ss]_[^ ]*' file
ms_45
MS_boat
Ms_time
Get all words like this: (M or m) and (S or s) and _ and all characters until space is found [^ ]*
PS The sed will only get the first ms_.. of every line, but the grep get them all.

Related

Using a single sed call to split and grep

This is mostly by curiosity, I am trying to have the same behavior as:
echo -e "test1:test2:test3"| sed 's/:/\n/g' | grep 1
in a single sed command.
I already tried
echo -e "test1:test2:test3"| sed -e "s/:/\n/g" -n "/1/p"
But I get the following error:
sed: can't read /1/p: No such file or directory
Any idea on how to fix this and combine different types of commands into a single sed call?
Of course this is overly simplified compared to the real usecase, and I know I can get around by using multiple calls, again this is just out of curiosity.
EDIT: I am mostly interested in the sed tool, I already know how to do it using other tools, or even combinations of those.
EDIT2: Here is a more realistic script, closer to what I am trying to achieve:
arch=linux64
base=https://chromedriver.storage.googleapis.com
split="<Contents>"
curl $base \
| sed -e 's/<Contents>/<Contents>\n/g' \
| grep $arch \
| sed -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out
What I would like to simplify is the curl line, turning it into something like:
curl $base \
| sed 's/<Contents>/<Contents>\n/g' -n '/1/p' -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out

Here are some alternatives, awk and sed based:
sed -E "s/(.*:)?([^:]*1[^:]*).*/\2/" <<< "test1:test2:test3"
awk -v RS=":" '/1/' <<< "test1:test2:test3"
# or also
awk 'BEGIN{RS=":"} /1/' <<< "test1:test2:test3"
Or, using your logic, you would need to pipe a second sed command:
sed "s/:/\n/g" <<< "test1:test2:test3" | sed -n "/1/p"
See this online demo. The awk solution looks cleanest.
Details
In sed solution, (.*:)?([^:]*1[^:]*).* pattern matches an optional sequence of any 0+ chars and a :, then captures into Group 2 any 0 or more chars other than :, 1, again 0 or more chars other than :, and then just matches the rest of the line. The replacement just keeps Group 2 contents.
In awk solution, the record separator is set to : and then /1/ regex is used to only return the record having 1 in it.

This might work for you (GNU sed):
sed 's/:/\n/;/^[^\n]*1/P;D' file
Replace each : and if the first line in the pattern space contains 1 print it.
Repeat.
An alternative:
sed -Ez 's/:/\n/g;s/^[^1]*$//mg;s/\n+/\n/;s/^\n//' file
This slurps the whole file into memory and replaces all colons by newlines. All lines that do not contain 1 are removed and surplus newlines deleted.

An alternative to the really ugly sed is: grep -o '\w*2\w*'
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | grep -o '\w*2\w*'
test2
bob2
fred2
grep -o: only matching
Or: grep -o '[^:]*2[^:]*'

echo -e "test1:test2:test3" | sed -En 's/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;//!D'
sed -n doesn't print unless told to
sed -E allows using parens to match (\n|$) which is newline or the end of the pattern space
P prints the pattern buffer up to the first newline.
D trims the pattern buffer up to the first newline
[^\n] is a character class that matches anything except a newline
// is sed shorthand for repeating a match
//! is then matching everything that didn't match previously
So, after you split into newlines, you want to make sure the 2 character is between the start of the pattern buffer ^ and the first newline.
And, if there is not the character you are looking for, you want to D delete up to the first newline.
At that point, it works for one line of input, with one string containing the character you're looking for.
To expand to several matches within a line, you have to ta, conditionally branch back to label :a:
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | \
sed -En ':a s/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;D;ta'
test2
bob2
fred2

This is simply NOT a job for sed. With GNU awk for multi-char RS:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '/1/'
test1
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' 'NR%2'
test1
test3
test5
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '!(NR%2)'
test2
test4
test6
$ echo "foo1:bar1:foo2:bar2:foo3:bar3" | awk -v RS='[:\n]' '/foo/ || /2/'
foo1
foo2
bar2
foo3
With any awk you'd just have to strip the \n from the final record before operating on it:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS=':' '{sub(/\n$/,"")} /1/'
test1

Verbatim Match with sed

I have a list of pairs of URLs - I want to find all occurrences of the first element of the pair and replace them with the second. I'm trying to use sed for this but sed escapes characters in my URL. Is there a way to make sed find these URLs (without changing my pairs)?
Here's my code:
while read -r NAME
do
ARG1=`echo "$NAME" | awk '{print $1}'`
ARG2=`echo "$NAME" | awk '{print $2}'`
echo "$ARG1"
echo "$ARG2"
sed -i "s#$ARG1#$ARG2#g" file
done < pagetable
pagetable has the pairs of URLS, and I'm doing the find and replace in 'file'. Since my URLs have special characters, sed isn't interpreting them verbatim.

Replace the metacharacters in the search pattern (\ * ^ $ . /) and in the replacement string (& /) before invoking sed. This assumes that the script is run by Bash.
ARG1="${ARG1//\\/\\\\}"
ARG1="${ARG1//\*/\\\*}"
ARG1="${ARG1//\//\\/}"
for mc in \^ \$ \.; do ARG1="${ARG1//$mc/\\$mc}"; done
ARG2="${ARG2//\\/\\\\}"
ARG2="${ARG2//\//\\/}"
ARG2="${ARG2//&/\\&}"
sed -i "s/$ARG1/$ARG2/g" file

sed to copy part of line to end

I'm trying to copy part of a line to append to the end:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz
becomes:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz
I have tried:
sed 's/\(.*(GCA_\)\(.*\))/\1\2\2)'

$ f1=$'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz'
$ echo "$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\1\2\3\/\2\4/' <<<"$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz
sed -E (or -r in some systems) enables extended regex support in sed , so you don't need to escape the group parenthesis ( ).
The format (GCA_.[^.]*) equals to "get from GCA_ all chars up and excluding the first found dot" :
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\2/' <<<"$f1"
GCA_900169985
Similarly (.[^_]*) means get all chars up to first found _ (excluding _ char). This is the regex way to perform a non greedy/lazy capture (in perl regex this would have been written something like as .*_?)
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\3/' <<<"$f1"
.1

Short sed approach:
s="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz"
sed -E 's/(GCA_[^._]+)\.([^_]+)/\1.\2\/\1/' <<< "$s"
The output:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz

Parsing a line with sed using regular expression

Using sed I want to parse Heroku's log-runtime-metrics like this one:
2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages
the desired output is:
worker.2: 54.01MB (54.01MB is being memory_total)
I could not manage although I tried several alternatives including:
sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'
What is wrong with my command? How can it be corrected?

The .+ after source= and memory_total= are both greedy, so they accept as much of the line as possible. Use [^ ] to mean "anything except a space" so that it knows where to stop.
sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'
Putting your content into https://regex101.com/ makes it really obvious what's going on.

I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:
sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'
The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.
This produces your desired output from the given line of data:
worker.2: 54.01MB
Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:
heroku logs -t -s heroku | grep memory_total | sed.......
Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).
You should be able to use:
sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'
The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

sed expressions: line shows twice

I'm parsing a csv file with a sed command like this:
sed -n -e 's/abc/&/p' -e 's/xyz/&/p' <input >output
Now if there is both in one line (abc and xyz) I'll have the line twice in the output. I'd should have it just once.
Can I do that with sed?

If you only want to print a line with "abc" or "xyz":
sed -n '/abc\|xyz/p'
Other tools:
grep -F -e abc -e xyz
awk '/abc/ || /xyz/'

I believe you are mis-using the s///p just to print the lines. This is not necessary in sed - you can get them printed using //p.
Both expressions will evaluate, though, so you are still at risk of duplication. Your best bet (and fastest, for large datasets) will be to build the 'or' behavior into the matching regexp:
sed -Ene '/abc|xyz/p' input >output

sed -n -r -e 's/(abc|xyz)/&/p' <input >output
-r flag is for enabling extended regular expressions (is in GNU sed)

sed -n '/abc/{p;b
}
/xyz/p' Input > Output
for non GNU sed (where | is not allowed as OR)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

In sed what does this mean? - sed

Can any one tell what does this following code signifies to in bash shell scripting? sed -n 's/^.\([Mm][Ss]_[^ ]\).*/\1/p'

Related

Using a single sed call to split and grep

Verbatim Match with sed

sed to copy part of line to end

Parsing a line with sed using regular expression

sed expressions: line shows twice

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

In sed what does this mean? - sed

Can any one tell what does this following code signifies to in bash shell scripting? sed -n 's/^.*\([Mm][Ss]_[^ ]*\).*/\1/p'

Related

Using a single sed call to split and grep

Verbatim Match with sed

sed to copy part of line to end

Parsing a line with sed using regular expression

sed expressions: line shows twice

Categories

Resources

Can any one tell what does this following code signifies to in bash shell scripting? sed -n 's/^.\([Mm][Ss]_[^ ]\).*/\1/p'