Using sed to eliminate a specific string - sed

I appreciate your help with this problem. I like to eliminate everything that is not a specific pattern from a string.
For example, below I like to eliminate everything that is not "5TTGTC".
But as seen here ^5TTGTC is not right. I used different combinations of ^(), ^{}, ^[], but none gave me what I am looking for. Appreciate your feedback!
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed 's/^5TTGTC//g'
Thanks in advance

You may use the following command if you want case sensitivity:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|[,.A-Za-z+0-9]/\1/g'
The code above prints:
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
The regular expression used above uses alternation to capture what you are interested in.
We match and capture what we are interested in (5TTGCC) and we match everything that is not the substring, in this case characters ,.A-Za-z+0-9.
You can check the behaviour of the regex here.
As pointed out by #EdMorton, the command can be simplified to:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|./\1/g'
You can try this here.
For compatibility across sed versions the -r flag can be replaced by the -E flag.

You don't make it very clear what you are trying to achieve.
One way to get where you are trying to go could be the -o option in grep.
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | grep -o '5TTGTC'
Output:
5TTGTC
5TTGTC
5TTGTC
5TTGTC
5TTGTC
You can then change 5TTGTC into a pattern, e.g. grep -o '[0-9]TT[AG]GTC'

With any sed:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
sed 's/#//g; s/5TTGTC/#/g; s/[^#]//g; s/#/5TTGTC/g'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
With any awk:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
awk -v str='5TTGTC' '{gsub(str,"\n"); gsub(/[^\n]/,""); gsub(/\n/,str)}1'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC

Related

Parsing a string with sed

I have a string like prefix-2020.80-suffix-1
Here are all of possible combinations of input string
"2020.80-suffix-1"
"2020.80-suffix"
"prefix-2020.80"
"prefix-2020.80-1"
I need to cut out and assign 2020 to a variable but cannot get my desired output
Here what i got so far...
set var=`echo "prefix-2020.80-suffix-1" | sed "s/[[:alnum:]]*-*\([0-9]*\).*/\1/"`
My regexp does not work for other cases and i cannot figure out why! its more complicated that python's regexp syntax
This should work for all you inputs
sed 's/.*\(^\|-\)\([0-9]*\)\..*/\2/' test
Matches the start of the line or everything up to -[number]. and captures the number.
The problem with the original you were using was you didn't take into account when there wasn't a prefix.
You can use this grep -oP:
echo "prefix-2020.80-suffix-1" | grep -oP '^([[:alnum:]]+-)?\K[0-9]+'
2020
RegEx Demo
Using sed (with extended regex):
echo "prefix-2020.80-suffix-1" |sed -r 's/^([^-]*-|)([0-9]+).*/\2/'
Using grep:
echo "prefix-2020.80-suffix-1" |grep -oP "^([^-]*-|)\K\d+"
2020
-P is for Perl regex.

grep and/or sed to match a path from a string which has different patterns

I have a big file which is composed of alot of different lines which only have one commen keyword, storaged.
PROC:storage123:0702:2108:0,1,2,3,4,5:storage:vers:storaged:storage123:Storage
123:storage123:-R /etc/orc/storage123 -e emr123#localhost -p Xxx::
PROC:storageabc:0606:2108:0,1,2,3,4,5:storage:vers:storaged:storageabc:Storage
abc:storageabc: -e emabc#localhost -R /etc/orc/storageabc -p 654::
What i need to do is grep for the path that can be found on all storaged keywords that comes after -R. But I only want the path, nothing after that. -R can be found on different places so there is no pattern to it.
I created one espressionen which seemed to work, but I think I made it much for complex (and not 100% sure to match) than it should have to be.
[root:~/scripts/] <conf.txt grep -o 'R *[^ ]*' | grep -o '[^ ]*$' | sed 's/.*R\///'
/etc/orc/storage123
/etc/orc/storagerabc
The espression also is hard to implement in a bash script so something simpler would be great. I need these paths in the script later on.
Cheers
Your attempt is nice, but you can simplify it by using a look-behind:
$ grep -Po '(?<=-R )[^ ]*' file
/etc/orc/storage123
/etc/orc/storageabc
Basically it looks for the string -R (note the space) and from that, it prints everything up to a space.
$ sed 's/.*-R \([^ ]*\).*/\1/' file
/etc/orc/storage123
/etc/orc/storageabc

Replace string with substring in lowercase using sed / awk / tr / perl?

I have a plaintext file containing multiple instances of the pattern $$DATABASE_*$$ and the asterisk could be any string of characters. I'd like to replace the entire instance with whatever is in the asterisk portion, but lowercase.
Here is a test file:
$$DATABASE_GIBSON$$
test me $$DATABASE_GIBSON$$ test me
$$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ test
$$DATABASE_GIBSON$$ $$DATABASE_GIBSON$$$$DATABASE_GIBSON$$
Here is the desired output:
gibson
test me gibson test me
gibson test gibson test
gibson gibsongibson
How do I do this with sed/awk/tr/perl?
Here's the perl version I ended up using.
perl -p -i.bak -e 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' inputFile
Unfortunately there's no easy, foolproof way with awk, but here's one approach:
$ cat tst.awk
{
gsub(/[$][$]/,"\n")
head = ""
tail = $0
while ( match(tail, "\nDATABASE_[^\n]+\n") ) {
head = head substr(tail,1,RSTART-1)
trgt = substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH)
gsub(/\n(DATABASE_)?/,"",trgt)
head = head tolower(trgt)
}
$0 = head tail
gsub("\n","$$")
print
}
$ cat file
The quick brown $$DATABASE_FOX$$ jumped over the lazy $$DATABASE_DOG$$s back.
The grey $$DATABASE_SQUIRREL$$ ate $$DATABASE_NUT$$s under a $$DATABASE_TREE$$.
Put a dollar $$DATABASE_DOL$LAR$$ in the $$ string.
$ awk -f tst.awk file
The quick brown fox jumped over the lazy dogs back.
The grey squirrel ate nuts under a tree.
Put a dollar dol$lar in the $$ string.
Note the trick of converting $$ to a newline char so we can negate that char in the match(RE), without that (i.e. if we used ".+" instead of "[^\n]+") then due to greedy RE matching if the same pattern appeared twice on one input line the matching string would extend from the start of the first pattern to the end of the second pattern.
This one works with complicated examples.
perl -ple 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' filename.txt
And for simpler examples :
echo '$$DATABASE_GIBSON$$' | sed 's#$$DATABASE_\(.*\)\$\$#\L\1#'
in sed, \L means lower case (\E to stop if needed)
Using awk alone:
> echo '$$DATABASE_AWESOME$$' | awk '{sub(/.*_/,"");sub(/\$\$$/,"");print tolower($0);}'
awesome
Note that I'm in FreeBSD, so this is not GNU awk.
But this can be done using bash alone:
[ghoti#pc ~]$ foo='$$DATABASE_AWESOME$$'
[ghoti#pc ~]$ foo=${foo##*_}
[ghoti#pc ~]$ foo=${foo%\$\$}
[ghoti#pc ~]$ foo=${foo,,}
[ghoti#pc ~]$ echo $foo
awesome
Of the above substitutions, all except the last one (${foo,,}) will work in standard Bourne shell. If you don't have bash, you can instead do use tr for this step:
$ echo $foo
AWESOME
$ foo=$(echo "$foo" | tr '[:upper:]' '[:lower:]')
$ echo $foo
awesome
$
UPDATE:
Per comments, it seems that what the OP really wants is to strip the substring out of any text in which it is included -- that is, our solutions need to account for the possibility of leading or trailing spaces, before or after the string he provided in his question.
> echo 'foo $$DATABASE_KITTENS$$ bar' | sed -nE '/\$\$[^$]+\$\$/{;s/.*\$\$DATABASE_//;s/\$\$.*//;p;}' | tr '[:upper:]' '[:lower:]'
kittens
And if you happen to have pcregrep on your path (from the devel/pcre FreeBSD port), you can use that instead, with lookaheads:
> echo 'foo $$DATABASE_KITTENS$$ bar' | pcregrep -o '(?!\$\$DATABASE_)[A-Z]+(?=\$\$)' | tr '[:upper:]' '[:lower:]'
kittens
(For Linux users reading this: this is equivalent to using grep -P.)
And in pure bash:
$ shopt -s extglob
$ foo='foo $$DATABASE_KITTENS$$ bar'
$ foo=${foo##*(?)\$\$DATABASE_}
$ foo=${foo%%\$\$*(?)}
$ foo=${foo,,}
$ echo $foo
kittens
Note that NONE of these three updated solutions will handle situations where multiple tagged database names exist in the same line of input. That's not stated as a requirement in the question either, but I'm just sayin'....
You can do this in a pretty foolproof way with the supercool command cut :)
echo '$$DATABASE_AWESOME$$' | cut -d'$' -f3 | cut -d_ -f2 | tr 'A-Z' 'a-z'
This might work for you (GNU sed):
sed 's/$\$/\n/g;s/\nDATABASE_\([^\n]*\)\n/\L\1/g;s/\n/$$/g' file
Here is the shortest (GNU) awk solution I could come up with that does everything requested by the OP:
awk -vRS='[$][$]DATABASE_([^$]+[$])+[$]' '{ORS=tolower(substr(RT,12,length(RT)-13))}1'
Even if the string indicated with the asterix (*) contained one or more single Dollar signs ($) and/or linebreaks this soultion should still work.
awk '{gsub(/\$\$DATABASE_GIBSON\$\$/,"gibson")}1' file
gibson
test me gibson test me
gibson test gibson test
gibson gibsongibson
echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'
awk will take what ever input, in this case the first agurment, and use the tolower function and return the results.
For your bash script you can do something like this and use the variable DBLOWER
DBLOWER=$(echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}');

Filter text based in a multiline match criteria

I have the following sed command. I need to execute the below command in single line
cat File | sed -n '
/NetworkName/ {
N
/\n.*ims3/ p
}' | sed -n 1p | awk -F"=" '{print $2}'
I need to execute the above command in single line. can anyone please help.
Assume that the contents of the File is
System.DomainName=shayam
System.Addresses=Fr6
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=AS
System.DomainName=ims5.com
System.DomainName=Ram
System.Addresses=Fr9
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims7.com
System.DomainName=mani
System.Addresses=Hello
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims3.com
And after executing the command you will get only peer as the output. Can anyone please help me out?
You can use a single nawk command. And you can lost the useless cat
nawk -F"=" '/NetworkName/{n=$2;getline;if($2~/ims3/){print n} }' file
You can use sed as well as proposed by others, but i prefer less regex and less clutter.
The above save the value of the network name to "n". Then, get the next line and check the 2nd field against "ims3". If matched, then print the value of "n".
Put that code in a separate .sh file, and run it as your single-line command.
cat File | sed -n '/NetworkName/ { N; /\n.*ims3/ p }' | sed -n 1p | awk -F"=" '{print $2}'
Assuming that you want the network name for the domain ims3, this command line works without sed:
grep -B 1 ims3 File | head -n 1 | awk -F"=" '{print $2}'
So, you want the network name where the domain name on the following line includes 'ims3', and not the one where the following line includes 'ims7' (even though the network names in the example are the same).
sed -n '/NetworkName/{N;/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;};}' File
This avoids abuse of felines, too (not to mention reducing the number of commands executed).
Tested on MacOS X 10.6.4, but there's no reason to think it won't work elsewhere too.
However, empirical evidence shows that Solaris sed is different from MacOS sed. It can all be done in one sed command, but it needs three lines:
sed -n '/NetworkName/{N
/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;}
}' File
Tested on Solaris 10.
You just need to put -e pretty much everywhere you'd break the command at a newline or have a semicolon. You don't need the extra call to sed or awk or cat.
sed -n -e '/NetworkName/ {' -e 'N' -e '/\n.*ims3/ s/[^\n]*=\(.*\).*/\1/P' -e '}' File

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?
The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.
So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.
Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo