Parsing a string with sed - sed

I have a string like prefix-2020.80-suffix-1
Here are all of possible combinations of input string
"2020.80-suffix-1"
"2020.80-suffix"
"prefix-2020.80"
"prefix-2020.80-1"
I need to cut out and assign 2020 to a variable but cannot get my desired output
Here what i got so far...
set var=`echo "prefix-2020.80-suffix-1" | sed "s/[[:alnum:]]*-*\([0-9]*\).*/\1/"`
My regexp does not work for other cases and i cannot figure out why! its more complicated that python's regexp syntax

This should work for all you inputs
sed 's/.*\(^\|-\)\([0-9]*\)\..*/\2/' test
Matches the start of the line or everything up to -[number]. and captures the number.
The problem with the original you were using was you didn't take into account when there wasn't a prefix.

You can use this grep -oP:
echo "prefix-2020.80-suffix-1" | grep -oP '^([[:alnum:]]+-)?\K[0-9]+'
2020
RegEx Demo

Using sed (with extended regex):
echo "prefix-2020.80-suffix-1" |sed -r 's/^([^-]*-|)([0-9]+).*/\2/'
Using grep:
echo "prefix-2020.80-suffix-1" |grep -oP "^([^-]*-|)\K\d+"
2020
-P is for Perl regex.

Related

Finding it difficult to extract digits from string using sed

I am trying to extract the version information a string using sed as follows
echo "A10.1.1-Vers8" | sed -n "s/^A\([0-9]+\)\.\([0-9]\)\.[0-9]+-.*/\1/p"
I want to extract '10' after 'A'. But the above expression doesn't give the expected information. Could some one please give some explanation on why this statement doesn't work ?
I tried the above command and changed options os sed but nothing works. I think this is some syntax error
echo "A10.1.1-Vers10" | sed -n "s/^X\([0-9]+\)\.\([0-9]\)\.[0-9]+-.*/\1/p"
Expected result is '10'
Actually result is None
$ echo "A10.1.1-Vers8" | sed -r 's/^A([[:digit:]]+)\.(.*)$/\1/g'
10
Search for string starting with A (^A), followed by multiple digits (I am using POSIX character class [[:digit:]]+) which is captured in a group (), followed by a literal dot \., followed by everything else (.*)$.
Finally, replace the whole thing with the Captured Group content \1.
In GNU sed, -r adds some syntactic sugar, in the man page, it is called as --regexp-extended
GNU grep is an alternative to sed:
$ echo "A10.1.1-Vers10" | grep -oP '(?<=^A)[0-9]+'
10
The -o option tells grep to print only the matched characters.
The -P option tells grep to match Perl regular expressions, which enables the (?<= lookbehind zero-length assertion.
The lookbehind assertion (?<=^A) ensures there is an A at the beginning of the line, but doesn't include it as part of the match for output.
If you need to match more of the version string, you can use a lookforward assertion:
$ echo "A10.1.1-Vers10" | grep -oP '(?<=^A)[0-9]+(?=\.[0-9]+\.[0-9]+-.*)'
10

Using sed to eliminate a specific string

I appreciate your help with this problem. I like to eliminate everything that is not a specific pattern from a string.
For example, below I like to eliminate everything that is not "5TTGTC".
But as seen here ^5TTGTC is not right. I used different combinations of ^(), ^{}, ^[], but none gave me what I am looking for. Appreciate your feedback!
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed 's/^5TTGTC//g'
Thanks in advance
You may use the following command if you want case sensitivity:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|[,.A-Za-z+0-9]/\1/g'
The code above prints:
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
The regular expression used above uses alternation to capture what you are interested in.
We match and capture what we are interested in (5TTGCC) and we match everything that is not the substring, in this case characters ,.A-Za-z+0-9.
You can check the behaviour of the regex here.
As pointed out by #EdMorton, the command can be simplified to:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|./\1/g'
You can try this here.
For compatibility across sed versions the -r flag can be replaced by the -E flag.
You don't make it very clear what you are trying to achieve.
One way to get where you are trying to go could be the -o option in grep.
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | grep -o '5TTGTC'
Output:
5TTGTC
5TTGTC
5TTGTC
5TTGTC
5TTGTC
You can then change 5TTGTC into a pattern, e.g. grep -o '[0-9]TT[AG]GTC'
With any sed:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
sed 's/#//g; s/5TTGTC/#/g; s/[^#]//g; s/#/5TTGTC/g'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
With any awk:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
awk -v str='5TTGTC' '{gsub(str,"\n"); gsub(/[^\n]/,""); gsub(/\n/,str)}1'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC

Sed Pattern filtering long html doc

I am trying to filter a long html page, for leaving only fingerprints which have a consistent structure. for example:
DCD0 5B71 EAB9 4199 527F 44AC DB6B 8C1F 96D8 BF60
i know how to do it by using standrd command line commands as grep, cut and head/tail, but is there more elegant way to do it with sed? the shell comman i use is long and not looking so nice.
thank you
grep is the right tool for extracting strings from a file based on regular expression matching:
grep -Eo '([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}' file.html
Here is a sed command tested with GNU sed 4.2.2:
sed -nr '/(([[:xdigit:]]){4} ?){10}/p' file
It matches and prints
10 groups that are made of
4 hexdigits
followed by an optional space
With GNU sed:
sed -E 's/.*(([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}).*/\1/' file

Replacing the last word of a path using sed

I have the following: param="/var/tmp/test"
I need to replace the word test with another word such as new_test
need a smart way to replace the last word after "/" with sed
echo 'param="/var/tmp/test"' | sed 's/\/[^\/]*"/\/REPLACEMENT"/'
param="/var/tmp/REPLACEMENT"
echo '/var/tmp/test' | sed 's/\/[^\/]*$/\/REPLACEMENT/'
/var/tmp/REPLACEMENT
Extracting bits and pieces with sed is a bit messy (as Jim Lewis says, use basename and dirname if you can) but at least you don't need a plethora of backslashes to do it if you are going the sed route since you can use the fact that the delimiter character is selectable (I like to use ! when / is too awkward, but it's arbitrary):
$ echo 'param="/var/tmp/test"' | sed ' s!/[^/"]*"!/new_test"! '
param="/var/tmp/new_test"
We can also extract just the part that was substituted, though this is easier with two substitutions in the sed control script:
$ echo 'param="/var/tmp/test"' | sed ' s!.*/!! ; s/"$// '
test
You don't need sed for this...basename and dirname are a better choice for assembling or disassembling pathnames. All those escape characters give me a headache....
param="/var/tmp/test"
param_repl=`dirname $param`/newtest
It's not clear whether param is part of the string that you need processed or it's the variable that holds the string. Assuming the latter, you can do this using only Bash (you don't say which shell you're using):
shopt -s extglob
param="/var/tmp/test"
param="${param/%\/*([^\/])//new_test}"
If param= is part of the string:
shopt -s extglob
string='param="/var/tmp/test"'
string="${string/%\/*([^\/])\"//new}"
This might work for you:
echo 'param="/var/tmp/test"' | sed -r 's#(/(([^/]*/)*))[^"]*#\1newtest#'
param="/var/tmp/newtest"

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?
The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.
So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.
Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo