sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon - sed

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.

This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'

requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters

One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt

Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp

You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

Related

How to replace consecutive symbols using only one sed command?

I have a simple .csv file with lines that holds 't' values. Here is the example:
2ABC;t;t;t;tortuga;fault;t;t;bored
I want to replace them to '1' using sed.
If I make sed "s/;t;/;1;/g" I get the next result:
2ABC;1;t;1;tortuga;fault;1;t;bored
As you can see, consecutive ';t;' have been replaced through one. Yes, I can replace all ';t;' by sed -e "s/;t;/;1;/g" -e "s/;t;/;1;/g" but this is boring.
How can I make the replacement by one sed command?

If there is something to replace, branch to replace again.
sed ': again; /;t;/{ s//;1;/; b again }'
Overall, parsing cvs with sed is crude. Consider awk.
awk -F';' -v OFS=';' '{ for(i=1;i<=NF;++i) if ($i=="t") $i=1 } 1'

Lookarounds is helpful in such cases:
$ s='t;2ABC;t;t;t;tortuga;fault;t;t;bored;t'
$ echo "$s" | perl -lpe 's/(?<![^;])t(?![^;])/1/g'
1;2ABC;1;1;1;tortuga;fault;1;1;bored;1

echo '2ABC;t;t;t;tortuga;fault;t;t;bored' |
— gawk-specific solution
gawk -be '(ORS = RT)^!(NF = NF)' FS='^t$' OFS=1 RS=';'
— cross-awk-solution
{m,g,n}awk 'gsub(FS, OFS, $!(NF = NF))^_' FS=';t;' OFS=';1;' RS=
2ABC;1;1;1;tortuga;fault;1;1;bored

Matching pattern on multiple lines

I have a file as below
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.70.xxx.xx)
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.71.xxx.xx)
I am trying to extract the values NAME and IP using sed:
cat file1 |
sed ':a
N
$!ba
s/\n/ /g' | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
However, I'm only getting the output:
NAME(BOLIVIA) IP(192.71.xxx.xx)
What I would like is:
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
Would appreciate it if someone could give me a pointer on what I'm missing.
TIA

Your first sed commands reformats the file into one long line. You could have used tr -d "\n" for this, but that is not the problem.
The problem is in the second part, where the .* greedy eats as much as possible until finding the last match.
Your solution could be "fixed" with the ugly
# Do not use this:
sed -zn 's/[^\n]*\(NAME(BOLI...)\)[^\n]*\n[^\n]*\(IP([^)]*)\)[^\n]*/\1 \2/gp' file1
Possible solutions:
cat file1 | paste -d " " - - | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
# or
grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1 | paste -d " " - -
# or
printf "%s %s\n" $(grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1)

In case you are ok with awk could you please try following. Written and tested in link
https://ideone.com/bJDzgf with shown samples only.
awk '
match($0,/^NAME\([^)]*/){
name=substr($0,RSTART+5,RLENGTH-5)
next
}
match($0,/IP\([^)]*/){
print name,substr($0,RSTART+3,RLENGTH-3)
name=""
}
' Input_file

This might work for you (GNU sed):
sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file
If a line contains NAME and the following line contains IP remove everything between and print the result.

An alternative shorter awk:
awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)

The issue in your script is the use .* which matches in a greedy way
so that you have only the first NAME(BOLI...) and last IP(.*)
If you can use python :
#!/bin/bash
python -c '
import re, sys
for ar in re.findall(r"(NAME\(BOLI.*?\)).*?(IP\(.*?\))", sys.stdin.read(), re.DOTALL):
print(*ar)
' < input-file

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.

You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt

perl -lne 'print $1 if(/(.*)\//)' your_file

Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

extract a substring of 11 characters from a line using sed,awk or perl

I have a file with many lines, in each line
there is either substring
whatever_blablablalsfjlsdjf;asdfjlds;f/watch?v=yPrg-JN50sw&amp,whatever_blabla
or
whatever_blablabla"/watch?v=yPrg-JN50sw&amp" class=whatever_blablablavwhate
I want to extract a substring, like the "yPrg-JN50s" above
the matching pattern is
the 11 characters after the string "/watch?="
how to extract the substring
I hope it is sed, awk in one line
if not, a pn line perl script is also ok

You can do
grep -oP '(?<=/watch\?v=).{11}'
if your grep knows Perl regex, or
sed 's/.*\/watch?v=\(.\{11\}\).*/\1/g'

$ cat file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
$
$ awk 'match($0,/\/watch\?v=/) { print substr($0,RSTART+RLENGTH,11) }' file
yPrg-JN50sw
yPrg-JN50sw

Just with the shell's parameter expansion, extract the 11 chars after "watch?v=":
while IFS= read -r line; do
tmp=${line##*watch?v=}
echo ${tmp:0:11}
done < filename

You could use sed to remove the extraneous information:
sed 's/[^=]\+=//; s/&.*$//' file
Or with awk and sensible field separators:
awk -F '[=&]' '{print $2}' file
Contents of file:
cat <<EOF > file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
EOF
Output:
yPrg-JN50sw
yPrg-JN50sw
Edit accommodating new requirements mentioned in the comments
cat <<EOF > file
<div id="" yt-grid-box "><div class="yt-lockup-thumbnail"><a href="/watch?v=0_NfNAL3Ffc" class="ux-thumb-wrap yt-uix-sessionlink yt-uix-contextlink contains-addto result-item-thumb" data-sessionlink="ved=CAMQwBs%3D&ei=CPTsy8bhqLMCFRR0fAodowXbww%3D%3D"><span class="video-thumb ux-thumb yt-thumb-default-185 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="//i1.ytimg.com/vi/0_NfNAL3Ffc/mqdefault.jpg" alt="Miniature" width="185" ><span class="vertical-align"></span></span></span></span><span class="video-time">5:15</span>
EOF
Use awk with sensible record separator:
awk -v RS='[=&"]' '/watch/ { getline; print }' file
Note, you should use a proper XML parser for this sort of task.

grep --perl-regexp --only-matching --regexp="(?<=/watch\\?=)([^&]{0,11})"

Assuming your lines have exactly the format you quoted, this should work.
awk '{print substr($0,10,11)}'
Edit: From the comment in another answer, I guess your lines are much longer and complicated than this, in which case something more comprehensive is needed:
gawk '{if(match($0, "/watch\\?v=(\\w+)",a)) print a[1]}'

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred

I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file

As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile

This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed

cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon - sed

This seems to work: sed 's/.TYPE=\(.\);./\1/' EDIT: Ah, so there can be semicolons in the random junk. Try this: sed 's/.TYPE=\([^;]\);./\1/'

requires GNU grep: grep -Po '(?<=TYPE=)[^;]+' meaning: preceded by "TYPE=", find some non-semicolon characters

One way using GNU sed: sed -r 's/.TYPE=([^;]+)./\1/' file.txt

You can also do this in pure bash, if you want: $ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0" $ t=${t#TYPE=} $ t=${t%%;} $ echo $t snp

Related

How to replace consecutive symbols using only one sed command?

Matching pattern on multiple lines

Remove all the characters from string after last '/'

extract a substring of 11 characters from a line using sed,awk or perl

AWK/SED. How to remove parentheses in simple text file

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon - sed

This seems to work: sed 's/.*TYPE=\(.*\);.*/\1/' EDIT: Ah, so there can be semicolons in the random junk. Try this: sed 's/.*TYPE=\([^;]*\);.*/\1/'

requires GNU grep: grep -Po '(?<=TYPE=)[^;]+' meaning: preceded by "TYPE=", find some non-semicolon characters

One way using GNU sed: sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt

You can also do this in pure bash, if you want: $ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0" $ t=${t#*TYPE=} $ t=${t%%;*} $ echo $t snp

Related

How to replace consecutive symbols using only one sed command?

Matching pattern on multiple lines

Remove all the characters from string after last '/'

extract a substring of 11 characters from a line using sed,awk or perl

AWK/SED. How to remove parentheses in simple text file

Categories

Resources

This seems to work: sed 's/.TYPE=\(.\);./\1/' EDIT: Ah, so there can be semicolons in the random junk. Try this: sed 's/.TYPE=\([^;]\);./\1/'

One way using GNU sed: sed -r 's/.TYPE=([^;]+)./\1/' file.txt

You can also do this in pure bash, if you want: $ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0" $ t=${t#TYPE=} $ t=${t%%;} $ echo $t snp