Matching pattern on multiple lines

Matching pattern on multiple lines - sed

I have a file as below
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.70.xxx.xx)
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.71.xxx.xx)
I am trying to extract the values NAME and IP using sed:
cat file1 |
sed ':a
N
$!ba
s/\n/ /g' | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
However, I'm only getting the output:
NAME(BOLIVIA) IP(192.71.xxx.xx)
What I would like is:
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
Would appreciate it if someone could give me a pointer on what I'm missing.
TIA

Your first sed commands reformats the file into one long line. You could have used tr -d "\n" for this, but that is not the problem.
The problem is in the second part, where the .* greedy eats as much as possible until finding the last match.
Your solution could be "fixed" with the ugly
# Do not use this:
sed -zn 's/[^\n]*\(NAME(BOLI...)\)[^\n]*\n[^\n]*\(IP([^)]*)\)[^\n]*/\1 \2/gp' file1
Possible solutions:
cat file1 | paste -d " " - - | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
# or
grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1 | paste -d " " - -
# or
printf "%s %s\n" $(grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1)

In case you are ok with awk could you please try following. Written and tested in link
https://ideone.com/bJDzgf with shown samples only.
awk '
match($0,/^NAME\([^)]*/){
name=substr($0,RSTART+5,RLENGTH-5)
next
}
match($0,/IP\([^)]*/){
print name,substr($0,RSTART+3,RLENGTH-3)
name=""
}
' Input_file

This might work for you (GNU sed):
sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file
If a line contains NAME and the following line contains IP remove everything between and print the result.

An alternative shorter awk:
awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)

The issue in your script is the use .* which matches in a greedy way
so that you have only the first NAME(BOLI...) and last IP(.*)
If you can use python :
#!/bin/bash
python -c '
import re, sys
for ar in re.findall(r"(NAME\(BOLI.*?\)).*?(IP\(.*?\))", sys.stdin.read(), re.DOTALL):
print(*ar)
' < input-file

Related

How to replace consecutive symbols using only one sed command?

I have a simple .csv file with lines that holds 't' values. Here is the example:
2ABC;t;t;t;tortuga;fault;t;t;bored
I want to replace them to '1' using sed.
If I make sed "s/;t;/;1;/g" I get the next result:
2ABC;1;t;1;tortuga;fault;1;t;bored
As you can see, consecutive ';t;' have been replaced through one. Yes, I can replace all ';t;' by sed -e "s/;t;/;1;/g" -e "s/;t;/;1;/g" but this is boring.
How can I make the replacement by one sed command?

If there is something to replace, branch to replace again.
sed ': again; /;t;/{ s//;1;/; b again }'
Overall, parsing cvs with sed is crude. Consider awk.
awk -F';' -v OFS=';' '{ for(i=1;i<=NF;++i) if ($i=="t") $i=1 } 1'

Lookarounds is helpful in such cases:
$ s='t;2ABC;t;t;t;tortuga;fault;t;t;bored;t'
$ echo "$s" | perl -lpe 's/(?<![^;])t(?![^;])/1/g'
1;2ABC;1;1;1;tortuga;fault;1;1;bored;1

echo '2ABC;t;t;t;tortuga;fault;t;t;bored' |
— gawk-specific solution
gawk -be '(ORS = RT)^!(NF = NF)' FS='^t$' OFS=1 RS=';'
— cross-awk-solution
{m,g,n}awk 'gsub(FS, OFS, $!(NF = NF))^_' FS=';t;' OFS=';1;' RS=
2ABC;1;1;1;tortuga;fault;1;1;bored

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern?

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern to the end of the file ?
sed -n '/pattern/,$p' file
This sed command prints from the first occurrence onwards.

This might work for you (GNU sed):
sed 'H;/pattern/h;$!d;x;//!d' file
Stashes the last pattern and following lines in the hold space and at end-of-file prints them out.
Or using the same method in awk:
awk '{x=x ORS $0};/pattern/{x=$0};END{if(x ~ //)print x}' file
However on my machine jaypals way with sed seems to be the quickest:
tac file | sed '/pattern/q' | tac

Reverse the file, print until the first pattern, exit and reverse the file.
tac file | awk '/pattern/{print;exit}1' | tac

Here's a Perlish way to do it:
perl -ne '$seen = 1, #a = () if /pattern/; push #a, $_; END { print #a if $seen }' file

Simplest solution is just to use a regex matching on the entire file:
perl -0777 -ne 'print $1 if /pattern(.*?)$/' file

A standalone awk:
awk '/pattern/{delete a;c=0}{a[c++]=$0}END{for (i=0;i<c;i++){print a[i]}}' file

Here is an pure awk
awk 'FNR==NR {if ($0~/pattern/) f=FNR;next} FNR==f {a=1}a' file{,}
It reads the file twice, and first time set a flag for last found of pattern, then print form pattern and out.
Or you can store data in an array like this:
awk '/pattern/ {f=NR} {a[NR]=$0} END {for (i=f;i<=NR;i++) print a[i]}' file

Using GNU awk for multi-char RS and gensub():
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(pattern)/,"\\1","")}' file
e.g.:
$ cat file
a
b
c
b
d
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(b)/,"\\1","")}' file
b
d
The above simply deletes from the start of the file up to just before the last occurrence of "b".

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.

This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'

requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters

One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt

Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp

You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred

I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file

As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile

This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed

cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

using set to extract a matched pattern using ' as pattern separator

I'm just not getting my head around the pattern matching in sed, what is worse, there are quotes as separators.
I do:
cat file | grep \'*.s\'
and get:
'PhaseRayA: ' 'sca/sca_out/sc_ray_a.s'
'PhaseRayO: ' 'sca/sca_out/sc_ray_o.s'
as output. An now I want to extract the:
sca/sca_out/sc_ray_a
sca/sca_out/sc_ray_o.s.s
So my pattern would be '*.s', with the quotes being part of the pattern but not part of the wanted result.
Any ideas on that? I guess sed will du the job but have no clue how...
Thanks for any help...
All the best, André

Your question is a little ambiguous, but this should do what I think you mean:
sed -e "s/'[^']*' *'//" -e "s/'//" file

You might want to consider awk:
$ cat test.txt
'PhaseRayA: ' 'sca/sca_out/sc_ray_a.s'
'PhaseRayO: ' 'sca/sca_out/sc_ray_o.s'
$ awk -F "'" '{print $4}' test.txt
sca/sca_out/sc_ray_a.s
sca/sca_out/sc_ray_o.s
I tend to use sed to edit files and awk to process them. awk is built for breaking up records.

Give this a try:
sed "s/.*'\([^']*\)'/\1/" inputfile
Similarly:
sed 's/.*\o47\([^\o47]*\)\o47/\1/' inputfile # that's the letter "o" between the backslash and the 4
or
sed 's/.*\x27\([^\x27]*\)\x27/\1/' inputfile

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Matching pattern on multiple lines - sed

In case you are ok with awk could you please try following. Written and tested in link https://ideone.com/bJDzgf with shown samples only. awk ' match($0,/^NAME\([^)]/){ name=substr($0,RSTART+5,RLENGTH-5) next } match($0,/IP\([^)]/){ print name,substr($0,RSTART+3,RLENGTH-3) name="" } ' Input_file

This might work for you (GNU sed): sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file If a line contains NAME and the following line contains IP remove everything between and print the result.

An alternative shorter awk: awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file NAME(BOLIVIA) IP(192.70.xxx.xx) NAME(BOLIVIA) IP(192.71.xxx.xx)

Related

How to replace consecutive symbols using only one sed command?

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern?

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

AWK/SED. How to remove parentheses in simple text file

using set to extract a matched pattern using ' as pattern separator

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Matching pattern on multiple lines - sed

In case you are ok with awk could you please try following. Written and tested in link https://ideone.com/bJDzgf with shown samples only. awk ' match($0,/^NAME\([^)]*/){ name=substr($0,RSTART+5,RLENGTH-5) next } match($0,/IP\([^)]*/){ print name,substr($0,RSTART+3,RLENGTH-3) name="" } ' Input_file

This might work for you (GNU sed): sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file If a line contains NAME and the following line contains IP remove everything between and print the result.

An alternative shorter awk: awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file NAME(BOLIVIA) IP(192.70.xxx.xx) NAME(BOLIVIA) IP(192.71.xxx.xx)

Related

How to replace consecutive symbols using only one sed command?

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern?

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

AWK/SED. How to remove parentheses in simple text file

using set to extract a matched pattern using ' as pattern separator

Categories

Resources

In case you are ok with awk could you please try following. Written and tested in link https://ideone.com/bJDzgf with shown samples only. awk ' match($0,/^NAME\([^)]/){ name=substr($0,RSTART+5,RLENGTH-5) next } match($0,/IP\([^)]/){ print name,substr($0,RSTART+3,RLENGTH-3) name="" } ' Input_file