perl extract string and scientific number - perl

I have data in particular format.
capacitor #(.c(3.58782e-14)) c_1310 (vsub, vss_res);
I want to extract those highlighted in BOLD from the data set. I tried using regex
$cap = $line =~ /([0-9]*\.?[0-9]+([eE][-]?[0-9]+)?)/ ;
($net1, $net2) = $line =~ /\(([A-Za-z0-9_]*) \, ([A-Za-z0-9_]*)\)/ ;
$line contains each data line. Need help in getting the regex corrected.
I have a solution using split() function but regex would be better I think.

Assuming that the format of data is always the same, something like this should work
my $line = 'capacitor #(.c(3.58782e-14)) c_1310 (vsub, vss_res);';
my ($net1, $net2, $net3) = $line =~ /\(.+\((.+)\)\)\s+(.+)\s+\((.+)\)/;

The original post seemed to do some checking and validation (in contrast with matching '.' which matches anything) and I will suggest a more validating version here:
use Modern::Perl;
use Regexp::Common;
my $line = 'capacitor #(.c(3.58782e-14)) c_1310 (vsub, vss_res);';
my ($cap, $cap_no, $net1, $net2) = $line =~ /
\([^(]+\( ($RE{num}{real}) \)\)
\s+(\w+)\s+
\(
(\w*) ,\s*
(\w*)
\)
/x;
say "cap: $cap cap_no: $cap_no net1: $net1 net2: $net2";
OUTPUT:
cap: 3.58782e-14 cap_no: c_1310 net1: vsub net2: vss_res

Related

perl regex too greedy

I went through similar questions asked by other members and applied (or tried to apply) solutions from their inquiry but they did not work on my issue. My pattern match and grouping is too greedy and does not stop at first pipe(|). If I get more specific, I think it can but I'm trying to figure out how I can stop the pattern match at the first instance of the pipe?
Here are couple of lines
09:30:00.063|IN:|8=FIX.4.2|9=206|35=D|34=5159|49=CLIENT|52=20191024-13:30:00.050|56=SERV|57=DEST|1=05033|11=ABZ5702|15=USD|21=1|38=2000|40=2|44=92.48|47=A|54=5|55=RC|60=20191024-13:30:00.050|111=0|114=N|336=X|5700=AP|9281=SOV|10=202
09:37:21.208|IN:|8=FIX.4.2|9=170|35=D|34=5184|49=CLIENT|52=20191024-13:37:21.206|56=SERV|57=ATXB|1=J5129|11=136404|15=USD|21=1|38=100|40=2|44=1.39|47=A|54=2|55=DIW|59=2|60=20191024-13:30:00.206|10=029
I'm expecting my perl script to return the following output from the above data:
09:30:00.063|13:30:00.050|ABZ5702
09:37:21.208|13:37:21.206|136404
I tried all this and few other veriations but could not get it to produce the above output:
#$msg =~ s/([^|]*).*|52=([^|]*).*|11=([^|]*).*/$1|$2|$3/;
$msg =~ s/(.+)\|??.*|52=([^|]*).*|11=([^|]*).*/$1|$2|$3/;
#$msg =~ s/^([^|]*).??|52=([^|]*).??|11=([^|]*).*/$1|$2|$3/;
#$msg =~ s/^([^\|??]*).*|52=([^\|??]*).*|11=([^\|??]*).*/$1|$2|$3/;
#$msg =~ s/(.*\|??).*|52=(.+\|??).*|11=(.+\|??).*/one $1|two $2|three $3/;
#$msg =~ s/(.*?|).*|52=(.*?|).*|11=(.*|?).*/$1|$2|$3/;
#$msg =~ /(.*)|??.*|52=(.*)|??.*|11=(.*)|??.*/$1|$2|$3/;
#$msg =~ s/|.*-[0-3][0-9]:/|/;
print "$msg\n";```
I realize there are other more than one way to skin the cat but there are cases where I need to use the pattern match approach. How can I get it to produce the expected output using the pattern matching where it stops each group at first pipe(|)? Can someone tell me what am I doing wrong?
Try this:
s/(.*?)\|.*\|52=([^|]*).*\|11=([^|]*).*/$1 $2 $3/;
There were a couple of pipe delimiters that needed escaping.
You need to look at non-greedy matching https://docstore.mik.ua/orelly/perl/cookbook/ch06_16.htm
The first matching group is (.*?) instead of (.*). The ? means we match as little as possible.
In general, for parsing FIX in perl, as long as there are no repeating groups, I would recommend splitting on | first and then creating a hash of tag-value pairs.
I would do it a little bit different - split line into array and work on individual element of array.
The regex may be an acceptable solution for one particular case if format of line predetermined and will never change.
use strict;
use warnings;
use Data::Dumper;
my $debug = 0;
while( my $line = <DATA> ) {
my #array = split /\|/, $line;
print Dumper(\#array) if $debug;
$array[7] =~ s/.+?-//;
$array[11] =~ s/\d+=//;
printf "%s\n", join '|', #array[0,7,11];
}
__DATA__
09:30:00.063|IN:|8=FIX.4.2|9=206|35=D|34=5159|49=CLIENT|52=20191024-13:30:00.050|56=SERV|57=DEST|1=05033|11=ABZ5702|15=USD|21=1|38=2000|40=2|44=92.48|47=A|54=5|55=RC|60=20191024-13:30:00.050|111=0|114=N|336=X|5700=AP|9281=SOV|10=202
09:37:21.208|IN:|8=FIX.4.2|9=170|35=D|34=5184|49=CLIENT|52=20191024-13:37:21.206|56=SERV|57=ATXB|1=J5129|11=136404|15=USD|21=1|38=100|40=2|44=1.39|47=A|54=2|55=DIW|59=2|60=20191024-13:30:00.206|10=029

In a string replacements how we use '/r' modifier

I need to increment a numeric value in a string:
my $str = "tool_v01.zip";
(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1++);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1+1);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ $1=~s{(\d+)}{$1+1}/r; /eri;
print $newstr;
Expected output is tool_v02.zip
Note: the version number 01 may contain any number of leading zeroes
I don't think this question has anything to do with the /r modifier, but rather how to properly format the output. For that, I'd suggest sprintf:
my $newstr = $str =~ s{ _v (\d+) \.zip$ }
{ sprintf("_v%0*d.zip", length($1), $1+1 ) }xeri;
Or, replacing just the number with zero-width Lookaround Assertions:
my $newstr = $str =~ s{ (?<= _v ) (\d+) (?= \.zip$ ) }
{ sprintf("%0*d", length($1), $1+1 ) }xeri;
Note: With either of these solutions, something like tool_v99.zip would be altered to tool_v100.zip because the new sequence number cannot be expressed in two characters. If that's not what you want then you need to specify what alternative behaviour you require.
The bit you're missing is sprintf which works the same way as printf except rather than outputting the formatted string to stdout or a file handle, it returns it as a string. Example:
sprintf("%02d",3)
generates a string 03
Putting this into your regex you can do this. Rather than using /r you can use do a zero-width look ahead ((?=...)) to match the file suffix and just replace the matched number with the new value
s/(\d+)(?=.zip$)/sprintf("%02d",$1+1)/ei

match multiple patterns and extract subpatterns into a array in perl

I have the following string in $str:
assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;
I want to parse this line and only capture all the string that starts with non-word character followed by either reg_\w+ or regbus_\w+ into an array.
so in the above example, i want to capture only
regbus_s_partially_resident and reg_two into a array.
I tried this and it didnot work:
my (#all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);
Since i am trying to use \W, its copying the non-word character also into the array list, which i donot want.
its copying the non-word character also into the array list
No, it doesn't.
$ perl -le'
my $str = "assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;";
my (#all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);
print $_ // "[undef]" for #all_matches;
'
[undef]
regbus_s_partially_resident
reg_two
[undef]
But you do have a problem: You have two captures, so you will get two values per match.
Fix:
my #all_matches;
push #all_matches, $1 // $2 while $str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g;
Far better:
my #all_matches = $str =~ m/\W(reg(?:bus)?_\w+)/g;
Ever better yet:
my #all_matches = $str =~ m/\b(reg(?:bus)?_\w+)/g;
Need a little tweak to your regex
my #all_matches = $str =~ m/\W(reg_\w+|regbus_\w+)/g;
or
my #all_matches = $str =~ m/\W( (?:reg|regbus)_\w+ )/gx;
or even something along the lines of
my #all_matches = $str =~ m/\W( reg(?:bus)?_\w+ )/gx;
The most suitable form depends on what patterns you may need and how this is used.
Or, reduce the regex use to the heart of the problem
my #matches = grep { /^(?:reg_\w+|regbus_\w+)/ } split /\W/, $str;
what may be helpful if your strings and/or requirements grow more complex.

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

In Perl, I want to mask/cut of X number of characters at end of string (X can be one of a set of character strings)

I have a two strings, XXXXXXnumber and XXXXXXdate and I want to strip all the XXXXXX from each string. The actual number of character represented by XXXXXX can vary. The suffixes 'number' and 'date' are constant. XXXXXXnumber and XXXXXXXdate should become XXXXXX.
my ($prefix) = ($string =~ /\A (.+?) (?:date|number) \z/x);
Alternatively:
$string =~ s/ (?:date|number) \z//x;
I would use a regular expression like $line =~ s/(number|date)$// for that task, where $line can be either line.
If your line has additional characters after number or date, they must be filtered out, too. An alternative approach would be using an expression like ($num) = ($line =~ /^(.*)(number|date).*$/);
use regexes:
($newvar = $oldvar) =~ s/^(.*)(number|date)$/$1/;
if you have no mor euse for $oldvar's original value (including the Xes) this simplifies to
$oldvar =~ s/^(.*)(number|date)$/$1/;
A simple substitution takes care of it:
$str =~ s/(?:number|date)\z/;