Negative Match in perl

Negative Match in perl - perl

I am trying to use negative match in perl.
I want to replace compete word $FROM by $TO except if
.$FROM
or
. $FROM
How to write a expression for this?
I used negative look-ahead regex. But I get following error:
Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<!\.\s*)

If you know how many spaces can appear between the . and the search value you can chain together fixed-width negative look-behinds. e.g. to handle zero or one space:
s/(?<!\.)(?<!\. )$FROM/$TO/;
If the number of spaces isn't bound but the search value is a literal string (not a pattern) you can reverse everything, use a negative look-ahead, then reverse the resulting value. e.g.
my $rfrom = reverse $FROM;
my $rto = reverse $TO;
my $rstring = reverse $string;
$rstring =~ s/$rfrom(?! *\.)/$rto/;
$string = reverse $rstring;
If the search value can be a pattern there's no general solution to work around the lack of variable-width negative look-behind.

Related

Need Regular expression - perl

I am looking for a regx for below expression:
my $text = "1170 KB/s (244475 bytes in 2.204s)"; # I want to retrieve last ‘2.204’ from this String.
$text =~ m/\d+[^\d*](\d+)/; #Regexp
my $num = $1;
print " $num ";
Output:
204
But I need 2.204 as output, please correct me.
Can any one help me out?

The regex is doing exactly what you asked it to: It is matching digits \d+, followed by one non-digit or star [^\d*], followed by digits \d+. The only thing that matches that in your string is 204.
If you want a quick fix, you can just move the parentheses:
m/(\d+[^\d*]\d+)/
This would (with the above input) match what you want. A more exact way to put it would be:
m/(\d+\.\d+)/
Of course this will match any float precision number, so if you can have more of those, that's not a good idea. You can shore it up by using an anchor, like so:
m/(\d+\.\d+)s\)/
Where s\) forces the match to occur at only that place. Further strictures:
m/\(\d+\D+(\d+\.\d+)s\)/
You might also want to account for the possibility of your target number not being a float:
m/\(\d+\D+(\d+\.?\d*)s\)/
By using ? and * we allow for those parts not to match at all. This is not recommended to do unless you are using anchors. You can also replace everything in the capture group with [\d.]+.
If you are not fond of matching the parentheses, you can match the text:
m/bytes in ([\d.]+)s/

I'd go with the second marker as indicator where you are in the string:
my ($num) = ($text =~ /(\d+\.\d+)s/);
with explanations:
/( # start of matching group
\d+ # first digits
\. # a literal '.', take \D if you want non-numbers
\d+ # second digits
)/x # close the matching group and the regex
You had the matching groups wrong. Also the [^\d] is a bit excessive, generally you can negate some of the backspaced special classes (\d,\h, \s and \w) with their respective uppercase letter.

Try this regex:
$text =~ m/\d+[^\d]*(\d+\.?\d*)s/;
That should match 1+ digits, a decimal point if there is one, 0 or more decimal places, and make sure it's followed by a "s".

Difference between /Regex/gm and m/Regex/g in perl string matching

Is there a difference between ($ipAddrResult =~ /Regex/gm) and ($ipAddrResult =~ m/Regex/g) in perl string matching? When I google online I get explanation for second one and not the first one. The file I tried to edit has first condition.

The ms in different places mean different things.
Let's look at the second example first.
m// is the regular expression matching operator. As a shortcut, the m can be omitted, so
$foo =~ m/$pattern/;
is exactly the same as
$foo =~ /$pattern/;
The only time the m is required is if you want to use delimiters other than / for your pattern. You can do, for example
$foo =~ m!$pattern!;
or
$foo =~ m[$pattern];
and so on, but these all require the m to be there.
In the first example, the m after the regex is a modifier flag which tells the regex how to behave. The regex flags are documented in the perlre man page, which has this to say:
m -
Treat string as multiple lines. That is, change "^" and "$" from
matching the start or end of line only at the left and right ends of
the string to matching them anywhere within the string.
So this:
$foo =~ /$pattern/m;
is the same as this:
$foo =~ m/$pattern/m;
and the same as this:
$foo =~ m{$pattern}m;

In the expression
/Regex/gm
The "m" stands for multi-line matching. In the expression:
m/Regex/g
The "m" stands for "match" as opposed to a substitution, which looks like this:
s/Regex/replacement/g
Because matching (vs. substitution) is the default, you can generally leave off the "m/" from the start of the expression. In other words "m/Regex/g" is just a synonym for "/Regex/g".

Yes, m/regex/g is syntactically equivalent to just /regex/g. That is, it doesn't activate the /m flag at all. Compare to s/foo/bar/ which is not at all the same as s/foo/bar/s. The name m stands for "match" I believe.

How do I replace all occurrences of certain characters with their predecessors?

$s = "bla..bla";
$s =~ s/([^%])\./$1/g;
I think it should replace all occurrences of . that is not after % with the character that is before ..
But $s is then: bla.bla, but
it should be blabla. Where is the problem? I know I can use quantifiers, but I need do it this way.

When a global regular expression is searching a string it will not find overlapping matches.
The first match in your string will be a., which is replaced with a. When the regex engine resumes searching it starts at the next . so it sees .bla as the rest of the string, and your regex requires a character to match before the . so it cannot match again.
Instead, use a negative lookbehind to perform the assertion that the previous character is not %:
$s =~ s/(?<!%)\.//g;
Note that if you use a positive lookbehind like (?<=[^%]), you will not replace the . if it is the first character in the string.

The problem is that even with the /g flag, each substitution starts looking where the previous one left off. You're trying to replace a. with a and then a. with a, but the second replacement doesn't happen because the a has already been "swallowed" by the previous replacement.
One fix is to use a zero-width lookbehind assertion:
$s =~ s/(?<=[^%])\.//g;
which will remove any . that is not the first character in the string, and that is not preceded by %.
But you might actually want this:
$s =~ s/(?<!%)\.//g;
which will remove any . that is not preceded by %, even if it is the first character in the string.

Much simpler than look-behinds is to use:
$s =~ s/([^%])\.+/$1/g;
This replaces any string of one or more dots after a character other than % by nothing.

Split by dot using Perl

I use the split function by two ways. First way (string argument to split):
my $string = "chr1.txt";
my #array1 = split(".", $string);
print $array1[0];
I get this error:
Use of uninitialized value in print
When I do split by the second way (regular expression argument to split), I don't get any errors.
my #array1 = split(/\./, $string); print $array1[0];
My first way of splitting is not working only for dot.
What is the reason behind this?

"\." is just ., careful with escape sequences.
If you want a backslash and a dot in a double-quoted string, you need "\\.". Or use single quotes: '\.'

If you just want to parse files and get their suffixes, better use the fileparse() method from File::Basename.

Additional details to the information provided by Mat:
In split "\.", ... the first parameter to split is first interpreted as a double-quoted string before being passed to the regex engine. As Mat said, inside a double-quoted string, a \ is the escape character, meaning "take the next character literally", e.g. for things like putting double quotes inside a double-quoted string: "\""
So your split gets passed "." as the pattern. A single dot means "split on any character". As you know, the split pattern itself is not part of the results. So you have several empty strings as the result.
But why is the first element undefined instead of empty? The answer lies in the documentation for split: if you don't impose a limit on the number of elements returned by split (its third argument) then it will silently remove empty results from the end of the list. As all items are empty the list is empty, hence the first element doesn't exist and is undefined.
You can see the difference with this particular snippet:
my #p1 = split "\.", "thing";
my #p2 = split "\.", "thing", -1;
print scalar(#p1), ' ', scalar(#p2), "\n";
It outputs 0 6.
The "proper" way to deal with this, however, is what #soulSurfer2010 said in his post.

How to get a perfect match for a regexp pattern in Perl?

I've to match a regular-expression, stored in a variable:
#!/bin/env perl
use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
print "\n********* NOT MATCHED\n";
}
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]
But expecting, it shouldn't go inside the if clause.
What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg----- (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg----- (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED (CORRECT/INTENDED)
but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?

By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;
(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)
EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;
This will match anything that takes the following form:
beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
[, one or more digits, ]
an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)
Do you have any more unstated requirements that need to be satisfied?

The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.
You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.
Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.
So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).
Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.
Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.
Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse