How to match a pattern and exclude another pattern within a string in Perl - perl

My objective is to match a pattern that doesn't contain '#' before the pattern, for example this:
array = ("# abc", "# abcd" "abc" " abc ", "abcd" "abc # foo")
I want to match "abc". " abc" . "abcd" . "abc # foo"
What regular expression do I need so as to match only patterns of 'abc' that do not contain '#'?
I tried m/[^#]+abc/g but it doesn't work.

look for regex lookbehind and lookahead.
something like this:
m/^(?<!#).*/g
may work, it's a negative look behind.

Your criterion isn't at all clear. Do you want to reject anything that has # as the first character?
print "$_\n" for grep /^[^#]/, #array;
will do that. But if you also want to check for the abc after possible leading space then you need
print "$_\n" for grep /^\s*abc/, #array;
these produce the same results from your data and select the items you say you want.

If you don't want # anywhere before abc, you were almost there. Try this: ^[^#]*abc.

this should work, it uses negative lookahead:
^[^#]*abc(?!(.*#))
New thoughts...
I read carefully your question again and I found out I didn't get exactly what you meant. Your intention is really confused and all I can say for sure is you DON'T want to get a match if the string has 1 or many # behind abc in the same line (and I guess you don't care if there's something else or not between them). I was confused because you explicitly say you WANT to match "abc # foo" but at the same time incoherently you say "to match only patterns of 'abc' that do not contain '#'".
If we want to follow this new interpretation, the correct regular expression may be:
(?<!(#.*))abc.*
The expression will not consume any text behind abc and will match from that point on if only anything before doesn't contain a # at all.

Related

Convert Perl to Shell

I have Perl script that I use to SNMP walk devices. However the server I have available to me does not allow me to install all the modules needed. So I need to convert the script to Shell (sh). I can run the script on individual devices but would like it to read from a text like it did in Perl. The Perl Script starts with:
open(TEST, "cat test.txt |");
#records=<TEST>;
close(TEST);
foreach $line (#records)
{
($field1, $field2, $field3)=split(/\s+/, $line);
# Run and record SNMP walk results.
Depending on exactly what the input is and what you are trying to do, that perl code fragment would likely translate to:
while read field1 field2 field3
do
# Run and record SNMP walk results.
echo "1=$field1 2=$field2 3=$field3"
done <text.txt
For example, if text.txt is:
$ cat text.txt
one two three
i ii iii
Then, the above code produces the output:
1=one 2=two 3=three
1=i 2=ii 3=iii
As you can see, the shell read command reads a line (record) at a time and also does splitting on whitespace. There are many options for read to control whether newlines or something else divide records (-d) and whether splitting is to be done on whitespace or something else (IFS) or whether backslashes in the input are to be treated as escape characters or not (-r). See man bash.
while read string; do
str1=${string%% *}
str3=${string##* }
temp=${string#$str1 }
str2=${temp%% *}
echo $str1 $str2 $str3
done <test.txt
alternate version
while read string; do
str1=${string%% *}
temp=${string#$str1 }
str2=${temp%% *}
temp=${string#$str1 $str2 }
str3=${temp%% *}
echo $str1 $str2 $str3
done <test.txt
POSIX substring parameter expansion
${parameter%word}
Remove Smallest Suffix Pattern. The word shall be expanded to produce
a pattern. The parameter expansion shall then result in parameter,
with the smallest portion of the suffix matched by the pattern
deleted.
${parameter%%word}
Remove Largest Suffix Pattern. The word shall be expanded to produce a
pattern. The parameter expansion shall then result in parameter, with
the largest portion of the suffix matched by the pattern deleted.
${parameter#word}
Remove Smallest Prefix Pattern. The word shall be expanded to produce
a pattern. The parameter expansion shall then result in parameter,
with the smallest portion of the prefix matched by the pattern
deleted. ${parameter##word} Remove Largest Prefix Pattern. The word
shall be expanded to produce a pattern. The parameter expansion shall
then result in parameter, with the largest portion of the prefix
matched by the pattern deleted.
${parameter##word}
Remove Largest Prefix Pattern. The word shall be expanded to produce a
pattern. The parameter expansion shall then result in parameter, with
the largest portion of the prefix matched by the pattern deleted.

Need Regular expression - perl

I am looking for a regx for below expression:
my $text = "1170 KB/s (244475 bytes in 2.204s)"; # I want to retrieve last ‘2.204’ from this String.
$text =~ m/\d+[^\d*](\d+)/; #Regexp
my $num = $1;
print " $num ";
Output:
204
But I need 2.204 as output, please correct me.
Can any one help me out?
The regex is doing exactly what you asked it to: It is matching digits \d+, followed by one non-digit or star [^\d*], followed by digits \d+. The only thing that matches that in your string is 204.
If you want a quick fix, you can just move the parentheses:
m/(\d+[^\d*]\d+)/
This would (with the above input) match what you want. A more exact way to put it would be:
m/(\d+\.\d+)/
Of course this will match any float precision number, so if you can have more of those, that's not a good idea. You can shore it up by using an anchor, like so:
m/(\d+\.\d+)s\)/
Where s\) forces the match to occur at only that place. Further strictures:
m/\(\d+\D+(\d+\.\d+)s\)/
You might also want to account for the possibility of your target number not being a float:
m/\(\d+\D+(\d+\.?\d*)s\)/
By using ? and * we allow for those parts not to match at all. This is not recommended to do unless you are using anchors. You can also replace everything in the capture group with [\d.]+.
If you are not fond of matching the parentheses, you can match the text:
m/bytes in ([\d.]+)s/
I'd go with the second marker as indicator where you are in the string:
my ($num) = ($text =~ /(\d+\.\d+)s/);
with explanations:
/( # start of matching group
\d+ # first digits
\. # a literal '.', take \D if you want non-numbers
\d+ # second digits
)/x # close the matching group and the regex
You had the matching groups wrong. Also the [^\d] is a bit excessive, generally you can negate some of the backspaced special classes (\d,\h, \s and \w) with their respective uppercase letter.
Try this regex:
$text =~ m/\d+[^\d]*(\d+\.?\d*)s/;
That should match 1+ digits, a decimal point if there is one, 0 or more decimal places, and make sure it's followed by a "s".

Difference between /Regex/gm and m/Regex/g in perl string matching

Is there a difference between ($ipAddrResult =~ /Regex/gm) and ($ipAddrResult =~ m/Regex/g) in perl string matching? When I google online I get explanation for second one and not the first one. The file I tried to edit has first condition.
The ms in different places mean different things.
Let's look at the second example first.
m// is the regular expression matching operator. As a shortcut, the m can be omitted, so
$foo =~ m/$pattern/;
is exactly the same as
$foo =~ /$pattern/;
The only time the m is required is if you want to use delimiters other than / for your pattern. You can do, for example
$foo =~ m!$pattern!;
or
$foo =~ m[$pattern];
and so on, but these all require the m to be there.
In the first example, the m after the regex is a modifier flag which tells the regex how to behave. The regex flags are documented in the perlre man page, which has this to say:
m -
Treat string as multiple lines. That is, change "^" and "$" from
matching the start or end of line only at the left and right ends of
the string to matching them anywhere within the string.
So this:
$foo =~ /$pattern/m;
is the same as this:
$foo =~ m/$pattern/m;
and the same as this:
$foo =~ m{$pattern}m;
In the expression
/Regex/gm
The "m" stands for multi-line matching. In the expression:
m/Regex/g
The "m" stands for "match" as opposed to a substitution, which looks like this:
s/Regex/replacement/g
Because matching (vs. substitution) is the default, you can generally leave off the "m/" from the start of the expression. In other words "m/Regex/g" is just a synonym for "/Regex/g".
Yes, m/regex/g is syntactically equivalent to just /regex/g. That is, it doesn't activate the /m flag at all. Compare to s/foo/bar/ which is not at all the same as s/foo/bar/s. The name m stands for "match" I believe.

How to get a perfect match for a regexp pattern in Perl?

I've to match a regular-expression, stored in a variable:
#!/bin/env perl
use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
print "\n********* NOT MATCHED\n";
}
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]
But expecting, it shouldn't go inside the if clause.
What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg----- (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg----- (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED (CORRECT/INTENDED)
but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?
By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;
(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)
EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;
This will match anything that takes the following form:
beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
[, one or more digits, ]
an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)
Do you have any more unstated requirements that need to be satisfied?
The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.
You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.
Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.
So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).
Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.
Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.
Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.

What does this following code do?

What does this following do? Can anybody explain me?
$data = "What is the STATUS of your mind right now?";
$data =~/.*/; print "$1,$2\n";
$data =~/(.*?)(u+).*/; print "$1, $2\n";
$data =~/(.?)(u+).*/; print "$1, $2\n";
$data =~/(\w+\s)+/; print "$1, $2\n";
What is $1 and $2? How does this get it's value? and what are all these regular expressions?
Thanks in advance :)
Please read perldoc perlretut, which will answer all your questions.
The general reference for Perl regular expressions is perldoc perlre, but you should read the tutorial first as it serves as a nicer introduction.
$1 and $2 are matching variables. They refer to whatever is matched in the various parentheses matching groups in the last regular expression.
$1 has the part of the string that was matched in the first parenthesis group. $2 has the part of the string that was matched in the second parenthesis group. You can guess what $3 would contain.
Lets look at your example:
$data = "What is the STATUS of your mind right now?";
$data =~/.*/; print "$1,$2\n";
There are no parentheses here, so $1 and $2 don't contain anything.
$data =~/(.*?)(u+).*/; print "$1, $2\n";
There are two parentheses groups here. The first one is (.*?), which matches nothing or anything it can (in a non-greedy manner, but that's another topic). The second one is (u+) which matches one or more *"u"*s.
The first (and only) "u" in $data is in the middle of "you", so $1 matches everying up until the first "u", and $2 matches that one "u".
$data =~/(.?)(u+).*/; print "$1, $2\n";
Now the first group is (.?), which matches one single character, or nothing. Then (u+) again matches one or more *"u"*s.
Since there's just one "u" in our string, the first group will be the one single character before it, which is "o", and the second group will match the actual "u"
$data =~/(\w+\s)+/; print "$1, $2\n";
Finally, the first group matches (\w+\s)+, which is one or more "word" characters followed by a whitespace character. "Word" characters are any alphanumeric character or the underscore. There is no second group, but there is that + (one or more) symbol.
So what does it match up to? This is a weird one, and I'm not sure if my understanding is 100% accurate. Since the entire matching group has the +, it will gobble up as much of the string as it can and still match the \w+\s. In this case it's able to ignore everything up until the "right ", which it then matches as $1.
Then, because it has the +, it will look for any more matches immediately afterward, but since the "right " is the rightmost string matched, it won't ever find another group match.
So $1 is "right ", and $2 is empty.
Summary:
When you see $1 and $2, you should look for the matching group parentheses in the last regular expression.