Can I replace the binding operator with the smartmatch operator in Perl? - perl

How can I write this with the smartmatch operator (~~)?
use 5.010;
my $string = '12 23 34 45 5464 46';
while ( $string =~ /(\d\d)\s/g ) {
say $1;
}

Interesting. perlsyn states:
Any ~~ Regex pattern match $a =~ /$b/
so, at first glance, it seems reasonable to expect
use strict; use warnings;
use 5.010;
my $string = '12 23 34 45 5464 46';
while ( $string ~~ /(\d\d)\s/g ) {
say $1;
}
to print 12, 23, etc but it gets stuck in a loop, matching 12 repeatedly. Using:
$ perl -MO=Deparse y.pl
yields
while ($string ~~ qr/(\d\d)\s/g) {
say $1;
}
looking at perlop, we notice
qr/STRING/msixpo
Note that 'g' is not listed as a modifier (logically, to me).
Interestingly, if you write:
my $re = qr/(\d\d)\s/g;
perl barfs:
Bareword found where operator expected at C:\Temp\y.pl line 5,
near "qr/(\d\d)\s/g"
syntax error at C:\Temp\y.pl line 5, near "qr/(\d\d)\s/g"
and presumably it should also say something if an invalid expression is used in the code above

If we go and look at what these two variants get transformed into, we can see the reason for this.
First lets look at the original version.
perl -MO=Deparse -e'while("abc" =~ /(.)/g){print "hi\n"}'
while ('abc' =~ /(.)/g) {
print "hi\n";
}
As you can see there wasn't any changing of the opcodes.
Now if you go and change it to use the smart-match operator, you can see it does actually change.
perl -MO=Deparse -e'while("abc" ~~ /(.)/g){print "hi\n"}'
while ('abc' ~~ qr/(.)/g) {
print "hi\n";
}
It changes it to qr, which doesn't recognize the /g option.
This should probably give you an error, but it doesn't get transformed until after it gets parsed.
The warning you should have gotten, and would get if you used qr instead is:
syntax error at -e line 1, near "qr/(.)/g"
The smart-match feature was never intended to replace the =~ operator. It came out of the process of making given/when work like it does.
Most of the time, when(EXPR) is treated as an implicit smart match of $_.
...

Is the expected behaviour to output to first match endlessly? Because that's what this code must do in its current form. The problem isn't the smart-match operator. The while loop is endless, because no modification ever occurs to $string. The /g global switch doesn't change the loop itself.
What are you trying to achieve? I'm assuming you want to output the two-digit values, one per line. In which case you might want to consider:
say join("\n", grep { /^\d{2}$/ } split(" ",$string));

To be honest, I'm not sure you can use the smart match operator for this. In my limited testing, it looks like the smart match is returning a boolean instead of a list of matches. The code you posted (using =~) can work without it, however.
What you posted doesn't work because of the while loop. The conditional statement on a while loop is executed before the start of each iteration. In this case, your regex is returning the first value in $string because it is reset at each iteration. A foreach would work however:
my $string = '12 23 34 45 5464 46';
foreach my $number ($string =~ /(\d\d)\s/g) {
print $number."\n";
}

Related

Using binding operator in perl

I am working on a program in perl and I am trying to combine more than one regex in a binding operator. I have tried using the syntax below but it's not working. I will like to know if there is any other way to go with this.
$in =~ (s/pattern/replacement/)||(s/pattern/replacement/)||...
You can often get a clue about what the Perl makes of some code using B::Deparse.
$ perl -MO=Deparse -E'$in =~ (s/pattern1/replacement1/)||(s/pattern2/replacement2/)'
[ ... snip ... ]
s/pattern2/replacement2/u unless $in =~ s/pattern1/replacement1/u;
-e syntax OK
So it's attempting your first substitution on $in. And if that fails, it is then trying your second substitution. But it's not using $in for the second substitution, it's using $_ instead.
You're running up against precedence issues here. Perl interprets your code as:
($in =~ s/pattern1/replacement1/) or (s/pattern2/replacement2/)
Notice that the opening parenthesis has moved before $in.
As others have pointed out, it's best to use a loop approach here. But I thought it might be useful to explain why your version didn't work.
Update: To be clear, if you wanted to use syntax like this, then you would need:
($in =~ s/pattern1/replacement1/) or
($in =~ s/pattern2/replacement2/);
Note that I've included $in =~ in each expression. At this point, it becomes obvious (I hope) why the looping solution is better.
However, because or is a short-circuiting operator, this statement will stop after the first successful substitution. I assumed that's what you wanted from your use of it in your original code. If that's not what you want, then you need to either switch to using and or (better, in my opinion) break them out into separate statements.
$in =~ s/pattern1/replacement1/;
$in =~ s/pattern2/replacement2/;
The closest you could get with a syntax looking similar to that would be
s/one/ONE/ or
s/two/TWO/ or
...
s/ten/TEN/ for $str;
This will attempt each substitution in turn, once only, stopping after the first successful one.
Use for to "topicalize" (alias $_ to your variable).
for ($in) {
s/pattern/replacement/;
s/pattern/replacement/;
}
A simpler way might be to create an array of all such patterns and replacements, then simply iterate through your array applying the substitution one pattern at a time.
my $in = "some string you want to modify";
my #patterns = (
['pattern to match', 'replacement string'],
# ...
);
$in = replace_many($in, \#patterns);
sub replace_many {
my ($in, $replacements) = #_;
foreach my $replacement ( #$replacements ) {
my ($pattern, $replace_string) = #$replacement;
$in =~ s/$pattern/$replace_string/;
}
return $in;
}
It's not at all clear what you need, and it's not at all clear that you can accomplish what you appear to want by the means you suggest. The OR operator is a short circuit operator, and you may not want this behavior. Please give an example of the input you expect, and the output you desire, hopefully several examples of each. Meanwhile, here is a test script.
use warnings;
use strict;
my $in1 = 'George Walker Bush';
my $in2 = 'George Walker Bush';
my $in3 = 'George Walker Bush';
my $in4 = 'George Walker Bush';
(my $out1 = $in1) =~ s/e/*/g;
print "out1 = $out1 \n";
(my $out2 = $in2) =~ s/Bush/Obama/;
print "out2 = $out2 \n";
(my $out3 = $in3) =~ s/(George)|(Bush)/Obama/g;
print "out3 = $out3\n";
$in4 =~ /(George)|(Walker)|(Bush)/g;
print "$1 - $2 - $3\n";
exit(0);
You will notice in the last case that only the first OR operator matches in the regular expression. If you wanted to replace 'George Walker Bush' with Barack Hussein Obama', you could do that easily enough, but you would also replace 'George Washington'with 'Barack Washington' - is this what you want? Here is the output of the script:
out1 = G*org* Walk*r Bush
out2 = George Walker Obama
out3 = Obama Walker Obama
Use of uninitialized value $2 in concatenation (.) or string at pq_151111a.plx line 19.
Use of uninitialized value $3 in concatenation (.) or string at pq_151111a.plx line 19.
George - -

conditional substitution using hashes

I'm trying for substitution in which a condition will allow or disallow substitution.
I have a string
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
Here are two hashes which are used to check condition.
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
Here is actual substitution in which substitution is allowed for hash tag values not matched.
$string =~ s{(?<=<(.*?)>)(you)}{
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sixe;
in this code I want to substitute 'you' with something with the condition that it is not equal to the hash value given in tag.
Can I use next in substitution?
Problem is that I can't use \g modifier. And after using next I cant go for next substitution.
Also I can't modify expression while matching and using next it dosen't go for second match, it stops there.
You can't use a variable length look behind assertion. The only one that is allowed is the special \K marker.
With that in mind, one way to perform this test is the following:
use strict;
use warnings;
while (my $string = <DATA>) {
$string =~ s{<([^>]*)>\K(?!\1)\w+}{I}s;
print $string;
}
__DATA__
There is <you>you can do for it. that dosen't mean <notyou>you are fool.
There is <you>you can do for it. that dosen't mean <do>you are fool.There <no>you got it.
Output:
There is <you>you can do for it. that dosen't mean <notyou>I are fool.
There is <you>you can do for it. that dosen't mean <do>I are fool.There <no>you got it.
It was simple but got my two days to think about it. I just written another substitution where it ignores previous tag which is cancelled by next;
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
my $notag;
$string =~ s{(?<=<(.*?)>)(you)}{
$notag = $2;
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sie;
$string =~ s{(?<=<(.*?)>)(?<!$notag)(you)}{
"I"
}sie;

Why do I get a syntax error in my compound if statement?

Why do I get a syntax error in the following script?
print "Enter Sequence:";
$a = <STDIN>;
if ($a=="A")|| ($a== "T")|| ( $a == "C")|| ($a== "G")
{
print $a;
}
else
{
print "Error";
}
First, you have a syntax error: The condition expression of an if statement must be in parens.
The second error is found by using use strict; use warnings;, something you should always do. The error is the use of numerical comparison (==) where string comparison (eq) is called for.
The final problem is that $a will almost surely contain a string ending with a newline, so a chomp is in order.
The immediate problem is that he entire logical expression for an if must be in parentheses.
In addition
You must use eq instead of == for comparing strings
Your input string will have a trailing newline, so it will look like "C\n" and will not match a simple one-character string. You need to chomp the input before you compare it
It is generally better to read from STDIN using <> rather than <STDIN>. That way you can specify an input file on the command line, or read from the STDIN if no input was provided
You must always put use strict and use warnings at the top of your program. That will catch many simple errors that you may otherwise overlook
You shouldn't use $a as a variable name. It is a symbol reserved by Perl itself, and says nothing about the purpose of the variable
It is best to use a regular expression for simple comparisons like this. It makes your code much easier to read and will usually make the execution very much faster
Please take a look at this program, which I think does what you want.
use strict;
use warnings;
print "Enter Sequence: ";
my $input = <>;
chomp $input;
if ( $input =~ /^[ATCG]$/i ) {
print $input, "\n";
}
else {
print "Error";
}

Backreferences undefined after finding pattern in perl v5.14.2

I found it strange that backreferences ($1,$2,$3) were not working in my original code, so I ran this example from the web:
#!/usr/bin/perl
# matchtest2.plx
use warnings;
use strict;
$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. silly (3)';
my $pattern = "silly";
if (/$pattern/) {
print "The text matches the pattern '$pattern'.\n";
print "\$1 is '$1'\n" if defined $1;
print "\$2 is '$2'\n" if defined $2;
print "\$3 is '$3'\n" if defined $3;
print "\$4 is '$4'\n" if defined $4;
print "\$5 is '$5'\n" if defined $5;
}
else {
print "'$pattern' was not found.\n";
}
Which only gave me:
The text matches the pattern 'silly'.
Why are the backreferences still undefined after the pattern was found? I am using Wubi (Ubuntu 12.04 64-bit) and my perl version is 5.14.2. Thank you in advance for your help.
You are not capturing any strings: No parentheses in your pattern. If you had done:
my $pattern = "(silly)";
You would have gotten something in $1.
In case you do not know, $1 is the text captured in the first parentheses, $2 the second parentheses, and so on.
This is expected behaviour! It is obvious that you pattern will match, so it is no suprise that the corresponding if-block is executed.
The term “backreferences” for $1, $2, ... may be slightly suboptimal, let's call them “capture groups”.
In a regex, you can enclose parts of the pattern with parens to be remembered later:
/(silly)/
This pattern has one group. The contents of this group will be stored in $1 if it matches.
All capture group variables for groups that don't exists in the pattern or were not populated are set to undef on an otherwise successfull match, so for above pattern $2, $3, ... would all be undef.

Is there a one-liner to get the first element of a split?

Instead of writing:
#holder = split /\./,"hello.world";
print #holder[0];
is it possible to just do a one-liner to just get the first element of the split? Something like:
print (split /\./,"hello.world")[0]
I get the following error when I try the second example:
print (...) interpreted as function at test.pl line 3.
syntax error at test.pl line 3, near ")["
You should have tried your hunch. That’s how to do it.
my $first = (split /\./, "hello.world")[0];
You could use a list-context assignment that grabs the first field only.
my($first) = split /\./, "hello.world";
To print it, use
print +(split /\./, "hello.world")[0], "\n";
or
print ((split(/\./, "hello.world"))[0], "\n");
The plus sign is there because of a syntactic ambiguity. It signals that everything following are arguments to print. The perlfunc documentation on print explains.
Be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all arguments (or interpose a +, but that doesn't look as good).
In the case above, I find the case with + much easier to write and read. YMMV.
If you insist on using split for this then you could potentially be splitting a long string into multiple fields, only to discard all but the first. The third parameter to split should be used to limit the number of fields into which to divide the string.
my $string = 'hello.world';
print((split(/\./, $string, 2))[0]);
But I believe a regular expression better describes what you want to do, and avoids this problem completely.
Either
my $string = 'hello.world';
my ($first) = $string =~ /([^.]+)/;
or
my $string = 'hello.world';
print $string =~ /([^.]+)/;
will extract the first string of non-dot characters for you.
I get the following error when I try the second example:
"syntax error at test.pl line 3, near ")["
No, if you have warnings enabled as you should, you get:
print (...) interpreted as function at test.pl line 3.
syntax error at test.pl line 3, near ")["
which should be a big clue as to your problem.