What happens to a non-matching regex in a Perl subroutine call? - perl

I'm trying to make sense of what's happening with a non-matching regex in a subroutine call. Consider this script:
sub routine{
print Dumper(\#_);
}
my $s = 'abc123';
# These pass a single element to &routine
&routine( $s =~ /c/ ); # 1. passes (1)
&routine(2 == 3); # 2. passes ('')
&routine(3 == 3); # 3. passes (1)
# The following two calls appear to be identical
&routine( $s =~ /foobar/ ); # 4. passes ()
&routine(); # 5. passes ()
In the above script, numbers 1, 2 and 3 all pass a single value to &routine. I'm surprised that number 4 doesn't pass a false value, but rather passes nothing at all!
It doesn't seem possible that the non-matching regex evaluates to nothing at all, since the same sort of signature in a conditional isn't valid:
# This is fine
if( $s =~ /foobar/ ){
print "it's true!\n";
}
# This is a syntax error
if( ){
print "Hmm...\n"; # :/
}
What happens to the non-matching regex when it's used in a subroutine call? Further, is it possible for &routine to figure out whether or not it's been called with a non-matching regex, vs nothing at all?

When the match operator =~ is used in list context it returns a list of matches. When there are no matches this list is empty (also called the empty list), and the empty list is passed to your sub routine which in turn causes #_ to be empty.
If you explicitly want to pass the false value of "Did this expression return any matches?" you need to perform your match in scalar context. You can do this by using the scalar keyword
&routine( scalar $s =~ /foobar/ );
which will pass the value ''(false) to your routine sub. Calling a sub without any arguments effectively passes this empty list, so your final example would be correctly written:
if ( () ) {
print "Hmm...\n";
}
which is not a syntax error because in Perl 0, '', and () all represent false.

Related

I need to search for a value in a perl array and if I find a match execute some code

This is sort of what I am wanting to do. At present mm returns nothing, while searchname returns the expected value.
This is a perl script embedded in a web page.
I have tried numerous approaches to this code but nothing seems to provide the results I desire. I think it is just a case of syntax.
# search for an item
if ($modtype eq "search") {
$searchname=$modname;
print "Value of searchname $searchname\n";
my #mm = grep{$searchname} #names;
print "Value of mm #mm\n";
if ($mm eq $searchname) {
print "$searchname found!\n";
}
else {
print "$searchname not Found\n";
}
}
my #mm = grep { $_ eq $searchname } #names;
if (#mm) {
print "found\n";
}
grep takes a boolean expression, not just a variable. In that expression, $_ refers to the current list element. By using an equality comparison we get (in #mm) all elements of #names that are equal to $searchname, if any.
To check whether an array is empty, you can simply use it in boolean context, as in if (#mm).
If you don't care about the found elements themselves, just whether there are any, you can use grep in scalar context:
my $count = grep { $_ eq $searchname } #names;
if ($count > 0) {
print "found $count results\n";
}
This will give you the number of matching elements.
If you don't need to know that number, just whether there was any result at all, you can use any from List::Util:
use List::Util qw(any);
if (any { $_ eq $searchname } #names) {
...
}
If #names is big, this is potentially more efficient because it can stop after the first match is found.
I'm not sure what $mm refers to in your code. Did you start your code with use strict; use warnings;? If not, you should.
Looks like you misunderstand a couple of things.
my #mm = grep{$searchname} #names;
The grep() function takes two arguments. A block of code ({ $searchname }) and a list of values (#names). For each value in the list, it puts the value into $_ and executes the code block. If the code block returns a true value then the contents of $_ is added to the output list.
Your block of code ignores $_ and just checks for the value of $searchname. That is very likely to always be true, so all of the values from #names get copied into #mm.
I think it's more likely that you want:
my #mm = grep{ $_ eq $searchname } #names;
Secondly, you suddenly start using a new variable called $mm. I suspect you're getting confused between #mm and $mm which are completely different variables with no connection with each other.
I think what you're actually trying to do is to look at the first element of #mm so you want:
if ($mm[0] eq $searchname)
But, given that values only end up in #mm if they are equal to $searchname (because that's what your grep() does), I think you really just want to check whether or not anything ended up in #mm. So you should use:
if (#mm)
Which is, in my opinion, easier to understand.

Explanation of Perl's syntax from module MoreUtils.pm

I am seeking explanation of the syntax of Perl's uniq and fidrstidx function from module MoreUtils.pm.
Having sought that, I already know other ways to get uniq array elements from an array having duplicate elements and finding the first index from an array by below ways :
## remove duplicate elements ##
my #arr = qw (2 4 2 8 3 4 6);
my #uniq = ();
my %hash = ();
#uniq = grep {!$hash{$_}++ } #arr;
### first index ###
#arr = qw (Java ooperl Ruby cgiperl Python);
my ($index) = grep {$arr[$_] =~ /perl/} 0..$#arr;
Can anybody please explain me second line of this below sub uniq function comprising map and ternary operator from MoreUtils.pm:
map {$h{$_}++ == 0 ? $_ : () } #;
and also
the &# passed to firstidx function and the below line in the body of the function :
local *_ = \$_[$i];
What I understand that sub routine ref is passed to firstidx. But a bit more detailed explanation will be much appreciated.
Thanks.
Your second question was answered in the comments.
Your first question asks about map {$h{$_}++ == 0 ? $_ : () } #; from List::MoreUtils. In recent versions, it's actually in List::MoreUtils::PP (for Pure Perl) since many of the subroutines are also implemented in C and XS. Here's the current version of the Pure Perl uniq:
sub uniq (#)
{
my %seen = ();
my $k;
my $seen_undef;
grep { defined $_ ? not $seen{ $k = $_ }++ : not $seen_undef++ } #_;
}
This has the same map technique although it's using grep instead. The grep goes through all of the elements in #_ and has to return either true or false for each of them. The elements which evaluate to true end up in the output list. The code then wants to make an element evaluate to true the first time it sees it and false the rest of the times.
In this code it handles undef separately. If the current element is not undef, it does the first branch of the conditional operator and the second branch otherwise. Now let's look at the branches.
The defined case adds an element to a hash. No one left code comments about the use of $k but it probably has something to do with not disturbing $_. That $k becomes the key for the hash:
not $seen{ $k = $_ }++
If that is the first time that key has been encountered the value of the hash is undef. That post-increment does its work after the value is used so hold off on thinking about that for a moment. The low-precendence not sees the value of $seen{$k}, which is undef. The not turns the false value of undef into true. That true indicates that the grep has seen $_ for the first time. It becomes part of the output list. Then the ++ does its work and increments the undef value to 1. On all subsequent encounters with the same value the hash value will be true. The not will turn the true value into false and that element won't be in the output list.
The map you show implements the grep. It returns an element when the condition is true and returns no elements when it is false:
map {$h{$_}++ == 0 ? $_ : () } #_;
For each element it adds it as the key in the hash and compares the value to 0. The first time an element is seen that value is undef. In numeric context an undef is 0. So, the == returns true and the first branch of the conditional operator fires, returning $_ to the output list. The ++ then increments the hash value from undef to 1. The next time it encounters the same value the hash value is not 0 and the second branch of the conditional operator returns the empty list. That adds no elements to the output list.
Newer version of List::MoreUtils don't use the construct any more, but as Сухой27 explained,
map { CONDITION ? $_ : () } LIST
is just a fancy alternative to
grep { CONDITION } LIST
I don't think there's any overarching reason the author chose map for this implementation, and in fact it was simplified to grep in later versions of List::MoreUtils.
The firstidx syntax is firstidx BLOCK LIST. Like the builtin map and grep, it is specified that the code in BLOCK will operate on the variable $_, and that the code is allowed to make changes to $_. So in the firstidx implementation, it is not sufficient to set $_ to each value in LIST. Rather, $_ must be aliased to each element of LIST so that a change in $_ inside BLOCK also results to a change in the element in the LIST. This is accomplished by manipulating the symbol table
local *_ = \$scalar # make $_ an alias of $scalar
And you use local so that when firstidx is done, we haven't clobbered any useful information that was previously in the $_ variable.

How can I check if a subroutine returns *nothing*

How can I determine that a perl function return nothing not even undef?
Examples:
sub test {
return;
}
or
sub test {}
The problem is that the curious acting function can also return strings, lists, ... and is a part of a foreign package. So I cannot check with test() || undef because empty lists or strings will be overitten with undef.
Does anyone have an idea how I can check the "null" value, so I can create an conditional exception?
If you use return and you specify no return value, the subroutine returns an empty list
in list context, the undefined value in scalar context, or nothing in
void context.
If no return is found and if the last statement is an expression, its
value is returned. If the last statement is a loop control structure
like a foreach or a while, the returned value is unspecified. The
empty sub returns the empty list.
-perlsub
In both of your cases it will return empty list. So you cannot distinguish between them.
If I understood you correctly you are trying to avoid empty list being overridden with undef if you do test () || undef. But that doesn't matter. In Perl both empty list and undef are considered false.
All of the below evaluate to false
0
'0'
undef
'' # Empty scalar
() # Empty list
('')
There is no way to "return nothing not even undef"*
The subroutine test that you describe evaluates to undef if you call it as
my $ret = test();
or an empty list if you call it as
my #ret = test();
What you do about that depends on what valid values your subroutine may return. Is it designed to return a list or a scalar?
Clearly an error condition must be distinct from any valid return value, and a common way is to always return a scalar value, which may be a reference if you need to return multiple values
Suppose you have a trash subroutine that returns a list of all values in a given range
use strict;
use warnings;
BEGIN { require v5.10 }
use feature 'say';
STDOUT->autoflush;
sub range {
my ($start, $end) = #_;
return if $end < $start;
return [ $start .. $end ];
}
my $range = range(1, 3) or die;
say for #$range;
$range = range(10, 1) or die;
say for #$range;
output
1
2
3
Died at E:\Perl\source\twice.pl line 19.
The returned value is undef if the parameters are bad, and the calling code may use that information as it likes

Why doesn't my subroutine return value get assigned to $_ default variable

I have a Perl subroutine which updates an RSS feed. I want to test the returned value, but the function is used in many places so I wanted to just test the default variable $_ which as far as I understand should be the assigned the return value if no variable is specified.
The code is a bit too long to include all of it, but in essence it does the following
sub updateFeed {
#....
if($error) {
return 0;
}
return 1;
}
Why then does
$rtn = updateFeed("My message");
if ($rtn < 1) { &Log("updateFeed Failed with error $rtn"); }
NOT log any error
whereas
updateFeed("myMessage");
if ($_ < 1) { &Log("updateFeed Failed with error $_"); }
logs an error of "updateFeed Failed with error"? (Note no value at the end of the message.)
Can anyone tell me why the default variable seems to contain an empty string or undef?
Because Perl doesn't work that way. $_ doesn't automatically get the result of functions called in void context. There are some built-in operators that read and write $_ and #_ by default, but your own subroutines will only do that if you write code to make it happen.
An ordinary function call is not one of the contexts in which $_ is used implicitly.
Here's what perldoc perlvar (as of v5.14.1) has to say about $_:
$_
The default input and pattern-searching space. The following pairs are equivalent:
while (<>) {...} # equivalent only in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chomp
chomp($_)
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, exp, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, quotemeta, readlink, readpipe, ref, require,
reverse (in scalar context only), rmdir, sin, split (on its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f, -d) except for -t, which defaults to STDIN. See -X in perlfunc
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a <FH> operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.
As $_ is a global variable, this may lead in some cases to unwanted side-effects. As of perl 5.9.1, you can now use a lexical version of $_ by declaring it in a file or in a block with my.
Moreover, declaring our $_ restores the global $_ in the current scope.
Mnemonic: underline is understood in certain operations.
You never assigned the flag to $_, so why would it contain your flag? It appears to contain an empty string (or perhaps undef, which stringifies to the empty string with a warning).
$_ isn't by set by subroutines in void context by default. It is possible to write your subs to set $_ when is void context. You start by checking the value of wantarray, and set $_ when wantarray is undefined.
sub updateFeed {
...
my $return
...
if($error) {
$return = 0;
}else{
$return = 1;
}
# $return = !$error || 0;
if( defined wantarray ){ # scalar or list context
return $return;
}else{ # void context
$_ = $return;
}
}
I would recommend against doing this as it can be quite a surprise to someone that is using your subroutine. Which can make it harder to debug their program.
About the only time I would do this, is when emulating a built-in subroutine.

Can you explain the context dependent variable assignment in perl

The following is one of the many cool things that Perl can do
my ($tmp) = ($_=~ /^>(.*)/);
It finds the pattern ^>.* in the current line in a loop, and it stores the what's in the parenthesis in the $tmp variable.
What I am curious is the concept behind this syntax. How and why(under what premises) does this work?
My understanding is the snippet $_=~ /^>(.*)/ is a boolean context, but the parenthesis renders it as a list context? But how come only what is in the parenthesis in the matched pattern is stored in the variable?!
Is it some kind of special case of variable assignments I have to "memorize" or can this be perfectly explainable? if so, what is this feature called(name like "autovivifacation?")
There are two assignment operators: list assignment and scalar assignment. The choice is determined based on the LHS of the "=". (The two operators are covered in detail in here.)
In this case, a list assignment operator is used. The list assignment operator evaluates both of its operands in list context.
So what does $_=~ /^>(.*)/ do in list context? Quote perlop:
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...) [...] When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.
In other words,
my ($match) = $_ =~ /^>(.*)/;
is equivalent to
my $match;
if ($_ =~ /^>(.*)/) {
$match = $1;
} else {
$match = undef;
}
Were the parens omitted (my $tmp = ...;), a scalar assignment would be used instead. The scalar assignment operator evaluates both of its operands in scalar context.
So what does $_=~ /^>(.*)/ do in scalar context? Quote perlop:
returns true if it succeeds, false if it fails.
In other words,
my $matched = $_ =~ /^>(.*)/;
is equivalent to
my $matched;
if ($_ =~ /^>(.*)/) {
$matched = 1; # !!1 if you want to be picky.
} else {
$matched = 0; # !!0 if you want to be picky.
}
The brackets in the search pattern make that a "group". What $_ =~ /regex/returns is an array of all the matching groups, so my ($tmp) grabs the first group into $tmp.
All operations in perl have a return value, including assignment. Thats why you can do $a=$b=1 and set $a to the result of $b=1.
You can use =~ in a boolean (well, scalar) context, but that's just because it returns an empty list / undef if there's no match, and that evaluates to false. Calling it in an array context returns an array, just like other context-sensitive functions can do using the wantarray method to determine context.