The following is one of the many cool things that Perl can do
my ($tmp) = ($_=~ /^>(.*)/);
It finds the pattern ^>.* in the current line in a loop, and it stores the what's in the parenthesis in the $tmp variable.
What I am curious is the concept behind this syntax. How and why(under what premises) does this work?
My understanding is the snippet $_=~ /^>(.*)/ is a boolean context, but the parenthesis renders it as a list context? But how come only what is in the parenthesis in the matched pattern is stored in the variable?!
Is it some kind of special case of variable assignments I have to "memorize" or can this be perfectly explainable? if so, what is this feature called(name like "autovivifacation?")
There are two assignment operators: list assignment and scalar assignment. The choice is determined based on the LHS of the "=". (The two operators are covered in detail in here.)
In this case, a list assignment operator is used. The list assignment operator evaluates both of its operands in list context.
So what does $_=~ /^>(.*)/ do in list context? Quote perlop:
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...) [...] When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.
In other words,
my ($match) = $_ =~ /^>(.*)/;
is equivalent to
my $match;
if ($_ =~ /^>(.*)/) {
$match = $1;
} else {
$match = undef;
}
Were the parens omitted (my $tmp = ...;), a scalar assignment would be used instead. The scalar assignment operator evaluates both of its operands in scalar context.
So what does $_=~ /^>(.*)/ do in scalar context? Quote perlop:
returns true if it succeeds, false if it fails.
In other words,
my $matched = $_ =~ /^>(.*)/;
is equivalent to
my $matched;
if ($_ =~ /^>(.*)/) {
$matched = 1; # !!1 if you want to be picky.
} else {
$matched = 0; # !!0 if you want to be picky.
}
The brackets in the search pattern make that a "group". What $_ =~ /regex/returns is an array of all the matching groups, so my ($tmp) grabs the first group into $tmp.
All operations in perl have a return value, including assignment. Thats why you can do $a=$b=1 and set $a to the result of $b=1.
You can use =~ in a boolean (well, scalar) context, but that's just because it returns an empty list / undef if there's no match, and that evaluates to false. Calling it in an array context returns an array, just like other context-sensitive functions can do using the wantarray method to determine context.
Related
I'm trying to write like:
my $q='select name as N from names';
my $sth=$dbh->prepare( $q ) ;
$sth->execute;
$test->{ $_->{N} } = 1 while $sth->fetchrow_hashref();
but $_ is undef. Does the return value get stored somewhere accessible without explicitly assigning it? Tim, it sure would be handy if it got stored somewhere!
The while loop does not always implicitly assign a value to the default operator $_. As described in perldoc perlsyn:
If the condition expression of a while statement is based on any of a
group of iterative expression types then it gets some magic treatment.
The affected iterative expression types are readline, the
input operator, readdir, glob, the globbing operator, and
each. If the condition expression is one of these expression types,
then the value yielded by the iterative operator will be implicitly
assigned to $_.
(...and otherwise it is not, is implied here)
To do what you want, you need to do
... while $_ = $sth->fetchrow_hashref();
Or better yet, use the idiomatic syntax:
while (my $row = $sth->fetchrow_hashref()) {
$test->{ $row->{N} } = 1;
}
I came across this code in a script, can you please explain what map and grep does here?
open FILE, '<', $file or die "Can't open file $file: $!\n";
my #sets = map {
chomp;
$_ =~ m/use (\w+)/;
$1;
}
grep /^use/, ( <FILE> );
close FILE;
The file pointed by $file has:
use set_marvel;
use set_caprion;
and so on...
Despite the fact that your question doesn't show any research effort, I'm going to answer it anyway, because it might be helpful for future readers who come across this page.
According to perldoc, map:
Evaluates the BLOCK or EXPR for each element of LIST (locally setting
$_ to each element) and returns the list value composed of the results
of each such evaluation. In scalar context, returns the total number
of elements so generated. Evaluates BLOCK or EXPR in list context, so
each element of LIST may produce zero, one, or more elements in the
returned value.
The definition for grep, on the other hand:
Evaluates the BLOCK or EXPR for each element of LIST (locally setting
$_ to each element) and returns the list value consisting of those
elements for which the expression evaluated to true. In scalar
context, returns the number of times the expression was true.
So they're similar in their input values, their return values, and the fact that they both localize $_.
In your specific code, going from right to left:
<FILE> slurps the lines in the file pointed to by the FILE filehandle and returns a list
In the context of grep, /^use/ looks at each line and returns true for the ones that match the regular expression. The return value of grep, therefore, is a list of lines that that start with use.
In the BLOCK of your map (which is only considering lines that passed the earlier grep test):
chomp removes any trailing string from $_ that corresponds to the current value of $/ (i.e., the newline). This is unnecessary, because as you'll see below, \w will never match a newline.
$_ =~ m/use (\w+)/ is a regular expression that looks for use followed by a space, followed by one or more word characters ([0-9a-zA-Z_]) in a capture group. The $_ =~ is redundant, since the match operator m// binds to $_ by default.
$1 is the first matching capture group from the previous expression. Since it's the last expression in the BLOCK, it bubbles up as the return value for each list item that was evaluated.
The end result is stored in an array named #sets, which should contain 'set_marvel', 'set_caprion', etc.
Equivalently, your code could be rewritten without map and grep like this, which may make it easier for you to understand:
my #sets;
while (<FILE>) {
next unless /^use (\w+)/;
push(#sets, $1);
}
The grep takes the <FILE> as input and uses the regular expression ^use to copy all of the lines that start with use into an array that is passed to map.
The map loops through each array entry and puts each entry in $_, then calls chomp on $_ implicitly. Then $_ =~ m/use (\w+)/; performs a regular expression on $_ that captures the word after the use and puts it into $1. Then the $1 is called to put it in #set.
I'm trying to make sense of what's happening with a non-matching regex in a subroutine call. Consider this script:
sub routine{
print Dumper(\#_);
}
my $s = 'abc123';
# These pass a single element to &routine
&routine( $s =~ /c/ ); # 1. passes (1)
&routine(2 == 3); # 2. passes ('')
&routine(3 == 3); # 3. passes (1)
# The following two calls appear to be identical
&routine( $s =~ /foobar/ ); # 4. passes ()
&routine(); # 5. passes ()
In the above script, numbers 1, 2 and 3 all pass a single value to &routine. I'm surprised that number 4 doesn't pass a false value, but rather passes nothing at all!
It doesn't seem possible that the non-matching regex evaluates to nothing at all, since the same sort of signature in a conditional isn't valid:
# This is fine
if( $s =~ /foobar/ ){
print "it's true!\n";
}
# This is a syntax error
if( ){
print "Hmm...\n"; # :/
}
What happens to the non-matching regex when it's used in a subroutine call? Further, is it possible for &routine to figure out whether or not it's been called with a non-matching regex, vs nothing at all?
When the match operator =~ is used in list context it returns a list of matches. When there are no matches this list is empty (also called the empty list), and the empty list is passed to your sub routine which in turn causes #_ to be empty.
If you explicitly want to pass the false value of "Did this expression return any matches?" you need to perform your match in scalar context. You can do this by using the scalar keyword
&routine( scalar $s =~ /foobar/ );
which will pass the value ''(false) to your routine sub. Calling a sub without any arguments effectively passes this empty list, so your final example would be correctly written:
if ( () ) {
print "Hmm...\n";
}
which is not a syntax error because in Perl 0, '', and () all represent false.
I dont understand how the "my" keyword works here. This is my perl script.
$line = ' sdfaad(asdvfr)';
code1:
if ($tmp = $line =~ /(\(\s*[^)]+\))/ ) {
print $tmp;
}
Outputs:
1
code2:
if (my ($tmp) = $line =~ /(\(\s*[^)]+\))/ ) {
print $tmp;
}
Outputs:
(asdvfr)
Why are the two outputs different? Does it have to do with the use of my?
It is not my that makes the difference, but scalar/list context. Braces around $tmp are imposing list context,
if (($tmp) = $line=~ /(\(\s*[^)]+\))/ ) # braces makes difference, not 'my'
while my only declares variable as lexical scoped one.
Perl has two different assignment operators; a list assignment operator and a scalar assignment operator. A list assignment gives its right operand list context, while a scalar assignment gives its right operand scalar context. A match operation returns differently depending on this context.
Which operator = is depends on what is on the left side; if it is an array, a hash, a slice, or a parenthesized expression, it is a list assignment; otherwise it is a scalar assignment.
I know from Learning Perl, 6th Ed. (ISBN: 978-1-449-30358-7) p.58 that ($x, $y) = "something", "new"; is a list context. So why does the following code print " bee"? Please explain how does the code parsed.
$dina = bobba;
$ba = bee;
print " " . ($dina, $ba)."\n";
The concatenation operator . imposes scalar context on the list created by the comma operator, so it returns its last member.
The most relevant documentation quote is this paragraph from perlop(1):
Comma Operator
Binary "," is the comma operator. In scalar context it evaluates its
left argument, throws that value away, then evaluates its right
argument and returns that value. This is just like C's comma operator.
"($x, $y) = ("something", "new"); is a list context." makes no sense. (Added the missing parenthesis to avoid going off-topic.)
First, something is evaluated in list context.
Second, there's no way to know in which context that expression will be evaluated from what you posted, but chances are it's evaluated in void context.
You are probably referring to the sub expressions ($x, $y) and ("something", "new"). They are evaluated indeed evaluated in list context, and that's because the list assignment operator evaluates its operands in list context.
In your code, ($x, $y) is the operand of a concatenation operator (.). The concatenation operator combines two strings, so it expects strings as operands. Strings being scalars, the concatenation operator evaluates its operands in scalar context.
In scalar context,
$x, $y
is about the same as
do { $x; $y }
(without the additional scope). Each item of the list is evaluated in turn in void or scalar context, and the whole evaluates to what the last item in the list returned.
>perl -E"sub f { say 'f'; 3 } sub g { say 'g'; 4 } say ':'.(f,g);"
f
g
:4