In Perl, what's the difference between "if defined $count" and "if $count"? - perl

I have the following script:
#!/usr/bin/perl
use warnings;
use strict;
my $count = 0;
my ( #first , #second , #third );
while ($count <= 7){
push ( #first , $count);
push ( #second , $count) if defined $count;
push ( #third , $count) if $count;
$count++;
}
print "first: #first\n";
print "second: #second\n";
print "third: #third\n";
This produces the following output:
first: 0 1 2 3 4 5 6 7
second: 0 1 2 3 4 5 6 7
third: 1 2 3 4 5 6 7
What's the difference between putting if defined $count vs. if $count, other than the latter method won't add the zero to the array? I've searched the perldocs but couldn't find the answer.

Truth and Falsehood in perlsyn explains what values are considered false in a boolean context:
The number 0, the strings '0' and '',
the empty list (), and undef are
all false in a boolean context. All other values are true.
undef is the value of a variable that has never been initialized (or that has been reset using the undef function). The defined function returns true if the value of the expression is not undef.
if $count is false if $count is the number 0, the string '0', the empty string, undef, or an object that has been overloaded to return one of those things when used in a boolean context. Otherwise, it's true. (The empty list can't be stored in a scalar variable.)
if defined $count is false only if $count is undef.

if you see the documentation of defined in perldoc then you will find that
Returns a Boolean value telling
whether EXPR has a value other
than the undefined value undef. If EXPR is not present, $_ is
checked.
A simple Boolean test will not
distinguish among undef, zero, the
empty string, and "0" , which are all
equally false.
that means,
push ( #second , 'undef') if defined $count;
when $count = 0, then it is defined because 0 is different from undef and defined returns true, but in this case push ( #third , 'undef') if $count; if condition fails, that's why it is not pushing 0 into the array.

The defined predicate tests to see whether the variable ($count in this case) is defined at all. In the code you've written, it always will be defined inside the loop because it's always got some value.
If you were to add:
undef $count;
push ( #first , 'undef');
push ( #second , 'undef') if defined $count;
push ( #third , 'undef') if $count;
after the loop, you would see the difference. (Note that I changed the code to add the literal 'undef' instead of $count because you'll get misleading effects from adding the actual undef value.

The if decides to run its block (or single statement) by looking at the value of the expression you give it:
if( EXPR ) { ... }
If that expression is true, it runs its block. If that expression is false, it doesn't.
That expression can be just about anything. Perl evaluates the expression, reducing it to a value that is either true or false. The if() then looks at that value.
So, removing that part of your question, you're left with "What's the difference between defined $count and $count". Well, one is the return value for defined and the other is whatever value is stored in $count.
When you want to figure out what a particular bit of code is doing, reduce it in the same logical process that perl would, one step at a time. See what each step does, and you'll often be able to answer your own questions. :)
You say that you searched the documentation, but I'm not sure where you looked. If you want to look up a built-in function, you can use perldoc's -f switch:
$ perldoc -f defined
If you want to read about Perl's syntax for things such as if, that's in perlsyn.
I have a beginner's guide to the Perl docs in Perl documentation documentation.

The way I read it.
if $count is only true when $count evaluates != 0, hence the third array has no 0 in it.
if defined $count checks to see if the $count as a scalar has been created, and as you have $count scalar, it's the same as the first one.

Related

Explanation of Perl's syntax from module MoreUtils.pm

I am seeking explanation of the syntax of Perl's uniq and fidrstidx function from module MoreUtils.pm.
Having sought that, I already know other ways to get uniq array elements from an array having duplicate elements and finding the first index from an array by below ways :
## remove duplicate elements ##
my #arr = qw (2 4 2 8 3 4 6);
my #uniq = ();
my %hash = ();
#uniq = grep {!$hash{$_}++ } #arr;
### first index ###
#arr = qw (Java ooperl Ruby cgiperl Python);
my ($index) = grep {$arr[$_] =~ /perl/} 0..$#arr;
Can anybody please explain me second line of this below sub uniq function comprising map and ternary operator from MoreUtils.pm:
map {$h{$_}++ == 0 ? $_ : () } #;
and also
the &# passed to firstidx function and the below line in the body of the function :
local *_ = \$_[$i];
What I understand that sub routine ref is passed to firstidx. But a bit more detailed explanation will be much appreciated.
Thanks.
Your second question was answered in the comments.
Your first question asks about map {$h{$_}++ == 0 ? $_ : () } #; from List::MoreUtils. In recent versions, it's actually in List::MoreUtils::PP (for Pure Perl) since many of the subroutines are also implemented in C and XS. Here's the current version of the Pure Perl uniq:
sub uniq (#)
{
my %seen = ();
my $k;
my $seen_undef;
grep { defined $_ ? not $seen{ $k = $_ }++ : not $seen_undef++ } #_;
}
This has the same map technique although it's using grep instead. The grep goes through all of the elements in #_ and has to return either true or false for each of them. The elements which evaluate to true end up in the output list. The code then wants to make an element evaluate to true the first time it sees it and false the rest of the times.
In this code it handles undef separately. If the current element is not undef, it does the first branch of the conditional operator and the second branch otherwise. Now let's look at the branches.
The defined case adds an element to a hash. No one left code comments about the use of $k but it probably has something to do with not disturbing $_. That $k becomes the key for the hash:
not $seen{ $k = $_ }++
If that is the first time that key has been encountered the value of the hash is undef. That post-increment does its work after the value is used so hold off on thinking about that for a moment. The low-precendence not sees the value of $seen{$k}, which is undef. The not turns the false value of undef into true. That true indicates that the grep has seen $_ for the first time. It becomes part of the output list. Then the ++ does its work and increments the undef value to 1. On all subsequent encounters with the same value the hash value will be true. The not will turn the true value into false and that element won't be in the output list.
The map you show implements the grep. It returns an element when the condition is true and returns no elements when it is false:
map {$h{$_}++ == 0 ? $_ : () } #_;
For each element it adds it as the key in the hash and compares the value to 0. The first time an element is seen that value is undef. In numeric context an undef is 0. So, the == returns true and the first branch of the conditional operator fires, returning $_ to the output list. The ++ then increments the hash value from undef to 1. The next time it encounters the same value the hash value is not 0 and the second branch of the conditional operator returns the empty list. That adds no elements to the output list.
Newer version of List::MoreUtils don't use the construct any more, but as Сухой27 explained,
map { CONDITION ? $_ : () } LIST
is just a fancy alternative to
grep { CONDITION } LIST
I don't think there's any overarching reason the author chose map for this implementation, and in fact it was simplified to grep in later versions of List::MoreUtils.
The firstidx syntax is firstidx BLOCK LIST. Like the builtin map and grep, it is specified that the code in BLOCK will operate on the variable $_, and that the code is allowed to make changes to $_. So in the firstidx implementation, it is not sufficient to set $_ to each value in LIST. Rather, $_ must be aliased to each element of LIST so that a change in $_ inside BLOCK also results to a change in the element in the LIST. This is accomplished by manipulating the symbol table
local *_ = \$scalar # make $_ an alias of $scalar
And you use local so that when firstidx is done, we haven't clobbered any useful information that was previously in the $_ variable.

What happens to a non-matching regex in a Perl subroutine call?

I'm trying to make sense of what's happening with a non-matching regex in a subroutine call. Consider this script:
sub routine{
print Dumper(\#_);
}
my $s = 'abc123';
# These pass a single element to &routine
&routine( $s =~ /c/ ); # 1. passes (1)
&routine(2 == 3); # 2. passes ('')
&routine(3 == 3); # 3. passes (1)
# The following two calls appear to be identical
&routine( $s =~ /foobar/ ); # 4. passes ()
&routine(); # 5. passes ()
In the above script, numbers 1, 2 and 3 all pass a single value to &routine. I'm surprised that number 4 doesn't pass a false value, but rather passes nothing at all!
It doesn't seem possible that the non-matching regex evaluates to nothing at all, since the same sort of signature in a conditional isn't valid:
# This is fine
if( $s =~ /foobar/ ){
print "it's true!\n";
}
# This is a syntax error
if( ){
print "Hmm...\n"; # :/
}
What happens to the non-matching regex when it's used in a subroutine call? Further, is it possible for &routine to figure out whether or not it's been called with a non-matching regex, vs nothing at all?
When the match operator =~ is used in list context it returns a list of matches. When there are no matches this list is empty (also called the empty list), and the empty list is passed to your sub routine which in turn causes #_ to be empty.
If you explicitly want to pass the false value of "Did this expression return any matches?" you need to perform your match in scalar context. You can do this by using the scalar keyword
&routine( scalar $s =~ /foobar/ );
which will pass the value ''(false) to your routine sub. Calling a sub without any arguments effectively passes this empty list, so your final example would be correctly written:
if ( () ) {
print "Hmm...\n";
}
which is not a syntax error because in Perl 0, '', and () all represent false.

Where does a Perl subroutine get values missing from the actual parameters?

I came across the following Perl subroutine get_billable_pages while chasing a bug. It takes 12 arguments.
sub get_billable_pages {
my ($dbc,
$bill_pages, $page_count, $cover_page_count,
$domain_det_page, $bill_cover_page, $virtual_page_billing,
$job, $bsj, $xqn,
$direction, $attempt,
) = #_;
my $billable_pages = 0;
if ($virtual_page_billing) {
my #row;
### Below is testing on the existence of the 11th and 12th parameters ###
if ( length($direction) && length($attempt) ) {
$dbc->xdb_execute("
SELECT convert(int, value)
FROM job_attribute_detail_atmp_tbl
WHERE job = $job
AND billing_sub_job = $bsj
AND xqn = $xqn
AND direction = '$direction'
AND attempt = $attempt
AND attribute = 1
");
}
else {
$dbc->xdb_execute("
SELECT convert(int, value)
FROM job_attribute_detail_tbl
WHERE job = $job
AND billing_sub_job = $bsj
AND xqn = $xqn
AND attribute = 1
");
}
$cnt = 0;
...;
But is sometimes called with only 10 arguments
$tmp_det = get_billable_pages(
$dbc2,
$row[6], $row[8], $row[7],
$domain_det_page, $bill_cover_page, $virtual_page_billing,
$job1, $bsj1, $row[3],
);
The function does a check on the 11th and 12th arguments.
What are the 11th and 12th arguments when the function is passed only 10 arguments?
Is it a bug to call the function with only 10 arguments because the 11th and 12th arguments end up being random values?
I am thinking this may be the source of the bug because the 12th argument had a funky value when the program failed.
I did not see another definition of the function which takes only 10 arguments.
The values are copied out of the parameter array #_ to the list of scalar variables.
If the array is shorter than the list, then the excess variables are set to undef. If the array is longer than the list, then excess array elements are ignored.
Note that the original array #_ is unmodified by the assignment. No values are created or lost, so it remains the definitive source of the actual parameters passed when the subroutine is called.
ikegami suggested that I should provide some Perl code to demonstrate the assignment of arrays to lists of scalars. Here is that Perl code, based mostly on his edit
use strict;
use warnings;
use Data::Dumper;
my $x = 44; # Make sure that we
my $y = 55; # know if they change
my #params = (8); # Make a dummy parameter array with only one value
($x, $y) = #params; # Copy as if this is were a subroutine
print Dumper $x, $y; # Let's see our parameters
print Dumper \#params; # And how the parameter array looks
output
$VAR1 = 8;
$VAR2 = undef;
$VAR1 = [ 8 ];
So both $x and $y are modified, but if there are insufficient values in the array then undef is used instead. It is as if the source array was extended indefinitely with undef elements.
Now let's look at the logic of the Perl code. undef evaluates as false for the purposes of conditional tests, but you apply the length operator like this
if ( length($direction) && length($attempt) ) { ... }
If you have use warnings in place as you should, Perl would normally produce a Use of uninitialized value warning. However length is unusual in that, if you ask for the length of an undef value (and you are running version 12 or later of Perl 5) it will just return undef instead of warning you.
Regarding "I did not see another definition of the function which takes only 10 arguments", Perl doesn't have function templates like C++ and Java - it is up to the code in the subroutine to look at what it has been passed and behave accordingly.
No, it's not a bug. The remaining arguments are "undef" and you can check for this situation
sub foo {
my ($x, $y) = #_;
print " x is undef\n" unless defined $x;
print " y is undef\n" unless defined $y;
}
foo(1);
prints
y is undef

When to use defined

I am a little confused as which way to test parameters. Here are two examples from source code posted below. First is this
if(!defined($DBHdl) || !defined($acct_no));
the way to test for undefined parameters?
Second, after assigning to a hashref
$ptMtrRecRef = $ptSelHdl->fetchrow_hashref;
is the best way to test for $ptMtrRecRef being defined to use
if(!$ptMtrRecRef)
or
if(!defined($ptMtrRecRef))?
###############################################################################
# Returns count of meters per account number.
# $PkNam -- package name discarded
# $DBHdl -- ICS database handle
# $acct_no -- water account number
sub mgbl_get_meter_count
{
my ($PkNam, $DBHdl, $acct_no) = #_;
die("mgbl_get_meter_count passed undef handles.\n")
if(!defined($DBHdl) || !defined($acct_no));
my $ptSelHdl;
my $ptMtrRecRef;
my $sql_statement =
"select count(*) from meter m where m.acct_no = ".$acct_no.";";
$ptSelHdl = $DBHdl->prepare($sql_statement);
die("Cannot prepare select count(*) from meter m\n")
if(!$ptSelHdl || !$ptSelHdl->execute);
$ptMtrRecRef = $ptSelHdl->fetchrow_hashref;
return $ptMtrRecRef;
}
$sth->fetchrow_hashref will either return undef or a reference to a hash. As such
if (defined($row))
and
if ($row)
are equivalent here. (undef is false, and reference is always true.) I opt for the simpler alternative.
Same idea for $dbh->prepare.
In the case of the code you posted, I would also do as ikegami said, and use the shorter form.
There are occasions when that isn't suitable, however, for example if a variable could have a legitimate value that would be treated as false if simply used in a true/false test. For example:
my $value = 0;
print "defined\n" if defined $value; # prints 'defined'
print "true\n" if $value; # does not print anything
Well , in perl script language, defined($a) is just a sub routine to test if $a is "undef",nothing else. So ,you will ask ,what is undef?
To be accurate, it is a perl subroutine ,the same as defined.But when it has no parameter, it can be considered as a perl-special scalar . For example , when you pop a value from an empty array ,it will return an undef.When you call subroutine "undef $b",then $b will become undef($b must be an left value),nothing else. Only in this case, defined($b) will return false.But if $c is an empty string like "" ,number zero ,or string "0" ,defined($c) will still return true;
But if you use a simple boolean expression instead of defined,it becomes totally different. A simple Boolean test will not distinguish among undef, zero, the empty string, and "0" .So , it absolutely depends on your pratical requirement when determining using defined() or just a boolean test.

What perl code samples can lead to undefined behaviour?

These are the ones I'm aware of:
The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...").
Modifying a variable twice in the same statement, like $i = $i++;
sort() in scalar context
truncate(), when LENGTH is greater than the length of the file
Using 32-bit integers, "1 << 32" is undefined. Shifting by a negative number of bits is also undefined.
Non-scalar assignment to "state" variables, e.g. state #a = (1..3).
One that is easy to trip over is prematurely breaking out of a loop while iterating through a hash with each.
#!/usr/bin/perl
use strict;
use warnings;
my %name_to_num = ( one => 1, two => 2, three => 3 );
find_name(2); # works the first time
find_name(2); # but fails this time
exit;
sub find_name {
my($target) = #_;
while( my($name, $num) = each %name_to_num ) {
if($num == $target) {
print "The number $target is called '$name'\n";
return;
}
}
print "Unable to find a name for $target\n";
}
Output:
The number 2 is called 'two'
Unable to find a name for 2
This is obviously a silly example, but the point still stands - when iterating through a hash with each you should either never last or return out of the loop; or you should reset the iterator (with keys %hash) before each search.
These are just variations on the theme of modifying a structure that is being iterated over:
map, grep and sort where the code reference modifies the list of items to sort.
Another issue with sort arises where the code reference is not idempotent (in the comp sci sense)--sort_func($a, $b) must always return the same value for any given $a and $b.