Unexpected autovivification of arguments - perl

Apparently my understanding of the no autovivification pragma is imperfect, as the not-dying-on-line-19 behaviour of the following script is extremely surprising to me.
use 5.014;
use strict;
use warnings;
no autovivification qw(fetch exists delete warn);
{
my $foo = undef;
my $thing = $foo->{bar};
# this does not die, as expected
die if defined $foo;
}
{
my $foo = undef;
do_nothing( $foo->{bar} );
# I would expect this to die, but it doesn't
die unless defined $foo;
}
sub do_nothing {
return undef;
}
Running the script produces:
Reference was vivified at test.pl line 8.
The question: why is $foo autovivified when $foo->{bar} is supplied as an argument to a sub, even though no autovivification is in effect?

In a subroutine call the arguments to a function are aliased in #_, so it must be possible to modify them. This provides an lvalue context, what will trigger autovivification.
When we look through descriptions of features you use in autovivification, they cover:
'fetch' -- "rvalue dereferencing expressions"
'exists' -- "dereferencing expressions that are parts of an exists"
'delete' -- "dereferencing expressions that are parts of a delete"
None of these deal with lvalues (neither does warn).
To stop autovivification in subroutine calls as well you need to add store
Turns off autovivification for lvalue dereferencing expressions, such as : [...]
where docs proceed with examples, including subroutine calls.
When I add it to your code,
no autovivification qw(fetch exists delete warn store);
# ...
I get
Reference was vivified at noautoviv.pl line 8.
Reference was vivified at noautoviv.pl line 16.
Died at noautoviv.pl line 19.

Related

Why does Perl's strict mode allow you to dereference a variable with an undefined value in this foreach context but not in an assignment context?

This code:
#!/usr/bin/perl
use 5.18.0;
use strict;
# Part 1
my $undef = undef;
print "1 $undef\n";
foreach my $index (#$undef) {
print "unreachable with no crash\n";
}
print "2 $undef\n";
# Part 2
my $undef = undef;
my #array = #$undef;
print "unreachable with crash\n";
Outputs:
1
2 ARRAY(0x7faefa803ee8)
Can't use an undefined value as an ARRAY reference at /tmp/perlfile line 12.
Questions about Part 1:
Why does dereferencing $undef in the Part 1 change $undef to an arrayref to an empty array?
Are there other contexts (other than a foreach) where dereferencing $undef would change it in the same way? What is the terminology to describe the most generic such case?
Questions about Part 2:
Why does dereferencing $undef in the Part 2 fall afoul of strict?
Are there other contexts (other than assignment) where dereferencing $undef would fall afoul of strict. What is the terminology to describe the most generic such case?
1) for() in Perl puts its operand into "l-value context", therefore the $undef is being auto-vivified into existence as an array (reference) with zero elements (see this relatively similar question/answer regarding l-value context).
3) Because you're trying to coercively assign an undefined value into something else in r-value context, and that's illegal under strict (nothing gets auto-vivified in this context, so you're not magically creating a variable from nothing like you would be in an l-value operation).
As far as question 2 and 4, there are several other context, too many to think of off the top of my head. For 2, map() comes to mind, or any other operation that treats the operand as an l-value.
When you dereference an undefined variable in lvalue context, Perl will auto-vivify the reference and that which it references.
For example,
#$ref = qw( a b c );
means
#{ $ref //= [] } = qw( a b c );
When you dereference an undefined variable in rvalue context, Perl won't auto-vivify. Under strict refs, this is an error. Otherwise, undefined is stringified (with warning) to the empty string, which is used as symbolic reference.
For example,
no strict qw( refs ); my $ref; my #a = #$ref;
is equivalent to
no strict qw( refs ); my #a = #{""};
(Aside from the lack of warning for the latter.)
Lvalue context is provided to:
The left-hand-side argument of assignments. (This is the "L" in "lvalue".)
Arguments of sub and method calls (because of aliasing of elements of #_).
Foreach's list (because of aliasing of $_).
The operands of some named operators (e.g. map and grep, because of aliasing of $_).

Can someone explain why Perl behaves this way (variable scoping)?

My test goes like this:
use strict;
use warnings;
func();
my $string = 'string';
func();
sub func {
print $string, "\n";
}
And the result is:
Use of uninitialized value $string in print at test.pl line 10.
string
Perl allows us to call a function before it has been defined. However when the function uses a variable declared only after the function call, the variable appears to be undefined. Is this behavior documented somewhere? Thank you!
The behaviour of my is documented in perlsub - it boils down to this - perl knows $string is in scope - because the my tells it so.
The my operator declares the listed variables to be lexically confined to the enclosing block, conditional (if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval, or do/require/use'd file.
It means it's 'in scope' from the point at which it's first 'seen' until the closing bracket of the current 'block'. (Or in your example - the end of the code)
However - in your example my also assigns a value.
This scoping process happens at compile time - where perl checks where it's valid to use $string or not. (Thanks to strict). However - it can't know what the value was, because that might change during code execution. (and is non-trivial to analyze)
So if you do this it might be a little clearer what's going on:
#!/usr/bin/env perl
use strict;
use warnings;
my $string; #undefined
func();
$string = 'string';
func();
sub func {
print $string, "\n";
}
$string is in scope in both cases - because the my happened at compile time - before the subroutine has been called - but it doesn't have a value set beyond the default of undef prior to the first invocation.
Note this contrasts with:
#!/usr/bin/env perl
use strict;
use warnings;
sub func {
print $string, "\n";
}
my $string; #undefined
func();
$string = 'string';
func();
Which errors because when the sub is declared, $string isn't in scope.
First of all, I would consider this undefined behaviour since it skips executing my like my $x if $cond; does.
That said, the behaviour is currently consistent and predictable. And in this instance, it behaves exactly as expected if the optimization that warranted the undefined behaviour notice didn't exit.
At compile-time, my has the effect of declaring and allocating the variable[1]. Scalars are initialized to undef when created. Arrays and hashes are created empty.
my $string was encountered by the compiler, so the variable was created. But since you haven't executed the assignment yet, it still has its default value (undefined) during the first call to func.
This model allows variables to be captured by closures.
Example 1:
{
my $x = "abc";
sub foo { $x } # Named subs capture at compile-time.
}
say foo(); # abc, even though $x fell out of scope before foo was called.
Example 2:
sub make_closure {
my ($x) = #_;
return sub { $x }; # Anon subs capture at run-time.
}
my $foo = make_closure("foo");
my $bar = make_closure("bar");
say $foo->(); # foo
say $bar->(); # bar
The allocation is possibly deferred until the variable is actually used.

Why is a list of undef not a read-only or constant value in Perl?

Consider the following programs in Perl.
use strict;
use warnings;
my #foo = qw(a b c);
undef = shift #foo;
print scalar #foo;
This will die with an error message:
Modification of a read-only value attempted at ...
Using a constat will give a different error:
1 = shift #foo;
Can't modify constant item in scalar assignment at ...
Execution of ... aborted due to compilation errors.
The same if we do this:
(1) = shift #foo;
All of those make sense to me. But putting undef in a list will work.
(undef) = shift #foo;
Now it prints 2.
Of course this is common practice if you have a bunch of return values and only want specific ones, like here:
my (undef, undef ,$mode, undef ,$uid, $gid, undef ,$size) = stat($filename);
The 9th line of code example in perldoc -f undef shows this, butthere is no explaination.
My question is, how is this handled internally by Perl?
Internally, Perl has different operators for scalar assignment and list assignment, even though both of them are spelled = in the source code. And the list assignment operator has the special case for undef that you're asking about.

Difference between a BLOCK and a function in terms of scoping in Perl

Guys I'm a little bit confused, I was playing with scoping in Perl, when i encountered this one:
#! usr/bin/perl
use warnings;
use strict;
sub nested {
our $x = "nested!";
}
print $x; # Error "Variable "$x" is not imported at nested line 10."
print our $x; # Doesn't print "nested!"
print our($x) # Doesn't print "nested!"
But when i do this:
{
our $x = "nested";
}
print our($x); # Prints "nested"
print our $x; # Prints "nested"
print $x; # Prints "nested"
So guys can you explain to me why those works and not?
To restate DVK's answer, our is just a handy aliasing tool. Every variable you use in these examples is actually named $main::x. Within any lexical scope you can use our to make an alias to that variable, with a shortened name, in that same scope; the variable doesn't reset or get removed outside, only the alias. This is unlike the my keyword which makes a new variable bound to that lexical scope.
To explain why the block example works the way it does, let's look at our explanation from "Modern Perl" book, chapter 5
Our Scope
Within given scope, declare an alias to a package variable with the our builtin.
The fully-qualified name is available everywhere, but the lexical alias is visible only within its scope.
This explains why the first two prints of your second example work (our is re-declared in print's scope), whereas the third one does not (as our only aliases $x to the package variable within the block's scope). Please note that printing $main::x will work correctly - it's only the alias that is scoped to the block, not the package variable itself.
As far as with the function:
print our $x; and print our($x) "don't work" - namely, correctly claim the value is uninitialized - since you never called the function which would initialize the variable. Observe the difference:
c:\>perl -e "use strict; use warnings; sub x { our $x = 1;} print our $x"
Use of uninitialized value $x in print at -e line 1.
c:\>perl -e "use strict; use warnings; sub x { our $x = 1;} x(); print our $x"
1
print $x; won't work for the same reason as with the block - our only scopes the alias to the block (i.e. in this case body of the sub) therefore you MUST either re-alias it in the main block's scope (as per print our $x example), OR use fully qualified package global outside the sub, in which case it will behave as expected:
c:\>perl -e "use strict; use warnings; sub x { our $x = 1;} print $main::x"
Use of uninitialized value $x in print at -e line 1.
c:\>perl -e "sub x { our $x = 1;} x(); print $main::x"
1

Why can't I say print $somehash{$var}{fh} "foo"?

I have a line of code along the lines of:
print $somehash{$var}{fh} "foo";
The hash contains the filehandle a few levels down. The error is:
String found where operator expected at test.pl line 10, near "} "foo""
I can fix it by doing this:
my $fh = $somehash{$var}{fh};
print $fh "foo";
...but is there a one-liner?
see http://perldoc.perl.org/functions/print.html
Note that if you're storing
FILEHANDLEs in an array, or if you're
using any other expression more
complex than a scalar variable to
retrieve it, you will have to use a
block returning the filehandle value
instead: ...
So, in your case, you would use a block like this:
print { $somehash{$var}{fh} } "foo";
If you have anything other than a simple scalar as your filehandle, you need to wrap the reference holding the filehandle in braces so Perl knows how to parse the statement:
print { $somehash{$var}{fh} } $foo;
Part of Perl Best Practices says to always wrap filehandles in braces just for this reason, although I don't get that nutty with it.
The syntax is odd because print is an indirect method on a filehandle object:
method_name Object #arguments;
You might have seen this in old-school CGI.pm. Here are two indirect method calls:
use CGI;
my $cgi_object = new CGI 'cat=Buster&bird=nightengale';
my $value = param $cgi_object 'bird';
print "Indirect value is $value\n";
That almost works fine (see Schwern's answer about the ambiguity) as long as the object is in a simple scalar. However, if I put the $cgi_object in a hash, I get the same syntax error you got with print. I can put the braces around the hash access to make it work out. Continuing with the previous code:
my %hash;
$hash{animals}{cgi} = $cgi_object;
# $value = param $hash{animals}{cgi} 'cat'; # syntax error
$value = param { $hash{animals}{cgi} } 'cat';
print "Braced value is $value\n";
That's all a bit clunky so just use the arrow notation for everything instead:
my $cgi_object = CGI->new( ... );
$cgi_object->param( ... );
$hash{animals}{cgi}->param( ... );
You can do the same with filehandles, although you have to use the IO::Handle module to make it all work out:
use IO::Handle;
STDOUT->print( 'Hello World' );
open my( $fh ), ">", $filename or die ...;
$fh->print( ... );
$hash{animals}{fh} = $fh;
$hash{animals}{fh}->print( ... );
The above answers are all correct. The reason they don't allow a full expression in there is print FH LIST is already pretty weird syntax. To put anything more complicated in there would introduce a ton of ambiguous syntax. The block removed that ambiguity.
To see where this madness leads to, consider the horror that is indirect object syntax.
foo $bar; # Is that foo($bar) or $bar->foo()? Good luck!