Why no interpolation within `map` BLOCK? - perl

This throws an error in Perl v5.20:
use strict;
use warnings;
my #a = (2,3,9);
my %b = map { "number $_" => 2*$_ } #a;
Error:
syntax error at a.pl line 4, near "} #a"
Execution of a.pl aborted due to compilation errors.
This doesn't:
use strict;
use warnings;
my #a = (2,3,9);
my %b = map { "number ".$_ => 2*$_ } #a;
Why is interpolation of $_ disallowed within the map BLOCK?

map has two syntax:
map BLOCK LIST
map EXPR, LIST
Perl must determine which syntax you are using. The problem is that both BLOCK and EXPR can start with { because { ... } can be the hash constructor (e.g. my $h = { a => 1, b => 2 };).
That means that Perl's grammar is ambiguous. When an ambiguity is encountered, perl guesses what you mean after looking ahead a little. In your situation, it guessed wrong. It guessed { was the start of a hash constructor instead of the start of a block. You will need to disambiguate explicitly.
The following are convenient ways to disambiguate blocks and hash constructors:
+{ ... } # Not a valid block, so must be a hash constructor.
{; ... } # Perl looks head, and sees that this must be a block.
So in your case, you could use
my %b = map {; "number $_" => 2*$_ } #a;
Related: Difference between returning +{} or {} in perl from a function, and return ref or value

Related

Can I make a variable optional in a perl sub prototype?

I'd like to understand if it's possible to have a sub prototype and optional parameters in it. With prototypes I can do this:
sub some_sub (\#\#\#) {
...
}
my #foo = qw/a b c/;
my #bar = qw/1 2 3/;
my #baz = qw/X Y Z/;
some_sub(#foo, #bar, #baz);
which is nice and readable, but the minute I try to do
some_sub(#foo, #bar);
or even
some_sub(#foo, #bar, ());
I get errors:
Not enough arguments for main::some_sub at tablify.pl line 72, near "#bar)"
or
Type of arg 3 to main::some_sub must be array (not stub) at tablify.pl line 72, near "))"
Is it possible to have a prototype and a variable number of arguments? or is something similar achievable via signatures?
I know it could be done by always passing arrayrefs I was wondering if there was another way. After all, TMTOWTDI.
All arguments after a semi-colon are optional:
sub some_sub(\#\#;\#) {
}
Most people are going to expect your argument list to flatten, and you are reaching for an outdated tool to do what people don't expect.
Instead, pass data structures by reference:
some_sub( \#array1, \#array2 );
sub some_sub {
my #args = #_;
say "Array 1 has " . $args[0]->#* . " elements";
}
If you want to use those as named arrays within the sub, you can use ref aliasing
use v5.22;
use experimental qw(ref_aliasing);
sub some_sub {
\my( #array1 ) = $_[0];
...
}
With v5.26, you can move the reference operator inside the parens:
use v5.26;
use experimental qw(declared_refs);
sub some_sub {
my( \#array1 ) = $_[0];
...
}
And, remember that v5.20 introduced the :prototype attribute so you can distinguish between prototypes and signatures:
use v5.20;
sub some_sub :prototype(##;#) { ... }
I write about these things at The Effective Perler (which you already read, I see), in Perl New Features, a little bit in Preparing for Perl 7 (which is mostly about what you need to stop doing in Perl 5 to be future proof).

Can someone explain why Perl behaves this way (variable scoping)?

My test goes like this:
use strict;
use warnings;
func();
my $string = 'string';
func();
sub func {
print $string, "\n";
}
And the result is:
Use of uninitialized value $string in print at test.pl line 10.
string
Perl allows us to call a function before it has been defined. However when the function uses a variable declared only after the function call, the variable appears to be undefined. Is this behavior documented somewhere? Thank you!
The behaviour of my is documented in perlsub - it boils down to this - perl knows $string is in scope - because the my tells it so.
The my operator declares the listed variables to be lexically confined to the enclosing block, conditional (if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval, or do/require/use'd file.
It means it's 'in scope' from the point at which it's first 'seen' until the closing bracket of the current 'block'. (Or in your example - the end of the code)
However - in your example my also assigns a value.
This scoping process happens at compile time - where perl checks where it's valid to use $string or not. (Thanks to strict). However - it can't know what the value was, because that might change during code execution. (and is non-trivial to analyze)
So if you do this it might be a little clearer what's going on:
#!/usr/bin/env perl
use strict;
use warnings;
my $string; #undefined
func();
$string = 'string';
func();
sub func {
print $string, "\n";
}
$string is in scope in both cases - because the my happened at compile time - before the subroutine has been called - but it doesn't have a value set beyond the default of undef prior to the first invocation.
Note this contrasts with:
#!/usr/bin/env perl
use strict;
use warnings;
sub func {
print $string, "\n";
}
my $string; #undefined
func();
$string = 'string';
func();
Which errors because when the sub is declared, $string isn't in scope.
First of all, I would consider this undefined behaviour since it skips executing my like my $x if $cond; does.
That said, the behaviour is currently consistent and predictable. And in this instance, it behaves exactly as expected if the optimization that warranted the undefined behaviour notice didn't exit.
At compile-time, my has the effect of declaring and allocating the variable[1]. Scalars are initialized to undef when created. Arrays and hashes are created empty.
my $string was encountered by the compiler, so the variable was created. But since you haven't executed the assignment yet, it still has its default value (undefined) during the first call to func.
This model allows variables to be captured by closures.
Example 1:
{
my $x = "abc";
sub foo { $x } # Named subs capture at compile-time.
}
say foo(); # abc, even though $x fell out of scope before foo was called.
Example 2:
sub make_closure {
my ($x) = #_;
return sub { $x }; # Anon subs capture at run-time.
}
my $foo = make_closure("foo");
my $bar = make_closure("bar");
say $foo->(); # foo
say $bar->(); # bar
The allocation is possibly deferred until the variable is actually used.

Perl: how do I return undef into a list?

I have a little problem in Perl. Basically, i do something like this:
sub myFunction
{
return (&myOtherFnA(),&myOtherFnB(),&myOtherFnC());
}
sub myOtherFnA() {return 'A';}
sub myOtherFnB() {return undef;}
sub myOtherFnC() {return 'C';}
my Problem is: when myOtherFnB() is returning undef, i want a list that has undef as 2nd element. But when myOtherFnB() does so, i just get a list with 2 elements, that of myOtherFnA() and that of myOtherFnC(). I get:
('A','C')
but i want to get:
('A', undef, 'C')
What syntax do I need to use to stop Perl from removing the return of myOtherFnB() from the list if it is undef and actually just put an element of undef into the list?
I don't know what makes you think you're not getting undef in the list. However, there are a number of problems with your code
Don't use an ampersand & when defining subroutines — it is a syntax error
Don't use an ampersand when calling subroutines. That hasn't been necessary since Perl 4 over twenty years ago
Don't use prototypes (the parentheses after the subroutine name in the definition) as they don't do what you think, and they're meant for something quite specialised
Don't use upper case letters in local identifiers: they are reserved for global identifiers like package names
This rewrite of your code fixes the syntax errors and corrects the above problems. As you see, the second element of the returned list is undef
use strict;
use warnings;
sub my_function {
return (
my_other_function_a(),
my_other_function_b(),
my_other_function_c()
);
}
sub my_other_function_a {
return 'A';
}
sub my_other_function_b {
return undef;
}
sub my_other_function_c {
return 'C';
}
use Data::Dump;
dd [ my_function ];
output
["A", undef, "C"]

Why does Perl function "map" give the error "Not enough arguments for map"

Here is the thing I don't understand.
This script works correctly (notice the concatenation in the map functin):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %aa = map { 'a' . '' => 1 } (1..3);
print Dumper \%aa;
__END__
output:
$VAR1 = {
'a' => 1
};
But without concatenation the map does not work. Here is the script I expect to work, but it does not:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %aa = map { 'a' => 1 } (1..3);
print Dumper \%aa;
__END__
output:
Not enough arguments for map at e.pl line 7, near "} ("
syntax error at e.pl line 7, near "} ("
Global symbol "%aa" requires explicit package name at e.pl line 9.
Execution of e.pl aborted due to compilation errors.
Can you please explain such behaviour?
Perl uses heuristics to decide whether you're using:
map { STATEMENTS } LIST; # or
map EXPR, LIST;
Because although "{" is often the start of a block, it might also be the start of a hashref.
These heuristics don't look ahead very far in the token stream (IIRC two tokens).
You can force "{" to be interpreted as a block using:
map {; STATEMENTS } LIST; # the semicolon acts as a disambigator
You can force "{" to be interpreted as a hash using:
map +{ LIST }, LIST; # the plus sign acts as a disambigator
grep suffers similarly. (Technically so does do, in that a hashref can be given as an argument, which will then be stringified and treated as if it were a filename. That's just weird though.)
Per the Documentation for map:
Because Perl doesn't look ahead for the closing } it has to take a guess at which it's dealing with based on what it finds just after the {. Usually it gets it right, but if it doesn't it won't realize something is wrong until it gets to the }
Giving the examples:
%hash = map { "\L$_" => 1 } #array # perl guesses EXPR. wrong
%hash = map { +"\L$_" => 1 } #array # perl guesses BLOCK. right
So adding + will give you the same as the first example you've given
my %aa = map { +'a'=> 1 } (1..3);
Perl's manpage entry for map() explains this:
"{" starts both hash references and blocks, so "map { ..."
could be either the start of map BLOCK LIST or map EXPR, LIST.
Because Perl doesn't look ahead for the closing "}" it has to
take a guess at which it's dealing with based on what it finds
just after the "{". Usually it gets it right, but if it doesn't
it won't realize something is wrong until it gets to the "}"
and encounters the missing (or unexpected) comma. The syntax
error will be reported close to the "}", but you'll need to
change something near the "{" such as using a unary "+" to give
Perl some help:
%hash = map { "\L$_" => 1 } #array # perl guesses EXPR. wrong
%hash = map { +"\L$_" => 1 } #array # perl guesses BLOCK. right
%hash = map { ("\L$_" => 1) } #array # this also works
%hash = map { lc($_) => 1 } #array # as does this.
%hash = map +( lc($_) => 1 ), #array # this is EXPR and works!
%hash = map ( lc($_), 1 ), #array # evaluates to (1, #array)
or to force an anon hash constructor use "+{":
#hashes = map +{ lc($_) => 1 }, #array # EXPR, so needs comma at end
to get a list of anonymous hashes each with only one entry
apiece.
Based on this, to get rid of the concatenation kludge, you'd need to adjust your syntax to one of these instead:
my %aa = map { +'a' => 1 } (1..3);
my %aa = map { ('a' => 1) } (1..3);
my %aa = map +( 'a' => 1 ), (1..3);
The braces are a little ambiguous in the context of map. They can be surrounding a block as you are intending, or they can be an anonymous hash constructor. There is some fuzzy logic in the perl parser which tries to guess which one you mean.
Your second case looks more like an anonymous hash to perl.
See the perldoc for map which explains this and gives some workarounds.

Why does this map block contain an apparently useless +?

While browsing the source code I saw the following lines:
my #files_to_keep = qw (file1 file2);
my %keep = map { + $_ => 1 } #files_to_keep;
What does the + do in this code snippet? I used Data::Dumper to see whether taking out the plus sign does anything, but the results were the same:
$ perl cleanme.pl
$VAR1 = {
'file1' => 1,
'file2' => 1
};
This is used to prevent a parsing problem. The plus symbol forces the interpreter to behave like a normal block and not an expression.
The fear is that perhaps you are trying to create a hashreference using the other (expression) formulation of map like so.
#array_of_hashrefs = map { "\L$_" => 1 }, #array
Notice the comma. Then if the parser guesses that you are doing this given the statement in the OP there will a syntax error for missing the comma! To see the difference try quoting "$_". For whatever reason, the parser takes this as enough to trigger the expression behavior.
Yes its an oddity. Therefore many extra-paranoid Perl programmers toss in the extra plus sign more often than needed (me included).
Here are the examples from the map documentation.
%hash = map { "\L$_" => 1 } #array # perl guesses EXPR. wrong
%hash = map { +"\L$_" => 1 } #array # perl guesses BLOCK. right
%hash = map { ("\L$_" => 1) } #array # this also works
%hash = map { lc($_) => 1 } #array # as does this.
%hash = map +( lc($_) => 1 ), #array # this is EXPR and works!
%hash = map ( lc($_), 1 ), #array # evaluates to (1, #array)
For a fun read (stylistically) and a case where the parser gets it wrong read this: http://blogs.perl.org/users/tom_wyant/2012/01/the-case-of-the-overloaded-curlys.html
The unary-plus operator simply returns its operand unchanged. Adding one doesn't even change the context.
In the example you gave, it is completely useless. But there are situations where it is useful to make the next token something that's undeniably an operator.
For example, map has two syntaxes.
map EXPR, LIST
and
map BLOCK LIST
A block starts with {, but so can an expression. For example, { } can be a block or a hash constructor.
So how can map tell the difference? It guesses. Which means it's sometimes wrong.
One occasion where is guesses wrong is the following:
map { $_ => 1 }, #list
You can prod it in to guessing correctly using + or ;.
map {; ... # BLOCK
map +{ ... # EXPR
So in this case, you could use
map +{ foo => $_ }, #list
Note that you could also use the following:
map({ foo => $_ }, #list)
Another example is when you omit the parens around arguments, and the first argument expression starts with a paren.
print ($x+$y)*2; # Same as: 2 * print($x+$y)
It can be fixed using
print +($x+$y)*2;
But why pile on a hack just to avoid parens? I prefer
print(($x+$y)*2);