Perl: precedence (Leftward list operator) - perl

from Programming Perl pg 90, he says:
#ary = (1, 3, sort 4, 2);
print #ary;
the commas on the right of the sort are evaluated before the sort but the commas on the left are evaluated after. ... list operators tend to gobble .. and then act like a simple term"
Does the assignment result in sort being processed or does that happen when #ary is expanded by print?
What does he mean by all that "comma" stuff?? My understanding is that in the assignment statement, comma has a lower priority than a list operator therefore sort runs first and gobbles up it's arguments (4 and 2).. How the heck is comma being evaluated at all?? So that statemnent then becomes (1, 3, 2, 4) a list which is assigned.. comma is just acting as a list separator and not an operator!! In fact on pg:108 he says: do not confuse the scalar context use of comma with list context use..
What is a leftward and rightward list operator? print #ary is a rightward list operator?? So it has very low priority?
print($foo, exit);
here, how is precedence evaluated? print is a list operator that looks like a function so it should run first! it has two arguments $foo and exit.. so why is exit not treated as a string??? After all priority-wise print(the list operator) has higher priority??
print $foo, exit;
here, you have print and , operators but the list operator has higher precedence.. so.. exit should be treated as a string - why not??
print ($foo & 255) + 1, "\n";
here since it's a list operator it prints $foo & 255 Shouldn't something similar happen with the above mentioned exit stuff..

When in doubt about how Perl is parsing a construct, you can run the code through the B::Deparse module, which will generate Perl source code from the compiled internal representation. For your first example:
$ perl -MO=Deparse,-p -e '#ary = (1, 3, sort 4, 2); print #ary;'
(#ary = (1, 3, sort(4, 2)));
print(#ary);
-e syntax OK
So as you can see, sort takes the two arguments to its right.
As far as execution order goes, you can find that out with the B::Concise module (I've added the comments):
$ perl -MO=Concise,-exec -e '#ary = (1, 3, sort 4, 2); print #ary;'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark s # start of list
4 <$> const[IV 1] s # 1 is added to list
5 <$> const[IV 3] s # 3 is added to list
6 <0> pushmark s # start of sort's argument list
7 <$> const[IV 4] s # 4 is added to sort's argument list
8 <$> const[IV 2] s # 2 is added to sort's argument list
9 <#> sort lK # sort is run, and returns its list into the outer list
a <0> pushmark s
b <#> gv[*ary] s
c <1> rv2av[t2] lKRM*/1
d <2> aassign[t3] vKS/COMMON # the list is assigned to the array
e <;> nextstate(main 1 -e:1) v:{
f <0> pushmark s # start of print's argument list
g <#> gv[*ary] s # the array is loaded into print's argument list
h <1> rv2av[t5] lK/1
i <#> print vK # print outputs it's argument list
j <#> leave[1 ref] vKP/REFC
-e syntax OK
For your second example:
$ perl -MO=Deparse,-p -e 'print $foo, exit;'
print($foo, exit);
-e syntax OK
$ perl -MO=Concise,-exec -e 'print $foo, exit;'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark s
4 <#> gvsv[*foo] s # add $foo to the argument list
5 <0> exit s # call `exit` and add its return value to the list
6 <#> print vK # print the list, but we never get here
7 <#> leave[1 ref] vKP/REFC
-e syntax OK
So as you can see, the exit builtin is run while trying to assemble the argument list for print. Since exit causes the program to quit, the print command never gets to run.
And the last one:
$ perl -MO=Deparse,-p -e 'print ($foo & 255) + 1, "\n";'
((print(($foo & 255)) + 1), '???'); # '???' means this was optimized away
-e syntax OK
$ perl -MO=Concise,-exec -e 'print ($foo & 255) + 1, "\n";'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark v
4 <0> pushmark s
5 <#> gvsv[*foo] s
6 <$> const[IV 255] s
7 <2> bit_and[t2] sK
8 <#> print sK
9 <$> const[IV 1] s
a <2> add[t3] vK/2
b <#> list vK
c <#> leave[1 ref] vKP/REFC
-e syntax OK

sort is evaluated when it's called, it really doesn't have anything to do with the assignment. sort returns a list. So what you're assigning is:
#ary = (1, 3, (2,4) );
Perl ignores the second parenthesis so you end up with 1,3,2,4 as you would expect.
The comma you're referring to no longer exists. It's the second argument to sort. Perl sees your list as a 3 item list not a 4 item list (it expands it to 4 in the assignment)
rightward does something with the parameters (e.g. prints them out or storing them), leftward does something TO the parameters, usually by modifying them in someway.
print acts like any other function in Perl (or any other language I've ever used for that matter). If you call a function as an argument, the return value of that function is given as the argument. So your case of:
print ($foo, exit);
or equivalent (the parens don't matter)
print $foo, exit;
does nothing, because you're asking it to print the return value of exit. Your program exits first so you get nothing back. I don't understand why you'd expect exit to be treated as a string. exit is a function in all contexts unless you quoted it.
print ($foo & 255) + 1,"\n";
From perlop which gives this example:
probably doesn't do what you expect at first glance. The parentheses
enclose the argument list for "print" which is evaluated (printing the
result of "$foo & 255"). Then one is added to the return value of
"print" (usually 1). The result is something like this:
1 + 1, "\n"; # Obviously not what you meant.
To do what you meant properly, you must write:
print(($foo & 255) + 1, "\n");

Not sure if what follows is perfectly accurate (it's a mishmash from IRC, the above mentioned answers, google and my interpretation of the book)
(operator)(operands) this is viewed as a leftward operator because it's to the left of the operands. (operands)(operator) this is viewed as a rightward operator because it's to the right of the operands. So, (1, 2, 3, sort 4, 5, sort 6, 7) Here the second sort, acts as both a leftword and a rightword operator!! sort 6,7 is leftword as in to the left of (6,7) - it's operands. It's also to the right of sort(4, 5 so here it's rightward and of very low precedence.
2.
#ary = (1, 3, sort 4, 2);
print #ary;
here, sort is a leftward list operator so straight away it's precedence is highest and as 'Cfreak' says..
print($foo, exit); print $foo, exit;
Here, print is leftward list so highest precedence and so it should execute first BUT! to execute it should resolve it's arguments and the bareword 'exit'. To resolve it, I guess it runs exit, ergo.. Print $foo,... will gobble up all it's arguments then it has to processes them and at the bareword runs it..
print ($foo & 255) same as above. print gets highest precedence but it needs to now resolve its various arguments.. so $foo & 255 etc as 'Cfreak' explained.
Many thanks guys!

Related

Can you print the subroutine's argument name in Perl?

It's possible to print a variable's name by *var{NAME}, is it possible to print the argument's name in a subroutine?
Below is what I want to achieve
var_name($myVar); will print myVar
sub var_name{
print *_{NAME}; # Prints `_`, but want `myVar`
}
First, your attempt using print *_{name}; does work; but it prints the name associated with the typeglob of _ (The one for things like $_ and #_), which isn't what you want. If you pass a typeglob/reference to typeglob to the function you can extract its name by de-referencing the argument:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
# The prototype isn't strictly necessary but it makes it harder
# to pass a non-typeglob value.
sub var_name :prototype(\*) {
say *{$_[0]}{NAME}; # Note the typeglob deref
}
my $myVar = 1;
say *myVar{NAME}; # myVar
var_name *myVar; # myVar
You get _ because the NAME associated with *_ is _.
So what glob should you use? Well, the glob that contains the variable used as an argument (if any) isn't passed to the sub, so you're out of luck.
A glob-based solution would never work with my variables anyway since these aren't found in globs. This means the very concept of a glob-based varname couldn't possibly work in practice.
Getting the name of the variables would entail examining the opcode tree before the call site. I believe this is how operators achieve this in situations such as the following:
$ perl -we'my $y; my $x = 0 + $y;'
Use of uninitialized value $y in addition (+) at -e line 1.
$ perl -MO=Concise -we'my $y; my $x = 0 + $y;'
a <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
3 <0> padsv[$y:1,3] vM/LVINTRO ->4
4 <;> nextstate(main 2 -e:1) v:{ ->5
9 <2> sassign vKS/2 ->a
7 <2> add[t3] sK/2 ->8 <-- This is what's issuing
5 <$> const[IV 0] s ->6 the warning.
6 <0> padsv[$y:1,3] s ->7 <-- This is the source of
8 <0> padsv[$x:2,3] sRM*/LVINTRO ->9 the name in the warning.
-e syntax OK

Why is "keys ::" not a syntax error?

I tried the following one-liner more out of curiosity than anything and was surprised that it actually worked without the % sigil.
$ perl -E 'say for keys ::'
It works on both versions 5.8.8 and 5.16.3; though the latter version emits this warning:
Hash %:: missing the % in argument of keys() at -e line 1.
How does this even work? What is so special about %:: that allows it to run and print its keys, even without the sigil?
Note that the keys do not get printed with %main::.
$ perl -E 'say for keys main::'
Hash main:: missing the % in argument 1 of keys() at -e line 1.
TL;DR
:: isn't special; prior to Perl 5.22.0, you can omit the % and pass any identifier to keys.
However:
keys main:: is equivalent to keys %{'main'} or just keys %main
keys :: is equivalent to keys %{'::'} or just keys %::.Note that %main:: (but not %main) is an alias for %::.
The relevant code is in toke.c (the following is from 5.8.8):
/* Look for a subroutine with this name in current package,
unless name is "Foo::", in which case Foo is a bearword
(and a package name). */
if (len > 2 &&
PL_tokenbuf[len - 2] == ':' && PL_tokenbuf[len - 1] == ':')
{
if (ckWARN(WARN_BAREWORD) && ! gv_fetchpv(PL_tokenbuf, FALSE, SVt_PVHV))
Perl_warner(aTHX_ packWARN(WARN_BAREWORD),
"Bareword \"%s\" refers to nonexistent package",
PL_tokenbuf);
len -= 2;
PL_tokenbuf[len] = '\0';
gv = Nullgv;
gvp = 0;
}
else {
len = 0;
if (!gv)
gv = gv_fetchpv(PL_tokenbuf, FALSE, SVt_PVCV);
}
/* if we saw a global override before, get the right name */
if (gvp) {
sv = newSVpvn("CORE::GLOBAL::",14);
sv_catpv(sv,PL_tokenbuf);
}
else {
/* If len is 0, newSVpv does strlen(), which is correct.
If len is non-zero, then it will be the true length,
and so the scalar will be created correctly. */
sv = newSVpv(PL_tokenbuf,len);
}
len is the length of the current token.
If the token is main::, a new scalar is created with its PV (string component) set to main.
If the token is ::, a typeglob is fetched with gv_fetchpv.
gv_fetchpv lives in gv.c and has special logic for handling :::
if (*namend == ':')
namend++;
namend++;
name = namend;
if (!*name)
return gv ? gv : (GV*)*hv_fetch(PL_defstash, "main::", 6, TRUE);
This fetches the typeglob stored in the default stash under key main:: (i.e. typeglob *main::).
Finally, keys expects its argument to be a hash, but if you pass it an identifier, it treats it as the name of a hash. See Perl_ck_fun in op.c:
case OA_HVREF:
if (kid->op_type == OP_CONST &&
(kid->op_private & OPpCONST_BARE))
{
char *name = SvPVx(((SVOP*)kid)->op_sv, n_a);
OP * const newop = newHVREF(newGVOP(OP_GV, 0,
gv_fetchpv(name, TRUE, SVt_PVHV) ));
if (ckWARN2(WARN_DEPRECATED, WARN_SYNTAX))
Perl_warner(aTHX_ packWARN2(WARN_DEPRECATED, WARN_SYNTAX),
"Hash %%%s missing the %% in argument %"IVdf" of %s()",
name, (IV)numargs, PL_op_desc[type]);
op_free(kid);
kid = newop;
kid->op_sibling = sibl;
*tokid = kid;
}
else if (kid->op_type != OP_RV2HV && kid->op_type != OP_PADHV)
bad_type(numargs, "hash", PL_op_desc[type], kid);
mod(kid, type);
break;
This works for things other than ::, too:
$ perl -e'%h = (foo => "bar"); print for keys h'
foo
(As of 5.22.0, you're no longer allowed to omit the % sigil.)
You can also see this with B::Concise:
$ perl -MO=Concise -e'keys main::'
Hash %main missing the % in argument 1 of keys() at -e line 1.
6 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
5 <1> keys[t2] vK/1 ->6
4 <1> rv2hv[t1] lKRM/1 ->5
3 <$> gv(*main) s ->4
-e syntax OK
$ perl -MO=Concise -e'keys ::'
Hash %:: missing the % in argument 1 of keys() at -e line 1.
6 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
5 <1> keys[t2] vK/1 ->6
4 <1> rv2hv[t1] lKRM/1 ->5
3 <$> gv(*main::) s ->4
-e syntax OK
Using:
perl -MO=Deparse -E 'say for keys ::'
Says:
use feature 'current_sub', 'evalbytes', 'fc', 'say', 'state', 'switch', 'unicode_strings', 'unicode_eval';
say $_ foreach (keys %main::);
So it treats :: as %:: in these perl versions without a strict

Perl's caller() function returning incorrect line number

I've got the following script running on Perl 5.10.1:
#!/usr/bin/perl
use strict;
use warnings;
foreach( my $x =0 ; $x < 1; $x++) { # Line 5
print_line(); # Line 6
}
sub print_line {
print "Function call from line: " . [caller(0)]->[2] . "\n";
}
Despite the call to the subroutine coming from line 6, the script outputs the line number of the start of the C-style for statement:
Function call from line: 5
What's really weird is if I throw a random statement into one of the blank line in the C-style for loop, caller returns the correct line number:
#!/usr/bin/perl
use strict;
use warnings;
foreach( my $x =0 ; $x < 1; $x++) {
my $x = 3;
print_line(); # Line 7
}
sub print_line {
print "Function call from line: " . [caller(0)]->[2] . "\n";
}
The above script correctly outputs:
Function call from line: 7
Is this some kind of bug or is there something I can do to get caller to accurately report the line number?
I think potentially it is a bug, because the same behavior doesn't occur if you replace
foreach (my $x = 0 ; $x < 1 ; $x++) {
with
foreach my $x (0 .. 0) {
I don't understand exactly what's happening, but by comparing the optrees of the two different versions, I think that a nextstate op is getting improperly optimized out. My version has
<;> nextstate(main 4 lineno.pl:11) v:*,&,x*,x&,x$,$ ->8
as the left sibling of the entersub op that calls print_line, while yours has
<0> ex-nextstate v ->8
which has been taken out of the flow of execution.
It wouldn't hurt to write this up as a perlbug.
$ perl -MO=Concise a.pl
j <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 6 a.pl:5) v:*,&,{,x*,x&,x$,$ ->3
5 <2> sassign vKS/2 ->6
3 <$> const[IV 0] s ->4
4 <0> padsv[$x:3,5] sRM*/LVINTRO ->5
6 <0> unstack v* ->7
i <2> leaveloop vK/2 ->j
7 <{> enterloop(next->b last->i redo->8) v ->e
- <1> null vK/1 ->i
h <|> and(other->8) vK/1 ->i
g <2> lt sK/2 ->h
e <0> padsv[$x:3,5] s ->f
f <$> const[IV 1] s ->g
- <#> lineseq vK ->-
- <#> scope vK ->b <---
- <0> ex-nextstate v ->8 <---
a <1> entersub[t5] vKS/TARG,2 ->b
- <1> ex-list K ->a
8 <0> pushmark s ->9
- <1> ex-rv2cv sK/2 ->-
9 <#> gv[*print_line] s/EARLYCV ->a
c <1> preinc[t2] vK/1 ->d
b <0> padsv[$x:3,5] sRM ->c
d <0> unstack v ->e
a.pl syntax OK
There's some optimization going on. The scope was deemed unnecessary and optimized away. (Notice the "-" meaning it's never reached.)
But at the same time, that removed the nextstate op, which is what sets the line number for warnings and for caller.
So, it's a bug that results from an improper optimization.
I suspect this may be down to statement separators (semicolons). As you may have spotted - with the code you're running, the line number reported by caller is the same as the foreach loop.
So I think what is happening, is because there's no semicolons.
If you were to do a multi-line sub call, caller would report the first line:
print "first call:", __LINE__, "\n";
print "Start of statement\n",
"a bit more on line ", __LINE__, "\n",
print_line(
1,
2,
3,
5,
);
You get the line number of the start of the call, not the end. So I think that's what you're getting - the statement starts when the semicolon statement separator occurs - which is the foreach line in the first example.
So as a workaround - I might suggest making use of __LINE__. Although I'd also perhaps suggest not worrying about it too much, because it's still going to point you to the right place in the code.
You get something similar if you use croak, for presumably the same reason.
As has been pointed out this is really a bug in Perl going back at least to 5.10 or 11 years, but in reality I think longer.
It has been reported as Perl bug perl #133239 and although it is alleged that it is not that hard to fix, it hasn't been. It may not also be that easy to fix, has performance ramifications since adding COP's slows things down, and possibly some administrative work would be needed to adjust tests.
And even if this bug were fixed, it would be only fixed in versions Perl 5.29 and later, or so. This isn't going to help you with 5.10.
So here is another tack that doesn't rely on a change to Perl's core, and therefore puts users more in control. However, I'll say up front it is a bit experimental and unless people are willing to spend coding effort on it, it's not likely to go back as far as 5.10. Right now the earliest Perl version I have working is 5.14, 7 years ago as if this writing.
Using B::DeparseTree you can write a different, and I think better caller() which can show you the location of the caller with more detail. Here is your program modified to do that:
#!/usr/bin/perl
use strict;
use warnings;
use B::DeparseTree::Fragment;
use Devel::Callsite;
sub dt_caller
{
my $level = $_ ? $_ : 0;
# Pick up the right caller's OP address.
my $addr = callsite($level+1);
# Hack alert 'main::main' should be replaced with the function name if not the top level. caller() is a little off-sync here.
my $op_info = deparse_offset('main::main', $addr);
# When Perl is in the middle of call, it has already advanced the PC,
# so we need to go back to the preceding op.
$op_info = get_prev_addr_info($op_info);
my $extract_texts = extract_node_info($op_info);
print join("\n", #$extract_texts), "\n";
}
foreach( my $x =0 ; $x < 1; $x++) {
print_line();
}
sub print_line {
dt_caller();
}
When run it prints:
$ perl bug-caller.pl
print_line()
------------
dt_caller() could and should be wrapped up into a package like Carp so you don't see all of that ugliness. However I'll leave that for someone else. And I'll mention that just to get this working, there were some bug fixes I had to make, so this works only starting with version 3.4.0 of B::DeparseTree.

Trouble with shift and dereference operator

I have a question regarding how the left and right sides of the -> operator are evaluated. Consider the following code:
#! /usr/bin/perl
use strict;
use warnings;
use feature ':5.10';
$, = ': ';
$" = ', ';
my $sub = sub { "#_" };
sub u { shift->(#_) }
sub v { my $s = shift; $s->(#_) }
say 'u', u($sub, 'foo', 'bar');
say 'v', v($sub, 'foo', 'bar');
Output:
u: CODE(0x324718), foo, bar
v: foo, bar
I expect u and v to behave identically but they don't. I always assumed perl evaluated things left to right in these situations. Code like shift->another_method(#_) and even shift->another_method(shift, 'stuff', #_) is pretty common.
Why does this break if the first argument happens to be a code reference? Am I on undefined / undocumented territory here?
The operand evaluation order of ->() is undocumented. It happens to evaluate the arguments before the LHS (lines 3-4 and 5 respectively below).
>perl -MO=Concise,u,-exec a.pl
main::u:
1 <;> nextstate(main 51 a.pl:11) v:%,*,&,x*,x&,x$,$,469762048
2 <0> pushmark s
3 <#> gv[*_] s
4 <1> rv2av[t2] lKM/3
5 <0> shift s*
6 <1> entersub[t3] KS/TARG,2
7 <1> leavesub[1 ref] K/REFC,1
a.pl syntax OK
Both using and modifying a variable in the same expression can be dangerous. It's best to avoid it unless you can explain the following:
>perl -E"$i=5; say $i,++$i,$i"
666
You could use
$_[0]->(#_[1..$#_])

Explanation for 'uninitialized value' warning

Why perl -we '$c = $c+3' rises
Use of uninitialized value $c in addition (+) at -e line 1.
and perl -we '$c += 3' doesn't complain about uninitialized value?
UPDATE
Does documentation or some book like 'Perl best practices' mention such behavior?
I think perldoc perlop has a little explanation:
Assignment Operators
"=" is the ordinary assignment operator.
Assignment operators work as in C. That is,
$a += 2;
is equivalent to
$a = $a + 2;
although without duplicating any side effects that dereferencing the
lvalue might trigger, such as from tie()
With B::Concise helper, we can see the trick:
$ perl -MO=Concise,-exec -e '$c += 3'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <#> gvsv[*c] s
4 <$> const[IV 3] s
5 <2> add[t2] vKS/2
6 <#> leave[1 ref] vKP/REFC
-e syntax OK
$ perl -MO=Concise,-exec -e '$c = $c + 3'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <#> gvsv[*c] s
4 <$> const[IV 3] s
5 <2> add[t3] sK/2
6 <#> gvsv[*c] s
7 <2> sassign vKS/2
8 <#> leave[1 ref] vKP/REFC
-e syntax OK
Update
After searching in perldoc, I saw that this problem had been documented in perlsyn:
Declarations
The only things you need to declare in Perl are report formats and subroutines (and sometimes not even subroutines). A variable
holds the undefined value ("undef") until it has been assigned a defined value, which is anything other than "undef". When used as a
number, "undef" is treated as 0; when used as a string, it is treated as the empty string, ""; and when used as a reference that
isn't being assigned to, it is treated as an error. If you enable warnings, you'll be notified of an uninitialized value whenever
you treat "undef" as a string or a number. Well, usually. Boolean contexts, such as:
my $a;
if ($a) {}
are exempt from warnings (because they care about truth rather than definedness). Operators such as "++", "--", "+=", "-=", and
".=", that operate on undefined left values such as:
my $a;
$a++;
are also always exempt from such warnings.
Because it makes sense for addition to warn when adding things other than numbers, but it's very convenient for += not to warn for undefined values.
As Gnouc found, this is documented in perlsyn:
Operators such as ++ , -- , += , -= , and .= , that operate on undefined variables such as:
undef $a;
$a++;
are also always exempt from such warnings.