What is the meaning of the double 'at' (##) in Perl? - perl

I am reviewing a proposed vendor-supplied patch to a Perl tool we use and I'm struggling to identify the reason for a particular type of change - the pre-pending of an '#' to the parameters passed to a subroutine.
For instance, a line that was:
my ($genfd) = #_;
Is now:
my ($genfd) = ##_;
Not being a Perl developer, I'm learning on the go here, but so far I understand that '#_' is the parameters supplied to the enclosing subroutine.
I also understand the assignment above (where the '$genfd' is wrapped in parentheses on the left-hand side) casts '#_' to a list and then assign the 'genfd' scalar variable to the first element of that list. This should result in the first parameter to the subroutine being stored in 'genfd'.
What I am completely stuck on is what difference the second '#' makes. I've found examples of this usage on GitHub but never with an explanation, nor can I find an explanation on any Perl references or SO. Any help would be much appreciated.

Looks like a bad patch.
##_ is a syntax error. At least, when I have the following Perl source file:
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
my ($genfd) = ##_;
}
running perl -cw on it (with Perl 5.14.2) gives:
Bareword found where operator expected at tmp.pl line 7, near "##_"
(Missing operator before _?)
syntax error at tmp.pl line 7, near "##_"
tmp.pl had compilation errors.
I haven't looked at all the examples on GitHub, but many of them are in files with a ,v suffix. That suffix is used by RCS and CVS for their internal version control files. I think the # character has some special meaning, so it's doubled to denote a literal # character. (Yes, it's a bit odd to have RCS or CVS internal files in a Git repository.)
Some kind of RCS or CVS interaction is the most likely explanation for the error, but there could be other causes.
You should ask the person who provided the patch.

Related

Why Perl ignores spaces between a sigil and variable name?

The question Is space supposed to be ignored between a sigil and its variable name? was answered positively.
What is the reason Perl interprets $ foo as $foo?
perl -w -E 'my $ foo = $$; say "Perl $]\n\$ foo = ", $foo'
Perl 5.028001
$ foo = 3492
Isn't it against The Syntax of Variable Names documentation?
That documentation only discusses the name, not the sigil. The sigil can always be separated from the name by space characters. It is definitely underdocumented and I would not suggest ever making use of it, but it is used.
Perl does not have sigils, it has "dereference" operators:
$test[1] means 'give me the scalar at the index 1 of the array called "test" from this scope'. That is why you can put spaces after the "sigil".
I don't understand why everybody keeps calling them sigils, it makes things very confusing. BASIC had sigils, PHP has sigils, but Perl 5 does not even if it looks like it has. I wish I had realized the "sigils" are in fact operators when I was learning Perl, understanding and parsing references and derefferencing would have been a lot easier, not to mention grokking symbol tree manipulation.
The "sigils" are not documented as "operators" in perldoc, but it is much easier to parse Perl code if you think of them as being operators.
Later, after discussion in the comments: here is how Perl 5 uses "sigils": https://www.oreilly.com/library/view/advanced-perl-programming/0596004567/ch01.html

Ambiguous use of -CONSTANT resolved as -&CONSTANT()

I'm trying to declare magic numbers as constants in my Perl scripts, as described in perlsub. However, I get warnings:
$ cat foo.perl
use warnings ; use strict ;
sub CONSTANT() { 5 }
print 7-CONSTANT,"\n" ;
$ perl foo.perl
Ambiguous use of -CONSTANT resolved as -&CONSTANT() at foo.perl line 3.
2
$
The warning goes away if I insert a space between the minus and the CONSTANT. It makes the expressions more airy than I'd like, but it works.
I'm curious, though: What is the ambiguity it's warning me about? I don't know any other way it could be parsed.
(Perl 5.10.1 from Debian "squeeze").
First, some background. Let's look at the following for a second:
$_ = -foo;
-foo is a string literal[1].
$ perl -Mstrict -wE'say -foo;'
-foo
Except if a sub named foo has been declared.
$ perl -Mstrict -wE'sub foo { 123 } say -foo;'
Ambiguous use of -foo resolved as -&foo() at -e line 1.
-123
Now back to your question. The warning is wrong. A TERM (7) cannot be followed by another TERM, so - can't be the start of a string literal or a unary minus operator. It must be the subtraction operator, so there is no ambiguity.
This warning is still issued in 5.20.0[2]. I have filed a bug report.
Look ma! No quotes!
system(grep => ( -R, $pat, $qfn ));
Well, 5.20.0 isn't out yet, but we're in a code freeze running up to its release. This won't be fixed in 5.20.0.
mpapec's answer helpfully referenced perldiag (which I wasn't aware of) but quoted the wrong diagnostic. The one I'm actually getting is
Ambiguous use of -%s resolved as -&%s()
(S ambiguous) You wrote something like -foo, which might be the string "-foo", or a call to the function foo, negated. If you meant the string, just write "-foo". If you meant the function call, write -foo().
So apparently the point is that -CONSTANT is a valid bareword. I didn't know they could start with dashes.
I still don't really understand why that would give a warning here, given that (a) I'm using strict subs so obviously I'm not going to throw around barewords deliberately, and (b) even if I were, a bareword or string in this position would be a syntax error anyway.
Edit: As pointed out (more or less) by tobyink, it is not actually that -CONSTANT in itself is a bareword, but that strict subs still allows barewords after the unary minus operator. Apparently the lexer isn't context-aware enough to know that parsing -CONSTANT as an unary minus is not allowed in this context.
Still feels strange to me -- one would expect the effect of prototyping a sub with no arguments ought to be that I deliberately forfeit using that name as a bareword, no matter whether it happens to be as the operand to unary minus or in a different context.

Meaning of the <*> symbol

I've recently been exposed to a bit of Perl code, and some aspects of it are still elusive to me. This is it:
#collection = <*>;
I understand that the at-symbol defines collection as an array. I've also searched around a bit, and landed on perldoc, specifically at the part about I/O Operators. I found the null filelhandle specifically interesting; code follows.
while (<>) {
...
}
On the same topic I have also noticed that this syntax is also valid:
while (<*.c>) {
...
}
According to perldoc It is actually calling an internal function that invokes glob in a manner similar as the following code:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
...
}
Question
What does the less-than, asterisk, more-than (<*>) symbol mentioned on the first line actually do? Is it a reference to an internally open and referenced glob? Would it be a special case, such as the null filehandle? Or can it be something entirely different, like a legacy implementation?
<> (the diamond operator) is used in two different syntaxes.
<*.c>, <*> etc. is shorthand for the glob built-in function. So <*> returns a list of all files and directories in the current directory. (Except those beginning with a dot; use <* .*> for that).
<$fh> is shorthand for calling readline($fh). If no filehandle is specified (<>) the magical *ARGV handle is assumed, which is a list of files specified as command line arguments, or standard input if none are provided. As you mention, the perldoc covers both in detail.
How does Perl distinguish the two? It checks if the thing inside <> is either a bare filehandle or a simple scalar reference to a filehandle (e.g. $fh). Otherwise, it calls glob() instead. This even applies to stuff like <$hash{$key}> or <$x > - it will be interpreted as a call to glob(). If you read the perldoc a bit further on, this is explained - and it's recommended that you use glob() explicitly if you're putting a variable inside <> to avoid these problems.
It collects all filenames in the current directory and save them to the array collection. Except those beginning with a dot. It's the same as:
#collection = glob "*";

Bizarre copy of UNKNOWN in subroutine entry

I'm hitting a bug in the SVN perl module when using git:
Bizarre copy of UNKNOWN in subroutine entry at
/usr/lib/perl5/vendor_perl/SVN/Base.pm line 80.
And I'm not quite sure if this is a perl bug or a subversion bug. This is the relevant part:
# insert the accessor
if (m/(.*)_get$/) {
my $member = $1;
*{"${caller}::$1"} = sub {
&{"SVN::_${pkg}::${prefix}${member}_". # <<<< line 80
(#_ > 1 ? 'set' : 'get')} (#_)
}
}
(full source)
What is a "Bizarre copy"? And whose fault is it?
Edit: software versions
subversion 1.6.15-1
perl 5.14.0-1
Resolution: This happens when you compile with incompatible flags:
https://groups.google.com/d/msg/subversion_users/EOru50ml6sk/5xrbu3luPk4J
That perldoc gives you the short answer, but a brief STFW session yields a little more detail. This is basically evidence of a smashed stack in Perl.
Trivial example:
#!/usr/bin/perl
my #A = 1..5;
sub blowUp {
undef #A;
my $throwAway = {};
print for #_; # <== line 6
}
blowUp(#A);
__END__
bash$ ./blowitup
Bizarre copy of HASH in print at ./blowitup line 6.
And to make it that much more entertaining, without the $throwAway assignment, it's an invisible error (though under 'use warnings' it will at least still tell you that you're trying to access an uninitialized value). It's just when you make a new assignment that you see the strange behavior.
Since #_ is essentially lexically scoped to the subroutine, and arguments are passed by reference, that little subroutine basically pulls the rug out from under itself by undef'ing the thing that #_ was pointing to (you get the same behavior if you change the undef to an assignment, fwiw). I've found a number of postings on perl5-porters that mention this as an artifact of the fact that items on the stack are not reference counted and therefore not cleanly freed.
So while I haven't looked through all of the code in your full source in depth, I'll go ahead and guess that something in there is messing with something that was passed in on #_ ; then when #_ is referenced again, Perl is telling you that something's rotten in Denmark.
The immediate problem is a bug in the script/module, iow. The deeper issue of Perl not reference counting these items is also there, but I suspect you'll have better luck fixing the module in the short term. :-)
HTH-
Brian
A "Bizarre copy" occurs when Perl's stack is corrupted or contains non-scalars. It occurs as the result of bugs in Perl itself or in XS modules. (Brian Gerard's example exercises one of a long list of known bugs related to the stack not being ref-counted.)
You could isolate the problem by adding the following to the anon sub:
warn("Calling SVN::_${pkg}::${prefix}${member}_".(#_ > 1 ? 'set' : 'get')."...");
You might even want to emit a stack trace, but you might have to build it yourself using caller to avoid triggering the panic when building the stack trace.
Probably a perl bug. SVN::Base has XS components, but the error is occurring in pure-perl code and it's my opinion that perl should never allow it to happen. However, it's possible that there's some weird XS in SVN::Base that's tweaking it.
Best idea: file it against Subversion subcomponent bindings_swig_perl and perlbug both.

Why is parenthesis optional only after sub declaration?

(Assume use strict; use warnings; throughout this question.)
I am exploring the usage of sub.
sub bb { print #_; }
bb 'a';
This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.
However, this causes a compilation error:
bb 'a';
sub bb { print #_; }
String found where operator expected at t13.pl line 4, near "bb 'a'"
(Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.
But this does not:
bb('a');
sub bb { print #_; }
Similarly, a sub without args, such as:
special_print;
my special_print { print $some_stuff }
Will cause this error:
Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.
Ways to alleviate this particular error is:
Put & before the sub name, e.g. &special_print
Put empty parenthesis after sub name, e.g. special_print()
Predeclare special_print with sub special_print at the top of the script.
Call special_print after the sub declaration.
My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?
ETA: I know how I can fix it. I want to know the logic behind this.
I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:
In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:
sub name {...} === BEGIN {*name = sub {...}}
This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.
As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.
Without a predeclared rule, an identifier will be interpreted as follows:
bareword === 'bareword' # a string
bareword LIST === syntax error, missing ','
bareword() === &bareword() # runtime execution of &bareword
&bareword === &bareword # same
&bareword() === &bareword() # same
When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.
When predeclared with any of the following:
sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}
Then the identifier will be interpreted as follows:
bareword === &bareword() # compile time binding to &bareword
bareword LIST === &bareword(LIST) # same
bareword() === &bareword() # same
&bareword === &bareword # same
&bareword() === &bareword() # same
So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.
As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.
That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).
The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.
Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.
Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:
b $a, $c;
Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.
When the Perl compiler sees this:
b($a, $c);
It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.
When you pre-declare your function, Perl can see this:
sub b; #Or use subs qw(b); will also work.
b $a, $c;
and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.
One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.
By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.
Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.
Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).
The reason is that Larry Wall is a linguist, not a computer scientist.
Computer scientist: The grammar of the language should be as simple & clear as possible.
Avoids complexity in the compiler
Eliminates sources of ambiguity
Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.