Why does Perl sub s require & - perl

The following file does not compile:
sub s {
return 'foo';
}
sub foo {
my $s = s();
return $s if $s;
return 'baz?';
}
The error from perl -c is:
syntax error at foobar.pl line 5 near "return"
(Might be a runaway multi-line ;; string starting on line 3)
foobar.pl had compilation errors.
But if I replace s() with &s() it works fine. Can you explain why?

The & prefix definitively says you want to call your own function called "s", rather than any built-in with the same name. In this case, it's confusing it for a substitution operator (like $stuff =~ s///;, which can also be written s()()).
Here's a PerlMonks discussion about what the ampersand does.

The problem you have, as has already been pointed out, is that s() is interpreted as the s/// substitution operator. Prefixing the function name with an ampersand is a workaround, although I would not say necessarily the correct one. In perldoc perlsub the following is said about calling subroutines:
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
&NAME(LIST); # Circumvent prototypes.
&NAME; # Makes current #_ visible to called subroutine.
What the ampersand does here is merely to distinguish between the built-in function and your own.
The "proper" way to deal with this, apart from renaming your subroutine, is to realize what's going on under the surface. When you say
s();
What you are really saying is
CORE::s();
When what you mean is
main::s();

my $s = 's'->();
works too--oddly enough with strict on.

Related

Confusion on syntax of of diamond operator in parsing and barewords

I'm very new to perl, so I'm sure my confusion here is simply due to not understanding perl syntax and how it handles bare words. I'm failing to find good answers to my question online though.
I had code I'm refactoring, it use to look like this
#month_dirs = <$log_directory/*>;
I changed $log_directory to be loaded with a config file (AppConfig to be exact). Now instead of exporting $log_directory we output $conf which is an AppConfig object. To access loaded variables you usually make a method call to the variable name so I tried ...
#month_dirs = <$conf->log_directory()."/*">
This fails, because I can't make a method call $conf->log_directory in a location where a barword is expected. Just playing around I tried this instead
$month_directory_command = $conf->log_directory()."/*";
#month_dirs = <$month_directory_command>;
This still fails, silently, without any indicator that this is a problem. I tried using a string directly in the diamond but it fails, apparently only barewords, not strings, are accepted by the diamond I'm surprised by that since I'm not allowed to use a string at all, I thought most places Barewords could be used a string could instead, is this simply because most code implements separate logic to accept barewords vs strings, but not required to be implemented this way?
I can make this work by emulating exactly the original syntax
$month_directory_command = $conf->log_directory();
#month_dirs = <$month_directory_command/*>;
However, this feels ugly to me. I'm also confused why I can do that, but I can't create a bare word with:
$bare_word = $conf->log_directory()/*
or
$month_directory_command = $conf->log_directory();
$bare_word = $month_directory_command/*;
#month_dirs = <$bare_word>;
Why do some variables work for bare words but not others? why can I use a scaler variable but not if it's returned from a method call?
I tried looking up perl syntax on barewords but didn't have much luck describing situations where they are not written directly, but are composed of variables.
I'm hoping someone can help me better understand the bareword syntax here. What defines when I can use a variable as part of a bare word and if I can save it as a variable?
I'd like to figure out a cleaner syntax for using the barword in my diamond operator if one can be suggested, but more then that I'd like to understand the syntax so I know how to work with barewords in the future. I promise I did try hunting this down ahead of time, but without much luck.
Incidentally, it seems the suggestion is to not use barewords in perl anyways? Is there someway I should be avoid barewords in the diamond operator?
You're mistaken that the diamond operator <> only works with barewords:
$ perl -E'say for <"/*">'
/bin
/boot
/dev
...
(In fact, a bareword is just an identifier that doesn't have a sigil and is prohibited by use strict 'subs';, so none of your examples really qualify.)
This:
#month_dirs = <$log_directory/*>;
works because a level of double-quote interpolation is done inside <>, and scalar variables like $log_directory are interpolated.
It's equivalent to:
#month_dirs = glob("$log_directory/*");
This:
#month_dirs = <$conf->log_directory()."/*">
fails because the > in $conf->log_directory() closes the diamond operator prematurely, confusing the parser.
It's parsed as:
<$conf->
(a call to glob) followed by
log_directory()."/*">
which is a syntax error.
This:
$month_directory_command = $conf->log_directory()."/*";
#month_dirs = <$month_directory_command>;
fails because
<$month_directory_command>
is equivalent to
readline($month_directory_command)
and not to
glob("$month_directory_command")
From perldoc perlop:
If what the angle brackets contain is a simple scalar variable (for example, $foo), then that variable contains the name of the filehandle to input from, or its typeglob, or a reference to the same.
[...]
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob().
So you're trying to read from a filehandle ($month_directory_command) that hasn't been opened yet.
Turning on warnings with use warnings 'all'; would have alerted you to this:
readline() on unopened filehandle at foo line 6.
This:
$bare_word = $conf->log_directory()/*;
fails because you're trying to concatenate the result of a method call with a non-quoted string; to concatenate strings, you have to interpolate them into a double quoted string, or use the concatenation operator.
You could do:
$bare_word = $conf->log_directory() . "/*";
#month_dirs = <"$bare_word">;
(although $bare_word isn't a bareword at all, it's a scalar variable.)
Note that:
#month_dirs = <$bare_word>;
(without quotes) would be interpreted as readline, not glob, as explained in perlop above.
In general, though, it would probably be less confusing to use the glob operator directly:
#month_dirs = glob( $conf->log_directory() . "/*" );
One of the main reasons to avoid the diamond operator like this is that it has two totally-unrelated meanings. The usual form you find diamond in is
$data = <$fh>;
This acts like a read function; the full (non-symbol) name for this function is readline. This line of source is equivalent to
$data = readline( $fh );
However, your original form given was
#month_dirs = <$log_directory/*>;
which is an entirely different form. This acts like a shell glob, returning a list of filename matches by scanning the filesystem. This form is better written out using the glob function:
#month_dirs = glob( "$log_directory/*" );
Note also that this being a normal function just takes a normal string argument. In this manner, you can use it with any of your provided examples, such as:
#month_dirs = glob( $conf->log_directory()."/*" );
bareword can only be inside the bracket <>, syntax inside is shell syntax, more a perl one
# wrong -
$bare_word = $month_directory_command/*;
# right - star is allowed because it is inside the quote single or double
$bare_word = "$month_directory_command/*";
# star is allowed simply because it is inside the bracket
#month_dirs = <$month_directory_command/*>;

Meaning of the <*> symbol

I've recently been exposed to a bit of Perl code, and some aspects of it are still elusive to me. This is it:
#collection = <*>;
I understand that the at-symbol defines collection as an array. I've also searched around a bit, and landed on perldoc, specifically at the part about I/O Operators. I found the null filelhandle specifically interesting; code follows.
while (<>) {
...
}
On the same topic I have also noticed that this syntax is also valid:
while (<*.c>) {
...
}
According to perldoc It is actually calling an internal function that invokes glob in a manner similar as the following code:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
...
}
Question
What does the less-than, asterisk, more-than (<*>) symbol mentioned on the first line actually do? Is it a reference to an internally open and referenced glob? Would it be a special case, such as the null filehandle? Or can it be something entirely different, like a legacy implementation?
<> (the diamond operator) is used in two different syntaxes.
<*.c>, <*> etc. is shorthand for the glob built-in function. So <*> returns a list of all files and directories in the current directory. (Except those beginning with a dot; use <* .*> for that).
<$fh> is shorthand for calling readline($fh). If no filehandle is specified (<>) the magical *ARGV handle is assumed, which is a list of files specified as command line arguments, or standard input if none are provided. As you mention, the perldoc covers both in detail.
How does Perl distinguish the two? It checks if the thing inside <> is either a bare filehandle or a simple scalar reference to a filehandle (e.g. $fh). Otherwise, it calls glob() instead. This even applies to stuff like <$hash{$key}> or <$x > - it will be interpreted as a call to glob(). If you read the perldoc a bit further on, this is explained - and it's recommended that you use glob() explicitly if you're putting a variable inside <> to avoid these problems.
It collects all filenames in the current directory and save them to the array collection. Except those beginning with a dot. It's the same as:
#collection = glob "*";

How to tell perl to print to a file handle instead of printing the file handle?

I'm trying to wrap my head around the way Perl handles the parsing of arguments to print.
Why does this
print $fh $stufftowrite
write to the file handle as expected, but
print($fh, $stufftowrite)
writes the file handle to STDOUT instead?
My guess is that it has something to do with the warning in the documentation of print:
Be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all arguments (or interpose a + , but that doesn't look as good).
Should I just get used to the first form (which just doesn't seem right to me, coming from languages that all use parentheses around function arguments), or is there a way to tell Perl to do what I want?
So far I've tried a lot of combination of parentheses around the first, second and both parameters, without success.
On lists
The structure bareword (LIST1), LIST2 means "apply the function bareword to the arguments LIST1", while bareword +(LIST1), LIST2 can, but doesn't neccessarily mean "apply bareword to the arguments of the combined list LIST1, LIST2". This is important for grouping arguments:
my ($a, $b, $c) = (0..2);
print ($a or $b), $c; # print $b
print +($a or $b), $c; # print $b, $c
The prefix + can also be used to distinguish hashrefs from blocks, and functions from barewords, e.g. when subscripting an hash: $hash{shift} returns the shift element, while $hash{+shift} calls the function shift and returns the hash element of the value of shift.
Indirect syntax
In object oriented Perl, you normally call methods on an object with the arrow syntax:
$object->method(LIST); # call `method` on `$object` with args `LIST`.
However, it is possible, but not recommended, to use an indirect notation that puts the verb first:
method $object (LIST); # the same, but stupid.
Because classes are just instances of themselves (in a syntactic sense), you can also call methods on them. This is why
new Class (ARGS); # bad style, but pretty
is the same as
Class->new(ARGS); # good style, but ugly
However, this can sometimes confuse the parser, so indirect style is not recommended.
But it does hint on what print does:
print $fh ARGS
is the same as
$fh->print(ARGS)
Indeed, the filehandle $fh is treated as an object of the class IO::Handle.
(While this is a valid syntactic explanation, it is not quite true. The source of IO::Handle itself uses the line print $this #_;. The print function is just defined this way.)
Looks like you have a typo. You have put a comma between the file handle and the argument in the second print statement. If you do that, the file handle will be seen as an argument. This seems to apply only to lexical file handles. If done with a global file handle, it will produce the fatal error
No comma allowed after filehandle at ...
So, to be clear, if you absolutely have to have parentheses for your print, do this:
print($fh $stufftowrite)
Although personally I prefer to not use parentheses unless I have to, as they just add clutter.
Modern Perl book states in the Chapter 11 ("What to Avoid"), section "Indirect Notation Scalar Limitations":
Another danger of the syntax is that the parser expects a single scalar expression as the object. Printing to a filehandle stored in an aggregate variable seems obvious, but it is not:
# DOES NOT WORK AS WRITTEN
say $config->{output} 'Fun diagnostic message!';
Perl will attempt to call say on the $config object.
print, close, and say—all builtins which operate on filehandles—operate in an indirect fashion. This was fine when filehandles were package globals, but lexical filehandles (Filehandle References) make the indirect object syntax problems obvious. To solve this, disambiguate the subexpression which produces the intended invocant:
say {$config->{output}} 'Fun diagnostic message!';
Of course, print({$fh} $stufftowrite) is also possible.
It's how the syntax of print is defined. It's really that simple. There's kind of nothing to fix. If you put a comma between the file handle and the rest of the arguments, the expression is parsed as print LIST rather than print FILEHANDLE LIST. Yes, that looks really weird. It is really weird.
The way not to get parsed as print LIST is to supply an expression that can legally be parsed as print FILEHANDLE LIST. If what you're trying to do is get parentheses around the arguments to print to make it look more like an ordinary function call, you can say
print($fh $stufftowrite); # note the lack of comma
You can also say
(print $fh $stufftowrite);
if what you're trying to do is set off the print expression from surrounding code. The key point is that including the comma changes the parse.

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.

Why is parenthesis optional only after sub declaration?

(Assume use strict; use warnings; throughout this question.)
I am exploring the usage of sub.
sub bb { print #_; }
bb 'a';
This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.
However, this causes a compilation error:
bb 'a';
sub bb { print #_; }
String found where operator expected at t13.pl line 4, near "bb 'a'"
(Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.
But this does not:
bb('a');
sub bb { print #_; }
Similarly, a sub without args, such as:
special_print;
my special_print { print $some_stuff }
Will cause this error:
Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.
Ways to alleviate this particular error is:
Put & before the sub name, e.g. &special_print
Put empty parenthesis after sub name, e.g. special_print()
Predeclare special_print with sub special_print at the top of the script.
Call special_print after the sub declaration.
My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?
ETA: I know how I can fix it. I want to know the logic behind this.
I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:
In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:
sub name {...} === BEGIN {*name = sub {...}}
This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.
As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.
Without a predeclared rule, an identifier will be interpreted as follows:
bareword === 'bareword' # a string
bareword LIST === syntax error, missing ','
bareword() === &bareword() # runtime execution of &bareword
&bareword === &bareword # same
&bareword() === &bareword() # same
When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.
When predeclared with any of the following:
sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}
Then the identifier will be interpreted as follows:
bareword === &bareword() # compile time binding to &bareword
bareword LIST === &bareword(LIST) # same
bareword() === &bareword() # same
&bareword === &bareword # same
&bareword() === &bareword() # same
So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.
As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.
That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).
The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.
Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.
Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:
b $a, $c;
Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.
When the Perl compiler sees this:
b($a, $c);
It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.
When you pre-declare your function, Perl can see this:
sub b; #Or use subs qw(b); will also work.
b $a, $c;
and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.
One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.
By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.
Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.
Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).
The reason is that Larry Wall is a linguist, not a computer scientist.
Computer scientist: The grammar of the language should be as simple & clear as possible.
Avoids complexity in the compiler
Eliminates sources of ambiguity
Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.