Function parameter separator in Perl? - perl

Function parameters are usually separated by comma(,), but seems also by space in some cases, like print FILEHANDLE 'string'. Why are both of such separators necessary?

print is special builtin Perl function with special syntax rules. Documentation lists four possible invocations:
print FILEHANDLE LIST
print FILEHANDLE
print LIST
print
So while in general function arguments are separated by comma (,), this is an exception that disambiguates print destination from contents to be printed. Another function that exhibit this behavior is system.
Other feature that shares this syntax is called "Indirect Object Notation". The expression in form:
function object arg1, arg2, … angn
# ^- no comma here
Is equivalent to:
object->function(arg1, arg2, … argn)
So that following statement pairs are equivalent:
$foo = new Bar;
$foo = Bar->new;
The Indirect Object Notation has several problems and generally should be avoided, save for few well-known idioms such as print F "something"

In that example, because print takes a list of arguments and has a default. So you need to tell the difference between printing a scalar, and printing to a filehandle.
The interpreter can tell the difference between:
print $fh "some scalar text";
print $bar, "set to a value ","\n";
print $fh;
print $bar;
But this is rather a special case - most functions don't work like that. I normally suggest for print, that surrounding the filehandle arg in braces differentiates.
You can look at prototypes as a way to get perl to do things with parameters, but I'd also normally suggest that makes for less clear code

Related

In Perl, how does readline assign to $_ in a loop condition but not elsewhere?

How is readline implemented in Perl?
Question is why readline sets $_ if readline is used in a loop condition such as:
while(<>) {
#here $_ is set
print;
}
On the contrary, if we just do
<>;
print; #$_ is not set here
It will not print anything?
How is this implemented? How does the function know it is used in a loop condition statement? Or it is just a built-in behavior so designed that way?
In this case, there's nothing special about the implementation of readline. It never sets $_. Instead, there's a special case in the Perl compiler that examines the condition of a while loop and rewrites certain conditions internally.
For example, while (<>) {} gets rewritten into
while (defined($_ = <ARGV>)) {
();
}
You can see this with perl -MO=Deparse -e 'while (<>) {}'.
This is documented under I/O Operators in perlop:
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a while statement (even if disguised as a for(;;) loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously.
It's also mentioned in Loop Control & For Loops in perlsyn.
while is a special case here assigning to $_. In the second case you just read everything on the command line and throw it away immediately. For further details, read the docs: https://metacpan.org/pod/perlop#I-O-Operators
Perl functions can behave differently in different contexts.
Your first example is a scalar context. Your second example is a void context.
You can determine the calling context of a function by using the built-in wantarray.
perldoc perlvar declares all the places where $_ is modified or used:
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf, quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only), rmdir, say, sin, split (for its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f , -d) except for -t, which defaults to STDIN. See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put the next value or input record when a <FH>, readline, readdir or each operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.

Why can't I use a typeglob in the diamond operator in Perl?

Usually bareword as the filehanle or a variable holds filehandle could be places inside <> operator to reference the file, but NOT the filehandle extracted from typeglob as the last line below shows. Why it doesn't work because the last case also references a filehandle?
open FILE, 'file.txt';
my $myfile = *FILE{IO};
print <$myfile>;
print <*FILE{IO}>; # this line doesn't work.
<> is among other things shortcut for readline(), and it accepts simple scalars or bare word, ie. <FILE>. For more complex expressions you have to be more explicit,
print readline *FILE{IO};
otherwise it will be interpreted as glob()
perl -MO=Deparse -e 'print <*FILE{IO}>;'
use File::Glob ();
print glob('*FILE{IO}');
In perlop, it says:
If what's within the angle brackets is neither a filehandle nor a
simple scalar variable containing a filehandle name, typeglob, or
typeglob reference, it is interpreted as a filename pattern to be
globbed ...
Since we want to be able to say things like:
foreach (<*.c>) {
# Do something for each file that matches *.c
}
it is not possible for perl to interpret the '*' as meaning a typeglob.
As noted in the other answer, you can work around this using readline, or you can assign the typeglob to a scalar first (as your example shows).

Passing Arguments Without Commas and With Produces Different Results

I'm trying to figure out why these two lines produce different results:
print($fh, "text"); --> 0x10101010 text (on STDOUT)
print($fh "text"); --> text (inside of file $fh)
When I have the comma I understand I create a list and when print only has a list it prints the list to STDOUT.
But, what is print doing when I don't have a comma? The result I want is the one I get without a comma.
This is strange to me and counters me expecting the one with the comma to work for my intended purpose. Code I usually see does filehandle printing with a line like "print $file "text"", but I want to use the parentheses as I find that more consistent with other languages. But, not putting a comma is just as inconsistent.
An explanation of the internals of "print" might help me understand. How is it getting the FILEHANDLE and LIST separate when there is no comma?
Docs: http://perldoc.perl.org/functions/print.html
Thanks!
print isn't a normal function, and you shouldn't call it with the parentheses because you're not really passing a parameter list to the function.
The way I typically it written is
print {$fh} 'text';
print {$fh} 'text1', 'text2';
or not going to a file:
print 'text';
print 'text1', 'text2';
You ask "How is it getting the FILEHANDLE and LIST separate when there is no comma?" and the answer is "Magic, because it's not a normal function."
In Perl, parens are mostly just used for precedence. It is customary to call builtins like print without parens – this emphasizes that they aren't subroutines, but special syntax like for, map, split, or my.
In your case, you have a variety of possibilities:
Leave out the comma, but this is error-prone:
print($fh #list);
print $fh (#list);
Use curly braces around the file handle (which I would suggest anyway):
print { $fh } (#list);
print({ $fh } #list);
Use the object-oriented interface:
use IO::File; # on older perls
$fh->print(#list);

Why does Perl sub s require &

The following file does not compile:
sub s {
return 'foo';
}
sub foo {
my $s = s();
return $s if $s;
return 'baz?';
}
The error from perl -c is:
syntax error at foobar.pl line 5 near "return"
(Might be a runaway multi-line ;; string starting on line 3)
foobar.pl had compilation errors.
But if I replace s() with &s() it works fine. Can you explain why?
The & prefix definitively says you want to call your own function called "s", rather than any built-in with the same name. In this case, it's confusing it for a substitution operator (like $stuff =~ s///;, which can also be written s()()).
Here's a PerlMonks discussion about what the ampersand does.
The problem you have, as has already been pointed out, is that s() is interpreted as the s/// substitution operator. Prefixing the function name with an ampersand is a workaround, although I would not say necessarily the correct one. In perldoc perlsub the following is said about calling subroutines:
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
&NAME(LIST); # Circumvent prototypes.
&NAME; # Makes current #_ visible to called subroutine.
What the ampersand does here is merely to distinguish between the built-in function and your own.
The "proper" way to deal with this, apart from renaming your subroutine, is to realize what's going on under the surface. When you say
s();
What you are really saying is
CORE::s();
When what you mean is
main::s();
my $s = 's'->();
works too--oddly enough with strict on.

How to tell perl to print to a file handle instead of printing the file handle?

I'm trying to wrap my head around the way Perl handles the parsing of arguments to print.
Why does this
print $fh $stufftowrite
write to the file handle as expected, but
print($fh, $stufftowrite)
writes the file handle to STDOUT instead?
My guess is that it has something to do with the warning in the documentation of print:
Be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all arguments (or interpose a + , but that doesn't look as good).
Should I just get used to the first form (which just doesn't seem right to me, coming from languages that all use parentheses around function arguments), or is there a way to tell Perl to do what I want?
So far I've tried a lot of combination of parentheses around the first, second and both parameters, without success.
On lists
The structure bareword (LIST1), LIST2 means "apply the function bareword to the arguments LIST1", while bareword +(LIST1), LIST2 can, but doesn't neccessarily mean "apply bareword to the arguments of the combined list LIST1, LIST2". This is important for grouping arguments:
my ($a, $b, $c) = (0..2);
print ($a or $b), $c; # print $b
print +($a or $b), $c; # print $b, $c
The prefix + can also be used to distinguish hashrefs from blocks, and functions from barewords, e.g. when subscripting an hash: $hash{shift} returns the shift element, while $hash{+shift} calls the function shift and returns the hash element of the value of shift.
Indirect syntax
In object oriented Perl, you normally call methods on an object with the arrow syntax:
$object->method(LIST); # call `method` on `$object` with args `LIST`.
However, it is possible, but not recommended, to use an indirect notation that puts the verb first:
method $object (LIST); # the same, but stupid.
Because classes are just instances of themselves (in a syntactic sense), you can also call methods on them. This is why
new Class (ARGS); # bad style, but pretty
is the same as
Class->new(ARGS); # good style, but ugly
However, this can sometimes confuse the parser, so indirect style is not recommended.
But it does hint on what print does:
print $fh ARGS
is the same as
$fh->print(ARGS)
Indeed, the filehandle $fh is treated as an object of the class IO::Handle.
(While this is a valid syntactic explanation, it is not quite true. The source of IO::Handle itself uses the line print $this #_;. The print function is just defined this way.)
Looks like you have a typo. You have put a comma between the file handle and the argument in the second print statement. If you do that, the file handle will be seen as an argument. This seems to apply only to lexical file handles. If done with a global file handle, it will produce the fatal error
No comma allowed after filehandle at ...
So, to be clear, if you absolutely have to have parentheses for your print, do this:
print($fh $stufftowrite)
Although personally I prefer to not use parentheses unless I have to, as they just add clutter.
Modern Perl book states in the Chapter 11 ("What to Avoid"), section "Indirect Notation Scalar Limitations":
Another danger of the syntax is that the parser expects a single scalar expression as the object. Printing to a filehandle stored in an aggregate variable seems obvious, but it is not:
# DOES NOT WORK AS WRITTEN
say $config->{output} 'Fun diagnostic message!';
Perl will attempt to call say on the $config object.
print, close, and say—all builtins which operate on filehandles—operate in an indirect fashion. This was fine when filehandles were package globals, but lexical filehandles (Filehandle References) make the indirect object syntax problems obvious. To solve this, disambiguate the subexpression which produces the intended invocant:
say {$config->{output}} 'Fun diagnostic message!';
Of course, print({$fh} $stufftowrite) is also possible.
It's how the syntax of print is defined. It's really that simple. There's kind of nothing to fix. If you put a comma between the file handle and the rest of the arguments, the expression is parsed as print LIST rather than print FILEHANDLE LIST. Yes, that looks really weird. It is really weird.
The way not to get parsed as print LIST is to supply an expression that can legally be parsed as print FILEHANDLE LIST. If what you're trying to do is get parentheses around the arguments to print to make it look more like an ordinary function call, you can say
print($fh $stufftowrite); # note the lack of comma
You can also say
(print $fh $stufftowrite);
if what you're trying to do is set off the print expression from surrounding code. The key point is that including the comma changes the parse.