Perl subroutines - do they need to be called with parentheses? [duplicate] - perl

This question already has answers here:
Why is parenthesis optional only after sub declaration?
(4 answers)
Closed 7 years ago.
I built a simple subroutine, and I have a question about whether calling it requires parentheses.
#!/usr/bin/perl
sub echo {
print "#_ \n" ;
}
echo(#ARGV);
When I use
echo #ARGV
or
echo (#ARGV)
or (without a space)
echo(#ARGV)
they all work. Which one is correct?

echo #ARGV, echo (#ARGV), and echo(#ARGV) are technically all correct in your case, but using parentheses is sometimes necessary; beyond that, it's a matter of choice, and some people employ conventions around when to use what style.
Enclosing the entire argument list in parentheses ALWAYS works - whether spaces precede the opening parenthesis or not - echo (#ARGV) or echo(#ARGV) - and whether you're calling a built-in or user-defined function (subroutine):
As #xxfelixxx notes in a comment on the question, perldoc perlstyle recommends no spaces between the function name and ( - echo(#ARGV).
Enclosing the entire argument list in parentheses can be used to disambiguate:
print (1 + 2) + 4 prints only 3, because (1 + 2) is interpreted as the entire argument list (with + 4 added to the the expression value of the print call, the result of which is not output).
print((1 + 2) + 4) resolves the ambiguity and prints 7.
Alternatively, prefix the parenthesized first argument with + to achieve the same effect: print +(1 + 2) + 4 also prints 7.
Not using parentheses - echo #ARGV - works:
with built-in functions: always
with user-defined functions: ONLY if they are predeclared, which can be ensured in one of the following ways:
The function is defined in the same script before its invocation.
The function is forward-declared with sub <name>; before its invocation.
The function is imported from a module with use before its invocation (require is not enough).
This predeclaration requirement is is an unfortunate side effect of backward compatibility with the earliest Perl versions - see this answer.
In the absence of a predeclaration, the safest approach is to use parentheses (while &echo #ARGV works in principle, it bypasses any prototypes (a form of parameter typing) that the function may declare).
As for conventions:
Since using parentheses always works (even when not strictly needed) with user-defined functions, whereas they are never needed for built-in functions, some people recommend always using parentheses with user-defined functions, and never with built-in functions.
In source code that adheres to this convention, looking at any function call then tells you whether a built-on or user-defined function is being invoked.

The parens are optional. You need them in some situations to explicitly show which values are the arguments to the function, for example, when passing the result of a function call to another function:
myfunc(1, 2, otherfunc(3), "z");
Without parens around the 3, otherfunc will receive both 3 and "z" as arguments.
As xxfelixxx mentioned, it's best to use them all the time.

Related

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)
Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).
Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

Subroutine name beginning with underscore in Perl

Here's an example from the Markdown source:
sub _StripLinkDefinitions{ somecode }
What does it mean? Is it just a convention or a part of the language?
It is an convention, documented in the perlstyle:
You can use a leading underscore to indicate that a variable or
function should not be used outside the package that defined it.
Also, in the Perl best practices page 49, says:
Prefix “for internal use only” subroutines with an underscore.
with the explanation:
A utility subroutine exists only to simplify the implementation of a
module or class. It is never supposed to be exported from its module,
nor ever to be used in client code.
Always use an underscore as the
first “letter” of any utility subroutine’s name. A leading underscore
is ugly and unusual and reserved (by ancient C/Unix convention) for
non-public components of a system. The presence of a leading
underscore in a subroutine call makes it immediately obvious when part
of the implementation has been mistaken for part of the interface.
Related: The underscore has an special meaning too, as a part of a language - e.g.:
the variable what's name is only the underscore, (check perlvar) - e.g:
$_ - The default input and pattern-searching space.
#_ - list of all subroutine arguments
_ - The special file handle what caches the information from the last stator file test operator (such -f)
language constructions what starts and ends with a double underscore, such: __DATA__, __END__, __FILE__, __PACKAGE__, __LINE__

Why does the Perl CGI module use hyphens to start named arguments?

I am a novice. My question is what is the "-" before the keys (type, expires name etc) standing for? Why not just use the plain hash table way and discard the hyphen?
# #!/usr/local/bin/perl -w
use CGI;
$q = CGI->new;
print $q->header(-type=>'image/gif',-expires=>'+3d');
$q->param(-name=>'veggie',-value=>'tomato');
The author already explained in the documentation.
Most CGI.pm routines accept several
arguments, sometimes as many as 20
optional ones! To simplify this
interface, all routines use a named
argument calling style that looks like
this:
print
$q->header(-type=>'image/gif',-expires=>'+3d');
Each argument name is preceded by a
dash. Neither case nor order matters
in the argument list. -type, -Type,
and -TYPE are all acceptable. In
fact, only the first argument needs to
begin with a dash. If a dash is
present in the first argument, CGI.pm
assumes dashes for the subsequent
ones.
Several routines are commonly called
with just one argument. In the case
of these routines you can provide the
single argument without an argument
name. header() happens to be one of
these routines. In this case, the
single argument is the document type.
print $q->header('text/html');
See perlop:
If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that -bareword is equivalent to the string "-bareword". (emphasis mine)
This is just an older style of perl arguments that isn't usually used in newer modules. It's not exactly deprecated, it's just an older style based on how Perl allows you to not quote your hash keys if they start with a dash.
I don't know what you mean by the 'plain hashtable way'. The way CGI::pm is implemented, names of properties are (in most cases) required to be preceded by '-', presumably so that they can be identified.
Or to put it another way, the hash-key required by CGI::header to identify the 'type' property is '-type'.
That's just the way CGI.pm is defined.

Why is the 'Use of "shift" without parentheses is ambiguous' warning issued by Perl?

Does anyone know what parsing or precedence decisions resulted in the warning 'Use of "shift" without parentheses is ambiguous' being issued for code like:
shift . 'some string';
# and not
(shift) . 'some string'; # or
shift() . 'some string';
Is this intentional to make certain syntactic constructs easier? Or is it merely an artifact of the way perl's parser works?
Note: this is a discussion about language design, not a place to suggest
"#{[shift]}some string"
With use diagnostics, you get the helpful message:
Warning: Use of "shift" without parentheses is ambiguous at (eval
9)[/usr/lib/perl5/5.8/perl5db.pl:628] line 2 (#1)
(S ambiguous) You wrote a unary operator followed by something that
looks like a binary operator that could also have been interpreted as a
term or unary operator. For instance, if you know that the rand
function has a default argument of 1.0, and you write
rand + 5;
you may THINK you wrote the same thing as
rand() + 5;
but in actual fact, you got
rand(+5);
So put in parentheses to say what you really mean.
The fear is you could write something like shift .5 and it will be parsed like shift(0.5).
Ambiguous doesn't mean truly ambiguous, just ambiguous as far as the parser had determined.
shift . in particular is "ambiguous" because . can start a term (e.g. .123) or an operator,
so it doesn't know enough to decide whether what follows is shift's operand or an operator for which shift() is the operand (and the parser isn't smart enough to know that: a) the .
isn't the start of such a term or b) .123 isn't a valid operand for shift).

How does this Perl one liner to check if a directory is empty work?

I got this strange line of code today, it tells me 'empty' or 'not empty' depending on whether the CWD has any items (other than . and ..) in it.
I want to know how it works because it makes no sense to me.
perl -le 'print+(q=not =)[2==(()=<.* *>)].empty'
The bit I am interested in is <.* *>. I don't understand how it gets the names of all the files in the directory.
It's a golfed one-liner. The -e flag means to execute the rest of the command line as the program. The -l enables automatic line-end processing.
The <.* *> portion is a glob containing two patterns to expand: .* and *.
This portion
(q=not =)
is a list containing a single value -- the string "not". The q=...= is an alternate string delimiter, apparently used because the single-quote is being used to quote the one-liner.
The [...] portion is the subscript into that list. The value of the subscript will be either 0 (the value "not ") or 1 (nothing, which prints as the empty string) depending on the result of this comparison:
2 == (()=<.* *>)
There's a lot happening here. The comparison tests whether or not the glob returned a list of exactly two items (assumed to be . and ..) but how it does that is tricky. The inner parentheses denote an empty list. Assigning to this list puts the glob in list context so that it returns all the files in the directory. (In scalar context it would behave like an iterator and return only one at a time.) The assignment itself is evaluated in scalar context (being on the right hand side of the comparison) and therefore returns the number of elements assigned.
The leading + is to prevent Perl from parsing the list as arguments to print. The trailing .empty concatenates the string "empty" to whatever came out of the list (i.e. either "not " or the empty string).
<.* *>
is a glob consisting of two patterns: .* are all file names that start with . and * corresponds to all files (this is different than the usual DOS/Windows conventions).
(()=<.* *>)
evaluates the glob in list context, returning all the file names that match.
Then, the comparison with 2 puts it into scalar context so 2 is compared to the number of files returned. If that number is 2, then the only directory entries are . and .., period. ;-)
<.* *> means (glob(".*"), glob("*")). glob expands file patterns the same way the shell does.
I find that the B::Deparse module helps quite a bit in deciphering some stuff that throws off most programmers' eyes, such as the q=...= construct:
$ perl -MO=Deparse,-p,-q,-sC 2>/dev/null << EOF
> print+(q=not =)[2==(()=<.* *>)].empty
> EOF
use File::Glob ();
print((('not ')[(2 == (() = glob('.* *')))] . 'empty'));
Of course, this doesn't instantly produce "readable" code, but it surely converts some of the stumbling blocks.
The documentation for that feature is here. (Scroll near the end of the section)