Using perl `my` within actual function arguments - perl

I want to use perl to build a document graph as readably as possible. For re-use of nodes, I want to refer to nodes using variables (or constants, if that is easier). The following code works and illustrates the idea with node types represented by literals or factory function calls to a and b. (For simple demo purposes, the functions do not create nodes but just return a string.)
sub a (#) {
return sprintf "a(%s)", join( ' ', #_ );
}
sub b (#) {
return sprintf "b(%s)", join( ' ', #_ );
}
printf "The document is: %s\n", a(
"declare c=",
$c = 1,
$e = b(
"use",
$c,
"to declare d=",
$d = $c + 1
),
"use the result",
$d,
"and document the procedure",
$e
);
The actual and expected output of this is The document is: a(declare c= 1 b(use 1 to declare d= 2) use the result 2 and document the procedure b(use 1 to declare d= 2)).
My problem arises because I want to use strict in the whole program so that variables like $c, $d, $e must be declared using my. I can, of course, write somewhere close to the top of the text my ( $c, $d, $e );. It would be more efficient at edit-time when I could use the my keyword directly at the first mention of the variable like so:
…
printf "The document is: %s\n", a(
"declare c=",
my $c = 1,
my $e = b(
"use",
$c,
"to declare d=",
my $d = $c + 1
),
"use the result",
$d,
"and document the procedure",
$e
);
This would be kind of my favourite syntax. Unfortunately, this code yields several Global symbol "…" requires explicit package name errors. (Moreover, according to documentation, my does not return anything.)
I have the idea of such use of my from uses like in open my $file, '<', 'filename.txt' or die; or in for ( my $i = 0; $i < 100; ++$i ) {…} where declaration and definition go in one.
Since the nodes in the graph are constants, it is acceptable to use something else than lexical variables. (But I think perl's built-in mechanims are strongest and most efficient for lexical variables, which is why I am inclined into this direction.)
My current idea to solve the issue is to define a function named something like define which behind the scenes would manipulate the current set of lexical variables using PadWalker or similar. Yet this would not allow me to use a natural perl like syntax like $c = 1, which would be my preferred syntax.

I am not certain of the exact need but here's one simple way for similar manipulations.
The example in the OP wants a named variable inside the function call statement itself, so that it can be used later in that statement for another call etc. If you must have it that way then you can use a do block to work out your argument list
func1(
do {
my $x = 5;
my $y = func2($x); # etc
say "Return from the do block what is then passed as arguments...";
$x, $y
}
);
This allows you to do things of the kind that your example indicates.†
If you also want to have names available in the subroutine then pass a hash (or a hashref), with suitably chosen key names for variables, and in the sub work with key names.
Alternatively, consider normally declaring your variables ahead of the function call. There's no bad thing about it while there are many good things. Can throw in a little wrapper and make it look nice, too.
† More specifically
printf "The document is: %s\n", a( do {
my $c = 1;
my $d = $c + 1;
my $e = b( "use", $c, "to declare d=", $d );
# Return a list from this `do`, which is then passed as arguments to a()
"declare c=", $c, $e, "use the result", $d,"and document the procedure", $e
} );
(condensed into fewer lines for posting here)
This do block is a half-way measure toward moving this code into a subroutine, as I presume that there are reasons to want this inlined. However, since comments indicate that the reality is even more complex I'd urge you to write a normal sub instead (in which a graph can be built, btw).

according to documentation, my does not return anything
The documentation doesn't say that, and it's not the case.
Haven't you ever done my $x = 123;? If so, you've assigned to the result of my $x. my simply returns the newly created variable as an lvalue (assignable value), so my $x simply returns $x.
Unfortunately, this code yields several [strict vars] errors.
Symbols (variables) created by my are only visible starting with the following statement.
For better of for worse, it allows the following:
my $x = 123;
{
my $x = $x;
$x *= 2;
say $x; # 246
}
say $x; # 123
I want to use perl to build a document graph as readably as possible.
So why not do that? Right now, you are building a string, not a graph. Build a graph of objects that resolve to a string after the graph has been constructed. You can build those object with a tree of sub calls (declare( c => [ use( c => ... ), ... ] )). I'd give a better example, but the grammar of what you are generating isn't clear to me.

Your argument list makes two references each to $c, $d and $e. If you prefix the first reference with my, it will be out of scope by the time Perl gets around to parsing the second reference it won't be in scope until the next statement, so the second reference would refer to a different variable (which may violate strict vars).
Declare my ($c,$d,$e) before your function call. There is nothing wrong or inelegant about doing that.

Related

Can I make a variable optional in a perl sub prototype?

I'd like to understand if it's possible to have a sub prototype and optional parameters in it. With prototypes I can do this:
sub some_sub (\#\#\#) {
...
}
my #foo = qw/a b c/;
my #bar = qw/1 2 3/;
my #baz = qw/X Y Z/;
some_sub(#foo, #bar, #baz);
which is nice and readable, but the minute I try to do
some_sub(#foo, #bar);
or even
some_sub(#foo, #bar, ());
I get errors:
Not enough arguments for main::some_sub at tablify.pl line 72, near "#bar)"
or
Type of arg 3 to main::some_sub must be array (not stub) at tablify.pl line 72, near "))"
Is it possible to have a prototype and a variable number of arguments? or is something similar achievable via signatures?
I know it could be done by always passing arrayrefs I was wondering if there was another way. After all, TMTOWTDI.
All arguments after a semi-colon are optional:
sub some_sub(\#\#;\#) {
}
Most people are going to expect your argument list to flatten, and you are reaching for an outdated tool to do what people don't expect.
Instead, pass data structures by reference:
some_sub( \#array1, \#array2 );
sub some_sub {
my #args = #_;
say "Array 1 has " . $args[0]->#* . " elements";
}
If you want to use those as named arrays within the sub, you can use ref aliasing
use v5.22;
use experimental qw(ref_aliasing);
sub some_sub {
\my( #array1 ) = $_[0];
...
}
With v5.26, you can move the reference operator inside the parens:
use v5.26;
use experimental qw(declared_refs);
sub some_sub {
my( \#array1 ) = $_[0];
...
}
And, remember that v5.20 introduced the :prototype attribute so you can distinguish between prototypes and signatures:
use v5.20;
sub some_sub :prototype(##;#) { ... }
I write about these things at The Effective Perler (which you already read, I see), in Perl New Features, a little bit in Preparing for Perl 7 (which is mostly about what you need to stop doing in Perl 5 to be future proof).

How do I pass in a variable from one function into another in perl

I am initializing a variable within one function and would like to pass this variable into another function. This variable holds a char value.
I have tried passing in the referencing and dereferencing, declaring the variables outside of the function, and using local.
I've also looked in perlmonks, perl by example, googled and looked through this site for a solution but to no avail. I'm just starting out with perl programming so any help will be appreciated!
Sounds to me like you need to read through some documentation, not just google around. I would suggest http://www.perl.org/books/beginning-perl/.
use strict;
use warnings;
sub foo {
my $char = 'A';
bar($char);
}
sub bar {
my ($bar_char) = #_;
print "bar got char $bar_char\n";
}
foo();
If you pass a parameter by reference (see below), it can be modified by the first function and you can then pass it to another function:
#!/usr/bin/perl
sub f {
$c = shift;
$$c='m';
}
$c='a';
f(\$c);
print $c;
This will print 'm'
Is there a reason who your first function cannot return this variable?
my $config_variable = function1( $param1 );
function2 ( $config_variable, $param2 );
You can also pass more than one variable back too:
my ( $config_variable, $value ) = function1( $param1 );
my $value2 = function2( $param1, $config_variable );
This would be the best way. However, you can use globally defined variables and they can be used from function to function:
#! /usr/bin/env perl
#
use strict;
use warnings;
my $value;
func1();
func2();
sub func1 {
$value = "foo";
}
sub func2 {
print "Value = $value\n";
}
Note that I declared $value outside of both functions, so it's global in the entire file - even in the subroutines. Now, func1 can set it, and func1 can print it.
The technical term for this is: A terrible, awful, evil idea and you should never, ever1 think of doing it.
This is because a particular variable you think is set to one value suddenly and mysteriously changes values without any reason. Do this for one variable is bad enough, but if you use this as a crutch, you'll end up with dozens of variables that are impossible to track through your program.
If you find yourself doing this quite a bit, you may need to rethink your code logic.

Perl function protoypes

Why do we use function protoypes in Perl?
What are the different prototypes available? How to use them?
Example: $$,$#,\## what do they mean?
You can find the description in the official documentation: http://perldoc.perl.org/perlsub.html#Prototypes
But more important: read why you should not use function prototytpes" Why are Perl 5's function prototypes bad?
To write some functions, prototypes are absolutely neccessary, as they change the way arguments are passed, the sub invocations are parsed, and in what context the arguments are evaluated.
Below are discussions on prototypes with the builtins open and bless, as well as the effect on user-written code like a fold_left subroutine. I come to the conclusion that there are a few scenarios where they are useful, but they are generally not a good mechanism to cope with signatures.
Example: CORE::open
Some builtin functions have prototypes, e.g open. You can get the prototype of any function like say prototype "CORE::open". We get *;$#. This means:
The * takes a bareword, glob, globref or scalar. E.g. STDOUT or my $fh.
The ; makes the following arguments optional.
The $ evaluates the next item in scalar context. We'll see in a minute why this is good.
The # allows any number of arguments.
This allows invocations like
open FOO; (very bad style, equivalent to open FOO, our $FOO)
open my $fh, #array;, which parses as open my $fh, scalar(#array). Useless
open my $fh, "<foo.txt"; (bad style, allows shell injection)
open my $fh, "<", "foo.txt"; (good three-arg-open)
open my $fh, "-|", #command; (now #command is evaluated in list context, i.e. is flattened)
So why should the second argument have scalar context? (1) either you use traditional two-arg-open. Then it isn't difficult to access the first element. (2) Or you want 3-arg-open (rather: multiarg). Then having an explicit mode in the source code is neccessary, which is good style and reduces action at a distance. So this forces you to decide between the outdated flexible 2-arg or the safe multi-arg.
Further restrictions, like that the < mode can only take one filename, while -| takes at least one string (the command) plus any number of arguments, are implemented on a non-syntactic level.
Example: CORE::bless
Another interesting example is the bless function. Its prototype is $;$. I.e. takes one or two scalars.
This allows bless $self; (blesses into current package), or the better bless $self, $class. However, my #array = ($self, $class); bless #array does not work, as scalar context is imposed on the first arg. So the first argument is not a reference, but the number 2. This reduces action at a distance, and fails rather than providing a probably wrong interpretation: both bless $array[0], $array[1] or bless \#array could have been meant here. So prototypes help and augment input validation, but are no substitute for it.
Example fold_left
Let us define a function fold_left that takes a list and an action as arguments. It performs this action on the first two values of the list, and replaces them with the result. This loops until only one element, the return value is left.
Simple implementation:
sub fold_left {
my $code = shift;
while ($#_) { # loop while more than one element
my ($x, $y) = splice #_, 0, 2;
unshift #_, $code->($x, $y);
}
return $_[0];
}
This can be called like
my $sum = fold_left sub{ $_[0] + $_[1] }, 1 .. 10;
my $str = fold_left sub{ "$_[0] $_[1]" }, 1 .. 10;
my $undef = fold_left;
my $runtime_error = fold_left \"foo", 1..10;
But this is unsatisfactory: we know that the first argument is a sub, so the sub keyword is redundant. Also, We can call it without a sub, which we want to be illegal. With prototypes, we can work around that:
sub fold_left (&#) { ... }
The & states that we'll take a coderef. If this is the first argument, this allows the sub keyword and the comma after the sub block to be omitted. Now we can do
my $sum = fold_left { $_[0] + $_[1] } 1 .. 10; # aka List::Util::sum(1..10);
my $str = fold_left { "$_[0] $_[1]" } 1 .. 10; # aka join " ", 1..10;
my $compile_error1 = fold_left; # ERROR: not enough arguments
my $compile_error2 = fold_left "foo", 1..10; # ERROR: type of arg 1 must be sub{} or block.
which is reminiscent of map {...} #list
On backslash prototypes
Backslash prototypes allow to capture typed references to arguments without imposing context. This is good when we want to pass an array without flattening it. E.g.
sub mypush (\##) {
my ($arrayref, #push_these) = #_;
my $len = #$arrayref;
#$arrayref[$len .. $len + $#push_these] = #push_these;
}
my #array;
mypush #array, 1, 2, 3;
You can think of the \ protecting the # like in regexes, thus requiring a literal # character on the argument. This is where prototypes are a sad story: Requiring literal characters is a bad idea. We can't even pass a reference directly, we have to dereference it first:
my $array = [];
mypush #$array, 1, 2, 3;
even though the called code sees and wants exactly that reference. From v14 on, the + can be used instead. It accepts an array, arrayref, hash or hashref (actually, it's like $ on scalar arguments, and \[#%] on hashes and arrays). This proto does no type validation, It'll just make sure you receive a reference unless the argument already is scalar.
sub mypush (+#) { ... }
my #array;
mypush #array, 1, 2, 3;
my $array_ref = [];
mypush $array_ref, 1, 2, 3; # works as well! yay
my %hash;
mypush %hash, 1, 2, 3; # syntactically legal, but will throw fatal on dereferencing.
mypush "foo", 1, 2, 3; # ditto
Conclusion
Prototypes are a great way to bend Perl to your will. Recently I was investigating how pattern matching from functional languages can be implemented in Perl. The match itself has the prototype $% (one scalar thing which is to be matched, and an even number of further arguments. These are pairs of patterns and code).
They are also a great way to shoot yourself in the foot, and can be downright ugly. From List::MoreUtils:
sub each_array (\#;\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#) {
return each_arrayref(#_);
}
This allows you to call it as each_array #a, #b, #c ..., but it isn't much effort to directly do each_arrayref \#a, \#b, \#c, ..., which imposes no limit on the number of parameters, and is more flexible.
Especially parameters like sub foo ($$$$$$;$$) indicate a code smell, and that you should move to named parameters, Method::Signatures, or Params::Validate.
In my experience, good prototypes are
#, % to slurp any (or an even) number of args. Note that # as sole prototype is equivalent to no prototype at all.
& leading codeblocks for nicer syntax.
$ iff you need to pad a slurpy # or %, but not on their own.
I actively dislike \# etc, and have yet to see a good use for _ aside from length (_ can be the last required argument in a prototype. If no explicit value is given, $_ is used.)
Having a good documentation and requiring the user of your subs to include the occasional backslash before your arguments is generally preferable to unexpected action at a distance or having scalar context imposed surprisingly.
Prototypes can be overridden like &foo(#args), and aren't honoured on method calls, so they are already useless here.

Data types for parameters of subroutines / functions?

In Perl, can one specifiy data types for the parameters of subroutines? E.g. when using a dualvar in a numeric context like exit:
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
exit NOTIFY_DIE_MAIL_SEND_FAILED;
How does Perl in that case know, that exit expects a numeric parameter? I didn't see a way to define data types for the parameters of subroutines like you do it in Java? (where I could understand how the data type is known as it is explicitely defined)
The whole point of the dualvar is that it behaves as a number or text depending on what you want. In cases where that's not obvious (to you more importantly than to perl) then make it clear.
exit 0 + NOTIFY_DIE_MAIL_SEND_FAILED;
As for explicitly typing parameters, that's not something built in. Perl is a much more dynamic language than Java so it's not common to check/force the type of every parameter or variable. In particular, a perl sub can accept different numbers of parameters and even different structures.
If you want to validate parameters (for an external API for example) try something like Params::Validate
In addition, Moose and Moo allow a certain level of attribute typing and even coercion.
In Perl, scalars are both numeric and stringy at the same time. It is not the variables themselves that distinguish between strings and numbers, but the operators you work with. While the addition + only uses a number, the concatenation . only uses strings.
In more strongly typing languages, e.g. Java, the addition operator doubles as addition and concatenation operator, because it can access type information.
"1" + 2 + 3 is still sick in Java, whereas Perl can cleanly distinguish between "1" + 2 + 3 == 6 and "1" . 2 . 3 eq "123".
You can force numeric or stringy context of a variable by adding 0 or concatenating the empty string:
sub foo {
my ($var) = #_;
$var += 0; # $var is numeric
$var .= ""; # $var is stringy now
}
Perl is quite different from Java in that - Perl is dynamically typed language, because it does not requires its variables to be typed at compile time..
Whereas, Java is statically typed (as you know already)
Perl determines the type of the variable depending upon the context it is used..
There can be only two context: -
List Context
Scalar Context
And the context is defined by the operator or function that is used..
For EG:-
# Define a list
#arr = qw/rohit jain/;
# Define a scalar
$num = 2
# Here perl will evaluate #arr in scalar context and take its length..
# so, below code will evaluate to : - value = 2 / 2
$value = #arr / $num;
# Here since it is used with a foreach loop, #arr will be taken as in list context
foreach (#arr) {
say $_;
}
# Above foreach loop will output: - `rohit` \n `jain` to the console..
You can force the type by:
use Scalar::Util qw(dualvar);
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
say NOTIFY_DIE_MAIL_SEND_FAILED;
say int(NOTIFY_DIE_MAIL_SEND_FAILED);
output:
NOTIFY_DIE_MAIL_SEND_FAILED
3
How does Perl in that case know, that exit expects a numeric parameter?
exit expect a number as is part of its specification and its behaviour is kind of undefined if you pass it a non-integer value (i.e. you should not do it.
Now, in this particular case, how does dualvar manages to return either value type depending of the context?
I don't know how Scalar::Util's dualvar is implemented but you can write something similar with overload instead.
You certainly can modify the behaviour for a blessed object:
#!/usr/bin/env perl
use strict;
use warnings;
{package Dualvar;
use overload
fallback => 1,
'0+' => sub { $_[0]->{INT_VAL} },
'""' => sub { $_[0]->{STR_VAL} };
sub new {
my $class = shift;
my $self = { INT_VAL => shift, STR_VAL => shift };
bless($self,$class);
}
1;
}
my $x = Dualvar->new(31,'Therty-One');
print $x . " + One = ",$x + 1,"\n"; # Therty-One + One = 32
From the docs, it seems that overload actually changes the behaviour within the declaration scope so you should be able to change the behaviour of some common operators locally for any operand.
If exit does use one of those overloadable operations to evaluate its parameter into a integer then this solution would do.
I didn't see a way to define data types for the parameters of subroutines like you do it in Java?
As already said by others... this is not the case in Perl, at least not at compilation time, except for subroutine prototypes but these don't offer much type granularity (like int vs strings or different object classes).
Richard has mentioned some run-time alternatives you may use. I personally would recommend Moose if you don't mind the performance penalty.
What Rohit Jain said is correct. A function that wants input to follow certain rules simply has to explicitly check that the input is valid.
For example
sub foo
{
my ($param1,$param2) = shift;
$param1 =~ /^\d+$/ or die "Parameter 1 must be a positive integer.";
$param2 =~ /^(bar|baz)$/ or die "Parameter 2 must be either 'bar' or 'baz'";
...
}
This may seem like a pain, but:
The extra flexibility gained generally outweighs the work involved in doing this.
Simply having the correct data type is often not enough to ensure that you valid input, so you end up doing a lot this anyway even in a language like Java.

Scope of the default variable $_ in Perl

I have the following method which accepts a variable and then displays info from a database:
sub showResult {
if (#_ == 2) {
my #results = dbGetResults($_[0]);
if (#results) {
foreach (#results) {
print "$count - $_[1] (ID: $_[0])\n";
}
} else {
print "\n\nNo results found";
}
}
}
Everything works fine, except the print line in the foreach loop. This $_ variable still contains the values passed to the method.
Is there anyway to 'force' the new scope of values on $_, or will it always contain the original values?
If there are any good tutorials that explain how the scope of $_ works, that would also be cool!
Thanks
The problem here is that you're using really #_ instead of $_. The foreach loop changes $_, the scalar variable, not #_, which is what you're accessing if you index it by $_[X]. Also, check again the code to see what it is inside #results. If it is an array of arrays or refs, you may need to use the indirect ${$_}[0] or something like that.
In Perl, the _ name can refer to a number of different variables:
The common ones are:
$_ the default scalar (set by foreach, map, grep)
#_ the default array (set by calling a subroutine)
The less common:
%_ the default hash (not used by anything by default)
_ the default file handle (used by file test operators)
&_ an unused subroutine name
*_ the glob containing all of the above names
Each of these variables can be used independently of the others. In fact, the only way that they are related is that they are all contained within the *_ glob.
Since the sigils vary with arrays and hashes, when accessing an element, you use the bracket characters to determine which variable you are accessing:
$_[0] # element of #_
$_{...} # element of %_
$$_[0] # first element of the array reference stored in $_
$_->[0] # same
The for/foreach loop can accept a variable name to use rather than $_, and that might be clearer in your situation:
for my $result (#results) {...}
In general, if your code is longer than a few lines, or nested, you should name the variables rather than relying on the default ones.
Since your question was related more to variable names than scope, I have not discussed the actual scope surrounding the foreach loop, but in general, the following code is equivalent to what you have.
for (my $i = 0; $i < $#results; $i++) {
local *_ = \$results[$i];
...
}
The line local *_ = \$results[$i] installs the $ith element of #results into the scalar slot of the *_ glob, aka $_. At this point $_ contains an alias of the array element. The localization will unwind at the end of the loop. local creates a dynamic scope, so any subroutines called from within the loop will see the new value of $_ unless they also localize it. There is much more detail available about these concepts, but I think they are outside the scope of your question.
As others have pointed out:
You're really using #_ and not $_ in your print statement.
It's not good to keep stuff in these variables since they're used elsewhere.
Officially, $_ and #_ are global variables and aren't members of any package. You can localize the scope with my $_ although that's probably a really, really bad idea. The problem is that Perl could use them without you even knowing it. It's bad practice to depend upon their values for more than a few lines.
Here's a slight rewrite in your program getting rid of the dependency on #_ and $_ as much as possible:
sub showResults {
my $foo = shift; #Or some meaningful name
my $bar = shift; #Or some meaningful name
if (not defined $foo) {
print "didn't pass two parameters\n";
return; #No need to hang around
}
if (my #results = dbGetResults($foo)) {
foreach my $item (#results) {
...
}
}
Some modifications:
I used shift to give your two parameters actual names. foo and bar aren't good names, but I couldn't find out what dbGetResults was from, so I couldn't figure out what parameters you were looking for. The #_ is still being used when the parameters are passed, and my shift is depending upon the value of #_, but after the first two lines, I'm free.
Since your two parameters have actual names, I can use the if (not defined $bar) to see if both parameters were passed. I also changed this to the negative. This way, if they didn't pass both parameters, you can exit early. This way, your code has one less indent, and you don't have a if structure that takes up your entire subroutine. It makes it easier to understand your code.
I used foreach my $item (#results) instead of foreach (#results) and depend upon $_. Again, it's clearer what your program is doing, and you wouldn't have confused $_->[0] with $_[0] (I think that's what you were doing). It would have been obvious you wanted $item->[0].