Passing arrays to functions in Perl - perl

I think I have misunderstood some aspects of argument passing to functions in Perl. What's the difference between func(\#array) and func(#array)?
AFAIK, in both functions, arguments are passed by reference and in both functions we can change the elements of #array in the main program. So what's the difference? When should we use which?
#array = (1,2,3);
func(#array);
func(\#array);
sub func {
...
}
Also, how do I imitate pass-by-value in Perl? Is using #_ the only way?

It's impossible to pass arrays to subs. Subs take a list of scalars for argument. (And that's the only thing they can return too.)
You can pass a reference to an array:
func(\#array)
You can pass the elements of an array:
func(#array)
When should we use which?
If you want to pass more than just the elements of the array (e.g. pass $x, $y and #a), it can become tricky unless you pass a reference.
If you're going to process lists (e.g. sum mysub grep { ... } ...), you might not want to pass a reference.
If you want to modify the array (as opposed to just modifying the existing elements of the array), you need to pass a reference.
It can be more efficient to pass a reference for long arrays, since creating and putting one reference on the stack is faster than creating an alias for each element of a large array. This will rarely be an issue, though.
It's usually decided by one of the first two of the above. Beyond that, it's mostly a question of personal preference.
Also, how do I imitate pass-by-value in Perl?
sub foo {
my ($x) = #_; # Changing $x doesn't change the argument.
...
}
sub foo {
my #a = #_; # Changing #a or its contents
... # doesn't change the arguments.
}

AFAIK, in both functions, arguments are passed by reference and in both functions we can change the elements of #array in the main program.
"change the elements of", yes. However, in the func(#array) case, the sub has no means to make other changes to the array (truncating it, pushing, popping, slicing, passing a reference to something else, even undef'ing it).
I would avoid using the term "passed by reference", since the mechanism is completely different than Perl's references. It is less overloaded :) to say that in the sub, #_'s elements start off aliased to the elements passed to the sub.

func(\#array) passes a reference. func(#array) passes a list (of the elements in #array). As Keith pointed out, these elements are passed by reference. However, you can make a copy inside the sub in order to pass by value.
What you are after is this:
sub func {
my #array = #_;
}
This will pass a copy of the arguments of func to #array, which is a local variable within the scope of the subroutine.
Documentation here

Related

How to avoid input modification in PDL subroutines

I would like to avoid the assignment operator .= to modify the user input from a subroutine.
One way to avoid this is to perform a copy of the input inside the subroutine. Is this the best way to proceed? Are there other solutions?
use PDL;use strict;
my $a=pdl(1);
f_0($a);print "$a\n";
f_1($a);print "$a\n";
sub f_0{
my($input)=#_;
my $x=$input->copy;
$x.=0;
}
sub f_1{
my($input)=#_;
$input.=0;
}
In my case (perl 5.22.1), executing last script prints 1 and 0 in two lines. f_0 does not modify user input in-place, while f_1 does.
According to the FAQ 6.17 What happens when I have several references to the same PDL object in different variables :
Piddles behave like Perl references in many respects. So when you say
$a = pdl [0,1,2,3]; $b = $a;
then both $b and $a point to the same
object, e.g. then saying
$b++;
will not create a copy of the original piddle but just
increment in place
[...]
It is important to keep the "reference nature" of piddles in mind when
passing piddles into subroutines. If you modify the input piddles you
modify the original argument, not a copy of it. This is different from
some other array processing languages but makes for very efficient
passing of piddles between subroutines. If you do not want to modify
the original argument but rather a copy of it just create a copy
explicitly...
So yes, to avoid modification of the original, create a copy as you did:
my $x = $input->copy;
or alternatively:
my $x = pdl( $input );

Perl: reference to a hash to pass to another routine

In code that uploads excel spreadsheets it gives me the data in array ref:
for( #{$listref} ){...
I access it with $_->{'whateverthehashkeyis'} and have no problem.
What I need to do is pass the hash I am accessing in the current iteration of the loop to another subroutine.
This is where I am having problems. I have tried different things with no luck.
This DOES NOT work, but it should be an example of what I need to do
%args = #{$_};
$results = &format_trading_card_preview_item(\%args);
....
sub format_trading_card_preview_item
{
my %args = shift;
I think what I need to do is dereference the hash to send it over. Is that right?
Thanks in advance for any help
It looks like $listref is a reference to an array of hash references.
If you need to use the variable holding the hash references then it is better if you name that variable instead of using the default scalar $_
There is also no point in dereferencing the hash and copying it to %args, only to take a reference to that hash and pass it as a parameter to your subroutine
And it is wrong to call a subroutine with an ampersand & character, and has been so ever since Perl v5.5 landed over seventeen years ago
Your loop should look like this
for my $item ( #$listref ) {
format_trading_card_preview_item($item);
}
Within the subroutine, it depends a lot on what you want to do with the hash passed in, but you don't say anything about that, so it's probably best to leave it as a reference and write
sub format_trading_card_preview_item {
my ($item) = #_;
...
}
or you could use the statement modifier form of for, like this
format_trading_card_preview_item($_) for #$listref;
To answer your question, you don't need to dereference the hash reference in order to pass it to another subroutine. Creating a shallow copy and then taking a reference to that new hash is inefficient, but it would technically work just fine.
However, your problem is that you're confusing hashes and arrays by using the syntax to dereference an array reference on something that is actually a hash reference. In fact, you should have gotten an error message basically saying the same thing:
Not an ARRAY reference at foo.pl line ...
What you actually want to do is something like this:
for my $href (#$listref) { # variable names could be better
# do something
my $results = format_trading_card_preview_item($href);
# do something else
}
sub format_trading_card_preview_item {
my $args = shift;
print $args->{foo};
return 42;
}
Check out perlreftut and perlref for more information on Perl references and nested data structures.

Pass by value vs pass by reference for a Perl hash

I'm using a subroutine to make a few different hash maps. I'm currently passing the hashmap by reference, but this conflicts when doing it multiple times. Should I be passing the hash by value or passing the hash reference?
use strict;
use warnings;
sub fromFile($){
local $/;
local our %counts =();
my $string = <$_[0]>;
open FILE, $string or die $!;
my $contents = <FILE>;
close FILE or die $!;
my $pa = qr{
( \pL {2} )
(?{
if(exists $counts{lc($^N)}){
$counts{lc($^N)} = $counts{lc($^N)} + 1;
}
else{
$counts{lc($^N)} = '1';
}
})
(*FAIL)
}x;
$contents =~ $pa;
return %counts;
}
sub main(){
my %english_map = &fromFile("english.txt");
#my %german_map = &fromFile("german.txt");
}
main();
When I run the different txt files individually I get no problems, but with both I get some conflicts.
Three comments:
Don't confuse passing a reference with passing by reference
Passing a reference is passing a scalar containing a reference (a type of value).
The compiler passes an argument by reference when it passes the argument without making a copy.
The compiler passes an argument by value when it passes a copy of the argument.
Arguments are always passed by reference in Perl
Modifying a function's parameters (the elements of #_) will change the corresponding variable in the caller. That's one of the reason the convention to copy the parameters exists.
my ($x, $y) = #_; # This copies the args.
Of course, the primary reason for copying the parameters is to "name" them, but it saves us from some nasty surprises we'd get by using the elements of #_ directly.
$ perl -E'sub f { my ($x) = #_; "b"=~/(.)/; say $x; } "a"=~/(.)/; f($1)'
a
$ perl -E'sub f { "b"=~/(.)/; say $_[0]; } "a"=~/(.)/; f($1)'
b
One cannot pass an array or hash as an argument in Perl
The only thing that can be passed to a Perl sub is a list of scalars. (It's also the only thing that can be returned by one.)
Since #a evaluates to $a[0], $a[1], ... in list context,
foo(#a)
is the same as
foo($a[0], $a[1], ...)
That's why we create a reference to the array or hash we want to pass to a sub and pass the reference.
If we didn't, the array or hash would be evaluated into a list of scalars, and it would have to be reconstructed inside the sub. Not only is that expensive, it's impossible in cases like
foo(#a, #b)
because foo has no way to know how many arguments were returned by #a and how many were returned by #b.
Note that it's possible to make it look like an array or hash is being passed as an argument using prototypes, but the prototype just causes a reference to the array/hash to be created automatically, and that's what actually passed to the sub.
For a couple of reasons you should use pass-by-reference, but the code you show returns the hash by value.
You should use my rather than local except for built-in variables like $/, and then for only as small a scope as possible.
Prototypes on subroutines are almost never a good idea. They do something very specific, and if you don't know what that is you shouldn't use them.
Calling subroutines using the ampersand sigil, as in &fromFile("english.txt"), hasn't been correct since Perl 4, about twenty years ago. It affects the parameters delivered to a subroutine in at least two different ways and is a bad idea.
I'm not sure why you are using a file glob with my $string = <$_[0]>. Are you expecting wildcards in the filename passed as the parameter? If so then you will be opening and reading only the first matching file, otherwise the glob is unnecessary.
Lexical file handles like $fh are better than bareword file handles like FILE, and will be closed implicitly when they are destroyed - usually at the end of the block where they are declared.
I am not sure how your hash %counts gets populated. No regex on its own can fill a hash, but I will have to trust you!
Try this version. People familiar with Perl will thank you (ironically!) for not using camel-case variable names. And it is rare to see a main subroutine declared and called. That is C, this is Perl.
Update I have changed this code to do what your original regex did.
use strict;
use warnings;
sub from_file {
my ($filename) = #_;
my $contents = do {
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
local $/;
my $contents = <$fh>;
};
my %counts;
$counts{lc $1}++ while $contents =~ /(?=(\pL{2}))/g;
return \%counts;
}
sub main {
my $english_map = from_file('english.txt');
my $german_map = from_file('german.txt');
}
main();
You can use either a reference or pass the entire hash or array. Your choice. There are two issues that might make you choose one over the other:
Passing other parameters
Memory Management
Perl doesn't really have subroutine parameters. Instead, you're simply passing in an array of parameters. What if you're subroutine is seeing which array has more elements. I couldn't do this:
foo(#first, #second);
because all I'll be passing in is one big array that combines all the members of both. This is true with hashes too. Imagine a program that takes two hashes and finds the ones with common keys:
#common_keys = common(%hash1, %hash1);
Again, I'm combining all the keys and their values in both hashes into one big array.
The only way around this issue is to pass a reference:
foo(\#first, \#second);
#common_keys = common(\%hash1, \%hash2);
In this case, I'm passing the memory location where these two hashes are stored in memory. My subroutine can use those hash references. However, you do have to take some care which I'll explain with the second explanation.
The second reason to pass a reference is memory management. If my array or hash is a few dozen entries, it really doesn't matter all that much. However, imagine I have 10,000,000 entries in my hash or array. Copying all those members could take quite a bit of time. Passing by reference saves me memory, but with a terrible cost. Most of the time, I'm using subroutines as a way of not affecting my main program. This is why subroutines are suppose to use their own variables and why you're taught in most programming courses about variable scope.
However, when I pass a reference, I'm breaking that scope. Here's a simple program that doesn't pass a reference.
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (#array);
print join ( ":", #array ) . "\n";
sub foo {
my #foo_array = #_;
$foo_array[1] = "FOO";
}
Note that the subroutine foo1 is changing the second element of the passed in array. However, even though I pass in #array into foo, the subroutine doesn't change the value of #array. That's because the subroutine is working on a copy (created by my #foo_array = #_;). Once the subroutine exists, the copy disappears.
When I execute this program, I get:
this:that:the:other
Now, here's the same program, except I'm passing in a reference, and in the interest of memory management, I use that reference:
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (\#array);
print join ( ":", #array ) . "\n";
sub foo {
my $foo_array_ref = shift;
$foo_array_ref->[1] = "FOO";
}
When I execute this program, I get:
this:FOO:the:other
That's because I don't pass in the array, but a reference to that array. It's the same memory location that holds #array. Thus, changing the reference in my subroutine causes it to be changed in my main program. Most of the time, you do not want to do this.
You can get around this by passing in a reference, then copying that reference to an array. For example, if I had done this:
sub foo {
my #foo_array = #{ shift() };
I would be making a copy of my reference to another array. It protects my variables, but it does mean I'm copying my array over to another object which takes time and memory. Back in the 1980s when I first was programming, this was a big issue. However, in this age of gigabyte memory and quadcore processors, the main issue isn't memory management, but maintainability. Even if your array or hash contained 10 million entries, you'll probably not notice any time or memory issues.
This also works the other way around too. I could return from my subroutine a reference to a hash or the entire hash. Many people like returning a reference, but this could be problematic.
In object oriented Perl programming, I use references to keep track of my objects. Normally, I'll have a reference to a hash I can use to store other values, arrays, and hashes.
In a recent program, I was counting IDs and how many times they are referenced in a log file. This was stored in an object (which is just a reference to a hash). I had a method that would return the entire hash of IDs and their counts. I could have done this:
return $self->{COUNT_HASH};
But, what happened, if the user started modifying that reference I passed? They would be actually manipulating my object without using my methods to add and subtract from the IDs. Not something that I want them to do. Instead, I create a new hash, and then return a reference to that hash:
my %hash_counts = % { $self-{COUNT_HASH} };
return \%hash_count;
This copied my reference to an array, and then I passed the reference to the array. This protects my data from outside manipulation. I could still return a reference, but the user would no longer have access to my object without going through my methods.
By the way, I like using wantarray which gives the caller a choice on how they want their data:
my %hash_counts = %{ $self->{COUNT_HASH} };
return want array ? %hash_counts : \%hash_counts;
This allows me to return a reference or a hash depending how the user called my object:
my %hash_counts = $object->totals(); # Returns a hash
my $hash_counts_ref = $object->totals(); # Returns a reference to a hash
1 A footnote: The #_ array is pointing to the same memory location as the parameters of your calling subroutine. Thus, if I pass in foo(#array) and then did $_[1] = "foo";, I would be changing the second element of #array.

Are Perl subroutines call-by-reference or call-by-value?

I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.

What is the meaning of #_ in Perl?

What is the meaning of #_ in Perl?
perldoc perlvar is the first place to check for any special-named Perl variable info.
Quoting:
#_: Within a subroutine the array #_ contains the parameters passed to that subroutine.
More details can be found in perldoc perlsub (Perl subroutines) linked from the perlvar:
Any arguments passed in show up in the
array #_ .
Therefore, if you called a function with two arguments, those
would be stored in $_[0] and $_[1].
The array #_ is a local array, but its
elements are aliases for the actual scalar parameters.
In particular, if
an element $_[0] is updated, the
corresponding argument is updated (or
an error occurs if it is not
updatable).
If an argument is an array
or hash element which did not exist
when the function was called, that
element is created only when (and if)
it is modified or a reference to it is
taken. (Some earlier versions of Perl
created the element whether or not the
element was assigned to.) Assigning to
the whole array #_ removes that
aliasing, and does not update any
arguments.
Usually, you expand the parameters passed to a sub using the #_ variable:
sub test{
my ($a, $b, $c) = #_;
...
}
# call the test sub with the parameters
test('alice', 'bob', 'charlie');
That's the way claimed to be correct by perlcritic.
First hit of a search for perl #_ says this:
#_ is the list of incoming parameters to a sub.
It also has a longer and more detailed explanation of the same.
The question was what #_ means in Perl. The answer to that question is that, insofar as $_ means it in Perl, #_ similarly means they.
No one seems to have mentioned this critical aspect of its meaning — as well as theirs.
They’re consequently both used as pronouns, or sometimes as topicalizers.
They typically have nominal antecedents, although not always.
You can also use shift for individual variables in most cases:
$var1 = shift;
This is a topic in which you should research further as Perl has a number of interesting ways of accessing outside information inside your sub routine.
All Perl's "special variables" are listed in the perlvar documentation page.
Also if a function returns an array, but the function is called without assigning its returned data to any variable like below. Here split() is called, but it is not assigned to any variable. We can access its returned data later through #_:
$str = "Mr.Bond|Chewbaaka|Spider-Man";
split(/\|/, $str);
print #_[0]; # 'Mr.Bond'
This will split the string $str and set the array #_.
# is used for an array.
In a subroutine or when you call a function in Perl, you may pass the parameter list. In that case, #_ is can be used to pass the parameter list to the function:
sub Average{
# Get total number of arguments passed.
$n = scalar(#_);
$sum = 0;
foreach $item (#_){
# foreach is like for loop... It will access every
# array element by an iterator
$sum += $item;
}
$average = $sum / $n;
print "Average for the given numbers: $average\n";
}
Function call
Average(10, 20, 30);
If you observe the above code, see the foreach $item(#_) line... Here it passes the input parameter.
Never try to edit to #_ variable!!!! They must be not touched.. Or you get some unsuspected effect. For example...
my $size=1234;
sub sub1{
$_[0]=500;
}
sub1 $size;
Before call sub1 $size contain 1234. But after 500(!!) So you Don't edit this value!!! You may pass two or more values and change them in subroutine and all of them will be changed! I've never seen this effect described. Programs I've seen also leave #_ array readonly. And only that you may safely pass variable don't changed internal subroutine
You must always do that:
sub sub2{
my #m=#_;
....
}
assign #_ to local subroutine procedure variables and next worked with them.
Also in some deep recursive algorithms that returun array you may use this approach to reduce memory used for local vars. Only if return #_ array the same.