Demystifying the Perl glob (*) - perl

In this question the poster asked how to do the following in one line:
sub my_sub {
my $ref_array = shift;
my #array = #$ref_array;
}
which with my knowledge of the basic Perl magic I would avoid by simply using something like:
sub my_sub {
my $ref_array = shift;
for (#$ref_array) {
#do somthing with $_ here
};
#use $ref_array->[$element] here
}
However in this answer one of SO's local monks tchrist suggested:
sub my_sub {
local *array = shift();
#use #array here
}
When I asked
In trying to learn the mid-level Perl
magic, can I ask, what is it that you
are setting to what here? Are you
setting a reference to #array to the
arrayref that has been passed in? How
do you know that you create #array and
not %array or $array? Where can I
learn more about this * operator
(perlop?). Thanks!
I was suggested to ask it as a new post, though he did give nice references. Anyway, here goes? Can someone please explain what gets assigned to what and how come #array gets created rather than perhaps %array or $array? Thanks.

Assignment to a glob
*glob = VALUE
contains some magic that depends on the type of VALUE (i.e., return value of, say, Scalar::Util::reftype(VALUE)). If VALUE is a reference to a scalar, array, hash, or subroutine, then only that entry in the symbol table will be overwritten.
This idiom
local *array = shift();
#use #array here
works as documented when the first argument to the subroutine is an array reference. If the first argument was instead, say, a scalar reference, then only $array and not #array would be affected by the assignment.
A little demo script to see what is going on:
no strict;
sub F {
local *array = shift;
print "\#array = #array\n";
print "\$array = $array\n";
print "\%array = ",%array,"\n";
print "------------------\n";
}
$array = "original scalar";
%array = ("original" => "hash");
#array = ("orignal","array");
$foo = "foo";
#foo = ("foo","bar");
%foo = ("FOO" => "foo");
F ["new","array"]; # array reference
F \"new scalar"; # scalar reference
F {"new" => "hash"}; # hash reference
F *foo; # typeglob
F 'foo'; # not a reference, but name of assigned variable
F 'something else'; # not a reference
F (); # undef
Output:
#array = new array
$array = original scalar
%array = originalhash
------------------
#array = orignal array
$array = new scalar
%array = originalhash
------------------
#array = orignal array
$array = original scalar
%array = newhash
------------------
#array = foo bar
$array = foo
%array = FOOfoo
------------------
#array = foo bar
$array = foo
%array = FOOfoo
------------------
#array =
$array =
%array =
------------------
#array = orignal array
$array = original scalar
%array = originalhash
------------------
Additional doc at perlmod and perldata. Back in the days before references were a part of Perl, this idiom was helpful for passing arrays and hashes into subroutines.

With my admittedly less-than-wizard knowledge of Perl, I'll venture an answer. The * operator assigns the symbol table entry. As I understand it, #array, %array, and $array all refer to the same symbol table entry for the string 'array', but to different fields in that entry: the ARRAY, HASH, and SCALAR fields. So assigning local *array = shift; actually assigns the entire local symbol table entry for 'array' (including the ARRAY, HASH, and SCALAR fields) to what was passed used in the caller.

Related

perl send internal array as argument to subroutine

I have a 2 dimensional array like this:
$map[0][0] = 'a';
$map[0][1] = 'b';
$map[1][0] = 'c';
$map[1][1] = 'd';
I want to pass only everything under $map[1] (by reference) to a subroutine. How to do that ?
Perl doesn't have multiple dimension arrays.
What you have is an array and each element of that array is a reference to another array. You might want to read up about Perl References since this is the way Perl allows you to build some very complex data structures.
Many people think of it as a multidimensional array, and you could treat it as such under certain circumstances. However, I prefer the -> syntax which reminds me that this is merely a reference to a reference.
$map[0]->[0] = 'a';
$map[0]->[1] = 'b';
$map[1]->[0] = 'c';
$map[1]->[1] = 'd';
Now, I can take the data structure apart:
#map: This is an array with two items in it, $map[0] and $map[1].
$map[0]->[]: This is a reference to another array. That array also has to items in it.
$map[1]->[]: This is another reference to yet another array. That array has two items in it.
Note that $map[1]->[] means that $map[1] contains an array reference. Thqt means you can pass $map[1] as your reference to that inner array.
mysub ($map[1]);
Here's a simple program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
my #map;
$map[0]->[0] = 'a';
$map[0]->[1] = 'b';
$map[1]->[0] = 'c';
$map[1]->[1] = 'd';
mysub( $map[1] );
sub mysub {
my $array_ref = shift;
my #array = #{ $array_ref }; # Dereference your reference
for my $index ( 0..$#array ) {
say "\$map[1]->[$index] = $array[$index]";
}
}
This prints:
$map[1]->[0] = c
$map[1]->[1] = d
Now, you see why I like that -> syntax although it's really completely unnecessary. It helps remind me what I am dealing with.
You can send array reference,
sub mysub {
my ($aref) = #_;
# work with #$aref ..
}
mysub($map[1]);
Simply pass the scalar $map[1].
fn($map[1]);
sub fn
{
my #loc_map_1 = #{ $_[0] };
Remember that perl doesn't have "real" 2 dimensional arrays. In your case map is an array that contains 2 references to arrays.

Perl: Sub's arguments explain

In perl all sub's argument writting to #_ array, like this:
call_any_sub($a,$b,$c);
sub call_any_sub {
my $s_a = shift;
my $s_b = shift;
my $s_c = shift;
}
But, if i want to passed array as an argument to sub, i should use:
call_any_sub(#data_array);
sub call_any_sub {
my #data = #_;
}
Instead of similar:
call_any_sub(#data_array);
sub call_any_sub {
my #data = shift;
}
So, why #data_array replaces the array of arguments and not written in it (as expected)?
One can only pass a list of scalars to a subroutine (and that's all they can return). After all, the arguments are presented to the sub as an array (#_), and arrays can only contains scalars.
You can either (inefficiently) recreate the array in the sub
sub foo {
my #bars = #_;
say for #bars;
}
foo(#bars);
or you can pass a reference to the array
sub foo {
my ($bars) = #_;
say for #$bars;
}
foo(\#bars);
You need to understand what shift does.
The shift/unshift pair of commands are parallel to the pop/push pair of commands. All of these commands operate on arrays. By default, shift (and only shift) assumes the #_ array when called in a subroutine and #ARGV when called in the main program. This means the following two statements are identical in a subroutine:
my $foo = shift #_; # Explicit Argument
my $foo = shift # Implicit Argument
Perl's parameter passing is an interesting concept because it doesn't really do named parameter passing like almost all other programs. Instead, everything is passed as one long list of scalars. This makes it hard when you aren't passing in a scalar.
It works okay if I am only passing in a single hash or array:
munge_hash ( %foo );
sub munge_hash {
my %hash = #_;
...
}
And, you have to be careful if you're passing in multiple arguments and an array. In this case, the array must be the last in your list of arguments:
my $foo = "floop";
my $bar = "bloop";
my #array = qw(loop coop soop);
munge_this ( $foo, $bar, #array );
sub munge_this {
say join ":", #_; # Prints "floop:bloop:loop:coop:soop"
my $var1 = shift # floop
my $var2 = shift # bloop
my #arry = #_ # The rest is the array passed.
However, things really fall apart if you're passing in multiple arrays or hashes. All of the elements get merged into a single list of scalars represented by #_.
munge_two_arrays ( #foo, #bar );
sub munge_two_arrays {
# Problem! Elements of both arrays are in #_.
# How do I separate them out?
}
Thus, it is common not to pass in a whole array, but an array reference:
munge_two_arrays( \#foo, \#bar ); # These are array references
sub munge_two_arrays {
my $array1_ref = shift;
my $array2_ref = shift;
my #array1 = #{ $array1_ref } # Dereference array references to make arrays
my #array2 = #{ $array2_ref } # Dereference array references to make arrays
}
This keeps the values of the two arrays from getting merged into a single #_.

Updated: Initialise and Clear multiple hash in one line

How can I Initialise and clear multiple hash in one line.
Ex:
my %hash1 = ();
my %hash2 = ();
my %hash3 = ();
my %hash4 = ();
to
my ( %hash1, %hash2, %hash3, %hash4 ) = ?
It appears (from your comments) that you really want to empty hashes that already have stuff in them. You can do it like this:
(%hash1,%hash2,%hash3) = ();
Complete example:
use strict;
use warnings;
my %hash1 = ('foo' => 1);
my %hash2 = ('bar' => 1);
my %hash3 = ('baz' => 1);
(%hash1,%hash2,%hash3) = ();
print (%hash1,%hash2,%hash3);
A variable declaration always gives you an empty variable, so there is no need to set it to empty. This is true even in a loop:
for (0..100)
{
my $x;
$x++;
print $x;
}
This will print 1 over and over; even though you might expect $x to retain its value, it does not.
Explanation: Perl allows list assignment like ($foo,$bar) = (1,2). If the list on the right is shorter, any remaining elements get assigned undef. Thus assigning the empty list to a list of variables makes them all undefined.
Another useful way to set a bunch of things is the x operator:
my ($x,$y,$z) = (100)x3;
This sets all three variables to 100. It doesn't work so well for hashes, though, because each one needs a list assigned to it.
It's as simple as doing
my ( %hash1, %hash2, %hash3, %hash4 );
and they will not contain any keys or values at that point.
The same technique applies to scalars and arrays.
To undef multiple hashes, you could do
undef %$_ for ( \%hash1, \%hash2 );
You can initialize it as:
my %hash1 = %hash2 = %hash3 = %hash4 = ();
You do not need to assign anything to a new variable in order to assure it is empty. All variables are empty, if nothing has been assigned to them.
my %hash; # hash contains nothing
%hash = () # hash still contains nothing
The only time it would be useful to assign the empty list to a hash is if you want to remove previously assigned values. And even then, that would only be a useful thing to do if it could not already be solved by applying the correct scope restriction to the hash.
my (%hash1, %hash2);
while (something) {
# some code
(%hash1,%hash2) = (); # delete old values
}
Emptying the hashes. Better written as:
while (something) {
my (%hash1, %hash2); # all values are now local to the while loop
# some code
}

Assigning multiple Values to a hash Reference - Array

i declared the following sub (In reality, the values come out of the Database - so i simplified it):
sub get_date {
my ($ref_last)=#_;
$$ref_last->{duration}='24,0,4';
($$ref_last->{duration}->{d},
$$ref_last->{duration}->{h},
$$ref_last->{duration}->{m})
= split(/\,/, $$ref_last->{duration});
}
This sub is called from the main-Part of the script, like this:
my $hashy;
get_date(\$hashy);
print $hashy->{duration}->{d};
Everything ist fine, and works like a charm, until i use strict:
use strict;
my $hashy;
get_date(\$hashy);
print $hashy->{duration}->{d};
in this case perl says "Can't use string ("24,0,4") as a HASH ref while "strict refs" in use"
I already tried ref($ref_last) - but ref is a read-only function.
Any suggestions, why this happens - and perhaps a better solution ?
Here's the full (non)-Working script:
#!/usr/bin/perl -w
use strict;
my $hashy;
get_date(\$hashy);
print $hashy->{duration}->{d};
sub get_date {
my ($ref_last)=#_;
$$ref_last->{duration}='24,0,4';
($$ref_last->{duration}->{d},
$$ref_last->{duration}->{h},
$$ref_last->{duration}->{m})
= split(/\,/, $$ref_last->{duration});
}
Based on comments, you're trying to change the format of an existing hash value (from «24,0,4» to «{ d=>24, h=>0, m=>4 }»). Here's how I'd do it.
sub split_duration { # Changes in-place.
my ($duration) = #_;
my %split;
#split{qw( d h m )} = split(/,/, $duration);
$_[0] = \%split;
}
my $row = $sth->fetchrow_hashref();
split_duration( $row->{duration} );
or
sub split_duration {
my ($duration) = #_;
my %split;
#split{qw( d h m )} = split(/,/, $duration);
return \%split;
}
my $row = $sth->fetchrow_hashref();
$row->{duration} = split_duration( $row->{duration} );
Explanation of the problem and initial solutions below.
Without strict, 24,0,4 was treated as a hash reference, which means Perl was creating a variable named $24,0,4!!! That's bad, which is why use strict 'refs'; prevents it.
The underlying problem is your attempt to assign two values to $$ref_last->{duration}: a string
'24,0,4'
and a reference to a hash
{ d => 24, h => 0, m => 4 }
It can't hold both. You need to rearrange your data.
I suspect you don't actually use 24,0,4 after you split it, so you could fix the code as follows:
sub get_date {
my ($ref_last)=#_;
my $duration = '24,0,4';
#{ $$ref_last->{duration} }{qw( d h m )} =
split(/,/, $duration);
}
If you need 24,0,4, you can reconstruct it. Or maybe, you can store the combined duration along with d,h,m.
sub get_date {
my ($ref_last)=#_;
my $duration = '24,0,4';
$$ref_last->{duration}{full} = $duration;
#{ $$ref_last->{duration} }{qw( d h m )} =
split(/,/, $duration);
}
Or in a separate elements of the higher up hash.
sub get_date {
my ($ref_last)=#_;
my $duration = '24,0,4';
$$ref_last->{full_duration} = $duration;
#{ $$ref_last->{duration} }{qw( d h m )} =
split(/,/, $duration);
}
Inside get_date, you assign a string to $ref_last->{duration} but then attempt to access it like a hashref. You also have extra dollar signs that attempt to dereference individual values plucked from the hash.
I would write it as
sub get_date {
my($ref_last) = #_;
my $duration = '24,0,4';
#{ $ref_last->{duration} }{qw/ d h m /} = split /\,/, $duration;
}
The last line is a hash slice that allows you to assign values to the d, h, and m keys in a single list-assignment.
In the context of the caller, you need to set up a bit of scaffolding.
my $hashy = {};
get_date($hashy);
Without initializing $hashy to contain a new empty hashref, get_date does all its assignments and then throws away newly-built edifice. This is because when you copy parameters out of #_, you are using pass-by-value semantics.
Perl will accommodate pass-by-reference as well. Perl has a feature known as autovivification where the language builds necessary scaffolding for you on demand. To use that style, you would write
my $hashy;
get_date($hashy);
sub get_date {
my($ref_last) = #_;
my $duration = '24,0,4';
#{ $_[0]->{duration} }{qw/ d h m /} = split(/\,/, $duration);
}
Note the use of $_[0] to directly access the first parameter, which is an alias to $hashy in this case. That is, get_date modifies $hashy directly.
Either way, say we print the contents with
print "[", join("][" => %{ $hashy->{duration} }), "]\n";
in which case the output is some permutation of
[h][0][m][4][d][24]
Building complex data structures with Perl isn’t difficult, but you have to learn the rules.
Perl references and nested data structures, man perlref
Perl Data Structures Cookbook, perldoc perldsc
Manipulating Arrays of Arrays in Perl, perldoc perllol
This happens because you have a weird syntax for your hash reference.
#!/usr/bin/perl -w
use strict;
my $hashref = {};
get_date($hashref);
print $hashref->{duration}->{d};
sub get_date {
my ($ref_last) = #_;
$tmp = '24,0,4';
($ref_last->{duration}->{d},
$ref_last->{duration}->{h},
$ref_last->{duration}->{m})
= split(/,/, $tmp);
}
and in your subroutine use $ref_last->{duration}, without $$.

Dereferencing a list reference in hash element

Can someone finish this for me and explain what you did?
my %hash;
#$hash{list_ref}=[1,2,3];
#my #array=#{$hash{list_ref}};
$hash{list_ref}=\[1,2,3];
my #array=???
print "Two is $array[1]";
#array = #{${$hash{list_ref}}};
(1,2,3) is a list.
[1,2,3] is a reference to a list an array (technically, there's no such thing in Perl as a reference to a list).
\[1,2,3] is a reference to a reference to an array.
$hash{list_ref} is a reference to a reference to an array.
${$hash{list_ref}} is a reference to an array.
#{${$hash{list_ref}}} is an array.
Since a reference is considered a scalar, a reference to a reference is a scalar reference, and the scalar dereferencing operator ${...} is used in the middle step.
Others have pretty much already answered the question, but more generally, if you are ever confused about a data structure, use Data::Dumper. This will print out the structure of the mysterious blob of data, and help you parse it.
use strict; #Always, always, always
use warnings; #Always, always, always
use feature qw(say); #Nicer than 'print'
use Data::Dumper; #Calling in the big guns!
my $data_something = \[1,2,3];
say Dumper $data_something;
say Dumper ${ $data_something };
Let's see what it prints out...
$ test.pl
$VAR1 = \[
1,
2,
3
];
$VAR1 = [
1,
2,
3
];
From the first dump, it appears that $data_something is a plain scalar reference to an array reference. That lead me to add the second Dumper after I ran the program the first time. That showed me that ${ $data_something } is now a reference to an array.
I can now access that array like this:
use strict; #Always, always, always
use warnings; #Always, always, always
use feature qw(say); #Nicer than 'print'
use Data::Dumper; #Calling in the big guns!
my $data_something = \[1,2,3];
# Double dereference
my #array = #{ ${ $data_something } }; #Could be written as #$$data_something
for my $element (#array) {
say "Element is $element";
}
And now...
$ test.pl
Element is 1
Element is 2
Element is 3
It looks like you meant:
my $hash{list_ref} = [1,2,3];
and not:
$hash{list_ref} = \[1,2,3];
That latter one got you an scalar reference of a array reference which really doesn't do you all that much good except add confusion to the situation.
Then, all you had to do to refer to a particular element is $hash{list_ref}->[0]. This is just a shortcut for ${ $hash{list_ref} }[0]. It's easier to read and understand.
use strict;
use warnings;
use feature qw(say);
my %hash;
$hash{list_ref} = [1, 2, 3];
foreach my $element (0..2) {
say "Element is " . $hash{list_ref}->[$element];
}
And...
$ test.pl
Element is 1
Element is 2
Element is 3
So, next time you are confused about what a particular data structure looks like (and it happens to the best of us. Well... not the best of us, It happens to me), use Data::Dumper.
my %hash;
#$hash{list_ref}=[1,2,3];
#Putting the list in square brackets makes it a reference so you don't need '\'
$hash{list_ref}=[1,2,3];
#If you want to use a '\' to make a reference it is done like this:
# $something = \(1,2,3); # A reference to a list
#
# or (as I did above)...
#
# $something = [1,2,3]; # Returns a list reference
#
# They are the same (for all intent and purpose)
print "Two is $hash{list_ref}->[1]\n";
# To make it completely referenced do this:
#
# $hash = {};
# $hash->{list_ref} = [1,2,3];
#
# print $hash->{list_ref}[1] . "\n";
To get at the array (as an array or list) do this:
my #array = #{ $hash{list_ref} }
[ EXPR ]
creates an anonymous array, assigns the value returned by EXPR to it, and returns a reference to it. That means it's virtually the same as
do { my #anon = ( EXPR ); \#anon }
That means that
\[ EXPR ]
is virtually the same as
do { my #anon = ( EXPR ); \\#anon }
It's not something one normally sees.
Put differently,
1,2,3 returns a list of three elements (in list context).
(1,2,3) same as previous. Parens simply affect precedence.
[1,2,3] returns a reference to an array containing three elements.
\[1,2,3] returns a reference to a reference to an array containing three elements.
In practice:
my #data = (1,2,3);
print #data;
my $data = [1,2,3]; $hash{list_ref} = [1,2,3];
print #{ $data }; print #{ $hash{list_ref} };
my $data = \[1,2,3]; $hash{list_ref} = \[1,2,3];
print #{ ${ $data } }; print #{ ${ $hash{list_ref} } };