Perl: Anonymous Multi-Dimensional Arrays - perl

NOTE: See the end of this post for final explanation.
This is probably a very basic question, but I'm still trying to master a few of the fundamentals regarding references in Perl, and came across something in the perldsc page that I'd like to confirm. The following code is in the Generating Array of Arrays section:
while ( <> ) {
push #AoA, [ split ];
}
Obviously, the <> operation in the while loop reads one line of input in at a time. I am assuming at this point that line is then put into an anonymous array via the [ ] brackets, we'll call this #zero. Then the split command places everything in a given line separated by whitespace within the array (e.g., the first word is assigned to $zero[0], the second to $zero[1] and so on). The scalar reference of #zero is then pushed onto #AoA.
The next line of input is passed via the <> operator and gets assigned to a completely new anonymous array (e.g. #one), and its scalar reference is pushed onto #AoA.
Once #AoA is populated, I could then access its contents with a nested foreach loop; the first iterating through the "rows" (e.g. for $row (#AoA)), and a second, inner loop, foreach to access the columns of that particular row.
The latter (accessing said "columns" would be done by dereferencing (e.g., for $column (#$row)) the particular $row being read by the previous, "outer" foreach loop.
Is my understanding correct? I'm assuming you could still access any element of the #AoA just as you would if it were assigned vs. being anonymous? That is $element = $AoA[8][1]; .
I'm want to verify my thought process here. Is the automatic declaration of a unique, anonymous array each time through the loop part of the autovivication in Perl? I guess that is what is throwing me off a bit. Thanks.
EDIT: Based on the comments below my understanding regarding the anonymous array is still unclear, so I want to take a shot at one more description to see if it meets everyone's understanding.
Starting with the push #AoA, [split]; statement, split takes in the line from $_ and returns a list parsed by whitepace. That list is captured by [ ], which then returns an array reference. That array reference (created by [ ]) is then pushed onto #AoA. Is this accurate re: [ ]? The next step (dereferencing / use of #AoA) was covered very well by #krico below.
FINAL ANSWER/EXPLANATION: Based on all of the comments / feedback here, some further research on my part, and testing it seems my understanding was correct. I'll break it down here, so others can easily reference it later. See #krico's response below for a more explicit code representation that follows the steps outlined here.
while ( <> ) {
push #AoA, [ split ];
}
One line of input is passed at a time to the <> operator
The split function takes that line in via $_ and parses it based on whitespace (the default).
split then returns a LIST.
The [ ] is an anonymous array that provides the perl data structure for the List passed by split.
The push #AoA pushes the reference to the anonymous array onto its queue as element $AoA[0] (the second anonymous array reference will be put into $AoA1, etc...).
This continues through the entire input file. Once completed, #AoA is a 2D array, holding reference values (scalar values) to each of the previously generated anonymous arrays.
From this point #AoA can be dereferenced appropriately to work with the underlying/reference elements taken in from the input file. The default dereferencing technique is CIRCUMFIX (see perlfef below); however as of 5.19 a new method of dereferencing is available and will be released in 5.20, POSTFIX. Articles are linked below.
References: Perl References Documentation, Perl References Tutorial, Perl References Question noted by #Eli Hubert, Mike Friedman's blog post about differences between arrays and lists, Upcoming Postfix dereferencing in Perl, and Postfix dereferencing Article

This is what is going on:
The <> will put the line into the default variable $_
The split function will read $_ and return an array
The [ ] brackets will return a scalar, in it there will be a reference to that array
That reference is then pushed into the #AoA array
When you do $AoA[8][2] you are implicitly dereferencing the scalar. It's the same as $AoA[8]->[2].

The same code a little more readeable and you should understand it.
my $line;
while ( $line = <STDIN> ) {
my #parts = split $line;
my $partsRef = \#parts;
push #AoA, $partsRef;
}
Now, if you wanted to print the 2nd part of the 5th line you could say.
my $ref = #AoA[4];
my #parts = #$ref;
print $parts[1];
Get it?

Related

While loop and diamond operator in Perl

I am trying to input a text file to Perl program and reverse its order of lines i.e. last line will become first, second last will become second etc. I am using following code
#!C:\Perl64\bin
$k = 0;
while (<>){
print "the value of i is $i";
#array[k] = $_;
++$k;
}
print "the array is #array";
But for some reason, my array is only printing the last line of the text file.
Any suggestions?
Typically, rather than keep a separate array index, perl programs use the push operator to push a string onto an array. One way to do this in your program:
push #array, $_;
If you really want to do it by array index, then you need to use the following syntax:
$array[$k] = $_;
Notice the $ rather than # in front. This tells perl that you're dealing with a single element from the array, not multiple elements. #array gives you the entire array, while $array[$k] gives you a single element. (There is a more advanced topic called "slices," but let's not get into that here. I will say that #array[$k] gives you a slice, and that isn't what you want here.)
If you really just want to slurp the entire file into an array, you can do that in one step:
#array = ( <> );
That will read the entire file into #array in one step.
You might have noticed I omitted/ignored your print statement. I'm not sure what it's doing printing out a variable named $i, since it didn't seem connected at all to the rest of the code. I reasoned it was debug code you had added, and not really relevant to the task at hand.
Anyway, that should get your input into #array. Now reversing the array... There are many ways you could do this in perl, but I'll let you discover those yourself.
Instead of:
#array[k] = $_;
you want:
$array[$k] = $_;
To reference the scalar variable $k, you need the $ on the front. Without that it is interpreted as the literal string 'k', which when used as an array index would be interpreted as 0 (since a non-numeric string will be interpreted as 0 in a numeric context).
So, each time around the loop you are setting the first element to the line read in (overwriting the value set in the previous iteration).
A few other tips:
#array[ ] is actually the syntax for an array slice rather than a single element. It works in this case because you are assigning to a slice of 1. The usual syntax for accessing a single element would be $array[ ].
I recommend placing 'use strict;' at the top of your script - you would have gotten an error pointing out the incorrect reference to $k
Instead of using an index variable, you could push the values onto the end of the array, eg:
while (<>) {
push #array, $_;
}
Accept input until it finds the word end
Solution1
#!/usr/bin/perl
while(<>) {
last if $_=~/end/i;
push #array,$_;
}
for (my $i=scalar(#array);$i>=0;$i--){
print pop #array;
}
Solution2
while(<>){
last if $_=~/end/i;
push #array,$_;
}
print reverse(#array);

A better variable naming system?

A newbie to programming. The task is to extract a particular data from a string and I chose to write the code as follows -
while ($line =<IN>) {
chomp $line;
#tmp=(split /\t/, $line);
next if ($tmp[0] !~ /ch/);
#tgt1=#tmp[8..11];
#tgt2=#tmp[12..14];
#tgt3=#tmp[15..17];
#tgt4=#tmp[18..21];
foreach (1..4) {
print #tgt($_), "\n";
}
I thought #tgt($_) would be interpreted as #tgt1, #tgt2, #tgt3, #tgt4 but I still get the error message that #tgt is a global symbol (#tgt1, #tgt2, #tgt3, #tgt4` have been declared).
Q1. Did I misunderstand foreach loop?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
I'll asnswer both together.
#tgt($_) does NOT mean what you hope it means
First off, it's an invalid syntax (you can't use () after an array name, perl interpeter will produce a compile error).
What you're trying to do is access distinct variables by accessing a variable via an expression resulting in its name (aka symbolic references). This IS possible to do; but is typically a bad idea and poor-style Perl (as in, you CAN but you SHOULD NOT do it, without a very very good reason).
To access element $_ the way you tried, you use #{"tgt$_"} syntax. But I repeat - Do Not Do That, even if you can.
A correct idiomatic solution: use an array of arrayrefs, with your 1-4 (or rather 0-3) indexing the outer array:
# Old bad code: #tgt1=#tmp[8..11];
# New correct code:
$tgt[0]=[ #tmp[8..11] ]; # [] creates an array reference from a list.
# etc... repeat 4 times - you can even do it in a smart loop later.
What this does is, it stores a reference to an array slice into a zeroth element of a single #tgt array.
At the end, #tgt array has 4 elements , each an array reference to an array containing one of the slices.
Q1. Did I misunderstand foreach loop?
Your foreach loop (as opposed to its contents - see above) was correct, with one style caveat - again, while you CAN use a default $_ variable, you should almost never use it, instead always use named variables for readability.
You print the abovementioned array of arrayrefs as follows (ask separately if any of the syntax is unclear - this is a mid-level data structure handling, not for beginners):
foreach my $index (0..3) {
print join(",", #{ $tgt[$index]}) . "\n";
}

Is there any advantage to using keys #array instead of 0 .. $#array?

I was quite surprised to find that the keys function happily works with arrays:
keys HASH
keys ARRAY
keys EXPR
Returns a list consisting of all the keys of the named hash, or the
indices of an array. (In scalar context, returns the number of keys or
indices.)
Is there any benefit in using keys #array instead of 0 .. $#array with respect to memory usage, speed, etc., or are the reasons for this functionality more of a historic origin?
Seeing that keys #array holds up to $[ modification, I'm guessing it's historic :
$ perl -Mstrict -wE 'local $[=4; my #array="a".."z"; say join ",", keys #array;'
Use of assignment to $[ is deprecated at -e line 1.
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
Mark has it partly right, I think. What he's missing is that each now works on an array, and, like the each with hashes, each with arrays returns two items on each call. Where each %hash returns key and value, each #array also returns key (index) and value.
while (my ($idx, $val) = each #array)
{
if ($idx > 0 && $array[$idx-1] eq $val)
{
print "Duplicate indexes: ", $idx-1, "/", $idx, "\n";
}
}
Thanks to Zaid for asking, and jmcnamara for bringing it up on perlmonks' CB. I didn't see this before - I've often looped through an array and wanted to know what index I'm at. This is waaaay better than manually manipulating some $i variable created outside of a loop and incremented inside, as I expect that continue, redo, etc., will survive this better.
So, because we can now use each on arrays, we need to be able to reset that iterator, and thus we have keys.
The link you provided actually has one important reason you might use/not use keys:
As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
That would cause each to reset to the beginning of the array. Using keys and each with arrays might be important if they ever natively support sparse arrays as a real data-type.
All that said, with so many array-aware language constructs like foreach and join in perl, I can't remember the last time I used 0..$#array.
I actually think you've answered your own question: it returns the valid indices of the array, no matter what value you've set for $[. So from a generality point of view (especially for library usage), it's more preferred.
The version of Perl I have (5.10.1) doesn't support using keys with arrays, so it can't be for historic reasons.
Well in your example, you are putting them in a list; So, in a list context
keys #array will be replaced with all elements of array
whereas 0 .. $#array will do the same but as array slicing; So, instead $array[0 .. $#array] you can also mention $array[0 .. (some specific index)]

Are Perl subroutines call-by-reference or call-by-value?

I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.

What is the meaning of #_ in Perl?

What is the meaning of #_ in Perl?
perldoc perlvar is the first place to check for any special-named Perl variable info.
Quoting:
#_: Within a subroutine the array #_ contains the parameters passed to that subroutine.
More details can be found in perldoc perlsub (Perl subroutines) linked from the perlvar:
Any arguments passed in show up in the
array #_ .
Therefore, if you called a function with two arguments, those
would be stored in $_[0] and $_[1].
The array #_ is a local array, but its
elements are aliases for the actual scalar parameters.
In particular, if
an element $_[0] is updated, the
corresponding argument is updated (or
an error occurs if it is not
updatable).
If an argument is an array
or hash element which did not exist
when the function was called, that
element is created only when (and if)
it is modified or a reference to it is
taken. (Some earlier versions of Perl
created the element whether or not the
element was assigned to.) Assigning to
the whole array #_ removes that
aliasing, and does not update any
arguments.
Usually, you expand the parameters passed to a sub using the #_ variable:
sub test{
my ($a, $b, $c) = #_;
...
}
# call the test sub with the parameters
test('alice', 'bob', 'charlie');
That's the way claimed to be correct by perlcritic.
First hit of a search for perl #_ says this:
#_ is the list of incoming parameters to a sub.
It also has a longer and more detailed explanation of the same.
The question was what #_ means in Perl. The answer to that question is that, insofar as $_ means it in Perl, #_ similarly means they.
No one seems to have mentioned this critical aspect of its meaning — as well as theirs.
They’re consequently both used as pronouns, or sometimes as topicalizers.
They typically have nominal antecedents, although not always.
You can also use shift for individual variables in most cases:
$var1 = shift;
This is a topic in which you should research further as Perl has a number of interesting ways of accessing outside information inside your sub routine.
All Perl's "special variables" are listed in the perlvar documentation page.
Also if a function returns an array, but the function is called without assigning its returned data to any variable like below. Here split() is called, but it is not assigned to any variable. We can access its returned data later through #_:
$str = "Mr.Bond|Chewbaaka|Spider-Man";
split(/\|/, $str);
print #_[0]; # 'Mr.Bond'
This will split the string $str and set the array #_.
# is used for an array.
In a subroutine or when you call a function in Perl, you may pass the parameter list. In that case, #_ is can be used to pass the parameter list to the function:
sub Average{
# Get total number of arguments passed.
$n = scalar(#_);
$sum = 0;
foreach $item (#_){
# foreach is like for loop... It will access every
# array element by an iterator
$sum += $item;
}
$average = $sum / $n;
print "Average for the given numbers: $average\n";
}
Function call
Average(10, 20, 30);
If you observe the above code, see the foreach $item(#_) line... Here it passes the input parameter.
Never try to edit to #_ variable!!!! They must be not touched.. Or you get some unsuspected effect. For example...
my $size=1234;
sub sub1{
$_[0]=500;
}
sub1 $size;
Before call sub1 $size contain 1234. But after 500(!!) So you Don't edit this value!!! You may pass two or more values and change them in subroutine and all of them will be changed! I've never seen this effect described. Programs I've seen also leave #_ array readonly. And only that you may safely pass variable don't changed internal subroutine
You must always do that:
sub sub2{
my #m=#_;
....
}
assign #_ to local subroutine procedure variables and next worked with them.
Also in some deep recursive algorithms that returun array you may use this approach to reduce memory used for local vars. Only if return #_ array the same.