understanding data structures in perl - perl

I'm trying to understand the 'Common Mistake' section in the perldsc documentation. What is the author trying to convey when he mentions:
The two most common mistakes made in constructing something like an array of arrays is either accidentally counting the number of elements or else taking a reference to the same memory location repeatedly. Here's the case where you just get the count instead of a nested array:
for my $i (1..10) {
my #array = somefunc($i);
$AoA[$i] = #array; # WRONG!
}
From what I understand is that when it iterate it will take the first value of (1..10) which is 1 and will pass it to a function like this:
my #array = somefunc(1);
Since that function is not defined, I'll create the logic.
sub somefunc {
my $a = shift;
print $a * $a;
}
which will essentially do this:
1 * 1
and the result is '1'.
To my understanding my #array will look like:
#array = ('1');
And the next line will do:
$AoA[$i] = #array;
I'm assuming that $AoA[$1] is an anonymous array ( he/she didn't declare with 'my', btw) and the #array will be the first element of the this anonymous array which the author said it' WRONG. And the for each loop with iterate to '2'.
somefunc(2);
Which will be '4' and passed to:
$AoA[$i] = #array
What is the point of the author with this code which is wrong. I'm trying to understand what is wrong but more importantly, I'm trying to understand the code. Any help will be appreciated.
UPDATE
I think I understand why this is a common mistake because when I use print and Dumper, I can visually see what the author is trying to convey, here is the revised code.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
for my $i (1..10) {
my #AoA;
my #array = somefunc($i);
print "The array is Dumper(#array)\n";
$AoA[$i] = #array; # WRONG!
print Dumper($AoA[$i]);
}
sub somefunc {
my $a = shift;
return $a * $a;
}
In the Common Mistakes paragraph of perldoc perldsc, he/she states
Here's the case where you just get the count instead of a nested array:
Below is the output of the Dumper.
The array is Dumper(1)
$VAR1 = 1;
The array is Dumper(4)
$VAR1 = 1;
The array is Dumper(9)
$VAR1 = 1;
The array is Dumper(16)
$VAR1 = 1;
The array is Dumper(25)
$VAR1 = 1;
The array is Dumper(36)
$VAR1 = 1;
The array is Dumper(49)
$VAR1 = 1;
The array is Dumper(64)
$VAR1 = 1;
The array is Dumper(81)
$VAR1 = 1;
The array is Dumper(100)
$VAR1 = 1;
So I'm assuming that the repeated
$VAR1 = 1;
is the count and not the nested array.
The author did indicate that if the count is what I truly want then to rewrite the code like this:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
for my $i (1..10) {
my #count;
my #array = somefunc($i);
print "The array is Dumper(#array)\n";
$count[$i] = scalar #array;
print Dumper($count[$i]);
}
sub somefunc {
my $a = shift;
return $a * $a;
}
But the documentation didn't tell me how to get the nested array?
UPDATE
Correct me if I am wrong but I rewrote the code to get the nested array:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #count;
my #new_array;
for my $i (1..10) {
#my #count;
my #array = somefunc($i);
push #new_array, [#array];
}
sub somefunc {
my $a = shift;
return $a * $a;
}
print Dumper(\#new_array);
Which printed
$VAR1 = [
[
1
],
[
4
],
[
9
],
[
16
],
[
25
],
[
36
],
[
49
],
[
64
],
[
81
],
[
100
]
];

In the following statement:
$AoA[$i] = #array;
the #array is referenced in a scalar context, yielding a number of its elements. The context is imposed by LHS, that is $AoA[$i] which is a single element of the #AoA array.
In Perl, there are no array of arrays in a strict sense. These are emulated essentially by either "flatten" arrays or array with references. For the latter, you would need to use take reference operator as in:
$AoA[$i] = \#array;
For the starter, you may find, that Data::Dumper is very handy in examining complex data stuctures such as arrayrefs and hashrefs.

Perl is polymorphic, which means that it deals with different data types transparently, and makes what is usually a pretty good guess on how to deal with something. This makes the programmer's work much easier because it is not strongly typed like other languages.
So for example if $i is the number 4, you can do this:
print $i + 1;
and you will see a 5 - pretty logical, right?
and if you do this:
print "I am " , $i , " years old";
You will see "I am 4 years old" - in this case perl says "you are operating in list context, so I will treat $i as a string. No need to convert the number into a string as many other languages insist.
So when you assign
$AoA[$i] = #array;
The way it treats this depends on the context. In scalar context, it will set $AoA[$i] to be the length of the array.
For more information about scalar vs list context, read this answer:
http://perl.plover.com/context.html

Your example isn't very useful in understanding what is going on here as your subroutine always returns "1" - the result of calling print(). If you replace the print() with return() then you will at get different values (1, 4, 9, etc).
But the next line of code:
$AoA[$i] = #array;
Will always assign 1 to the element of #Aoa. That's because You are assigning an array (#array) to a scalar variable ($AoA[$i]) and when you evaluate an array in a scalar context, you get the number of elements in the array.
Now, as your #array only ever has a single element, you could do this:
$AoA[$i] = $array[0];
But that's not really building an array of arrays. What you really want to do is to get a reference to an array.
$AoA[$i] = \#array;
This would be more useful if your subroutine returned more than one value.
sub somefunc {
# Used $x instead of $a as $a has a special meaning in Perl
my $x = shift;
return ($x * $x, $x * $x * $x);
}
for my $i (1..10) {
my #array = somefunc($i);
$AoA[$i] = \#array;
}
As useful tool for exploring this is Data::Dumper. Try adding:
use Data::Dumper;
To the top of your code and:
print Dumper #AoA;
After the foreach loop to see the different data structures that you get back.

Related

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;
When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?
With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

Dereferencing a list reference in hash element

Can someone finish this for me and explain what you did?
my %hash;
#$hash{list_ref}=[1,2,3];
#my #array=#{$hash{list_ref}};
$hash{list_ref}=\[1,2,3];
my #array=???
print "Two is $array[1]";
#array = #{${$hash{list_ref}}};
(1,2,3) is a list.
[1,2,3] is a reference to a list an array (technically, there's no such thing in Perl as a reference to a list).
\[1,2,3] is a reference to a reference to an array.
$hash{list_ref} is a reference to a reference to an array.
${$hash{list_ref}} is a reference to an array.
#{${$hash{list_ref}}} is an array.
Since a reference is considered a scalar, a reference to a reference is a scalar reference, and the scalar dereferencing operator ${...} is used in the middle step.
Others have pretty much already answered the question, but more generally, if you are ever confused about a data structure, use Data::Dumper. This will print out the structure of the mysterious blob of data, and help you parse it.
use strict; #Always, always, always
use warnings; #Always, always, always
use feature qw(say); #Nicer than 'print'
use Data::Dumper; #Calling in the big guns!
my $data_something = \[1,2,3];
say Dumper $data_something;
say Dumper ${ $data_something };
Let's see what it prints out...
$ test.pl
$VAR1 = \[
1,
2,
3
];
$VAR1 = [
1,
2,
3
];
From the first dump, it appears that $data_something is a plain scalar reference to an array reference. That lead me to add the second Dumper after I ran the program the first time. That showed me that ${ $data_something } is now a reference to an array.
I can now access that array like this:
use strict; #Always, always, always
use warnings; #Always, always, always
use feature qw(say); #Nicer than 'print'
use Data::Dumper; #Calling in the big guns!
my $data_something = \[1,2,3];
# Double dereference
my #array = #{ ${ $data_something } }; #Could be written as #$$data_something
for my $element (#array) {
say "Element is $element";
}
And now...
$ test.pl
Element is 1
Element is 2
Element is 3
It looks like you meant:
my $hash{list_ref} = [1,2,3];
and not:
$hash{list_ref} = \[1,2,3];
That latter one got you an scalar reference of a array reference which really doesn't do you all that much good except add confusion to the situation.
Then, all you had to do to refer to a particular element is $hash{list_ref}->[0]. This is just a shortcut for ${ $hash{list_ref} }[0]. It's easier to read and understand.
use strict;
use warnings;
use feature qw(say);
my %hash;
$hash{list_ref} = [1, 2, 3];
foreach my $element (0..2) {
say "Element is " . $hash{list_ref}->[$element];
}
And...
$ test.pl
Element is 1
Element is 2
Element is 3
So, next time you are confused about what a particular data structure looks like (and it happens to the best of us. Well... not the best of us, It happens to me), use Data::Dumper.
my %hash;
#$hash{list_ref}=[1,2,3];
#Putting the list in square brackets makes it a reference so you don't need '\'
$hash{list_ref}=[1,2,3];
#If you want to use a '\' to make a reference it is done like this:
# $something = \(1,2,3); # A reference to a list
#
# or (as I did above)...
#
# $something = [1,2,3]; # Returns a list reference
#
# They are the same (for all intent and purpose)
print "Two is $hash{list_ref}->[1]\n";
# To make it completely referenced do this:
#
# $hash = {};
# $hash->{list_ref} = [1,2,3];
#
# print $hash->{list_ref}[1] . "\n";
To get at the array (as an array or list) do this:
my #array = #{ $hash{list_ref} }
[ EXPR ]
creates an anonymous array, assigns the value returned by EXPR to it, and returns a reference to it. That means it's virtually the same as
do { my #anon = ( EXPR ); \#anon }
That means that
\[ EXPR ]
is virtually the same as
do { my #anon = ( EXPR ); \\#anon }
It's not something one normally sees.
Put differently,
1,2,3 returns a list of three elements (in list context).
(1,2,3) same as previous. Parens simply affect precedence.
[1,2,3] returns a reference to an array containing three elements.
\[1,2,3] returns a reference to a reference to an array containing three elements.
In practice:
my #data = (1,2,3);
print #data;
my $data = [1,2,3]; $hash{list_ref} = [1,2,3];
print #{ $data }; print #{ $hash{list_ref} };
my $data = \[1,2,3]; $hash{list_ref} = \[1,2,3];
print #{ ${ $data } }; print #{ ${ $hash{list_ref} } };

Why does my Perl max() function always return the first element of the array?

I am relatively new to Perl and I do not want to use the List::Util max function to find the maximum value of a given array.
When I test the code below, it just returns the first value of the array, not the maximum.
sub max
{
my #array = shift;
my $cur = $array[0];
foreach $i (#array)
{
if($i > $cur)
{
$cur = $i;
}
else
{
$cur = $cur;
}
}
return $cur;
}
Replace
my #array = shift;
with
my #array = #_;
#_ is the array containing all function arguments. shift only grabs the first function argument and removes it from #_. Change that code and it should work correctly!
Why don't you want to use something that works?
One of the ways to solve problems like this is to debug your data structures. At each step you print the data you have to see if what you expect is actually in there. That can be as simple as:
print "array is [#array]\n";
Or for complex data structures:
use Data::Dumper;
print Dumper( \#array );
In this case, you would have seen that #array has only one element, so there it must be the maximum.
If you want to see how list assignment and subroutine arguments work, check out Learning Perl.
You can write the function as:
#!/usr/bin/perl
use strict; use warnings;
print max(#ARGV);
sub max {
my $max = shift;
$max >= $_ or $max = $_ for #_;
return $max;
}
However, it would be far more efficient to pass it a reference to the array and even more efficient to use List::Util::max.

Creating array from object function using map

I have an array of HTML::Elements obtained from HTML::TreeBuilder and HTML::Element->find and I need to assign their as_text value to some other variables. I know I can really easily do
my ($var1, $var2) = ($arr[0]->as_text, $arr[1]->as_text);
but I was hoping I could use map instead just to make the code a bit more readable as there are at least 8 elements in the array. I'm really new to Perl so I'm not quite sure what to do.
Can anyone point me in the right direction?
If you're well versed in perldoc -f map it's pretty clear:
my #as_texts = map { $_->as_text } #arr;
Works as well if you want to assign to a list of scalars:
my($var1, $var2, $var3, ...) = map { $_->as_text } #arr;
But of course the array version is better for an unknown number of elements.
Note that, if you just want to map the first two elements of #arr:
my($var1, $var2) = map { $_->as_text } #arr;
will invoke $_->as_text for all elements of #arr. In that case, use an array slice to avoid unnecessary calls:
my($var1, $var2) = map { $_->as_text } #arr[0 .. 1];
Example:
#!/usr/bin/perl
use strict;
use warnings;
my #arr = 'a' .. 'z';
my $count;
my ($x, $y) = map { $count++; ord } #arr;
print "$x\t$y\t$count\n";
$count = 0;
($x, $y) = map { $count++; uc } #arr[0 .. 1];
print "$x\t$y\t$count\n";
Output:
C:\Temp> jk
97 98 26
A B 2
ord was called for each element of #arr whereas uc was called for only the elements we were interested in.

Automatically get loop index in foreach loop in Perl

If I have the following array in Perl:
#x = qw(a b c);
and I iterate over it with foreach, then $_ will refer to the current element in the array:
foreach (#x) {
print;
}
will print:
abc
Is there a similar way to get the index of the current element, without manually updating a counter? Something such as:
foreach (#x) {
print $index;
}
where $index is updated like $_ to yield the output:
012
Like codehead said, you'd have to iterate over the array indices instead of its elements. I prefer this variant over the C-style for loop:
for my $i (0 .. $#x) {
print "$i: $x[$i]\n";
}
In Perl prior to 5.10, you can say
#!/usr/bin/perl
use strict;
use warnings;
my #a = qw/a b c d e/;
my $index;
for my $elem (#a) {
print "At index ", $index++, ", I saw $elem\n";
}
#or
for my $index (0 .. $#a) {
print "At index $index I saw $a[$index]\n";
}
In Perl 5.10, you use state to declare a variable that never gets reinitialized (unlike ones created with my). This lets you keep the $index variable in a smaller scope, but it can lead to bugs (if you enter the loop a second time it will still have the last value):
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #a = qw/a b c d e/;
for my $elem (#a) {
state $index;
say "At index ", $index++, ", I saw $elem";
}
In Perl 5.12 you can say
#!/usr/bin/perl
use 5.012; # This enables strict
use warnings;
my #a = qw/a b c d e/;
while (my ($index, $elem) = each #a) {
say "At index $index I saw $elem";
}
But be warned: you there are restrictions to what you are allowed to do with #a while iterating over it with each.
It won't help you now, but in Perl 6 you will be able to say
#!/usr/bin/perl6
my #a = <a b c d e>;
for #a Z 0 .. Inf -> $elem, $index {
say "at index $index, I saw $elem"
}
The Z operator zips the two lists together (i.e. it takes one element from the first list, then one element from the second, then one element from the first, and so on). The second list is a lazy list that contains every integer from 0 to infinity (at least theoretically). The -> $elem, $index says that we are taking two values at a time from the result of the zip. The rest should look normal to you (unless you are not familiar with the say function from 5.10 yet).
perldoc perlvar does not seem to suggest any such variable.
It can be done with a while loop (foreach doesn't support this):
my #arr = (1111, 2222, 3333);
while (my ($index, $element) = each(#arr))
{
# You may need to "use feature 'say';"
say "Index: $index, Element: $element";
}
Output:
Index: 0, Element: 1111
Index: 1, Element: 2222
Index: 2, Element: 3333
Perl version: 5.14.4
Not with foreach.
If you definitely need the element cardinality in the array, use a 'for' iterator:
for ($i=0; $i<#x; ++$i) {
print "Element at index $i is " , $x[$i] , "\n";
}
No, you must make your own counter. Yet another example:
my $index;
foreach (#x) {
print $index++;
}
when used for indexing
my $index;
foreach (#x) {
print $x[$index]+$y[$index];
$index++;
}
And of course you can use local $index; instead my $index; and so and so.
autobox::Core provides, among many more things, a handy for method:
use autobox::Core;
['a'..'z']->for( sub{
my ($index, $value) = #_;
say "$index => $value";
});
Alternatively, have a look at an iterator module, for example: Array::Iterator
use Array::Iterator;
my $iter = Array::Iterator->new( ['a'..'z'] );
while ($iter->hasNext) {
$iter->getNext;
say $iter->currentIndex . ' => ' . $iter->current;
}
Also see:
each to their own (autobox)
perl5i
Yes. I have checked so many books and other blogs... The conclusion is, there isn't any system variable for the loop counter. We have to make our own counter. Correct me if I'm wrong.
Oh yes, you can! (sort of, but you shouldn't). each(#array) in a scalar context gives you the current index of the array.
#a = (a..z);
for (#a) {
print each(#a) . "\t" . $_ . "\n";
}
Here each(#a) is in a scalar context and returns only the index, not the value at that index. Since we're in a for loop, we have the value in $_ already. The same mechanism is often used in a while-each loop. Same problem.
The problem comes if you do for(#a) again. The index isn't back to 0 like you'd expect; it's undef followed by 0,1,2... one count off. The perldoc of each() says to avoid this issue. Use a for loop to track the index.
each
Basically:
for(my $i=0; $i<=$#a; $i++) {
print "The Element at $i is $a[$i]\n";
}
I'm a fan of the alternate method:
my $index=0;
for (#a) {
print "The Element at $index is $a[$index]\n";
$index++;
}
Please consider:
print "Element at index $_ is $x[$_]\n" for keys #x;
Well, there is this way:
use List::Rubyish;
$list = List::Rubyish->new( [ qw<a b c> ] );
$list->each_index( sub { say "\$_=$_" } );
See List::Rubyish.
You shouldn't need to know the index in most circumstances. You can do this:
my #arr = (1, 2, 3);
foreach (#arr) {
$_++;
}
print join(", ", #arr);
In this case, the output would be 2, 3, 4 as foreach sets an alias to the actual element, not just a copy.
I have tried like....
#array = qw /tomato banana papaya potato/; # Example array
my $count; # Local variable initial value will be 0.
print "\nBefore For loop value of counter is $count"; # Just printing value before entering the loop.
for (#array) { print "\n",$count++," $_" ; } # String and variable seperated by comma to
# execute the value and print.
undef $count; # Undefining so that later parts again it will
# be reset to 0.
print "\nAfter for loop value of counter is $count"; # Checking the counter value after for loop.
In short...
#array = qw /a b c d/;
my $count;
for (#array) { print "\n",$count++," $_"; }
undef $count;