Creating array from object function using map - perl

I have an array of HTML::Elements obtained from HTML::TreeBuilder and HTML::Element->find and I need to assign their as_text value to some other variables. I know I can really easily do
my ($var1, $var2) = ($arr[0]->as_text, $arr[1]->as_text);
but I was hoping I could use map instead just to make the code a bit more readable as there are at least 8 elements in the array. I'm really new to Perl so I'm not quite sure what to do.
Can anyone point me in the right direction?

If you're well versed in perldoc -f map it's pretty clear:
my #as_texts = map { $_->as_text } #arr;
Works as well if you want to assign to a list of scalars:
my($var1, $var2, $var3, ...) = map { $_->as_text } #arr;
But of course the array version is better for an unknown number of elements.

Note that, if you just want to map the first two elements of #arr:
my($var1, $var2) = map { $_->as_text } #arr;
will invoke $_->as_text for all elements of #arr. In that case, use an array slice to avoid unnecessary calls:
my($var1, $var2) = map { $_->as_text } #arr[0 .. 1];
Example:
#!/usr/bin/perl
use strict;
use warnings;
my #arr = 'a' .. 'z';
my $count;
my ($x, $y) = map { $count++; ord } #arr;
print "$x\t$y\t$count\n";
$count = 0;
($x, $y) = map { $count++; uc } #arr[0 .. 1];
print "$x\t$y\t$count\n";
Output:
C:\Temp> jk
97 98 26
A B 2
ord was called for each element of #arr whereas uc was called for only the elements we were interested in.

Related

understanding data structures in perl

I'm trying to understand the 'Common Mistake' section in the perldsc documentation. What is the author trying to convey when he mentions:
The two most common mistakes made in constructing something like an array of arrays is either accidentally counting the number of elements or else taking a reference to the same memory location repeatedly. Here's the case where you just get the count instead of a nested array:
for my $i (1..10) {
my #array = somefunc($i);
$AoA[$i] = #array; # WRONG!
}
From what I understand is that when it iterate it will take the first value of (1..10) which is 1 and will pass it to a function like this:
my #array = somefunc(1);
Since that function is not defined, I'll create the logic.
sub somefunc {
my $a = shift;
print $a * $a;
}
which will essentially do this:
1 * 1
and the result is '1'.
To my understanding my #array will look like:
#array = ('1');
And the next line will do:
$AoA[$i] = #array;
I'm assuming that $AoA[$1] is an anonymous array ( he/she didn't declare with 'my', btw) and the #array will be the first element of the this anonymous array which the author said it' WRONG. And the for each loop with iterate to '2'.
somefunc(2);
Which will be '4' and passed to:
$AoA[$i] = #array
What is the point of the author with this code which is wrong. I'm trying to understand what is wrong but more importantly, I'm trying to understand the code. Any help will be appreciated.
UPDATE
I think I understand why this is a common mistake because when I use print and Dumper, I can visually see what the author is trying to convey, here is the revised code.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
for my $i (1..10) {
my #AoA;
my #array = somefunc($i);
print "The array is Dumper(#array)\n";
$AoA[$i] = #array; # WRONG!
print Dumper($AoA[$i]);
}
sub somefunc {
my $a = shift;
return $a * $a;
}
In the Common Mistakes paragraph of perldoc perldsc, he/she states
Here's the case where you just get the count instead of a nested array:
Below is the output of the Dumper.
The array is Dumper(1)
$VAR1 = 1;
The array is Dumper(4)
$VAR1 = 1;
The array is Dumper(9)
$VAR1 = 1;
The array is Dumper(16)
$VAR1 = 1;
The array is Dumper(25)
$VAR1 = 1;
The array is Dumper(36)
$VAR1 = 1;
The array is Dumper(49)
$VAR1 = 1;
The array is Dumper(64)
$VAR1 = 1;
The array is Dumper(81)
$VAR1 = 1;
The array is Dumper(100)
$VAR1 = 1;
So I'm assuming that the repeated
$VAR1 = 1;
is the count and not the nested array.
The author did indicate that if the count is what I truly want then to rewrite the code like this:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
for my $i (1..10) {
my #count;
my #array = somefunc($i);
print "The array is Dumper(#array)\n";
$count[$i] = scalar #array;
print Dumper($count[$i]);
}
sub somefunc {
my $a = shift;
return $a * $a;
}
But the documentation didn't tell me how to get the nested array?
UPDATE
Correct me if I am wrong but I rewrote the code to get the nested array:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #count;
my #new_array;
for my $i (1..10) {
#my #count;
my #array = somefunc($i);
push #new_array, [#array];
}
sub somefunc {
my $a = shift;
return $a * $a;
}
print Dumper(\#new_array);
Which printed
$VAR1 = [
[
1
],
[
4
],
[
9
],
[
16
],
[
25
],
[
36
],
[
49
],
[
64
],
[
81
],
[
100
]
];
In the following statement:
$AoA[$i] = #array;
the #array is referenced in a scalar context, yielding a number of its elements. The context is imposed by LHS, that is $AoA[$i] which is a single element of the #AoA array.
In Perl, there are no array of arrays in a strict sense. These are emulated essentially by either "flatten" arrays or array with references. For the latter, you would need to use take reference operator as in:
$AoA[$i] = \#array;
For the starter, you may find, that Data::Dumper is very handy in examining complex data stuctures such as arrayrefs and hashrefs.
Perl is polymorphic, which means that it deals with different data types transparently, and makes what is usually a pretty good guess on how to deal with something. This makes the programmer's work much easier because it is not strongly typed like other languages.
So for example if $i is the number 4, you can do this:
print $i + 1;
and you will see a 5 - pretty logical, right?
and if you do this:
print "I am " , $i , " years old";
You will see "I am 4 years old" - in this case perl says "you are operating in list context, so I will treat $i as a string. No need to convert the number into a string as many other languages insist.
So when you assign
$AoA[$i] = #array;
The way it treats this depends on the context. In scalar context, it will set $AoA[$i] to be the length of the array.
For more information about scalar vs list context, read this answer:
http://perl.plover.com/context.html
Your example isn't very useful in understanding what is going on here as your subroutine always returns "1" - the result of calling print(). If you replace the print() with return() then you will at get different values (1, 4, 9, etc).
But the next line of code:
$AoA[$i] = #array;
Will always assign 1 to the element of #Aoa. That's because You are assigning an array (#array) to a scalar variable ($AoA[$i]) and when you evaluate an array in a scalar context, you get the number of elements in the array.
Now, as your #array only ever has a single element, you could do this:
$AoA[$i] = $array[0];
But that's not really building an array of arrays. What you really want to do is to get a reference to an array.
$AoA[$i] = \#array;
This would be more useful if your subroutine returned more than one value.
sub somefunc {
# Used $x instead of $a as $a has a special meaning in Perl
my $x = shift;
return ($x * $x, $x * $x * $x);
}
for my $i (1..10) {
my #array = somefunc($i);
$AoA[$i] = \#array;
}
As useful tool for exploring this is Data::Dumper. Try adding:
use Data::Dumper;
To the top of your code and:
print Dumper #AoA;
After the foreach loop to see the different data structures that you get back.

Why isn't my sort working in Perl?

I have never used Perl, but I need to complete this exercise. My task is to sort an array in a few different ways. I've been provided with a test script. This script puts together the array and prints statements for each stage of it's sorting. I've named it foo.pl:
use strict;
use warnings;
use MyIxHash;
my %myhash;
my $t = tie(%myhash, "MyIxHash", 'a' => 1, 'abe' => 2, 'cat'=>'3');
$myhash{b} = 4;
$myhash{da} = 5;
$myhash{bob} = 6;
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the starting key => val pairs\n";
$t->SortByKey; # sort alphabetically
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the alphabetized key => val pairs\n";
$t->SortKeyByFunc(sub {my ($a, $b) = #_; return ($b cmp $a)}); # sort alphabetically in reverse order
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the reverse alphabetized key => val pairs\n";
$t->SortKeyByFunc(\&abcByLength); # use abcByLength to sort
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the abcByLength sorted key => val pairs\n";
print "Done\n\n";
sub abcByLength {
my ($a, $b) = #_;
if(length($a) == length($b)) { return $a cmp $b; }
else { return length($a) <=> length($b) }
}
Foo.pl uses a package called MyIxHash which I've created a module for called MyIxHash.pm. The script runs through the alphabetical sort: "SortByKey", which I've inherited via the "IxHash" package in my module. The last two sorts are the ones giving me issues. When the sub I've created: "SortKeyByFunc" is ran on the array, it passes in the array and a subroutine as arguments. I've attempted to take those arguments and associate them with variables.
The final sort is supposed to sort by string length, then alphabetically. A subroutine for this is provided at the bottom of foo.pl as "abcByLength". In the same way as the reverse alpha sort, this subroutine is being passed as a parameter to my SortKeyByFunc subroutine.
For both of these sorts, it seems the actual sorting work is done for me, and I just need to apply this subroutine to my array.
My main issue here seems to be that I don't know how, if possible, to take my subroutine argument and run my array through it as a parameter. I'm a running my method on my array incorrectly?
package MyIxHash;
#use strict;
use warnings;
use parent Tie::IxHash;
use Data::Dumper qw(Dumper);
sub SortKeyByFunc {
#my $class = shift;
my ($a, $b) = #_;
#this is a reference to the already alphabetaized array being passed in
my #letters = $_[0][1];
#this is a reference to the sub being passed in as a parameter
my $reverse = $_[1];
#this is my variable to contain my reverse sorted array
my #sorted = #letters->$reverse();
return #sorted;
}
1;
"My problem occurs where I try: my #sorted = #letters->$reverse(); I've also tried: my #sorted = sort {$reverse} #letters;"
You were really close; the correct syntax is:
my $reverse = sub { $b cmp $a };
# ...
my #sorted = sort $reverse #letters;
Also note that, for what are basically historical reasons, sort passes the arguments to the comparison function in the (slightly) magic globals $a and $b, not in #_, so you don't need to (and indeed shouldn't) do my ($a, $b) = #_; in your sortsubs (unless you declare them with a prototype; see perldoc -f sort for the gritty details).
Edit: If you're given a comparison function that for some reason does expect its arguments in #_, and you can't change the definition of that function, then your best bet is probably to wrap it in a closure like this:
my $fixed_sortsub = sub { $weird_sortsub->($a, $b) };
my #sorted = sort $fixed_sortsub #letters;
or simply:
my #sorted = sort { $weird_sortsub->($a, $b) } #letters;
Edit 2: Ah, I see the/a problem. When you write:
my #letters = $_[0][1];
what you end up with a is a single-element array containing whatever $_[0][1] is, which is presumably an array reference. You should either dereference it immediately, like this:
my #letters = #{ $_[0][1] };
or just keep is as a reference for now and dereference it when you use it:
my $letters = $_[0][1];
# ...
my #sorted = sort $whatever #$letters;
Edit 3: Once you do manage to sort the keys, then, as duskwuff notes in his original answer, you'll also need to call the Reorder() method from your parent class, Tie::IxHash to actually change the order of the keys. Also, the first line:
my ($a, $b) = #_;
is completely out of place in what's supposed to be an object method that takes a code reference (and, in fact, lexicalizing $a and $b is a bad idea anyway if you want to call sort later in the same code block). What it should read is something like:
my ($self, $sortfunc) = #_;
In fact, rather than enumerating all the things that seem to be wrong with your original code, it might be easier to just fix it:
package MyIxHash;
use strict;
use warnings;
use parent 'Tie::IxHash';
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
my #unsorted = $self->Keys();
my #sorted = sort { $sortfunc->($a, $b) } #unsorted;
$self->Reorder( #sorted );
}
1;
or simply:
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
$self->Reorder( sort { $sortfunc->($a, $b) } $self->Keys() );
}
(Ps. I now see why the comparison functions were specified as taking their arguments in #_ rather than in the globals $a and $b where sort normally puts them: it's because the comparison functions belong to a different package, and $a and $b are not magical enough to be the same in every package like, say, $_ and #_ are. I guess that could be worked around, but it would take some quite non-trivial trickery with caller.)
(Pps. Please do credit me and duskwuff / Stack Overflow when you hand in your exercise. And good luck with learning Perl — trust me, it'll be a useful skill to have.)
Your SortKeyByFunc method returns the results of sorting the array (#sorted), but it doesn't modify the array "in place". As a result, just calling $t->SortKeyByFunc(...); doesn't end up having any visible permanent effects.
You'll need to call $t->Reorder() within your SortKeyByFunc method to have any lasting impact on the array. I haven't tried it, but something like:
$t->Reorder(#sorted);
at the end of your method may be sufficient.

Why does my Perl max() function always return the first element of the array?

I am relatively new to Perl and I do not want to use the List::Util max function to find the maximum value of a given array.
When I test the code below, it just returns the first value of the array, not the maximum.
sub max
{
my #array = shift;
my $cur = $array[0];
foreach $i (#array)
{
if($i > $cur)
{
$cur = $i;
}
else
{
$cur = $cur;
}
}
return $cur;
}
Replace
my #array = shift;
with
my #array = #_;
#_ is the array containing all function arguments. shift only grabs the first function argument and removes it from #_. Change that code and it should work correctly!
Why don't you want to use something that works?
One of the ways to solve problems like this is to debug your data structures. At each step you print the data you have to see if what you expect is actually in there. That can be as simple as:
print "array is [#array]\n";
Or for complex data structures:
use Data::Dumper;
print Dumper( \#array );
In this case, you would have seen that #array has only one element, so there it must be the maximum.
If you want to see how list assignment and subroutine arguments work, check out Learning Perl.
You can write the function as:
#!/usr/bin/perl
use strict; use warnings;
print max(#ARGV);
sub max {
my $max = shift;
$max >= $_ or $max = $_ for #_;
return $max;
}
However, it would be far more efficient to pass it a reference to the array and even more efficient to use List::Util::max.

How do I refresh an array in a foreach loop?

I am writing a Perl script to do some mathematical operations on a hash. This hash contains the values as given in the sample below. I have written the code below. If I execute this code for an array value separately without using a foreach loop, the output is fine. But if I run this using a foreach loop on the array values, the sum for values in A are good, but from B the output add the previous values.
Hash Sample:
$VAR1 = 'A';
$VAR2 = {
'"x"' => [values],
'"y"' => [values],
and so on...
$VAR3 = 'B';
$VAR4 = {
'"x"' => [values],
'"y"' => [values],
and so on...
$VARn....
Code:
#!/usr/bin/perl -w
use strict;
use List::Util qw(sum);
my #data;
my #count;
my $total;
my #array = ("A", "B", "C", "D");
foreach my $v (#array) {
my %table = getV($v); #getV is a subroutine returing a hash.
for my $h (sort keys %table) {
for my $et (sort keys %{ $table{$h} } ) {
for $ec ($table{$h}{$et}) {
push #data, $ec;
#count = map { sum(#{$_}) } #data;
$total = sum(#count);
}
}
print "sum of $v is $total\n";
}
I think the issue is with this line. It is storing all the previous values and hence adding all the values in next foreach loop.
push #data, $ec;
So, here I have two issues:
1) How can I refresh the array (#data) in each foreach loop iteration?
2) How can I add the values in the array ref ($ec) and store them in an array? Because when I use the following code:
for $ec ($table{$h}{$et}) {
#count = map { sum(#{$_}) } #$ec;
$total = sum(#count);
}
The output gives me the same values for #count and $total.
Please provide me with suggestions.
If I understand you correctly, just a small change in your code. Make an empty array (#data) at the beginning of for loop. Hope this helps.
for my $h (sort keys %table) {
my #data;
1) Declare the #data array at the top of the loop body where you want to start with a fresh, empty array. Or maybe you mean to be saying #data = #$ec, not push #data, $ec?
2) To add the values in the array referred to by $ec, you would just say sum(#$ec); no map required.
It's not completely clear what your data structure is or what you are trying to do with it.
It would help to see what a sample %table looks like and what results you expect from it.

Automatically get loop index in foreach loop in Perl

If I have the following array in Perl:
#x = qw(a b c);
and I iterate over it with foreach, then $_ will refer to the current element in the array:
foreach (#x) {
print;
}
will print:
abc
Is there a similar way to get the index of the current element, without manually updating a counter? Something such as:
foreach (#x) {
print $index;
}
where $index is updated like $_ to yield the output:
012
Like codehead said, you'd have to iterate over the array indices instead of its elements. I prefer this variant over the C-style for loop:
for my $i (0 .. $#x) {
print "$i: $x[$i]\n";
}
In Perl prior to 5.10, you can say
#!/usr/bin/perl
use strict;
use warnings;
my #a = qw/a b c d e/;
my $index;
for my $elem (#a) {
print "At index ", $index++, ", I saw $elem\n";
}
#or
for my $index (0 .. $#a) {
print "At index $index I saw $a[$index]\n";
}
In Perl 5.10, you use state to declare a variable that never gets reinitialized (unlike ones created with my). This lets you keep the $index variable in a smaller scope, but it can lead to bugs (if you enter the loop a second time it will still have the last value):
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #a = qw/a b c d e/;
for my $elem (#a) {
state $index;
say "At index ", $index++, ", I saw $elem";
}
In Perl 5.12 you can say
#!/usr/bin/perl
use 5.012; # This enables strict
use warnings;
my #a = qw/a b c d e/;
while (my ($index, $elem) = each #a) {
say "At index $index I saw $elem";
}
But be warned: you there are restrictions to what you are allowed to do with #a while iterating over it with each.
It won't help you now, but in Perl 6 you will be able to say
#!/usr/bin/perl6
my #a = <a b c d e>;
for #a Z 0 .. Inf -> $elem, $index {
say "at index $index, I saw $elem"
}
The Z operator zips the two lists together (i.e. it takes one element from the first list, then one element from the second, then one element from the first, and so on). The second list is a lazy list that contains every integer from 0 to infinity (at least theoretically). The -> $elem, $index says that we are taking two values at a time from the result of the zip. The rest should look normal to you (unless you are not familiar with the say function from 5.10 yet).
perldoc perlvar does not seem to suggest any such variable.
It can be done with a while loop (foreach doesn't support this):
my #arr = (1111, 2222, 3333);
while (my ($index, $element) = each(#arr))
{
# You may need to "use feature 'say';"
say "Index: $index, Element: $element";
}
Output:
Index: 0, Element: 1111
Index: 1, Element: 2222
Index: 2, Element: 3333
Perl version: 5.14.4
Not with foreach.
If you definitely need the element cardinality in the array, use a 'for' iterator:
for ($i=0; $i<#x; ++$i) {
print "Element at index $i is " , $x[$i] , "\n";
}
No, you must make your own counter. Yet another example:
my $index;
foreach (#x) {
print $index++;
}
when used for indexing
my $index;
foreach (#x) {
print $x[$index]+$y[$index];
$index++;
}
And of course you can use local $index; instead my $index; and so and so.
autobox::Core provides, among many more things, a handy for method:
use autobox::Core;
['a'..'z']->for( sub{
my ($index, $value) = #_;
say "$index => $value";
});
Alternatively, have a look at an iterator module, for example: Array::Iterator
use Array::Iterator;
my $iter = Array::Iterator->new( ['a'..'z'] );
while ($iter->hasNext) {
$iter->getNext;
say $iter->currentIndex . ' => ' . $iter->current;
}
Also see:
each to their own (autobox)
perl5i
Yes. I have checked so many books and other blogs... The conclusion is, there isn't any system variable for the loop counter. We have to make our own counter. Correct me if I'm wrong.
Oh yes, you can! (sort of, but you shouldn't). each(#array) in a scalar context gives you the current index of the array.
#a = (a..z);
for (#a) {
print each(#a) . "\t" . $_ . "\n";
}
Here each(#a) is in a scalar context and returns only the index, not the value at that index. Since we're in a for loop, we have the value in $_ already. The same mechanism is often used in a while-each loop. Same problem.
The problem comes if you do for(#a) again. The index isn't back to 0 like you'd expect; it's undef followed by 0,1,2... one count off. The perldoc of each() says to avoid this issue. Use a for loop to track the index.
each
Basically:
for(my $i=0; $i<=$#a; $i++) {
print "The Element at $i is $a[$i]\n";
}
I'm a fan of the alternate method:
my $index=0;
for (#a) {
print "The Element at $index is $a[$index]\n";
$index++;
}
Please consider:
print "Element at index $_ is $x[$_]\n" for keys #x;
Well, there is this way:
use List::Rubyish;
$list = List::Rubyish->new( [ qw<a b c> ] );
$list->each_index( sub { say "\$_=$_" } );
See List::Rubyish.
You shouldn't need to know the index in most circumstances. You can do this:
my #arr = (1, 2, 3);
foreach (#arr) {
$_++;
}
print join(", ", #arr);
In this case, the output would be 2, 3, 4 as foreach sets an alias to the actual element, not just a copy.
I have tried like....
#array = qw /tomato banana papaya potato/; # Example array
my $count; # Local variable initial value will be 0.
print "\nBefore For loop value of counter is $count"; # Just printing value before entering the loop.
for (#array) { print "\n",$count++," $_" ; } # String and variable seperated by comma to
# execute the value and print.
undef $count; # Undefining so that later parts again it will
# be reset to 0.
print "\nAfter for loop value of counter is $count"; # Checking the counter value after for loop.
In short...
#array = qw /a b c d/;
my $count;
for (#array) { print "\n",$count++," $_"; }
undef $count;