Calling subroutine N number of times within a foreach loop - perl

I have the two hash of arrays (HoA) that correspond to the following file:
A 10 15 20 25
B 21 33 21 23
C 43 14 23 23
D 37 45 43 49
Here are my HoAs.
my %first_HoA = (
'A' => [ '10', '15', '20', '25'],
'B' => [ '21', '33', '21', '23'],
);
my %second_HoA = (
'A' => [ '10', '15', '20', '25'],
'B' => [ '21', '33', '21', '23'],
'C' => [ '43', '14', '23', '23'],
'D' => [ '37', '45', '43', '49'],
);
For every $key in the second HoA (A-D), I want to call a subroutine that does calculations on it's corresponding array and the array of every $key in the first HoA (A-B). Based on the calculations, the subroutine should return a key from the first HoA that yields the highest value. In other words, the subroutine should only be called for every $key in the second HoA and return the $key in the first HoA that yields the best value based on the calculations of the arrays of the keys in the first HoA.
Here's how I have it right now. Say I have an arbitrary subroutine called calculate
my $iterations = 1;
foreach my $key ( keys %second_HoA ) {
for my $arrayref (values %first_HoA){
calculate($first_HoA{$key}, $arrayref);
print "Iteration: $iterations\n";
$iterations++;
}
}
As you can see, this calls calculate 8 times. I only want to call calculate for every $key in %second_HoA which is 4 times but I also need to pass in the $arrayref to do the calculations in the subroutine.
Does anyone know how I can do this?
Another way I was thinking of doing this was passing in a hash_ref of the first_HoA like so:
foreach my $key ( keys %second_HoA ) {
calculate($second_HoA{$key}, \%first_HoA);
print "Iteration: $iterations\n";
$iterations++;
}
Doing this calls calculate 4 times which is what I want but it complicates things in the subroutine.
Any suggestions. thanks.

You say calculate($second_HoA{$key}, \%first_HoA) "complicates things", but I don't see how that's possible. It seems to me it's the minimum of information you need, and it's in a convenient format.
Anything less would complicate things, in the sense that you wouldn't have the information you need to do your calculations.

Related

Perl hash of hashes rationale

I have decided to give Perl a try and I have stumbled across a language structure that seems to be valid, but I just can't believe it is. As I guess there is some rationale behind this I decided to ask a question.
Take a following Perl code:
%data = ('John Paul' => ('Age' => 45), 'Lisa' => 30);
print "\$data{'John Paul'} = $data{'John Paul'}{'Age'}\n";
print "\$data{'Lisa'} = $data{'Lisa'}\n";
My intention was to check how hash of hashes works. The above code prints:
$data{'John Paul'} =
$data{'Lisa'} =
To make it a valid hash of hashes one needs:
%data = ('John Paul' => {'Age' => 45}, 'Lisa' => 30);
and the result would be:
$data{'John Paul'} = 45
$data{'Lisa'} = 30
Does anyone know:
Why there is non uniformity and the internal hash needs {} instead of ()?
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash? It is very easy to do such kind of mistakes. What is more, ('Age' => 45) breaks not only the value for 'John Paul' but also for 'Lisa'. I just can't imagine searching for such kind of "bugs" in project with thousands lines of code.
( 'John Paul' => ( 'Age' => 45 ), 'Lisa' => 30 )
is just another way of writing
'John Paul', 'Age', 45, 'Lisa', 30
Parens don't create any data structure; they just affect precedence like in (3+4)*5. The reason we don't write
my %h = a => 4;
or the equivalent
my %h = 'a', 4;
is that it would be interpreted as
( my %h = 'a' ), 4;
What creates the hash is my %data, not the parens. The right-hand side of the assignment just places an arbitrary number of scalars on the stack, not a hash. The assignment operator adds these scalars to the hash.
But sometimes, we want to create an anonymous hash. This is where {} comes in.
my %data = ( 'John Paul' => { 'Age' => 45 }, 'Lisa' => 30 );
is basically equivalent to
my %anon = ( 'Age' => 45 );
my %data = ( 'John Paul' => \%anon, 'Lisa' => 30 );
Note that \%anon returns a scalar, a reference to a hash. This is fundamentally different than what ( 'John Paul' => \%anon, 'Lisa' => 30 ) and 'John Paul' => \%anon, 'Lisa' => 30 return, four scalars.
Why there is non uniformity and the internal hash needs {} instead of ()?
An underlying premise of this question is false: Hashes don't need (). For example, the following are perfectly valid:
my %h1 = 'm'..'p';
sub f { return x => 4, y => 5 }
my %h2 = f();
my %h3 = do { i => 6, j => 7 };
() has nothing to do with hashes. The lack of uniformity comes from the lack of parallel. One uses {} to create a hash. One uses () to override precedence.
Since parens just affect precedence, one could use
my %data = ( 'John Paul' => ({ 'Age' => 45 }), 'Lisa' => 30 ); # ok (but weird)
This is very different than the following:
my %data = ( 'John Paul' => ( 'Age' => 45 ), 'Lisa' => 30 ); # XXX
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash?
Not only is using () valid, using () around expressions that contain commas is commonly needed. So when exactly should it warn? The point is that it's arguable whether this should be a warning or something perlcritic finds, at least at first glance. The latter should definitely find this, but I wouldn't know if a rule for it exists or not.
Why there is non uniformity and the internal hash needs {} instead of ()?
An assignment to a hash is a list of scalars (alternating between keys and values).
You can't have a hash (because it isn't a scalar) as a value there, but you can have a hash reference.
Lists get flattened.
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash?
Because you didn't turn them on with the use strict; use warnings; pragmas (which are off by default for reasons of horrible backwards compatibility but which will be on by default in Perl 7).

Perl Hash of Hashes, get lowest numeric key

I have a hash of hashes where the first key is a string and the second key is an integer. I'm trying to get the lowest second key in the hash of hashes. Here's my hash.
%HoH = (
flintstones => {
8 => "fred",
4 => "barney",
},
jetsons => {
5 => "george",
1 => "jane",
9 => "elroy", # Key quotes needed.
},
simpsons => {
99 => "homer",
5 => "marge",
3 => "bart",
},
);
How do I get the lowest (minimum) key for the hash simpsons? In this case the answer would be 3. The closest related search I could find was for a way to get the key with the highest value. But I'm trying to get the key with the lowest numeric value.
================== EDIT ============ MY ATTEMPT ================
foreach my $cartoon (keys %HoH){
if ($cartoon == "simpsons"){
$HoH{$cartoon}{<numeric key>}; # somehow store and print lowest key
}
}
I can't loop through the keys sequentially (1,2,3,4, etc.) and simply store and return the lowest key because the key (1,2,3,4, etc.) may not exist. I probably would have tried to store the keys in a separate array and get the minimum key stored in that array. That's my attempt. It's sort of a round about way of doing it. Since it's a round about way, next I would have done more Googling to see if there's an easier way (a one liner way) to do it.
use List::Util qw(min);
print min(keys(%{$HoH{simpsons}}));
my $min = (sort {$a <=> $b} keys $HoH{'simpsons'})[0];
print $min;

Perl: Get position and length of element in a string

Say I have a string like:
my $refseq="CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
I want to detect the positions where "-" occurs and the number of contiguous "-". I want to end up with a hash with "-" position as key, and extension length as value, for this example above:
%POSLENGTH = (5 => 1, 8 => 3, 14 => 2, 19 => 1, 27 => 20);
Note that the positions should be given based on the string without "-".
Check for #- array in perlval
my $refseq = "CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
my %POSLENGTH;
$POSLENGTH{ $-[0] +1 } = length($1) while $refseq =~ s/(-+)//;
use Data::Dumper; print Dumper \%POSLENGTH;
output
$VAR1 = {
'14' => 2,
'8' => 3,
'27' => 20,
'19' => 1,
'5' => 1
};
You can do this using the built-in #- and #+ arrays. Together they hold the start and end offsets of the last successful pattern match in element 0 (and of any captures in elements 1 onwards) so clearly the length of the last match is $+[0] - $-[0].
They're documented under Variables related to regular expressions in perldoc perlvar.
I've used Data::Dump here just to display the contents of the hash that is built
On a side note, I'm very doubtful that a hash is a useful structure for this information as I can't imagine a situation where you know the start position of a substring and need to know its length. I would have thought it was better represented as just an array of pairs
use strict;
use warnings;
use Data::Dump;
my $refseq="CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
my %pos_length;
while ( $refseq =~ /-+/g ) {
my ($pos, $len) = ( $-[0] + 1, $+[0] - $-[0] );
$pos_length{$pos} = $len;
}
dd \%pos_length;
output
{ 5 => 1, 9 => 3, 18 => 2, 25 => 1, 34 => 20 }

Math::BaseConvert with strange basenumbers like 23 or 1000

I just found the perl module Math::BaseConvert. I have to the task to convert numbers to very strange number with a different base. Not only base 2, 8, 16, but also 23, 134 up to 1000. (This is a partial task to balance a tree of files in a directory)
I could not make it. Reading the tests for the module in CPAN also confused me. So I wrote a little test, maybe you can tell me what's wrong, the result is:
ok 1 - use Math::BaseConvert;
ok 2 - Convert number '23' (base10) to '27' (base8)
not ok 3 - Convert number '23' (base10) to '23' (base32)
# Failed test 'Convert number '23' (base10) to '23' (base32)'
# at test_math_baseconvert.pl line 35.
# got: 'N'
# expected: '23'
not ok 4 - Convert number '64712' (base10) to '64:712' (base1000)
# Failed test 'Convert number '64712' (base10) to '64:712' (base1000)'
# at test_math_baseconvert.pl line 35.
# got: '-1'
# expected: '64:712'
1..4
# Looks like you failed 2 tests of 4.
The testprogram is this:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use_ok( 'Math::BaseConvert', '1.7' );
my #lines = (
{
# http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+16
old_number => '23',
old_base => 10,
new_number => '27',
new_base => 8,
},
{
# http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+32
old_number => '23',
old_base => 10,
new_number => '23', # stays same
new_base => 32,
},
{
# http://www.wolframalpha.com/input/?i=64712+from+base+10+to+base+1000
old_number => '64712',
old_base => 10,
new_number => '64:712',
new_base => 1000,
},
);
for my $line (#lines) {
cmp_ok(
Math::BaseConvert::cnv(
$line->{old_number}, $line->{old_base}, $line->{new_base}
),
'eq',
$line->{new_number},
sprintf(
"Convert number '%s' (base%d) to '%s' (base%d)",
$line->{old_number}, $line->{old_base},
$line->{new_number}, $line->{new_base}
)
);
}
done_testing();
Wolfram Alpha's method of showing bases greater than base 16 is to separate digits with a colon. There's nothing wrong with that, as they're displaying those numbers using css styling that lessons the shading of the colon to make it more obvious what they're doing. But they also add a message stating exactly how many digits they're showing since "1:1617 (2 digits)" isn't obvious enough.
The method Math::BaseConvert and other such modules use is to expand the character set for digits just like is done with hex numbers 0-9A-F to include the first 6 letters in the alphabet. For the case of base 32 numbers the character set is 0-9A-V. Given N is the 14th letter in the alphabet, it is the appropriate representation for 23 in base 32.
If you want to use the colon representation for numbers greater than 16, you can either use the module or just roll your own solution.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use_ok( 'Math::BaseConvert', '1.7' );
my #lines = (
# Old_Number Old_Base New_Number New_Base
[qw(23 10 27 8)], # http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+16
[qw(23 10 23 32)], # http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+32
[qw(64712 10 64:712 1000)], # http://www.wolframalpha.com/input/?i=64712+from+base+10+to+base+1000
);
for my $line (#lines) {
cmp_ok(
base10toN(#$line[0,3]),
'eq',
$line->[2],
sprintf("Convert number '%s' (base%d) to '%s' (base%d)", $line->[0], 10, #$line[2,3])
);
}
sub base10toN {
my ($num, $base) = #_;
return Math::BaseConvert::cnv($num, 10, $base)
if $base <= 16;
my #digits = ();
while (1) {
my $remainder = $num % $base;
unshift #digits, $remainder;
$num = ($num - $remainder) / $base
or last;
}
return join ':', #digits;
}
done_testing();
You seem to be expecting decimal output, with "digits" being decimal numbers separated by :.
Math::BaseConvert doesn't do that. It only supports having a single character per digit.
By default, the digits used are '0'..'9', 'A'..'Z', 'a'..'z', '.', '_' though you can supply your own list instead (and you would have to do so to support up to base 1000).

Find duplicates in a hash, store grouped in a new hash

I have the following hash, and I need to find the duplicates between the top most hash values 6 and 4. I've tried a few solutions to no avail, and am not too familiar with Perl syntax to make it work.
The Hash I Have
$VAR1 = {
'6' => [ '1000', '2000', '4000' ],
'4' => [ '1000', '2000', '3000' ]
};
The Hash I Need
$VAR1 = {
'6' => ['4000'],
'4' => ['3000'],
'Both' => ['1000','2000']
}
Find all common elements, e.g. by deduplicating with a hash.
Find all elements that are not common.
Given two arrays #x, #y, this would mean:
use List::MoreUtils 'uniq';
# find all common elements
my %common;
$common{$_}++ for uniq(#x), uniq(#y); # count all elements
$common{$_} == 2 or delete $common{$_} for keys %common;
# remove entries from #x, #y that are common:
#x = grep { not $common{$_} } #x;
#y = grep { not $common{$_} } #y;
# Put the common strings in an array:
my #common = keys %common;
Now all that is left is to do a bit of dereferencing and such, but that should be fairly trivial.
No need for other modules. perl hashes are really good for finding uniq or common values
my %both;
# count the number of times any element was seen in 4 and 6
$both{$_}++ for (#{$VAR1->{4}}, #{$VAR1->{6}});
for (keys %both) {
# if the count is one the element isn't in both 4 and 6
delete $both{$_} if( $both{$_} == 1 );
}
$VAR1->{Both} = [keys %both];