How to calculate number of rows of a referenced array in Perl - perl

A part of my code calculates inverse of a matrix (generated previously in the code) with dimensions more than 300 X 300. I want to use the elements of the inversed matrix further in the code. Have used the below code for this, trying with only 5X5 matrix for testing:
use strict;
use warnings;
use Math::MatrixReal;
my #a=(); #a is the matrix obtained
$a[0][0]=0.18761134;
$a[0][1]=0.010779401; #Have hard-coded the values here till $a[4][4]
my $ref_a = \#a;
my $b = Math::MatrixReal->new_from_rows($ref_a);
my $b_inv = $b->inverse();
print "\n Inverse is\n",$b_inv; #prints correct inverse
print "\n\nTest printing elements\n";
print $$b_inv[0][1][1],"\n"; #prints the correct element
my $row_b=scalar(#{$b});
print "Number of rows in b: ",$row_b,"\n"; #prints 6
my $col_b=#{$$b[0]};
print "Columns in b: ",$col_b,"\n"; #prints 5
my $row_binv=scalar(#$b_inv);
print "Number of rows in b_inv: ",$row_binv,"\n"; #prints 3
my $col_binv=#{$$b_inv[0]};
print "Number of columns in b_inv ",$col_binv,"\n"; #prints 5
I am not able to understand
why the output of number of rows for both b and b_inv is wrong? How to get the correct value of number of rows?
That although the syntax of printing elements of a referenced array is $$b_inv[1][1], I get the correct output when I use $$b_inv[0][1][1]

You are creating a
Math::MatrixReal
matrix object, and then accessing it as a simple Perl array. Poking around inside a Perl object indiscriminately is wrong, and you must use the methods defined in the documentation
In particular, your statement
print $$b_inv[0][1][1],"\n"; # prints the correct element
accesses a three-dimensional array, and there is no way of knowing what the "correct element" should be for this without reading the code of the module
This modification sets up a 5 x 5 identity matrix (in future, please provide data that we can use to reproduce your results) and takes its inverse. The values output are derived using the object's methods as I described and are all correct. Note that the rows and columns are indexed from one instead of from zero that you would expect for Perl arrays
use strict;
use warnings 'all';
use Math::MatrixReal;
my #arr = (
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
);
my $ref_a = \#arr;
my $b = Math::MatrixReal->new_from_rows(\#arr);
my $b_inv = $b->inverse;
print "\nInverse is\n", $b_inv;
print "\n\nTest printing elements\n";
print $b_inv->element($_, $_), "\n" for 1 .. 5;
my ($row_b, $col_b) = $b->dim;;
print "Number of rows in b: $row_b\n"; # prints 5
print "Columns in b: $col_b\n"; # prints 5
my ($row_binv, $col_binv) = $b_inv->dim;;
print "Number of rows in b_inv: $row_binv\n"; # prints 5
print "Number of columns in b_inv $col_binv\n"; # prints 5
output
Inverse is
[ 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 ]
Test printing elements
1
1
1
1
1
Number of rows in b: 5
Columns in b: 5
Number of rows in b_inv: 5
Number of columns in b_inv 5

Related

Sort and array maintaining index value

I stored some values in the array as follows $score[$userIndex] = :
Score [100, 400, 900]
userIndex (1 , 2 , 3 )
I need to ascending numeric sort based on the score while maintaining the indexes, as they have significance, the result should resemble:
Score [100, 400, 900]
userIndex (2 , 1 , 3 )
2. After the sort is complete how can I access the previous index an element was at(after the sort is complete I no longer need the score just the indexes)?
In the sorted array all i need is for eg: $sorted[0] = 2, $sorted[1] = 1, $sorted[3] = 3
I found this method of doing it after some extensive googling
while ($hashIndex < $#Score) {
$matchHash{$hashIndex} = $Score[$hashIndex];
$hashIndex++;
}
foreach my $score (sort { $matchHash{$a} <=> $matchHash{$b} } keys %matchHash) {
#DS
# printf "%-8s %s\n", $score, $matchHash{$score};
push (#sorted, $score);
}
The way to do this is to sort a list of array indices instead of the data itself. Then you can reorder the corresponding list of users using the same sorted indices.
The code would look like this
use strict;
use warnings;
my #score = (400, 100, 900);
my #users = (1, 2, 3);
my #sorted_indices = sort { $score[$a] <=> $score[$b] } 0 .. $#score;
my #sorted_users = #users[#sorted_indices];
print "#sorted_users\n";
output
2 1 3
Update
Looking at your own code, it looks like perhaps all you want is the sorted array indices, (i.e. the userIndex list in your question should properly be (0, 1, 2)). If that is the case the the sort is complete when #sorted_indices is filled.
print "#sorted_indices\n";
outputs
1 0 2

Branch and bound using Perl

I have a problem, I cannot find an answer to. I am using Perl. My input is a symmetric cost-matrix, kind of like the TSP.
I want to know all solutions that lie beneath my boundary, which is 10.
This is my matrix:
- B E G I K L P S
B - 10 10 2 10 10 10 10
E 10 - 2 10 10 10 1 10
G 10 2 - 10 2 3 3 3
I 2 10 10 - 4 10 10 2
K 10 10 2 4 - 10 10 3
L 10 10 3 10 10 - 2 2
P 10 1 3 10 10 2 - 10
S 10 10 3 2 3 2 10 -
Does anybody know how to implement the branch and bound algorithm to solve this? For now, I did replace every 10 in the matrix with "-".
What I did so far:
#verwbez = ( ["-", B, E, G, I, K, L, P, S],
[B,"-", 10, 10, 2, 10, 10, 10, 10],
[E, 10, "-", 2, 10, 10, 10, 1, 10],
[G, 10, 2, "-", 10, 2, 3, 3, 3],
[I, 2, 10, 10, "-", 4, 10, 10, 2],
[K, 10, 10, 2, 4, "-", 10, 10, 3],
[L, 10, 10, 3, 10, 10, "-", 2, 2],
[P, 10, 1, 3, 10, 10, 2, "-", 10],
[S, 10, 10, 3, 2, 3, 2, 10, "-"]);
for ($i=0;$i<=$#verwbez;$i++) {
for ($j=0; $j<=$#{$verwbez[$i]};$j++) {
while ($verwbez[$i][$j] >=7) {
$verwbez[$i][$j] = "-";
}
}
}
Basically just altering the matrix, every 10 is replaced with a "-". Now I want to find all solutions that are beneath 10 and contain 4 districts where always two cities are linked together. But unfortunately, I do not know how to proceed/start...
You're unlikely to get someone to implement the Branch and Bound algorithm for you. However, the following stackoverflow post, TSP - branch and bound, has some links to some helpful resources:
Optimal Solution for TSP using Branch and Bound
B&B Implementations for the TSP -
Part 1: A solution with nodes containing partial tours with
constraints
B&B Implementations for the TSP - Part 2: Single threaded solution with many inexpensive nodes
Since you appear new to perl, we can give you some quick tips
Always include use strict; and use warnings at the top of each and every perl script
Use the range operator .. when creating an incrementing for loop.
Your while loop should actually be an if statement.
For increased style, consider using qw() when initializing a mixed word/number array, especially since it will allow you to easily align a multidimensional array's elements
Your first goal for a project like this should be to create a method to output your multidimensional array in a readable format, so you can observe and verify the changes that you're making.
All of that gives the following changes:
use strict;
use warnings;
my #verwbez = (
[qw(- B E G I K L P S )],
[qw(B - 10 10 2 10 10 10 10)],
[qw(E 10 - 2 10 10 10 1 10)],
[qw(G 10 2 - 10 2 3 3 3 )],
[qw(I 2 10 10 - 4 10 10 2 )],
[qw(K 10 10 2 4 - 10 10 3 )],
[qw(L 10 10 3 10 10 - 2 2 )],
[qw(P 10 1 3 10 10 2 - 10)],
[qw(S 10 10 3 2 3 2 10 - )],
);
for my $i (0 .. $#verwbez) {
for my $j (0 .. $#{$verwbez[$i]}) {
if ($verwbez[$i][$j] =~ /\d/ && $verwbez[$i][$j] >= 7) {
$verwbez[$i][$j] = ".";
}
}
}
for (#verwbez) {
for (#$_) {
printf "%2s ", $_;
}
print "\n";
}
Outputs:
- B E G I K L P S
B - . . 2 . . . .
E . - 2 . . . 1 .
G . 2 - . 2 3 3 3
I 2 . . - 4 . . 2
K . . 2 4 - . . 3
L . . 3 . . - 2 2
P . 1 3 . . 2 - .
S . . 3 2 3 2 . -
Note that B has only 1 city it's near to. So if the goal was solving the TSP, then there isn't a trivial solution. However, given there are only 8 cities and (n-1)! circular permutations. That gives us just 5,040 permutations, so using brute force would totally work for finding a lowest cost solution.
use strict;
use warnings;
use Algorithm::Combinatorics qw(circular_permutations);
my #verwbez = ( ... already defined ... );
# Create a cost between two cities hash:
my %cost;
for my $i (1..$#verwbez) {
for my $j (1..$#{$verwbez[$i]}) {
$cost{ $verwbez[$i][0] }{ $verwbez[0][$j] } = $verwbez[$i][$j] if $i != $j;
}
}
# Determine all Routes and their cost (sorted)
my #cities = keys %cost;
my #perms = circular_permutations(\#cities);
my #cost_with_perm = sort {$a->[0] <=> $b->[0]} map {
my $perm = $_;
my $prev = $perm->[-1];
my $cost = 0;
for (#$perm) {
$cost += $cost{$_}{$prev};
$prev = $_
}
[$cost, $perm]
} #perms;
# Print out lowest cost routes:
print "Lowest cost is: " . $cost_with_perm[0][0] . "\n";
for (#cost_with_perm) {
last if $_->[0] > $cost_with_perm[0][0];
print join(' ', #{$_->[1]}), "\n";
}
It ends up there are only 2 lowest cost solutions to this setup, and they're mirror images of each other, which makes sense since we didn't filter by direction in our circular permutations. Am intentionally not stating what they are here.

Randomly selecting letters by frequency of use

After feeding few Shakespeare books to my Perl script I have a hash with 26 english letters as keys and the number of their occurences in texts - as value:
%freq = (
a => 24645246,
b => 1409459,
....
z => 807451,
);
and of course the total number of all letters - let's say in the $total variable.
Is there please a nice trick to generate a string holding 16 random letters (a letter can occur several times there) - weighted by their frequency of use?
To be used in a word game similar to Ruzzle:
Something elegant - like picking a random line from a file, as suggested by a Perl Cookbook receipt:
rand($.) < 1 && ($line = $_) while <>;
The Perl Cookbook trick for picking a random line (which can also be found in perlfaq5) can be adapted for weighted sampling too:
my $chosen;
my $sum = 0;
foreach my $item (keys %freq) {
$sum += $freq{$item};
$chosen = $item if rand($sum) < $freq{$item};
}
Here, $sum corresponds to the line counter $. and $freq{$item} to the constant 1 in the Cookbook version.
If you're going to be picking a lot of weighted random samples, you can speed this up a bit with some preparation (note that this destroys %freq, so make a copy first if you want to keep it):
# first, scale all frequencies so that the average frequency is 1:
my $avg = 0;
$avg += $_ for values %freq;
$avg /= keys %freq;
$_ /= $avg for values %freq;
# now, prepare the array we'll need for fast weighted sampling:
my #lookup;
while (keys %freq) {
my ($lo, $hi) = (sort {$freq{$a} <=> $freq{$b}} keys %freq)[0, -1];
push #lookup, [$lo, $hi, $freq{$lo} + #lookup];
$freq{$hi} -= (1 - $freq{$lo});
delete $freq{$lo};
}
Now, to draw a random weighted sample from the prepared distribution, you just do this:
my $r = rand #lookup;
my ($lo, $hi, $threshold) = #{$lookup[$r]};
my $chosen = ($r < $threshold ? $lo : $hi);
(This is basically the Square Histogram method described in Marsaglia, Tsang & Wang (2004), "Fast Generation of Discrete Random Variables", J. Stat. Soft. 11(3) and originally due to A.J. Walker (1974).)
I have no clue about Perl syntax so I'll just write pseudo-code. You can do something like that
sum <= 0
foreach (letter in {a, z})
sum <= sum + freq[letter]
pick r, a random integer in [0, sum[
letter <= 'a' - 1
do
letter <= letter + 1
r <= r - freq(letter)
while r > 0
letter is the resulting value
The idea behind this code is to make a stack of boxes for each letter. The size of each box is the frequency of the letter. Then we choose a random location on this stack and see which letter's box we landed.
Example :
freq(a) = 5
freq(b) = 3
freq(c) = 3
sum = 11
| a | b | c |
- - - - - - - - - - -
When we choose a 0 <= r < 11, we have the following probabilities
Pick a 'a' = 5 / 11
Pick a 'b' = 3 / 11
Pick a 'c' = 3 / 11
Which is exactly what we want.
You can first built a table of the running sum of the frequency. So if you have the following data:
%freq = (
a => 15,
b => 25,
c => 30,
d => 20
);
the running sum would be;
%running_sums = (
a => 0,
b => 15,
c => 40, # 15 + 25
d => 70, # 15 + 25 + 30
);
$max_sum = 90; # 15 + 25 + 30 + 20
To pick a single letter with the weighted frequency, you need to select a number between [0,90), then you can do a linear search on the running_sum table for the range that includes the letter. For example, if your random number is 20 then the appropriate range is 15-40, which is for the letter 'b'. Using linear search gives a total running time of O(m*n) where m is the number of letters we need and n is the size of the alphabet (therefore m=16, n=26). This is essentially what #default locale do.
Instead of linear search, you can also do a binary search on the running_sum table to get the closest number rounded down. This gives a total running time of O(m*log(n)).
For picking m letters though, there is a faster way than O(m*log(n)), perticularly if n < m. First you generate m random numbers in sorted order (which can be done without sorting in O(n)) then you do a linear matching for the ranges between the list of sorted random numbers and the list of running sums. This gives a total runtime of O(m+n). The code in its entirety running in Ideone.
use List::Util qw(shuffle);
my %freq = (...);
# list of letters in sorted order, i.e. "a", "b", "c", ..., "x", "y", "z"
# sorting is O(n*log(n)) but it can be avoided if you already have
# a list of letters you're interested in using
my #letters = sort keys %freq;
# compute the running_sums table in O(n)
my $sum = 0;
my %running_sum;
for(#letters) {
$running_sum{$_} = $sum;
$sum += $freq{$_};
}
# generate a string with letters in $freq frequency in O(m)
my $curmax = 1;
my $curletter = $#letters;
my $i = 16; # the number of letters we want to generate
my #result;
while ($i > 0) {
# $curmax generates a uniformly distributed decreasing random number in [0,1)
# see http://repository.cmu.edu/cgi/viewcontent.cgi?article=3483&context=compsci
$curmax = $curmax * (1-rand())**(1. / $i);
# scale the random number $curmax to [0,$sum)
my $num = int ($curmax * $sum);
# find the range that includes $num
while ($num < $running_sum{$letters[$curletter]}) {
$curletter--;
}
push(#result, $letters[$curletter]);
$i--;
}
# since $result is sorted, you may want to use shuffle it first
# Fisher-Yates shuffle is O(m)
print "", join('', shuffle(#result));

Identifying subarrays in matrices in Perl

I am relatively new to Perl, and I need to make a relatively sophisticated matricial computation and don't know what data structures to use.
Not sure if this is the appropriate forum for this, but say you have following matrix in a multi-dimensional array in Perl:
0.2 0.7 0.2
0.6 0.8 0.7
0.6 0.1 0.8
0.1 0.2 0.9
0.6 0.3 0.0
0.6 0.9 0.2
I am trying to identify column segments in this Matrix corresponding to continuous values that are higher than a given threshold, e.g. 0.5
For example, if we threshold this matrix, we have:
0 1 0
1 1 1
1 0 1
0 0 1
1 0 0
1 1 0
If we now focus on the first column:
0
1
1
0
1
1
we can see that there are two continuous segments:
0 1 1 0 1 1
The first track (sequence of ones) starts with index 1 and ends with index 2
The second track (sequence of ones) starts with index 4 and ends with index 5
I would like to detect all such tracks in the original matrix, but I don't know how to proceed or what Perl data structures are most appropriate for this.
Ideally I would like something easy to index, e.g. assuming that we use the variable tracks, I can store the indices for the first column (index 0) as follows:
# First column, first track
$tracks{0}{0}{'start'} = 1;
$tracks{0}{0}{'end'} = 2;
# First column, second track
$tracks{0}{1}{'start'} = 4;
$tracks{0}{1}{'end'} = 5;
# ...
What are good data structures and/or libraries I can use to approach this problem in Perl?
I am just giving the algorithmic answer and you can code it in whatever language you like.
Split the problem into subproblems:
Thresholding: depending how you store you input this can be as simple as an iteration over an $n$ dimensional matrix, or a tree/list traversal if your matrices are sparse. This is the easy bit.
The algorithm for finding continuous segments is called 'run-length-encoding'. It takes a sequence with possible duplicates like
1 0 0 1 1 1 1 0 1 and returns another sequence which tells you which element is next, and how many of them are there. So for example the above sequence would be 1 1 0 2 1 4 0 1 1 1. The encoding is unique so if you ever want to invert it you are OK.
The first 1 is there because the original input starts with 1, and first 0 is there because after the 1 there is a 0, and the fourth number is two because there are two consecutive zeros. There are zillions of rle-encoders if you don't want to do your own.
Its main purpose is compression and it works reasonably well for that purpose if you have long runs of identical items. Depending on your needs you may have to run it horizontally, vertically and even diagonally.
You find the precise algorithm in all the classical books on data structures and algorithm. I'd suggest Cormen-Leiseron-Rivest-Stein: 'Introduction to Algorithms' first, then Knuth.
Once you get the gist, you can safely 'fuse' the thresholding with RLE to avoid iterating twice over your inputs.
This seems to do what you want. I have represented the data in the form you suggested, as the ideal form depends entirely on what you want to do with the result
It works by calculating the list of 0s and 1s from each column, adding barrier values of zero at each end (one in $prev and one in the for list) and then scanning the list for changes between 1 and 0
Every time a change is found, a track start or end is recorded. If $start is undefined then the current index is recorded as the start of a segment, otherwise the current segment ended at one less than the current index. A hash is built with start and end keys, and pushed onto the #segments array.
The final set of nested loops dumps the calculated data in the form you show in the question
use strict;
use warnings;
use constant THRESHOLD => 0.5;
my #data = (
[ qw/ 0.2 0.7 0.2 / ],
[ qw/ 0.6 0.8 0.7 / ],
[ qw/ 0.6 0.1 0.8 / ],
[ qw/ 0.1 0.2 0.9 / ],
[ qw/ 0.6 0.3 0.0 / ],
[ qw/ 0.6 0.9 0.2 / ],
);
my #tracks;
for my $colno (0 .. $#{$data[0]}) {
my #segments;
my $start;
my $prev = 0;
my $i = 0;
for my $val ( (map { $_->[$colno] > THRESHOLD ? 1 : 0 } #data), 0 ) {
next if $val == $prev;
if (defined $start) {
push #segments, { start => $start, end=> $i-1 };
undef $start;
}
else {
$start = $i;
}
}
continue {
$prev = $val;
$i++;
}
push #tracks, \#segments;
}
# Dump the derived #tracks data
#
for my $colno (0 .. $#tracks) {
my $col = $tracks[$colno];
for my $track (0 .. $#$col) {
my $data = $col->[$track];
printf "\$tracks[%d][%d]{start} = %d\n", $colno, $track, $data->{start};
printf "\$tracks[%d][%d]{end} = %d\n", $colno, $track, $data->{end};
}
print "\n";
}
output
$tracks[0][0]{start} = 1
$tracks[0][0]{end} = 2
$tracks[0][1]{start} = 4
$tracks[0][1]{end} = 5
$tracks[1][0]{start} = 0
$tracks[1][0]{end} = 1
$tracks[1][1]{start} = 5
$tracks[1][1]{end} = 5
$tracks[2][0]{start} = 1
$tracks[2][0]{end} = 3
Lamenting the poor support for multidimensional arrays by Perl, I soon found myself throwing together a small solution of my own. The algorithm is rather similar to Borodins idea, but with a slightly different structure:
sub tracks {
my ($data) = #_; # this sub takes a callback as argument
my #tracks; # holds all found ranges
my #state; # is true if we are inside a range/track. Also holds the starting index of the current range.
my $rowNo = 0; # current row number
while (my #row = $data->()) { # fetch new data
for my $i (0..$#row) {
if (not $state[$i] and $row[$i]) {
# a new track is found
$state[$i] = $rowNo+1; # we have to pass $rowNo+1 to ensure a true value
} elsif ($state[$i] and not $row[$i]) {
push #{$tracks[$i]}, [$state[$i]-1, $rowNo-1]; # push a found track into the #tracks array. We have to adjust the values to revert the previous adjustment.
$state[$i] = 0; # reset state to false
}
}
} continue {$rowNo++}
# flush remaining tracks
for my $i (0..$#state) {
push #{$tracks[$i]}, [$state[$i]-1, $rowNo-1] if $state[$i]
}
return #tracks;
}
#state doubles as a flag indicating if we are inside a track and as a record for the track starting index. In the state and tracks arrays, the index indicates the current column.
As a data source, I used an external file, but this can be easily plugged into anything, e.g. a preexisting array. The only contract is that it must return an arbitrary sequence of true and false values and the empty list when no further data is available.
my $limit = 0.5
my $data_source = sub {
defined (my $line = <>) or return (); # return empty list when data is empty
chomp $line;
return map {$_ >= $limit ? $_ : 0} split /\s+/, $line; # split the line and map the data to true and false values
};
With the data you gave copy-pasted as input, I get the following printout as output (printing code omitted):
[ [1 2], [4 5] ]
[ [0 1], [5 5] ]
[ [1 3] ]
With your structure, this would be
$tracks[0][0][0] = 1;
$tracks[0][0][1] = 2;
$tracks[0][1][0] = 4;
...;
If this is modified to a hash, further data like the original value could be incorporated.

How can I create combinations of several lists without hardcoding loops?

I have data that looks like this:
my #homopol = (
["T","C","CC","G"], # part1
["T","TT","C","G","A"], #part2
["C","CCC","G"], #part3 ...upto part K=~50
);
my #prob = ([1.00,0.63,0.002,1.00,0.83],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02]);
# Note also that the dimension of #homopol is always exactly the same with #prob.
# Although number of elements can differ from 'part' to 'part'.
What I want to do is to
Generate all combinations of elements in part1 through out partK
Find the product of the corresponding elements in #prob.
Hence at the end we hope to get this output:
T-T-C 1 x 0.72 x 1 = 0.720
T-T-CCC 1 x 0.72 x 0.97 = 0.698
T-T-G 1 x 0.72 x 0.02 = 0.014
...
G-G-G 1 x 0.85 x 0.02 = 0.017
G-A-C 1 x 1 x 1 = 1.000
G-A-CCC 1 x 1 x 0.97 = 0.970
G-A-G 1 x 1 x 0.02 = 0.020
The problem is that the following code of mine does that by hardcoding
the loops. Since the number of parts of #homopol is can be varied and large
(e.g. ~K=50), we need a flexible and compact way to get the same result. Is there any?
I was thinking to use Algorithm::Loops, but not sure how to achieve that.
use strict;
use Data::Dumper;
use Carp;
my #homopol = (["T","C","CC","G"],
["T","TT","C","G","A"],
["C","CCC","G"]);
my #prob = ([1.00,0.63,0.002,1.00,0.83],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02]);
my $i_of_part1 = -1;
foreach my $base_part1 ( #{ $homopol[0] } ) {
$i_of_part1++;
my $probpart1 = $prob[0]->[$i_of_part1];
my $i_of_part2 =-1;
foreach my $base_part2 ( #{ $homopol[1] } ) {
$i_of_part2++;
my $probpart2 = $prob[1]->[$i_of_part2];
my $i_of_part3 = -1;
foreach my $base_part3 ( #{ $homopol[2] } ) {
$i_of_part3++;
my $probpart3 = $prob[2]->[$i_of_part3];
my $nstr = $base_part1."".$base_part2."".$base_part3;
my $prob_prod = sprintf("%.3f",$probpart1 * $probpart2 *$probpart3);
print "$base_part1-$base_part2-$base_part3 \t";
print "$probpart1 x $probpart2 x $probpart3 = $prob_prod\n";
}
}
}
I would recommend Set::CrossProduct, which will create an iterator to yield the cross product of all of your sets. Because it uses an iterator, it does not need to generate every combination in advance; rather, it yields each one on demand.
use strict;
use warnings;
use Set::CrossProduct;
my #homopol = (
[qw(T C CC G)],
[qw(T TT C G A)],
[qw(C CCC G)],
);
my #prob = (
[1.00,0.63,0.002,1.00],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02],
);
# Prepare by storing the data in a list of lists of pairs.
my #combined;
for my $i (0 .. $#homopol){
push #combined, [];
push #{$combined[-1]}, [$homopol[$i][$_], $prob[$i][$_]]
for 0 .. #{$homopol[$i]} - 1;
};
my $iterator = Set::CrossProduct->new([ #combined ]);
while( my $tuple = $iterator->get ){
my #h = map { $_->[0] } #$tuple;
my #p = map { $_->[1] } #$tuple;
my $product = 1;
$product *= $_ for #p;
print join('-', #h), ' ', join(' x ', #p), ' = ', $product, "\n";
}
A solution using Algorithm::Loops without changing the input data would look something like:
use Algorithm::Loops;
# Turns ([a, b, c], [d, e], ...) into ([0, 1, 2], [0, 1], ...)
my #lists_of_indices = map { [ 0 .. #$_ ] } #homopol;
NestedLoops( [ #lists_of_indices ], sub {
my #indices = #_;
my $prob_prod = 1; # Multiplicative identity
my #base_string;
my #prob_string;
for my $n (0 .. $#indices) {
push #base_string, $hompol[$n][ $indices[$n] ];
push #prob_string, sprintf("%.3f", $prob[$n][ $indices[$n] ]);
$prob_prod *= $prob[$n][ $indices[$n] ];
}
print join "-", #base_string; print "\t";
print join "x", #prob_string; print " = ";
printf "%.3f\n", $prob_prod;
});
But I think that you could actually make the code clearer by changing the structure to one more like
[
{ T => 1.00, C => 0.63, CC => 0.002, G => 0.83 },
{ T => 0.72, TT => 0.03, ... },
...
]
because without the parallel data structures you can simply iterate over the available base sequences, instead of iterating over indices and then looking up those indices in two different places.
Why don't you use recursion? Pass the depth as a parameter and let the function call itself with depth+1 inside the loop.
you could do it by creating an array of indicies the same length as the #homopol array (N say), to keep track of which combination you are looking at. In fact this array is just like a
number in base N, with the elements being the digits. Iterate in the same way as you would write down consectutive numbers in base N, e.g (0 0 0 ... 0), (0 0 0 ... 1), ...,(0 0 0 ... N-1), (0 0 0 ... 1 0), ....
Approach 1: Calculation from indices
Compute the product of lengths in homopol (length1 * length2 * ... * lengthN). Then, iterate i from zero to the product. Now, the indices you want are i % length1, (i / length1)%length2, (i / length1 / length2) % length3, ...
Approach 2: Recursion
I got beaten to it, see nikie's answer. :-)