For loop help in perl - perl

I am writing perl script and I have little question regarding for loop limit.
Let say I have two arrays, arr1 has serial numbers and arr2 is two dimensional array, the first dimension is the serial number [same as arr1] and the second dimension is the contents of that serial number , Now I want to apply the for loop for this two dimension array but I am confused at the limit . Till now I have this code
Example : I have Three serial numbers , 1 ,2 ,3 . Serial 1 has 2 contents 1,5 . Serial 2 has 1 content i.e 1. Serial 3 has two contents 1,1.
#arr1 = (1,2,3)
$arr2[0][0] = 1
$arr2[0][1] = 5
$arr2[1][0] = 1
$arr2[2][1] = 1
$arr2[2][2] = 1
Note: As you can see the contents of arr2 has arr1 elements in 1st columns and the contents in the second columns.
for (my $i = 0; $i <= $#arr1; $i++) {
print( "The First Serial number has:" );
for (my $j = 0; $j <= $#arr2; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Thanks, Sorry for the bad explaination

Why don't do this like that :
#!/usr/bin/perl
use strict;
my #arr;
$arr[0][0] = 1;
$arr[0][1] = 5;
$arr[1][0] = 1;
$arr[2][1] = 1;
$arr[2][2] = 1;
my ($i, $j);
foreach $i (#arr) {
foreach $j (#{$i}) {
print $j."\n" if($j);
}
}
1;
__END__

Fixed code:
use strict;
use warnings;
my #arr1 = (1,2,3);
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1; # original code had
$arr2[2][1] = 1; # these indexes wrong
for (my $i = 0; $i <= $#arr1; $i++) {
print( "Serial number $arr1[$i] has:" );
for (my $j = 0; $j <= $#{ $arr2[$i] }; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Note the use of $#{ arrayref }; see http://perlmonks.org/?node=References+quick+reference

you can put #arr2 like this and it would be much easier for you to understand #arr2
use strict;
use warnings;
my #arr1 = (1, 2, 3);
my #arr2 = ([1, 5], [1], [1, 1]);
for my $first(#arr1) {
for my $second (#{$arr2[$first-1]}) {
print $second."\n";
}
}

Here is a version without the first array.
for (my $i = 0; $i<= $#arr; $i++)
{
print "INDEX $i\n";
for (my $j = 0; $j <= $#{$arr[$i]}; $j++)
{
print "${arr[$i][$j]}\n";
}
}
The point here is that a two dimensional array is in fact an array of arrays (well actually array references, but that does not change anything here). So in the inner loop, you should check against the size of the array that is stored in $arr[$i].

Try this.
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1;
$arr2[2][1] = 1;
foreach $inside_array (#arr2){
foreach $ele (#$inside_array){
print $ele,"\n";
}
}
Its always better to use foreach instead of for/while, this will eliminate any possibility of bugs. Especially with judging proper condition to exit the loop.

Related

Perl: scope and allocation of a named variable

Here is a little sample.
my %X = ();
for (my $i = 0; $i < 5; $i ++)
{
$X {$i} = [$i .. 4]; # the assignment: reference to an unnamed array
}
# this is just for output - you can ignore it
foreach (sort keys %X)
{
print "\n" . $_ . " = ";
foreach (#{$X {$_}})
{
print $_;
}
}
The output is like expected.
0 = 01234
1 = 1234
2 = 234
3 = 34
4 = 4
If I use a local variable for the assignment it will produce the same output - thats ok!
The memory for the list is always reallocated and not overwritten because #l is always new. There is still a reference to it in %X so no release is possible(or however the memory-managment in perl is working - I dont know).
for (my $i = 0; $i < 5; $i ++)
{
my #l = ($i .. 4); # inside
$X {$i} = \#l;
}
But can I produce the same output from above with using an outside variable?
Is that possible with some allocation trick - like to give it a new memory but not garbage the old one?
my %X = ();
my #l; # outside
for (my $i = 0; $i < 5; $i ++)
{
#l = ($i .. 4);
$X {$i} = \#l;
}
All hash-elements now the the content of the last loop.
0 = 4
1 = 4
2 = 4
3 = 4
4 = 4
Is it possible to get the output from the beginning with the outer variable?
No, it's not possible for each value of %X to be a reference to a different array, while at the same time all being a reference to the same array.
If you want each value of %X to be a reference to a same array, go ahead an allocate a single array outside of the loop.
If you want each value of %X to be a reference to a different array, you'll need to allocate a new array for each pass through the loop. This can be a named one (created using my), or an anonymous one (created using [ ]).
If you simply wanted to use the values within the outside #l so that every referenced array initially has the same value, you could use
my #a = #l;
$X{$i} = \#l;
or
$X{$i} = [ #l ];

Perl - Determinant of Matrix Containing Variables

I have a Perl program containing the following methods:
det To find the determinant of a matrix.
identityMatrix Return an n by n identity matrix (1s on the main diagnal, rest 0s).
matrixAdd To add two matrices together.
matrixScalarMultiply To multiply an integer by a matrix.
I can easily find the determinant of, for example, a matrix A - I where
(It is 0)
But what if I want to find the determinant of A - RI?
In this case, I want my program to solve for the characteristic polynomial solution (a.k.a. the determinant containing variables) along these lines instead of an integer value:
Any suggestions on how to handle this? Code below:
#!/usr/bin/perl
#perl Solver.pl
use strict;
use warnings;
### solve the characteristic polynomial det(A - RI)
my #A = ( # 3x3, det = -3
[1, 3, -3],
[1, 0, 0],
[0, 1, 0],
);
# test_matrix = A - I
my $test_matrix = matrixAdd( \#A,
matrixScalarMultiply( identityMatrix(3), -1 ) );
print "\nTest:\n";
for( my $i = 0; $i <= $#$test_matrix; $i++ ){
print "[";
for( my $j = 0; $j <= $#$test_matrix; $j++ ){
$j == $#$test_matrix ? print $test_matrix->[$i][$j], "]\n" :
print $test_matrix->[$i][$j], ", ";
}
}
my $dd = det ($test_matrix);
print "det = $dd \n";
# recursively find determinant of a real square matrix
# only call on n by n matrices where n >= 2
#
# arg0 = matrix reference
sub det{
my ($A) = #_;
#base: 2x2 matrix
if( $#$A + 1 == 2 ){ #recall $#$A == last index of A
return $A->[0][0]*$A->[1][1] - $A->[1][0]*$A->[0][1];
}
#cofactor expansion for matrices > 2x2
my $answer = 0;
for( my $col = 0; $col <= $#$A; $col++ ){
my $m = (); #sub matrix
my $multiplier = $A->[0][$col];
if( $col % 2 == 1 ){ #+, -, +, -, ...
$multiplier *= -1;
}
for( my $i = 1; $i <= $#$A; $i++ ){
#j is indexer for A, k for m
for( my ($j, $k) = (0, 0); $j <= $#$A; $j++ ){
$m->[$i-1][$k++] = $A->[$i][$j] unless $j == $col;
}
}
$answer += $multiplier*det( $m );
}#end cofactor expansion
return $answer;
}#end det()
# return reference to an n by n identity matrix
# can do this in Perl!
#
# arg0 = dimension 'n'
sub identityMatrix{
my $n = shift;
my #ret;
for (my $i = 0; $i < $n; $i++ ){
for (my $j = 0; $j < $n; $j++ ){
$ret[$i][$j] = $i == $j ? 1 : 0;
}
}
return \#ret;
}
# return reference to an n by n matrix which is the sum
# of two different n by n matrices, "a" and "b"
#
# arg0, 1 = references to the pair of matrices to add
sub matrixAdd{
my #ret;
my ($a, $b) = ($_[0], $_[1]);
for (my $i = 0; $i <= $#$a; $i++ ){
for (my $j = 0; $j <= $#$a; $j++ ){
$ret[$i][$j] = $a->[$i][$j] + $b->[$i][$j];
}
}
return \#ret;
}
# return reference to a matrix multiplied by a given scalar
#
# arg0 = reference to matrix
# arg1 = scalar to multiply by
sub matrixScalarMultiply{
my #ret;
my ($a, $multiplier) = ($_[0], $_[1]);
for (my $i = 0; $i <= $#$a; $i++ ){
for (my $j = 0; $j <= $#$a; $j++ ){
$ret[$i][$j] = $a->[$i][$j] * $multiplier;
}
}
return \#ret;
}
This is called symbolic math and is in the wheelhouse of tools like Mathematica. For Perl, there are packages like Math::Synbolic but I couldn't tell you how easy they are to use.
On the other hand, if you are just interested in what values of R have a determinant of zero and not that interested in what the characteristic polynomial looks like, then you are looking for the eigenvalues of A. There are some Perl libraries for that, too.

Perl concatenate name of an array with an existing value

The question is like this:
I have a loop. And while I iterate this loop I want to create a number of arrays with the following names: array1 array2 array3...
I am wondering if there is a way to concatenate these names in perl
I tried something like this but I get an error
$i = 0;
while ($i <= 5) {
#array . $i = ();
$i++;
}
Yes, you can do this, but no, you should not do this.
What you should do instead is use an array of references to anonymous arrays:
#arrayrefs = ();
$i = 0;
while ($i <= 5) {
$arrayrefs[$i] = [];
$i++;
}
or, more tersely:
#arrayrefs = ([], [], [], [], [], []);
But for completeness' sake . . . you can do this, by using "symbolic references":
$i = 0;
while ($i <= 5) {
my $name = "array$i";
#$name = ();
$i++;
}
(of course, arrays default to the empty array anyway, so this isn't really needed . . .).
By the way, note that it's actually customary to use a for loop rather than a while loop for such simple cases. Either this:
for ($i = 0; $i <= 5; $i++) {
...
}
or this:
for $i (0 .. 5) {
...
}
You want to use hash,
use strict;
use warnings;
my %hash;
for my $i (1 .. 5) {
$hash{ "array$i" } = [];
}
Long story short: Why it's stupid to use a variable as a variable name

Using perl, given an array of any size, how do I randomly pick 1/4 of the list

For clarification, if I had a list of 8 elements, i would want to randomly pick 2. If I had a list of 20 elements, I would want to randomly pick 5. I would also like to assure (though not needed) that two elements don't touch, i.e. if possible not the 3 and then 4 element. Rather, 3 and 5 would be nicer.
The simplest solution:
Shuffle the list
select the 1st quarter.
Example implementation:
use List::Util qw/shuffle/;
my #nums = 1..20;
my #pick = (shuffle #nums)[0 .. 0.25 * $#nums];
say "#pick";
Example output: 10 2 18 3 19.
Your additional restriction “no neighboring numbers” actually makes this less random, and should be avoided if you want actual randomness. To avoid that two neighboring elements are included in the output, I would iteratively splice unwanted elements out of the list:
my #nums = 1..20;
my $size = 0.25 * #nums;
my #pick;
while (#pick < $size) {
my $i = int rand #nums;
push #pick, my $num = $nums[$i];
# check and remove neighbours
my $len = 1;
$len++ if $i < $#nums and $num + 1 == $nums[$i + 1];
$len++, $i-- if 0 < $i and $num - 1 == $nums[$i - 1];
splice #nums, $i, $len;
}
say "#pick";
use strict;
use warnings;
sub randsel {
my ($fact, $i, #r) = (1.0, 0);
while (#r * 4 < #_) {
if (not grep { $_ == $i } #r) {
$fact = 1.0;
# make $fact = 0.0 if you really don't want
# consecutive elements
$fact = 0.1 if grep { abs($i - $_) == 1 } #r;
push(#r, $i) if (rand() < 0.25 * $fact);
}
$i = ($i + 1) % #_;
}
return map { $_[$_] } sort { $a <=> $b } #r;
}
my #l;
$l[$_] = $_ for (0..19);
print join(" ", randsel(#l)), "\n";

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100