Perl: scope and allocation of a named variable - perl

Here is a little sample.
my %X = ();
for (my $i = 0; $i < 5; $i ++)
{
$X {$i} = [$i .. 4]; # the assignment: reference to an unnamed array
}
# this is just for output - you can ignore it
foreach (sort keys %X)
{
print "\n" . $_ . " = ";
foreach (#{$X {$_}})
{
print $_;
}
}
The output is like expected.
0 = 01234
1 = 1234
2 = 234
3 = 34
4 = 4
If I use a local variable for the assignment it will produce the same output - thats ok!
The memory for the list is always reallocated and not overwritten because #l is always new. There is still a reference to it in %X so no release is possible(or however the memory-managment in perl is working - I dont know).
for (my $i = 0; $i < 5; $i ++)
{
my #l = ($i .. 4); # inside
$X {$i} = \#l;
}
But can I produce the same output from above with using an outside variable?
Is that possible with some allocation trick - like to give it a new memory but not garbage the old one?
my %X = ();
my #l; # outside
for (my $i = 0; $i < 5; $i ++)
{
#l = ($i .. 4);
$X {$i} = \#l;
}
All hash-elements now the the content of the last loop.
0 = 4
1 = 4
2 = 4
3 = 4
4 = 4
Is it possible to get the output from the beginning with the outer variable?

No, it's not possible for each value of %X to be a reference to a different array, while at the same time all being a reference to the same array.
If you want each value of %X to be a reference to a same array, go ahead an allocate a single array outside of the loop.
If you want each value of %X to be a reference to a different array, you'll need to allocate a new array for each pass through the loop. This can be a named one (created using my), or an anonymous one (created using [ ]).
If you simply wanted to use the values within the outside #l so that every referenced array initially has the same value, you could use
my #a = #l;
$X{$i} = \#l;
or
$X{$i} = [ #l ];

Related

Scoping in Perl

As a biology student, I'm trying to extend my programming knowledge and I ran into a problem with Perl.
I'm trying to create a program that generates random DNA strings and performs analysis work on the generated data.
In the first part of the program, I am able to print out the strings stored in the array, but the second part I cannot retrieve all but one of the elements of the array.
Could this be part of the scoping rules of Perl?
#!usr/bin/perl
# generate a random DNA strings and print it to file specified by the user.
$largearray[0] = 0;
print "How many nucleotides for the string?\n";
$n = <>;
$mylong = $n;
print "how many strings?\n";
$numstrings = <>;
# #largearray =();
$j = 0;
while ( $j < $numstrings ) {
$numstring = ''; # start with the empty string;
$dnastring = '';
$i = 0;
while ( $i < $n ) {
$numstring = int( rand( 4 ) ) . $numstring; # generate a new random integer
# between 0 and 3, and concatenate
# it with the existing $numstring,
# assigning the result to $numstring.
$i++; # increase the value of $i by one.
}
$dnastring = $numstring;
$dnastring =~ tr/0123/actg/; # translate the numbers to DNA characters.
#print $dnastring;
#print "\n";
$largearray[j] = $dnastring; #append generated string to end of array
#print $largearray[j];
#print $j;
#IN HERE THERE ARE GOOD ARRAY VALUES
#print "\n";
$j++;
}
# ii will be used to continuously take the next couple of strings from largearray
# for LCS matching.
$mytotal = 0;
$ii = 0;
while ( $ii < $numstrings ) {
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
print "\n";
$ii++;
#string1 = split( //, $line );
$line = $largearray[ii];
#print $largearray[ii];
#print "\n";
$ii++;
chomp $line;
#string2 = split( //, $line );
$n = #string1; #assigning a list to a scalar just assigns the
#number of elements in the list to the scalar.
$m = #string2;
$v = 1;
$Cm = 0;
$Im = 0;
$V[0][0] = 0; # Assign the 0,0 entry of the V matrix
for ( $i = 1; $i <= $n; $i++ ) { # Assign the column 0 values and print
# String 1 See section 5.2 of Johnson
# for loops
$V[$i][0] = -$Im * $i;
}
for ( $j = 1; $j <= $m; $j++ ) { # Assign the row 0 values and print String 2
$V[0][$j] = -$Im * $j;
}
for ( $i = 1; $i <= $n; $i++ ) { # follow the recurrences to fill in the V matrix.
for ( $j = 1; $j <= $m; $j++ ) {
# print OUT "$string1[$i-1], $string2[$j-1]\n"; # This is here for debugging purposes.
if ( $string1[ $i - 1 ] eq $string2[ $j - 1 ] ) {
$t = 1 * $v;
}
else {
$t = -1 * $Cm;
}
$max = $V[ $i - 1 ][ $j - 1 ] + $t;
# print OUT "For $i, $j, t is $t \n"; # Another debugging line.
if ( $max < $V[$i][ $j - 1 ] - 1 * $Im ) {
$max = $V[$i][ $j - 1 ] - 1 * $Im;
}
if ( $V[ $i - 1 ][$j] - 1 * $Im > $max ) {
$max = $V[ $i - 1 ][$j] - 1 * $Im;
}
$V[$i][$j] = $max;
}
} #outer for loop
print $V[$n][$m];
$mytotal += $V[$n][$m]; # append current result to the grand total
print "\n";
} # end while loop
print "the average LCS value for length ", $mylong, " strings is: ";
print $mytotal/ $numstrings;
This isn't a scoping issue. You have declared none of your variables, which has the effect of implicitly making them all global and accessible everywhere in your code
I reformatted your Perl program so that I could read it, and then added this to the top of your program
use strict;
use warnings 'all';
which are essential in every Perl program you write
Then I added
no strict 'vars';
which is a very bad idea, and lets you get away without declaring any variables
The result is this
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 60.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 61.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 67.
Argument "j" isn't numeric in array element at E:\Perl\source\dna.pl line 42.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 60.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 61.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 67.
Bareword "j" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 42.
Execution of E:\Perl\source\dna.pl aborted due to compilation errors.
Line 42 (of my reformatted version) is
$largearray[j] = $dnastring
and lines 60, 61 and 67 are
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
and
$line = $largearray[ii];
You are using j and ii as array indexes. Those are Perl subroutine calls, not variables. Adding use strict would have stopped this from compiling unless you had also declared sub ii and sub j
You might get away with it if you just change j and ii to $j and $ii, but you are certain to get into further problems
Please make the same changes to your own code, and declare every variable that you need using my as close as possible to the first place they are used
You should also improve your variable naming. Things like #largearray are pointless: the # says that it's an array, and whether it's large or not is relative, and of little use in understanding your code. If you have no better description of its purpose then #table or #data are probably a little better
Likewise, please avoid capital letters and most single-letter names. #V, $Cm and $Im are meaningless, and you would need fewer comments if those names were better
You certainly wouldn't need comments like # end while loop and # outer for loop if you had indented your blocks properly and kept them short enough so that both the beginning and the end can be seen on the screen at the same time, and the fewer comments you can get away with the better, because they badly clutter the code structure
Finally, it's worth noting that the C-style for loop is rarely the best choice in Perl. Your
for ( $i = 1; $i <= $n; $i++ ) { ... }
is much clearer as
for my $i ( 1 .. $n ) { ... }
and declaring the control variable at that point makes it unnecessary to invent new names like $ii for each new loop
I think you have a typo in your code:
ii => must be $ii
don't forget to put this at the beginning of your code:
use strict;
use warnings;
in order to avoid this (and others) kind of errors

Variable scopes in coderefs if perl, need an explanation of strange behavior

Why are the values of $copy_of_i's returned by coderefs in the #coderefs the same?
use Modern::Perl;
my #coderefs = ();
for (my $i = 0; $i < 5; $i++){
push #coderefs, sub {
my $copy_of_i = $i;
return $copy_of_i;
};
}
say $coderefs[1]->();
say $coderefs[3]->();
I thought the $copy_of_i would be local for every coderef added to #coderefs and thus contain the current value of $i assigned to the $copy_of_i at the given iteration of the loop. But if we display the values of a couple of $copi_of_i's with 'say' we'll see that they have the same values as if the $copy_of_i wasn't local for every newly created coderef. Why?
You want to have different values associated with the closures, yet you only have the single variable $i for all the closures to capture. You need to create a variable for each closure to capture, so $copy_of_i should be created outside of the closure. Creating a copy when you call the closure is far too late; $i no longer contains the desired value at that point.
for (my $i = 0; $i < 5; $i++){
my $copy_of_i = $i;
push #coderefs, sub {
return $copy_of_i;
};
}
By the way, for my $i (0 .. 5) is preferred over for (my $i = 0; $i < 5; $i++), and it has the advantage of creating a new variable for each iteration of the loop, so you can simply use
my #coderefs;
for my $i (0 .. 4) {
push #coderefs, sub {
return $i;
};
}

For loop help in perl

I am writing perl script and I have little question regarding for loop limit.
Let say I have two arrays, arr1 has serial numbers and arr2 is two dimensional array, the first dimension is the serial number [same as arr1] and the second dimension is the contents of that serial number , Now I want to apply the for loop for this two dimension array but I am confused at the limit . Till now I have this code
Example : I have Three serial numbers , 1 ,2 ,3 . Serial 1 has 2 contents 1,5 . Serial 2 has 1 content i.e 1. Serial 3 has two contents 1,1.
#arr1 = (1,2,3)
$arr2[0][0] = 1
$arr2[0][1] = 5
$arr2[1][0] = 1
$arr2[2][1] = 1
$arr2[2][2] = 1
Note: As you can see the contents of arr2 has arr1 elements in 1st columns and the contents in the second columns.
for (my $i = 0; $i <= $#arr1; $i++) {
print( "The First Serial number has:" );
for (my $j = 0; $j <= $#arr2; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Thanks, Sorry for the bad explaination
Why don't do this like that :
#!/usr/bin/perl
use strict;
my #arr;
$arr[0][0] = 1;
$arr[0][1] = 5;
$arr[1][0] = 1;
$arr[2][1] = 1;
$arr[2][2] = 1;
my ($i, $j);
foreach $i (#arr) {
foreach $j (#{$i}) {
print $j."\n" if($j);
}
}
1;
__END__
Fixed code:
use strict;
use warnings;
my #arr1 = (1,2,3);
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1; # original code had
$arr2[2][1] = 1; # these indexes wrong
for (my $i = 0; $i <= $#arr1; $i++) {
print( "Serial number $arr1[$i] has:" );
for (my $j = 0; $j <= $#{ $arr2[$i] }; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Note the use of $#{ arrayref }; see http://perlmonks.org/?node=References+quick+reference
you can put #arr2 like this and it would be much easier for you to understand #arr2
use strict;
use warnings;
my #arr1 = (1, 2, 3);
my #arr2 = ([1, 5], [1], [1, 1]);
for my $first(#arr1) {
for my $second (#{$arr2[$first-1]}) {
print $second."\n";
}
}
Here is a version without the first array.
for (my $i = 0; $i<= $#arr; $i++)
{
print "INDEX $i\n";
for (my $j = 0; $j <= $#{$arr[$i]}; $j++)
{
print "${arr[$i][$j]}\n";
}
}
The point here is that a two dimensional array is in fact an array of arrays (well actually array references, but that does not change anything here). So in the inner loop, you should check against the size of the array that is stored in $arr[$i].
Try this.
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1;
$arr2[2][1] = 1;
foreach $inside_array (#arr2){
foreach $ele (#$inside_array){
print $ele,"\n";
}
}
Its always better to use foreach instead of for/while, this will eliminate any possibility of bugs. Especially with judging proper condition to exit the loop.

Concatenate scalar with array name

I am trying to concatenate a scalar with array name but not sure how to do.
Lets say we have two for loops (one nested inside other) like
for ($i = 0; $i <= 5; $i++) {
for ($k = 0; $k <=5; $k++) {
$array[$k] = $k;
}
}
I want to create 5 arrays with names like #array1, #array2, #array3 etc. The numeric at end of each array represents value of $i when array creation in progress.
Is there a way to do it?
Thanks
If you mean to create actual variables, for one thing, its a bad idea, and for another, there is no point. You can simply access a variable without creating or declaring it. Its a bad idea because it is what a hash does, exactly, and with none of the drawbacks.
my %hash;
$hash{array1} = [ 1, 2, 3 ];
There, now you have created an array. To access it, do:
print #{ $hash{array1} };
The hash keys (names) can be created dynamically, just like you want, so it is easy to create 5 different names and assign values to them.
for my $i (0 .. 5) {
push #{ $hash{"array$i"} }, "foo";
}
You need to add {} and "" to characters, when they are used as variable or array/hash name.
Try this:
for ($i = 0; $i <= 5; $i++){
for ($k = 0; $k <=5; $k++){
${"array$k"}[$k] = $k;
}
}
print "array5[4] = $array5[4]
array5[5] = $array5[5]\n";
array5[4] =
array5[5] = 5

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100