Perl concatenate name of an array with an existing value - perl

The question is like this:
I have a loop. And while I iterate this loop I want to create a number of arrays with the following names: array1 array2 array3...
I am wondering if there is a way to concatenate these names in perl
I tried something like this but I get an error
$i = 0;
while ($i <= 5) {
#array . $i = ();
$i++;
}

Yes, you can do this, but no, you should not do this.
What you should do instead is use an array of references to anonymous arrays:
#arrayrefs = ();
$i = 0;
while ($i <= 5) {
$arrayrefs[$i] = [];
$i++;
}
or, more tersely:
#arrayrefs = ([], [], [], [], [], []);
But for completeness' sake . . . you can do this, by using "symbolic references":
$i = 0;
while ($i <= 5) {
my $name = "array$i";
#$name = ();
$i++;
}
(of course, arrays default to the empty array anyway, so this isn't really needed . . .).
By the way, note that it's actually customary to use a for loop rather than a while loop for such simple cases. Either this:
for ($i = 0; $i <= 5; $i++) {
...
}
or this:
for $i (0 .. 5) {
...
}

You want to use hash,
use strict;
use warnings;
my %hash;
for my $i (1 .. 5) {
$hash{ "array$i" } = [];
}
Long story short: Why it's stupid to use a variable as a variable name

Related

Can I use the size of an array without having to place it in a variable

Currently I am doing this to read the individual contents of an array
my $size = #words;
for(my $x = 0; $x < $size, $x++)
{
print $words[$x];
}
Is there away to skip the $size assignment? A way to cast the array and have one less line?
i.e.
for(my $x = 0; $x < $(#word), $x++)
{
print $words[$x];
}
Can't seem to find the right syntax.
Thanks
Replace
for (my $i = 0; $i < $(#words), $i++) { ... $words[$i] ... }
with
for (my $i = 0; $i < #words; $i++) { ... $words[$i] ... }
Just like in your assignment, an array evaluated in scalar context produces its size.
That said, using a C-style loop is complex and wasteful.
A better solution if you need the index:
for my $i (0..$#words) { ... $words[$i] ... }
A better solution if you don't need the index:
for my $word (#words) { ... $word ... }
Yes, for has a built in array iterator and for and foreach are synonyms.
for my $word (#words) {
print $word;
}
This is the preferred way to iterate through arrays in Perl. C style 3 statement for-loops are discouraged unless necessary. They're harder to read and lead to bugs, like this one.
for(my $x = 0; $x < $size, $x++)
^
should be a ;
Better to use foreach, but to your specific question, #foo in scalar context resolves to the length of the array, and $#foo resolves to the index of the last element:
foreach my $word (#words) { ... } # preferred
for(my $i = 0; $i < #words; ++$i) { my $word = $words[$i]; ... } # ok sometimes
for(my $i = 0; $i <= $#words; ++$i) { my $word = $words[$i]; ... } # same thing
(assuming that you haven't played with $[, which you shouldn't do.)
The syntax that you are searching for is actually no syntax at all. If you use an array variable anywhere where Perl knows you should be using a scalar value (like as an operand to a comparison operator) then Perl gives you the number of elements in the array.
So, based on your example, this will work:
# Note: I've corrected a syntax error here.
# I replaced a comma with a semicolon
for (my $x = 0; $x < #words; $x++)
{
print $words[$x];
}
But there are several ways that we can improve this. Firstly, let's get rid of the ugly and potentially confusing C-style for loop and replace it with a far easier to understand foreach.
foreach my $x (0 .. #words - 1)
{
print $words[$x];
}
We can also improve on that #words - 1. Instead, we can use $#words which gives the final index in the array #words.
foreach my $x (0 .. $#words)
{
print $words[$x];
}
Finally, we don't really need the index number here as we're just using it to access each element of the array in turn. Far better to iterate over the elements of the array rather than the indexes.
foreach my $element (#words)
{
print $element;
}

Perl: scope and allocation of a named variable

Here is a little sample.
my %X = ();
for (my $i = 0; $i < 5; $i ++)
{
$X {$i} = [$i .. 4]; # the assignment: reference to an unnamed array
}
# this is just for output - you can ignore it
foreach (sort keys %X)
{
print "\n" . $_ . " = ";
foreach (#{$X {$_}})
{
print $_;
}
}
The output is like expected.
0 = 01234
1 = 1234
2 = 234
3 = 34
4 = 4
If I use a local variable for the assignment it will produce the same output - thats ok!
The memory for the list is always reallocated and not overwritten because #l is always new. There is still a reference to it in %X so no release is possible(or however the memory-managment in perl is working - I dont know).
for (my $i = 0; $i < 5; $i ++)
{
my #l = ($i .. 4); # inside
$X {$i} = \#l;
}
But can I produce the same output from above with using an outside variable?
Is that possible with some allocation trick - like to give it a new memory but not garbage the old one?
my %X = ();
my #l; # outside
for (my $i = 0; $i < 5; $i ++)
{
#l = ($i .. 4);
$X {$i} = \#l;
}
All hash-elements now the the content of the last loop.
0 = 4
1 = 4
2 = 4
3 = 4
4 = 4
Is it possible to get the output from the beginning with the outer variable?
No, it's not possible for each value of %X to be a reference to a different array, while at the same time all being a reference to the same array.
If you want each value of %X to be a reference to a same array, go ahead an allocate a single array outside of the loop.
If you want each value of %X to be a reference to a different array, you'll need to allocate a new array for each pass through the loop. This can be a named one (created using my), or an anonymous one (created using [ ]).
If you simply wanted to use the values within the outside #l so that every referenced array initially has the same value, you could use
my #a = #l;
$X{$i} = \#l;
or
$X{$i} = [ #l ];

For loop help in perl

I am writing perl script and I have little question regarding for loop limit.
Let say I have two arrays, arr1 has serial numbers and arr2 is two dimensional array, the first dimension is the serial number [same as arr1] and the second dimension is the contents of that serial number , Now I want to apply the for loop for this two dimension array but I am confused at the limit . Till now I have this code
Example : I have Three serial numbers , 1 ,2 ,3 . Serial 1 has 2 contents 1,5 . Serial 2 has 1 content i.e 1. Serial 3 has two contents 1,1.
#arr1 = (1,2,3)
$arr2[0][0] = 1
$arr2[0][1] = 5
$arr2[1][0] = 1
$arr2[2][1] = 1
$arr2[2][2] = 1
Note: As you can see the contents of arr2 has arr1 elements in 1st columns and the contents in the second columns.
for (my $i = 0; $i <= $#arr1; $i++) {
print( "The First Serial number has:" );
for (my $j = 0; $j <= $#arr2; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Thanks, Sorry for the bad explaination
Why don't do this like that :
#!/usr/bin/perl
use strict;
my #arr;
$arr[0][0] = 1;
$arr[0][1] = 5;
$arr[1][0] = 1;
$arr[2][1] = 1;
$arr[2][2] = 1;
my ($i, $j);
foreach $i (#arr) {
foreach $j (#{$i}) {
print $j."\n" if($j);
}
}
1;
__END__
Fixed code:
use strict;
use warnings;
my #arr1 = (1,2,3);
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1; # original code had
$arr2[2][1] = 1; # these indexes wrong
for (my $i = 0; $i <= $#arr1; $i++) {
print( "Serial number $arr1[$i] has:" );
for (my $j = 0; $j <= $#{ $arr2[$i] }; $j++) {
print( "$arr2[$i][$j]\n" );
}
}
Note the use of $#{ arrayref }; see http://perlmonks.org/?node=References+quick+reference
you can put #arr2 like this and it would be much easier for you to understand #arr2
use strict;
use warnings;
my #arr1 = (1, 2, 3);
my #arr2 = ([1, 5], [1], [1, 1]);
for my $first(#arr1) {
for my $second (#{$arr2[$first-1]}) {
print $second."\n";
}
}
Here is a version without the first array.
for (my $i = 0; $i<= $#arr; $i++)
{
print "INDEX $i\n";
for (my $j = 0; $j <= $#{$arr[$i]}; $j++)
{
print "${arr[$i][$j]}\n";
}
}
The point here is that a two dimensional array is in fact an array of arrays (well actually array references, but that does not change anything here). So in the inner loop, you should check against the size of the array that is stored in $arr[$i].
Try this.
my #arr2;
$arr2[0][0] = 1;
$arr2[0][1] = 5;
$arr2[1][0] = 1;
$arr2[2][0] = 1;
$arr2[2][1] = 1;
foreach $inside_array (#arr2){
foreach $ele (#$inside_array){
print $ele,"\n";
}
}
Its always better to use foreach instead of for/while, this will eliminate any possibility of bugs. Especially with judging proper condition to exit the loop.

Concatenate scalar with array name

I am trying to concatenate a scalar with array name but not sure how to do.
Lets say we have two for loops (one nested inside other) like
for ($i = 0; $i <= 5; $i++) {
for ($k = 0; $k <=5; $k++) {
$array[$k] = $k;
}
}
I want to create 5 arrays with names like #array1, #array2, #array3 etc. The numeric at end of each array represents value of $i when array creation in progress.
Is there a way to do it?
Thanks
If you mean to create actual variables, for one thing, its a bad idea, and for another, there is no point. You can simply access a variable without creating or declaring it. Its a bad idea because it is what a hash does, exactly, and with none of the drawbacks.
my %hash;
$hash{array1} = [ 1, 2, 3 ];
There, now you have created an array. To access it, do:
print #{ $hash{array1} };
The hash keys (names) can be created dynamically, just like you want, so it is easy to create 5 different names and assign values to them.
for my $i (0 .. 5) {
push #{ $hash{"array$i"} }, "foo";
}
You need to add {} and "" to characters, when they are used as variable or array/hash name.
Try this:
for ($i = 0; $i <= 5; $i++){
for ($k = 0; $k <=5; $k++){
${"array$k"}[$k] = $k;
}
}
print "array5[4] = $array5[4]
array5[5] = $array5[5]\n";
array5[4] =
array5[5] = 5

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100