Related
When looping through two arrays, I am confused about how to move the pointer through one loop but keep it constant in another. So for example:
Array 1: A T C G T C G A G C G
Array 2: A C G T C C T G T C G
So A in the first array matches A in the second array so we move on to next elements. But since the T doesn't match the C in the 2nd index, I want the program to compare that T to the next G in array 2 and so on until it finds the matching T.
my ($array1ref, $array2ref) = #_;
my #array1 = #$array1ref;
my #array2= #$array2ref;
my $count = 0;
foreach my $element (#array1) {
foreach my $element2 (#array2) {
if ($element eq $element2) {
$count++;
}else { ???????????
}
You could use a while loop to search for matches. If you find a match, advance in both arrays. If you don't, advance the second array. At the end you could print the remaining unmatched characters from the first array:
# [1, 2, 3] is a reference to an anonymous array (1, 2, 3)
# qw(1, 2, 3) is shorthand quoted-word for ('1', '2', '3')
my $arr1 = [qw(A T C G T C G A G C G)];
my $arr2 = [qw(A C G T C C T G T C G)];
my $idx1 = 0;
my $idx2 = 0;
# Find matched characters
# #$arr_ref is the size of the array referenced by $arr_ref
while ($idx1 < #$arr1 && $idx2 < #$arr2) {
my $char1 = $arr1->[$idx1];
my $char2 = $arr2->[$idx2];
if ($char1 eq $char2) {
# Matched character, advance arr1 and arr2
printf("%s %s -- arr1[%d] matches arr2[%d]\n", $char1, $char2, $idx1, $idx2);
++$idx1;
++$idx2;
} else {
# Unmatched character, advance arr2
printf(". %s -- skipping arr2[%d]\n", $char2, $idx2);
++$idx2;
}
}
# Remaining unmatched characters
while ($idx1 < #$arr1) {
my $char1 = $arr1->[$idx1];
printf("%s . -- arr1[%d] is beyond the end of arr2\n", $char1, $idx1);
$idx1++;
}
The script prints:
A A -- arr1[0] matches arr2[0]
. C -- skipping arr2[1]
. G -- skipping arr2[2]
T T -- arr1[1] matches arr2[3]
C C -- arr1[2] matches arr2[4]
. C -- skipping arr2[5]
. T -- skipping arr2[6]
G G -- arr1[3] matches arr2[7]
T T -- arr1[4] matches arr2[8]
C C -- arr1[5] matches arr2[9]
G G -- arr1[6] matches arr2[10]
A . -- arr1[7] is beyond the end of arr2
G . -- arr1[8] is beyond the end of arr2
C . -- arr1[9] is beyond the end of arr2
G . -- arr1[10] is beyond the end of arr2
Nested loops makes no sense. You don't want to loop over anything more than once.
You didn't specify what you wanted to happen after you resync, so you'll want to start with the following and tailor it to your needs.
my ($array1, $array2) = #_;
my $idx1 = 0;
my $idx2 = 0;
while ($idx1 < #$array1 && $idx2 < #$array2) {
if ($array1->[$idx1] eq $array2->[$idx2]) {
++$idx1;
++$idx2;
} else {
++$idx2;
}
}
...
As is, the above snippet will leave $idx1 at the last index it couldn't (eventually) resync. If instead you want to stop as soon as you first resync, you want
my ($array1, $array2) = #_;
my $idx1 = 0;
my $idx2 = 0;
my $mismatch = 0;
while ($idx1 < #$array1 && $idx2 < #$array2) {
if ($array1->[$idx1] eq $array2->[$idx2]) {
last if $mismatched;
++$idx1;
++$idx2;
} else {
++$mismatched;
++$idx2;
}
}
...
The foreach loops won't cut it: We'll either want to loop while there are available elements in both arrays, or iterate through all indices, which we can increment as we like:
EL1: while (defined(my $el1 = shift #array1) and #array2) {
EL2: while(defined(my $el2 = shift #array2)) {
++$count and next EL1 if $el1 eq $el2; # break out of inner loop
}
}
or
my $j = 0; # index of #array2
for (my $i = 0; $i <= $#array1; $i++) {
$j++ until $j > $#array or $array1[$i] eq $array2[$j];
last if $j > $#array;
$count++;
}
or any combination.
This is to complex a condition for for loops use while loops instead
my ($array1ref, $array2ref) = #_;
my #array1 = #$array1ref;
my #array2= #$array2ref;
my $count = 0;
my ($index, $index2) = (0,0);
#loop while indexs are in arrays
while($index <= ##array1 && $index2 <= ##array2) {
if($array1[$index] eq $array2[$index2]) {
$index++;
$index2++;
} else {
#increment index until we find a match
$index2++ until $array1[$index] eq $array2[$index2];
}
}
Here is one possibility. It will use indexes to go through both lists.
my #array1 = qw(A T C G T C G A G C G);
my #array2 = qw(A C G T C C T G T C G);
my $count = 0;
my $idx1 = 0;
my $idx2 = 0;
while(($idx1 < scalar #array1) && ($idx2 < scalar #array2)) {
if($array1[$idx1] eq $array2[$idx2]) {
print "Match of $array1[$idx1] array1 \# $idx1 and array2 \# $idx2\n";
$idx1++;
$idx2++;
$count++;
} else {
$idx2++;
}
}
print "Count = $count\n";
It seems like you could do this pretty easily with a 'grep', if you're guaranteed that array2 is always as long as or longer than array1. Something like this:
sub align
{
my ($array1, $array2) = #_;
my $index = 0;
return grep
{
$array1->[$index] eq $array2->[$_] ? ++$index : 0
} 0 .. scalar( #$array2 ) - 1;
}
Basically, the grep is saying "Return me the list of increasing indices into array2 that match contiguous elements from array1."
If you run the above with this test code, you can see it returns the expected alignment array:
my #array1 = qw(A T C G T C G A G C G);
my #array2 = qw(A C G T C C T G T C G);
say join ",", align \#array1, \#array2;
This outputs the expected mapping:
0,3,4,7,8,9,10. That list means that #array1[0 .. 6] correspond to #array2[0,3,4,7,8,9,10].
(Note: You need to use Modern::Perl or similar to use say.)
Now, you haven't really said what you need the output of the operation to be. I've assumed you wanted this mapping array. If you just need a count of the number of elements skipped in #array2 while aligning it with #array1, you can still use the grep above, but instead of the list, just return scalar(#$array2) - $index at the end.
As you may know, your problem is called Sequence Alignment. There are well-developed algorithms to do this efficiently, and one such module Algorithm::NeedlemanWunsch is available on CPAN. Here's how you might apply it to your problem.
#!/usr/bin/perl
use Algorithm::NeedlemanWunsch;
my $arr1 = [qw(A T C G T C G A G C G)];
my $arr2 = [qw(A C G T C C T G T C G)];
my $matcher = Algorithm::NeedlemanWunsch->new(sub {#_==0 ? -1 : $_[0] eq $_[1] ? 1 : -2});
my (#align1, #align2);
my $result = $matcher->align($arr1, $arr2,
{
align => sub {unshift #align1, $arr1->[shift]; unshift #align2, $arr2->[shift]},
shift_a => sub {unshift #align1, $arr1->[shift]; unshift #align2, '.'},
shift_b => sub {unshift #align1, '.'; unshift #align2, $arr1->[shift]},
});
print join("", #align1), "\n";
print join("", #align2), "\n";
That prints out an optimal solution in terms of the costs we specified in the constructor:
ATCGT.C.GAGCG
A.CGTTCGG.TCG
A very different method from the one in your original question, but I think it's worth knowing about.
The program below should take an array and compress it so that there are no repeated products and add up the totals, so:
A B B C D A E F
100 30 50 60 100 50 20 90
Becomes:
A 150
B 80
C 60
D 100
E 20
F 90
The code below runs and works the way I want it to:
#! C:\strawberry\perl\bin
use strict;
use warnings;
my #firstarray = qw(A B B C D A E F);
my #secondarray = qw (100 30 50 60 100 50 20 90);
my #totalarray;
my %cleanarray;
my $i;
# creates the 2d array which holds variables retrieved from a file
#totalarray = ([#firstarray],[#secondarray]);
my $count = $#{$totalarray[0]};
# prints the array for error checking
for ($i = 0; $i <= $count; $i++) {
print "\n $i) $totalarray[0][$i]\t $totalarray[1][$i]\n";
}
# fills a hash with products (key) and their related totals (value)
for ($i = 0; $i <= $count; $i++) {
$cleanarray{ $totalarray[0][$i] } = $cleanarray{$totalarray[0][$i]} + $totalarray[1][$i];
}
# prints the hash
my $x = 1;
while (my( $k, $v )= each %cleanarray) {
print "$x) Product: $k Cost: $cleanarray{$k} \n";
$x++;
}
However before printing the hash it gives me the "Use of uninitialized value in addition (+)" error" six times. Being very new to Perl (this is my first Perl program outside of a text book), can someone tell me why this is happening? It seems like I have initialized everything...
It gives me compile errors in these lines:
my #cleanarray;
It is a hash.
my %cleanarray;
And here:
$cleanarray{ $totalarray[0][$i] } = $cleanarray{$totalarray[0][$i]} + totalarray[1][$i];
You missed the sigil of totalarray. It is $totalarray[1][$i]
The undefined message it is because $cleanarray{$totalarray[0][$i]} doesn't exists. Using the shorter:
$cleanarray{ $totalarray[0][$i] } += totalarray[1][$i];
will work without warnings.
You are using cleanarray as a hash but it is declared as an array
You may find you prefer this reorganization of your program.
use strict;
use warnings;
my #firstarray = qw (A B B C D A E F);
my #secondarray = qw (100 30 50 60 100 50 20 90);
# prints the data for error checking
for my $i (0 .. $#firstarray) {
printf "%d) %s %.2f\n", $i, $firstarray[$i], $secondarray[$i];
}
print "\n";
# fills a hash with products (key) and their related totals (value)
my %cleanarray;
for my $i (0 .. $#firstarray) {
$cleanarray{ $firstarray[$i] } += $secondarray[$i];
}
# prints the hash
my $n = 1;
for my $key (sort keys %cleanarray) {
printf "%d) Product: %s Cost: %.2f\n", $n++, $key, $cleanarray{$key};
}
I have two arrays:
#array1 = (A,B,C,D,E,F);
#array2 = (A,C,H,D,E,G);
The arrays could be of different size. I want to find how many mismatches are there between the arrays. The indexes should be the same. In this case there are three mismatch :b->c,c->h and F->G.(i.e , The 'C' in $array[2] should not be considered a match to 'C' in $array[1]) I would like to get the number of mismatches as well as the mismatch.
foreach my $a1 ( 0 .. $#array1) {
foreach my $a2( 0 .. $#array2)
if($array1[$a1] ne $array2[$a2]) {
}
}
}
my %array_one = map {$_, 1} #array1;
my #difference = grep {!$array_one {$_}} #array1;
print "#difference\n";
Ans: gives me H, G but not C.
with my little Perl knowledge I tried this, with no result. Could you suggest me how I should deal this? Your suggestions and pointers would be very helpful.
You shouldn't have nested loops. You only need to go through the indexes once.
use List::Util qw( max );
my #mismatches;
for my $i (0..max($#array1, $#array2)) {
push #mismatches, $i
if $i >= #array1
|| $i >= #array2
|| $array1[$i] ne $array2[$i];
}
}
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Since you mentioned grep, this is how you'd replace the for with grep:
use List::Util qw( max );
my #mismatches =
grep { $_ >= #array1
|| $_ >= #array2
|| array1[$_] ne $array2[$_] }
0 .. max($#array1, $#array2);
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Here's an example using each_arrayref from List::MoreUtils.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my #out;
my $iter = each_arrayref(#_);
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}
This version should be faster if you are going to use this for determining the number of differences often. The output is exactly the same. It just doesn't have to work as hard when returning a number.
Read about wantarray for more information.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my $iter = each_arrayref(#_);
if( wantarray ){
# return structure
my #out;
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}else{
# only return a count of differences
my $out = 0;
while( my #current = $iter->() ){
++$out unless all_same #current;
}
return $out;
}
}
diff_array uses the subroutine all_same to determine if all of the current list of elements are the same.
sub all_same{
my $head = shift;
return undef unless #_; # not enough arguments
for( #_ ){
return 0 if $_ ne $head; # at least one mismatch
}
return 1; # all are the same
}
To get just the number of differences:
print scalar diff_array \#array1, \#array2;
my $count = diff_array \#array1, \#array2;
To get a list of differences:
my #list = diff_array \#array1, \#array2;
To get both:
my $count = my #list = diff_array \#array1, \#array2;
The output for the input you provided:
(
[ 1, 'B', 'C' ],
[ 2, 'C', 'H' ],
[ 5, 'F', 'G' ]
)
Example usage
my #a1 = qw'A B C D E F';
my #a2 = qw'A C H D E G';
my $count = my #list = diff_array \#a1, \#a2;
print "There were $count differences\n\n";
for my $group (#list){
my $index = shift #$group;
print " At index $index\n";
print " $_\n" for #$group;
print "\n";
}
You're iterating over both arrays when you don't want to be doing so.
#array1 = ("A","B","C","D","E","F");
#array2 = ("A","C","H","D","E","G");
foreach my $index (0 .. $#array1) {
if ($array1[$index] ne $array2[$index]) {
print "Arrays differ at index $index: $array1[$index] and $array2[$index]\n";
}
}
Output:
Arrays differ at index 1: B and C
Arrays differ at index 2: C and H
Arrays differ at index 5: F and G
Well, first, you're going to want to go over each element of one of the arrays, and compare it to the same element of the other array. List::MoreUtils provides an easy way to do this:
use v5.14;
use List::MoreUtils qw(each_array);
my #a = qw(a b c d);
my #b = qw(1 2 3);
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
say "a = $a, b = $b, idx = ", $ea->('index');
}
You can extend that to find where there is a non-match by checking inside that while loop (note: this assumes your arrays don't have undefs at the end, or that if they do, undef is the same as having a shorter array):
my #mismatch;
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
if (defined $a != defined $b || $a ne $b) {
push #mismatch, $ea->('index');
}
}
and then:
say "Mismatched count = ", scalar(#mismatch), " items are: ", join(q{, }, #mismatch);
The following code builds a list of mismatched pairs, then prints them out.
#a1 = (A,B,C,D,E,F);
#a2 = (A,C,H,D,E,G);
#diff = map { [$a1[$_] => $a2[$_]] }
grep { $a1[$_] ne $a2[$_] }
(0..($#a1 < $#a2 ? $#a1 : $#a2));
print "$_->[0]->$_->[1]\n" for #diff
You have the right idea, but you only need a single loop, since you are looking at each index and comparing entries between the arrays:
foreach my $a1 ( 0 .. $#array1) {
if($array1[$a1] ne $array2[$a1]) {
print "$a1: $array1[$a1] <-> $array2[$a1]\n";
}
}
I have two arrays, #a and #b. I want to do a compare among the elements of the two arrays.
my #a = qw"abc def efg ghy klm ghn";
my #b = qw"def ghy jgk lom com klm";
If any element matches then set a flag. Is there any simple way to do this?
First of all, your 2 arrays need to be written correctly.
#a = ("abc","def","efg","ghy","klm","ghn");
#b = ("def","efg","ghy","klm","ghn","klm");
Second of all, for arbitrary arrays (e.g. arrays whose elements may be references to other data structures) you can use Data::Compare.
For arrays whose elements are scalar, you can do comparison using List::MoreUtils pairwise BLOCK ARRAY1 ARRAY2, where BLOCK is your comparison subroutine. You can emulate pairwise (if you don't have List::MoreUtils access) via:
if (#a != #b) {
$equals = 0;
} else {
$equals = 1;
foreach (my $i = 0; $i < #a; $i++) {
# Ideally, check for undef/value comparison here as well
if ($a[$i] != $b[$i]) { # use "ne" if elements are strings, not numbers
# Or you can use generic sub comparing 2 values
$equals = 0;
last;
}
}
}
P.S. I am not sure but List::Compare may always sort the lists. I'm not sure if it can do pairwise comparisons.
List::Compare
if ( scalar List::Compare->new(\#a, \#b)->get_intersection ) {
…
}
Check to create an intersect function, which will return a list of items that are present in both lists. Then your return value is dependent on the number of items in the intersected list.
You can easily find on the web the best implementation of intersect for Perl. I remember looking for it a few years ago.
Here's what I found :
my #array1 = (1, 2, 3);
my #array2 = (2, 3, 4);
my %original = ();
my #isect = ();
map { $original{$_} = 1 } #array1;
#isect = grep { $original{$_} } #array2;
This is one way:
use warnings;
use strict;
my #a = split /,/, "abc,def,efg,ghy,klm,ghn";
my #b = split /,/, "def,ghy,jgk,lom,com,klm";
my $flag = 0;
my %a;
#a{#a} = (1) x #a;
for (#b) {
if ($a{$_}) {
$flag = 1;
last;
}
}
print "$flag\n";
From the requirement that 'if any element matches', use the intersection of sets:
sub set{
my %set = map { $_, undef }, #_;
return sort keys %set;
}
sub compare{
my ($listA,$listB) = #_;
return ( (set(#$listA)-set(#$listB)) > 0)
}
my #a = qw' abc def efg ghy klm ghn ';
my #b = qw' def ghy jgk lom com klm ';
my $flag;
foreach my $item(#a) {
$flag = #b~~$item ? 0 : 1;
last if !$flag;
}
Note that you will need Perl 5.10, or later, to use the smart match operator (~~) .
Brute force should do the trick for small a n:
my $flag = 0;
foreach my $i (#a) {
foreach my $k (#b) {
if ($i eq $k) {
$flag = 1;
last;
}
}
}
For a large n, use a hash table:
my $flag = 0;
my %aa = ();
$aa{$_} = 1 foreach (#a);
foreach my $i (#b) {
if ($aa{$i}) {
$flag = 1;
last;
}
}
Where a large n is |#a| + |#b| > ~1000 items
IMHO, you should use List::MoreUtils::pairwise. However, if for some reason you cannot, then the following sub would return a 1 for every index where the value in the first array compares equal to the value in the second array. You can generalize this method as much as you want and pass your own comparator if you want to, but at that point, just installing List::MoreUtils would be a more productive use of your time.
use strict; use warnings;
my #a = qw(abc def ghi jkl);
my #b = qw(abc dgh dlkfj jkl kjj lkm);
my $map = which_ones_equal(\#a, \#b);
print join(', ', #$map), "\n";
sub which_ones_equal {
my ($x, $y, $compare) = #_;
my $last = $#$x > $#$y ? $#$x : $#$y;
no warnings 'uninitialized';
return [ map { 0 + ($x->[$_] eq $y->[$_]) } $[ .. $last ];
}
This is Perl. The 'obvious' solution:
my #a = qw"abc def efg ghy klm ghn";
my #b = qw"def ghy jgk lom com klm";
print "arrays equal\n"
if #a == #b and join("\0", #a) eq join("\0", #b);
given "\0" not being in #a.
But thanks for confirming that there is no other generic solution than rolling your own.
my #a1 = qw|a b c d|;
my #a2 = qw|b c d e|;
for my $i (0..$#a1) {
say "element $i of array 1 was not found in array 2"
unless grep {$_ eq $a1[$i]} #a2
}
If you would consider the arrays with different order to be different, you may use Array::Diff
if (Array::Diff->diff(\#a, \#b)->count) {
# not_same
} else {
# same
}
This question still could mean two things where it states "If any element matches then set a flag":
Elements at the same position, i.e $a[2] eq $b[2]
Values at any position, i.e. $a[3] eq $b[5]
For case 1, you might do this:
# iterate over all positions, and compare values at that position
my #matches = grep { $a[$_] eq $b[$_] } 0 .. $#a;
# set flag if there's any match at the same position
my $flag = 1 if #matches;
For case 2, you might do that:
# make a hash of #a and check if any #b are in there
my %a = map { $_ => 1 } #a;
my #matches = grep { $a{$_} } #b;
# set flag if there's matches at any position
my $flag = 1 if #matches;
Note that in the first case, #matches holds the indexes of where there are matching elements, and in the second case #matches holds the matching values in the order in which they appear in #b.
Does anyone know how to shuffle two arrays randomly in exactly the same way in Perl?
For example, say I have these two arrays:
Before shuffling:
array 1: 1, 2, 3, 4, 5
array 2: a, b, c, d, e
After shuffling:
array 1: 2, 4, 5, 3, 1
array 2: b, d, e, c, a
So every element in each array is bound to its equivalent element.
Try (something like) this:
use List::Util qw(shuffle);
my #list1 = qw(a b c d e);
my #list2 = qw(f g h i j);
my #order = shuffle 0..$#list1;
print #list1[#order];
print #list2[#order];
First: parallel arrays are a potential sign of bad code; you should see if you can use an array of objects or hashes and save yourself that trouble.
Nonetheless:
use List::Util qw(shuffle);
sub shuffle_together {
my (#arrays) = #_;
my $length = #{ $arrays[0] };
for my $array (#arrays) {
die "Arrays weren't all the same length" if #$array != $length;
}
my #shuffle_order = shuffle (0 .. $length - 1);
return map {
[ #{$_}[#shuffle_order] ]
} #arrays;
}
my ($numbers, $letters) = shuffle_together [1,2,3,4,5], ['a','b','c','d','e'];
Basically, use shuffle to produce a list of indices in random order, and then slice all of the arrays with the same list of indices.
Use List::Util shuffle to shuffle a list of indexes and map the results onto the arrays.
use strict;
use warnings;
use List::Util qw(shuffle);
my #array1 = qw( a b c d e );
my #array2 = 1..5;
my #indexes = shuffle 0..$#array1;
my #shuffle1 = map $array1[$_], #indexes;
my #shuffle2 = map $array2[$_], #indexes;
Update
Use Chris Jester-Young's solution. Array slices are a better choice that I should have thought of.
Here is another way:
use strict;
use warnings;
use List::AllUtils qw(pairwise shuffle);
my #list1 = qw(a b c d e);
my #list2 = qw(f g h i j);
my #shuffled_pairs = shuffle pairwise{[$a, $b]} #list1, #list2;
for my $pair ( #shuffled_pairs ) {
print "$pair->[0]\t$pair->[1]\n";
}
Output:
C:\Temp> sfl
e j
b g
d i
a f
c h
This way, you can iterate directly over #shuffled_pairs without needing to keep an extra array for the indexes and avoid C-style loops.