I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.
Related
I am trying to create a simple bubble sort in Perl but it doesn't seem to work. Can any one help me?
Code:
for ( my $i = 1; $i < #array; $i++ ) {
for ( my $k = 0; $k = #array < $i - 1; $k++ ) {
if ( $array[$k] > $array[ $k + 1 ] ) {
$temp = $array[$k];
$array[$k] = $array[ $k + 1 ];
$array[ $k + 1 ] = $temp;
}
}
}
Then when I iterate through the array again it is not sorted.
Shouldn't the outer loop go from the back to the front of the array? Also the $k = #array< $i - 1 statement in the for inner loop doesn't make sense.
my #array = (5,6,3,1,7,3,2,9,10,4);
my $i, $k;
for ($i = $#array; $i > 0; $i--) { # $#array = last index = length-1
for ($k = 0; $k < $i; $k++) {
if ($array[$k] > $array[$k+1]) {
($array[$k], $array[$k+1]) = ($array[$k+1], $array[$k]);
}
}
}
print "#array\n"; # 1 2 3 3 4 5 6 7 9 10
You should avoid this sort of confusion by using list version of for
my #array = ( 5, 6, 3, 1, 7, 3, 2, 9, 10, 4 );
for my $i ( 1 .. $#array ) {
for my $k ( 0 .. $i - 1 ) {
#array[ $k, $k + 1 ] = #array[ $k + 1, $k ]
if $array[$k] > $array[ $k + 1 ];
}
}
print "#array\n";
You can prevent a lot of bugs in this way and code is more readable because there is clearly visible intent what do you like to achieve. Which leads to the realisation your algorithm is wrong and what you would probably like is
my #array = ( 5, 6, 3, 1, 7, 3, 2, 9, 10, 4 );
for my $i ( reverse 1 .. $#array ) {
for my $k ( 0 .. $i - 1 ) {
#array[ $k, $k + 1 ] = #array[ $k + 1, $k ]
if $array[$k] > $array[ $k + 1 ];
}
}
print "#array\n";
I cannot believe nobody gave him the true PERL answer yet. I believe he asked for it in Perl, not in C transcribed to Perl :)
for($i=$#a;$i>0;$i--){$m=$a[0];splice(#a,0,$i+1,map{$s=$m;$m>$_?$_:($s,$m=$_)[0]}#a[1..$i],$m);}
#!/bin/usr/perl -w
use strict;
print "Enter your input filename for original sample data values: \n";
chomp($data=<STDIN>);
print "Enter your input filename for adjustment values\n";
chomp($adj=<STDIN>) ;
print "Enter your output filename for resultant adjusted new sample data \n";
chomp($new=<STDIN>);
open(R1,"$data") or die("error");
open(R2,"$adj") or die ("error");
open(WW,"+>$new") or die ("error");
while( ($line1=(<R1>)) && ($line2=(<R2>)) )
{
$l1=$line1;
#arr1= split(" ",$l1);
$l2=$line2;
#arr2= split(" ",$l2);
$l= ( scalar#arr1);
$p= (scalar#arr2);
for ( $i = 0; $i <= $l; $i++ ){
for ( $j =($i+1); $j <= $l; $j++ ){
if ($arr1[$i]< $arr1[$j]){
$a = $arr1[$i] + ($arr2[$i]/2);
$b = $arr1[$j] - ($arr2[$i]/2);
push ( $arr1[$i]->$a , $arr1[$j]->$b);
}
elsif ( $arr1[$i]= $arr1[$j]){
$a = $arr1[$i];
$b = $arr1[$j];
push ($arr1[$i]->$a,$arr1[$j]->$b);
}
else{
$a = $arr1[$i]-($arr2[$i]/2);
$b = $arr1[$j]+ ($arr2[$i]/2);
push ($arr1[$i]->$a,$arr1[$j]->$b);
}
}
}
$l1 = scalar#arr1;
for ($k = 0; $k <= $l1 ; $k++)
{
if (($k % 10) != 0){
print WW "$arr1[$k]";
print WW "\t" ;
}
else {
print WW "\n";
print WW "$arr1[$k]";
print WW "\t";
}
}
}
close(R1);
close(R2);
close(WW);
exit;
when i am running this prog. i am getting an error that "not an ARRAY reference at line 29".
how can i create the reference to my first array #arr1 ??? so that it stores the changed values of the element at the particular index after running the iteration.
input :
#array1
1 2 3 4 5 6 7 8 9 10
#array2
1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2
desired output
#array1
15 1.5 2 3 6 4 11.5 5 5.5
Well, I'm not getting the answer you say you're looking for, but what it appears you're trying to do is to store the value of $a into the $i'th index of array #arr1 and the value of $b into the $jth index of #arr1. I have hoisted the assignment code out of the if branches since it will be the same for all three cases. I have also fixed a subtle error you had in your conditions. You had
elsif ( $arr1[$i]= $arr1[$j]){
but you surely meant to do an equality comparison rather than an assignment here:
elsif ( $arr1[$i] == $arr1[$j]){
So here is the modified section. As I say, it still doesn't print out what you say the desired result is, and I'm not sure whether it's because your computation is wrong or your printing is wrong (I couldn't figure out any obvious transform from your inputs to your desired output), but this should at least put you in the right direction:
for ( $i = 0; $i <= $l; $i++ ){
for ( $j =($i+1); $j <= $l; $j++ ){
if ($arr1[$i]< $arr1[$j]){
$a = $arr1[$i] + ($arr2[$i]/2);
$b = $arr1[$j] - ($arr2[$i]/2);
# push ( $arr1[$i]->$a , $arr1[$j]->$b);
}
elsif ( $arr1[$i] == $arr1[$j]){
$a = $arr1[$i];
$b = $arr1[$j];
# push ($arr1[$i]->$a,$arr1[$j]->$b);
}
else{
$a = $arr1[$i]-($arr2[$i]/2);
$b = $arr1[$j]+ ($arr2[$i]/2);
# push ($arr1[$i]->$a,$arr1[$j]->$b);
}
$arr1[$i] = $a;
$arr1[$j] = $b;
}
}
I have a file with columns of numbers:
1 0.0 0.0
2 0.0 0.0
3 15.2 0.0
4 7.0 9.0
5 0.0 3.0
6 1.0 0.0
7 0.0 2.5
8 0 0 0 0
I need to find the sum of numbers from row 3 to 7 of the right two columns. So for column2 i want to sum 15.2, 7.0 and 1.0. For column3 i want to sum 9.0, 3.0 and 2.5. I need to maintain the single decimal point format.
code:
While (<INPUT>){
my #a = split;
my $c2 .= $a[1];
my $c3 .= $a[2];
my $c2_string = substr($c2, 2, 5);
my $c3_string = substr($c3, 2, 5);
my #sumarray = split ('', $c2);
#then loop through each element and add them up.
This doesnt seem to work. How can i maintain separation of each number while maintaining the decimal format?
For c2, wrong Output:
1
5
.
2
7
.
0
0
.
0
etc
Desired Output:
c2=23.2
c3=14.5
my $x = my $y = 0;
while (<INPUT>) {
my #a = split;
($a[0] >=3 and $a[0] <=7) or next;
$x += $a[1];
$y += $a[2];
}
print "c2=$x\n", "c3=$y\n";
perl -lane'
($F[0] >=3 and $F[0] <=7) or next;
$x += $F[1]; $y += $F[2];
END{ print for "c2=$x","c3=$y" }
' file
my #data;
while (<INPUT>) {
push #data, [ split ];
}
my ($sum2, $sum3);
for (my $i = 2; $i < 7; $i++) {
$sum2 += $data[$i][1];
$sum3 += $data[$i][2];
}
print "$sum2, $sum3\n";
Output:
23.2, 14.5
And this one does not create an array for the entire file:
my ($sum2, $sum3);
while (<INPUT>) {
my #v = split;
if ($v[0] > 2 && $v[0] < 8) {
$sum2 += $v[1];
$sum3 += $v[2];
}
}
#!/usr/bin/perl -w
use strict;
my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
my ($col1, $sum_col2, $sum_col3 );
while (<$input>) {
my (#cols) = split;
$col1 = $cols[0];
$sum_col2 += $cols[1] if $col1 == 3 .. 7;
$sum_col3 += $cols[2] if $col1 == 3 .. 7;
}
print "Column2: $sum_col2\n";
print "Column3: $sum_col3\n";
Output:
Column2: 23.2
Column3: 14.5
I'm comparing a reference sequence of size 5500 bases and query sequence of size 3600, using dynamic programming (semi global alignment), in fact I don't know much about complexity and performance and the code is blowing up and giving me the error "out of memory". Knowing that it works normally on smaller sequences, my question is: This behavior is normal or I might have another problem in code ?if it's normal any hint to solve this problem ? Thanks in advance.
sub semiGlobal {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
$matrix[0][0]{score} = 0;
$matrix[0][0]{pointer} = "none";
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j]{score} = 0;
$matrix[0][$j]{pointer} = "none";
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0]{score} = $GAP * $i;
$matrix[$i][0]{pointer} = "up";
}
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2);
print "seq1: ".length($seq1);
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
if ( $letter1 eq $letter2 ) {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MATCH;
}
else {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MISMATCH;
}
# calculate gap scores
$up_score = $matrix[ $i - 1 ][$j]{score} + $GAP;
$left_score = $matrix[$i][ $j - 1 ]{score} + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j]{score} = $diagonal_score;
$matrix[$i][$j]{pointer} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j]{score} = $up_score;
$matrix[$i][$j]{pointer} = "up";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
# set maximum score
if ( $matrix[$i][$j]{score} > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = $matrix[$i][$j]{score};
}
}
}
my $align1 = "";
my $align2 = "";
my $j = $max_j;
my $i = $max_i;
while (1) {
if ( $matrix[$i][$j]{pointer} eq "none" ) {
$stseq1 = $j;
last;
}
if ( $matrix[$i][$j]{pointer} eq "diagonal" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "left" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "up" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
return ( $align1, $align2, $stseq1 ,$max_j);
}
One way to possibly solve the problem is to tie the #matrix with a file. However, this will dramatically slow down the program. Consider this:
sub semiGlobal {
use Tie::Array::CSV;
tie my #matrix, 'Tie::Array::CSV', 'temp.txt'; # Don't forget to add your own error handler.
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
$matrix[0][0] = '0 n';
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j] = '0 n';
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
my $score = $GAP * $i;
$matrix[$i][0] = join ' ',$score,'u';
}
#print Dumper(\#matrix);
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2)."\n";
print "seq1: ".length($seq1)."\n";
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
my $score = (split / /, $matrix[ $i - 1 ][ $j - 1 ])[0];
if ( $letter1 eq $letter2 ) {
$diagonal_score = $score + $MATCH;
}
else {
$diagonal_score = $score + $MISMATCH;
}
# calculate gap scores
$up_score = (split / /,$matrix[ $i - 1 ][$j])[0] + $GAP;
$left_score = (split / /,$matrix[$i][ $j - 1 ])[0] + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j] = join ' ',$diagonal_score,'d';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j] = join ' ', $up_score, 'u';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
# set maximum score
if ( (split / /, $matrix[$i][$j])[0] > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = (split / /, $matrix[$i][$j])[0];
}
}
}
my $align1 = "";
my $align2 = "";
my $stseq1;
my $j = $max_j;
my $i = $max_i;
while (1) {
my $pointer = (split / /, $matrix[$i][$j])[1];
if ( $pointer eq "n" ) {
$stseq1 = $j;
last;
}
if ( $pointer eq "d" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $pointer eq "l" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $pointer eq "u" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
untie #matrix; # Don't forget to add your own error handler.
unlink 'temp.txt'; # Don't forget to add your own error handler.
return ( $align1, $align2, $stseq1 ,$max_j);
}
You can still use your original sub for short sequences, and switch to this sub for long ones.
I think that #j_random_hacker and #Ashalynd are on the right track regarding using this algorithm in most Perl implementations. The datatypes you're using are going to use more memory that absolutely needed for the calculations.
So this is "normal" in that you should expect to see this kind of memory usage for how you've written this algorithm in perl. You may have other problems in surrounding code that are using a lot of memory but this algorithm will hit your memory hard with large sequences.
You can address some of the memory issues by changing the datatypes that you're using as #Ashalynd suggests. You could try changing the hash which holds score and pointer into an array and changing the string pointers into integer values. Something like this might get you some benefit while still maintaining readability:
use strict;
use warnings;
# define constants for array positions and pointer values
# so the code is still readable.
# (If you have the "Readonly" CPAN module you may want to use it for constants
# instead although none of the downsides of the "constant" pragma apply in this code.)
use constant {
SCORE => 0,
POINTER => 1,
DIAGONAL => 0,
LEFT => 1,
UP => 2,
NONE => 3,
};
...
sub semiGlobal2 {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
# score and pointer are now stored in an array
# using the defined constants as indices
$matrix[0][0][SCORE] = 0;
# pointer value is now a constant integer
$matrix[0][0][POINTER] = NONE;
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j][SCORE] = 0;
$matrix[0][$j][POINTER] = NONE;
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0][SCORE] = $GAP * $i;
$matrix[$i][0][POINTER] = UP;
}
... # continue to make the appropriate changes throughout the code
However, when I tested this I didn't get a huge benefit when attempting to align a 3600 char string in a 5500 char string of random data. I programmed my code to abort when it consumed more than 2GB of memory. The original code aborted after 23 seconds while the one using constants and an array instead of a hash aborted after 32 seconds.
If you really want to use this specific algorithm I'd check out the performance of Algorithm::NeedlemanWunsch. It doesn't look like it's very mature but it may have addressed your performance issues. Otherwise look into writing an Inline or Perl XS wrapper around a C implementation
For clarification, if I had a list of 8 elements, i would want to randomly pick 2. If I had a list of 20 elements, I would want to randomly pick 5. I would also like to assure (though not needed) that two elements don't touch, i.e. if possible not the 3 and then 4 element. Rather, 3 and 5 would be nicer.
The simplest solution:
Shuffle the list
select the 1st quarter.
Example implementation:
use List::Util qw/shuffle/;
my #nums = 1..20;
my #pick = (shuffle #nums)[0 .. 0.25 * $#nums];
say "#pick";
Example output: 10 2 18 3 19.
Your additional restriction “no neighboring numbers” actually makes this less random, and should be avoided if you want actual randomness. To avoid that two neighboring elements are included in the output, I would iteratively splice unwanted elements out of the list:
my #nums = 1..20;
my $size = 0.25 * #nums;
my #pick;
while (#pick < $size) {
my $i = int rand #nums;
push #pick, my $num = $nums[$i];
# check and remove neighbours
my $len = 1;
$len++ if $i < $#nums and $num + 1 == $nums[$i + 1];
$len++, $i-- if 0 < $i and $num - 1 == $nums[$i - 1];
splice #nums, $i, $len;
}
say "#pick";
use strict;
use warnings;
sub randsel {
my ($fact, $i, #r) = (1.0, 0);
while (#r * 4 < #_) {
if (not grep { $_ == $i } #r) {
$fact = 1.0;
# make $fact = 0.0 if you really don't want
# consecutive elements
$fact = 0.1 if grep { abs($i - $_) == 1 } #r;
push(#r, $i) if (rand() < 0.25 * $fact);
}
$i = ($i + 1) % #_;
}
return map { $_[$_] } sort { $a <=> $b } #r;
}
my #l;
$l[$_] = $_ for (0..19);
print join(" ", randsel(#l)), "\n";