I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.
I am trying to finish a homework for my class. After I installed a API::Twitter module using CPAN, the Net::Twitter module seems to be affected and can no longer be used.
I tried to run this code in my Mac terminal:
song_hanlun_hw8.pl
#!/usr/bin/perl
use Net::Twitter;
use JSON;
use LWP::Simple;
use XML::Bare;
use Data::Dumper;
# keys for twitter
$consumer_key = "key";
$consumer_secret = "key";
$token = "key-key";
$token_secret = "key";
# keys for sentiment analysis
$apikey = "key";
# As of 13-Aug-2010, Twitter requires OAuth for authenticated requests
my $nt = Net::Twitter->new(
traits => [qw/API::RESTv1_1/],
# traits => [qw/API::Search/],
consumer_key => $consumer_key,
consumer_secret => $consumer_secret,
access_token => $token,
access_token_secret => $token_secret,
);
#enter a term you want to search for
$search_term = "Halloween";
$count = 100;
my $r = $nt->search( { q => $search_term, count => $count } );
# print Dumper $r;
for my $status ( #{ $r->{statuses} } ) {
push #tweets, $status->{text};
}
$nextMaxId = $r->{search_metadata}->{next_results};
$nextMaxId =~ s/\?max_id=//g;
$nextMaxId =~ s/&q=Halloween&count=100&include_entities=1//g;
# print $nextMaxId;
my $r2 = $nt->search( { q => $search_term, count => $count, max_id => $nextMaxId } );
for my $status ( #{ $r2->{statuses} } ) {
push #tweets, $status->{text};
}
foreach $tweet ( #tweets ) {
# $tweet = $tweets[1];
$return = get "http://gateway-a.watsonplatform.net/calls/text/TextGetTextSentiment?apikey=$apikey&text=$tweet&outputMode=xml";
$bare = new XML::Bare( text => $return );
$root = $bare->parse();
$sentiment = $root->{results}->{docSentiment}->{score}->{value};
push #sentiments, $sentiment;
# print "$sentiment";
}
$m100to75 = 0;
$m75to50 = 0;
$m50to25 = 0;
$m25to0 = 0;
$p0to25 = 0;
$p25to50 = 0;
$p50to75 = 0;
$p75to100 = 0;
foreach $sent ( #sentiments ) {
# smaller than zero. four ranges.
if ( $sent >= -1 && $sent < -0.75 ) {
$m100to75++;
}
if ( $sent >= -0.75 && $sent < -0.5 ) {
$m75to50++;
}
if ( $sent >= -0.5 && $sent < -0.25 ) {
$m50to25++;
}
if ( $sent >= -0.25 && $sent < 0 ) {
$m25to0++;
}
#bigger than zero. four ranges.
if ( $sent >= 0 && $sent < 0.25 ) {
$p0to25++;
}
if ( $sent >= 0.25 && $sent < 0.5 ) {
$p25to50++;
}
if ( $sent >= 0.5 && $sent < 0.75 ) {
$p50to75++;
}
if ( $sent >= 0.75 && $sent < 1 ) {
$p75to100++;
}
}
# print "$m100to75\n$m75to50\n$m50to25\n$m25to0\n$p0to25\n$p25to50\n$p50to75\n$p75to100\n";
print "tweets sentiment score summary histogram:\n";
print "-1.00 to -0.75: ";
for ( $i = 0 ; $i < $m100to75 ; $i++ ) {
print "*";
}
print "\n";
print "-0.75 to -0.50: ";
for ( $i = 0 ; $i < $m75to50 ; $i++ ) {
print "*";
}
print "\n";
print "-0.50 to -0.25: ";
for ( $i = 0 ; $i < $m50to25 ; $i++ ) {
print "*";
}
print "\n";
print "-0.25 to -0.00: ";
for ( $i = 0 ; $i < $m25to0 ; $i++ ) {
print "*";
}
print "\n";
print "+0.00 to +0.25: ";
for ( $i = 0 ; $i < $p0to25 ; $i++ ) {
print "*";
}
print "\n";
print "+0.25 to +0.50: ";
for ( $i = 0 ; $i < $p25to50 ; $i++ ) {
print "*";
}
print "\n";
print "+0.50 to +0.75: ";
for ( $i = 0 ; $i < $p50to75 ; $i++ ) {
print "*";
}
print "\n";
print "+0.75 to +1.00: ";
for ( $i = 0 ; $i < $p75to100 ; $i++ ) {
print "*";
}
print "\n";
But the terminal returns
Invalid version format (version required) at /Library/Perl/5.18/Module/Runtime.pm line 386.
BEGIN failed--compilation aborted at /Library/Perl/5.18/Net/Twitter.pm line 3.
Compilation failed in require at song_hanlun_hw8.pl line 3.
BEGIN failed--compilation aborted at song_hanlun_hw8.pl line 3.
Could someone tell me how to fix this?
After some Google search I highly believe downgrading the Module::Runtime would solve this issue. But I could't find how.
Eventually I solved this issue by delete the Runtime module file in perl folder and reinstalled the Net::Twitter module, which also reinstalled the Runtime that Net::Twitter depends on.
I'm trying to count the number of bases using a for loop and the substr function but the counts are off and I'm not sure why! Please help! I have to use these functions in my assignment. Where am I going wrong? Here is my code:
use strict;
use warnings;
my $user_input = "accgtutf5";
#initalizing the lengths
my $a_base_total = 0;
my $c_base_total = 0;
my $g_base_total = 0;
my $t_base_total = 0;
my $other_total = 0;
for ( my $position = 0; $position < length $user_input; $position++ ) {
my $nucleotide = substr( $user_input, $position, 1 );
if ( $nucleotide eq "a" ) {
$a_base_total++;
} elsif ( $nucleotide eq "c" ) {
$c_base_total++;
} elsif ( $nucleotide eq "g" ) {
$g_base_total++;
} elsif ( $nucleotide eq "t" ) {
$t_base_total++;
} else {
$other_total++;
}
$position++;
}
print "a = $a_base_total\n";
print "c = $c_base_total\n";
print "g = $g_base_total\n";
print "t = $t_base_total\n";
print "other = $other_total\n";
The output I'm getting is :
a=1
c=1
g=0
t=2
other=1
When it should be:
a = 1
c = 2
g = 1
t = 2
other = 3
Thanks in advance! :)
You're incrementing twice.
Simply remove this line:
$position++;
Also, instead of iterating on position, I would suggest iterating on character.
Your script can be simplified to just:
use strict;
use warnings;
my $user_input = "accgtutf5";
my %count;
for my $nucleotide (split '', $user_input) {
$nucleotide = 'other' unless $nucleotide =~ /[acgt]/;
$count{$nucleotide}++;
}
printf "%s = %d\n", $_, $count{$_} // 0 for qw(a c g t other);
You are incrementing $position twice: once at the for and once at the end of the loop. Remove the second $position++.
I would like to write a Perl function that gets a GFF3 filename and a range (i.e. 100000 .. 2000000). and returns a reference to an array containing all names/accessions of genes found in this range.
I guess using bioperl will make sense, but I have very little experience with it. I can write a script that parses a GFF3 by my self, but if using bioperl (or another packagae) is not too complicated - I'd rather reuse their code.
use Bio::Tools::GFF;
my $range_start = 100000;
my $range_end = 200000;
my #features_in_range = ( );
my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
while (my $feature = $gffio->next_feature()) {
## What about features that are not contained within the coordinate range but
## do overlap it? Such features won't be caught by this check.
if (
($feature->start() >= $range_start)
&&
($feature->end() <= $range_end)
) {
push #features_in_range, $feature;
}
}
$gffio->close();
DISCLAIMER: Naive implementation. I just banged that out, it's had no testing. I won't even guarantee it compiles.
You do want to use BioPerl for this, using possibly the Bio::Tools::GFF module.
You should really ask on the BioPerl mailing list. It's very friendly and the subscribers are very knowledgeable -- they'll definitely be able to help you. And once you do get an answer (and if you don't get one here first), I suggest answering your own question here with the answer so we can all benefit!
The following function takes a hash of targets and ranges and returns a function that will iterate over all targets that overlap any of the ranges. The targets should be a reference to an array of references:
my $targets =
[
[
$start,
$end,
],
...,
]
The ranges should be a reference to an array of hashes:
my $ranges =
[
{
seqname => $seqname,
source => $source,
feature => $feature,
start => $start,
end => $end,
score => $score,
strand => $strand,
frame => $frame,
attribute => $attribute,
},
...,
]
You can, of course, just pass a single target.
my $brs_iterator
= binary_range_search( targets => $targets, ranges => $ranges );
while ( my $gff_line = $brs_iterator->() ) {
# do stuff
}
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || croak 'Need a targets parameter';
my $ranges = $options{ranges} || croak 'Need a ranges parameter';
my ( $low, $high ) = ( 0, $#{$ranges} );
my #iterators = ();
TARGET:
for my $range (#$targets) {
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down - 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while (#iterators) {
if ( my $range = $iterators[0]->() ) {
return $range;
}
shift #iterators;
}
return;
};
}
I have a function, binary_range_search, that is called like so:
my $brs_iterator = binary_range_search(
target => $range, # eg. [1, 200]
search => $ranges # eg. [ {start => 1, end => 1000},
); # {start => 500, end => 1500} ]
brs_iterator->() will iterate over all #$ranges on which $range overlaps.
I would like to extend binary_range_search to be able to call it with multiple ranges as its target, eg:
target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
search => $search_ranges # as above
So, when the search on $range->[0] is exhausted, it should move on to $range->[1], and so on. Here is the function in question, in its original form:
sub binary_range_search {
my %options = #_;
my $range = $options{target} || return;
my $ranges = $options{search} || return;
my ( $low, $high ) = ( 0, #{$ranges} - 1 );
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down + 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
return $brs_iterator;
}
return sub { };
}
It's a standard binary search strategy, until it finds an overlapping range. It then moves on the right, exhausts it, moves on the left, exhausts it, and finally gives up. Ideally, it should then maybe shift the next target range, and redo the search, I suppose (perhaps via recursion?). My problem is, I am not sure how to make that work with the iterator construction.
I just wrapped your iterator generation in a for loop, and built up an array of iterator functions.
Depending on context, I either return a master iterator or a list of iterator functions. I wasn't sure what you wanted.
use strict;
use warnings;
my $t = [ [1,200], [400,900] ];
my #r = (
{ start => 1, end => 100 },
{ start => 2, end => 500 },
{ start => 204, end => 500 },
{ start => 208, end => 500 },
{ start => 215, end => 1000 },
{ start => 150, end => 1000 },
{ start => 500, end => 1100 },
);
# Get a master iterator that will process each iterator in turn.
my $brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
# Get an array of iterators
my #brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || return;
my $ranges = $options{search} || return;
my #iterators;
TARGET:
for my $target ( #$targets ) {
my ( $low, $high ) = ( 0, $#{$ranges} );
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
# Remove non-overlapping ranges
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $target->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $target->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( exists $ranges->[$up + 1]
and $ranges->[ $up + 1 ]{end} >= $target->[0]
and $ranges->[ $up + 1 ]{start} <= $target->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $target->[0]
and $ranges->[ $down + 1 ]{start} <= $target->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while( #iterators ) {
if( my $range = $iterators[0]() ) {
return $range;
}
shift #iterators;
}
return;
};
}
If you're wanting to iterate over all values that overlap the search ranges, you don't need binary search.
First the customary front matter:
use warnings;
use strict;
use Carp;
First off, check that we have target and search parameters and that for each range, the starting point is no greater than its ending point. Otherwise, we refuse to proceed.
sub binary_range_search {
my %arg = #_;
my #errors;
my $target = $arg{target} || push #errors => "no target";
my $search = $arg{search} || push #errors => "no search";
for (#$target) {
my($start,$end) = #$_;
push #errors => "Target start ($start) is greater than end ($end)"
if $start > $end;
}
for (#$search) {
my($start,$end) = #{$_}{qw/ start end /};
push #errors => "Search start ($start) is greater than end ($end)"
if $start > $end;
}
croak "Invalid use of binary_range_search:\n",
map " - $_\n", #errors
if #errors;
The iterator itself is a closure that maintains the following state:
my $i;
my($ta,$tb);
my($sa,$sb);
my $si = 0;
where
$i if defined is the next value from the current overlapping range
$ta and $tb are the starting and ending points for the current target range
$sa and $sb are like the above but for the current search range
$si is an index into #$search and defines the current search range
We will be assigning and returning the iterator $it. The declaration and initialization are separate so the iterator may call itself when necessary.
my $it;
$it = sub {
We are done if no more target ranges remain or if there were no search ranges to begin with.
return unless #$target && #$search;
When $i is defined, it means we have found an overlap and iterate by incrementing $i until it is greater than the ending point of either the current target range or the current search range.
if (defined $i) {
# iterating within a target range
if ($i > $tb || $i > $sb) {
++$si;
undef $i;
return $it->();
}
else {
return $i++;
}
}
Otherwise, we need to determine whether the next target range overlaps any search range. However, if $i is undefined and we've already considered all the search ranges, we discard the current target range and start again.
else {
# does the next target range overlap?
if ($si >= #$search) {
shift #$target;
$si = 0;
return $it->();
}
Here we pull out the starting and ending points of both the current target range (always at the front of #$target) and the current search range (indexed by $si).
($ta,$tb) = #{ $target->[0] };
($sa,$sb) = #{ $search->[$si] }{qw/ start end /};
Now testing for overlap is straightforward. For disjoint search ranges, we ignore and move on. Otherwise, we find the leftmost point in the overlap and iterate from there.
if ($sb < $ta || $sa > $tb) {
# disjoint
++$si;
undef $i;
return $it->();
}
elsif ($sa >= $ta) {
$i = $sa;
return $i++;
}
elsif ($ta >= $sa) {
$i = $ta;
return $i++;
}
}
};
Finally, we return the iterator:
$it;
}
For an example similar to the one in your question
my $it = binary_range_search(
target => [ [1, 200], [50, 300] ],
search => [ { start => 1, end => 1000 },
{ start => 500, end => 1500 },
{ start => 40, end => 60 },
{ start => 250, end => 260 } ],
);
while (defined(my $value = $it->())) {
print "got $value\n";
}
Its output with internal points elided is
got 1
[...]
got 200
got 40
[...]
got 60
got 50
[...]
got 300
got 50
[...]
got 60
got 250
[...]
got 260
Split it into two functions, an outer function that loops over the ranges and calls an inner function that implements a conventional binary chop.
Warning: a very c++ biased answer:
what you have to do is define a new type of iterator, which is a pair of a usual iterator, and a segmemt iterrator (if you don't have a segment iterator, it's a pair of a const pointer / ref to the segments, and an index pointing to the correct segment). You have to define all the concepts of a random access iterator (difference, addition of integer, etc). bear in mind, that at least in c++ lingo this is not a true random iterator, since addition of an integer is not really constant time; such is life.