Shipping Handling Charge CGI/Perl - perl

I want to add a flat $25 handling fee for Alaska (AK) and Hawaii (HI) - my test breaks when I add the states and flat fee to the shipping matrix below. Can someone point me in the right direction?
my $totalPounds = sprintf("%.2f",($totalWeight / 16));
#my $shipping = &getShipUPS($totalPounds, $zip, $shipType);
if ($subtotal <= 24.99) {$shipping = '10.95';}
elsif (($subtotal > 24.99) && ($subtotal <= 74.99)) {$shipping = '13.95';}
elsif (($subtotal > 74.99) && ($subtotal <= 149.99)) {$shipping = '14.95';}
elsif ($subtotal >= $150) {$shipping = '18.95';}
elsif ($state eq 'HI','AK') ($subtotal <= 24.99) {$shipping = '10.95'+'25.00';}
elsif ($state eq 'HI','AK') (($subtotal > 24.99) && ($subtotal <= 74.99)) {$shipping = '13.95'+'25.00';}
elsif ($state eq 'HI','AK') (($subtotal > 74.99) && ($subtotal <= 149.99)) {$shipping = '14.95'+'25.00';}
elsif ($state eq 'HI','AK') ($subtotal >= $150) {$shipping = '18.95'+'25.00';}else
$shipping = sprintf("%.2f", $shipping);
my $total = $subtotal + $tax + $shipping;
$subtotal = sprintf("%.2f", $subtotal);
$total = sprintf("%.2f", $total);

You cannot use multiple parameters with eq like this
$state eq 'HI','AK'
You need to do
$state eq 'HI' or $state eq 'AK'
ALso, you cannot put another parenthesis after the first after elsif like this
elsif ($state eq 'HI','AK') ($subtotal >= $150)
You need to do
elsif ( ($state eq 'HI' or $state eq 'AK') or ($subtotal >= $150) )
# ^---- main parantheses -------^
Of course, the smarter choice might be to use a hash
%extra_charges = ( AK => 25,
HI => 25,
# etc
);
...
$subtotal += $extra_charges{$state}; # assuming no missing states
The if-else logic is also all kinds of redundant. This ought to be the equivalent of your code:
if ($subtotal <= 24.99) { $shipping = '10.95' }
elsif ($subtotal <= 74.99) { $shipping = '13.95' }
elsif ($subtotal <= 149.99) { $shipping = '14.95' }
else { $shipping = '18.95' }
if ($state eq 'AK' or $state eq 'HI') { $shipping += 25 }
Those meandering forests of ifs are enough to make one dizzy, and most of them were not required. If a value is not less than or equal to 24.99, it must be bigger than 24.99, so no need to double check that.

That code is a total mess, has multiple syntax errors, and violates DRY.
It would be best to first calculate the basic shipping fee, depending on the subtotal. In a second step you add the $25 charge if the state is Hawaii or Alaska:
my #shipping_fees = (
# max subtotal => fee
[ 24.99 => 10.95 ],
[ 74.99 => 13.95 ],
[ 149.99 => 14.95 ],
[ inf => 18.95 ],
);
my %extra_fees_per_state = (
AK => 25.00,
HI => 25.00,
);
Then:
my $shipping;
for my $shipping_fee (#shipping_fees) {
my ($max, $fee) = #$shipping_fee;
if ($subtotal <= $max) {
$shipping = $fee;
last;
}
}
if (defined( my $extra = $extra_fees_per_state{$state})) {
$shipping += $extra;
}

Related

Perl invalid version format error

I am trying to finish a homework for my class. After I installed a API::Twitter module using CPAN, the Net::Twitter module seems to be affected and can no longer be used.
I tried to run this code in my Mac terminal:
song_hanlun_hw8.pl
#!/usr/bin/perl
use Net::Twitter;
use JSON;
use LWP::Simple;
use XML::Bare;
use Data::Dumper;
# keys for twitter
$consumer_key = "key";
$consumer_secret = "key";
$token = "key-key";
$token_secret = "key";
# keys for sentiment analysis
$apikey = "key";
# As of 13-Aug-2010, Twitter requires OAuth for authenticated requests
my $nt = Net::Twitter->new(
traits => [qw/API::RESTv1_1/],
# traits => [qw/API::Search/],
consumer_key => $consumer_key,
consumer_secret => $consumer_secret,
access_token => $token,
access_token_secret => $token_secret,
);
#enter a term you want to search for
$search_term = "Halloween";
$count = 100;
my $r = $nt->search( { q => $search_term, count => $count } );
# print Dumper $r;
for my $status ( #{ $r->{statuses} } ) {
push #tweets, $status->{text};
}
$nextMaxId = $r->{search_metadata}->{next_results};
$nextMaxId =~ s/\?max_id=//g;
$nextMaxId =~ s/&q=Halloween&count=100&include_entities=1//g;
# print $nextMaxId;
my $r2 = $nt->search( { q => $search_term, count => $count, max_id => $nextMaxId } );
for my $status ( #{ $r2->{statuses} } ) {
push #tweets, $status->{text};
}
foreach $tweet ( #tweets ) {
# $tweet = $tweets[1];
$return = get "http://gateway-a.watsonplatform.net/calls/text/TextGetTextSentiment?apikey=$apikey&text=$tweet&outputMode=xml";
$bare = new XML::Bare( text => $return );
$root = $bare->parse();
$sentiment = $root->{results}->{docSentiment}->{score}->{value};
push #sentiments, $sentiment;
# print "$sentiment";
}
$m100to75 = 0;
$m75to50 = 0;
$m50to25 = 0;
$m25to0 = 0;
$p0to25 = 0;
$p25to50 = 0;
$p50to75 = 0;
$p75to100 = 0;
foreach $sent ( #sentiments ) {
# smaller than zero. four ranges.
if ( $sent >= -1 && $sent < -0.75 ) {
$m100to75++;
}
if ( $sent >= -0.75 && $sent < -0.5 ) {
$m75to50++;
}
if ( $sent >= -0.5 && $sent < -0.25 ) {
$m50to25++;
}
if ( $sent >= -0.25 && $sent < 0 ) {
$m25to0++;
}
#bigger than zero. four ranges.
if ( $sent >= 0 && $sent < 0.25 ) {
$p0to25++;
}
if ( $sent >= 0.25 && $sent < 0.5 ) {
$p25to50++;
}
if ( $sent >= 0.5 && $sent < 0.75 ) {
$p50to75++;
}
if ( $sent >= 0.75 && $sent < 1 ) {
$p75to100++;
}
}
# print "$m100to75\n$m75to50\n$m50to25\n$m25to0\n$p0to25\n$p25to50\n$p50to75\n$p75to100\n";
print "tweets sentiment score summary histogram:\n";
print "-1.00 to -0.75: ";
for ( $i = 0 ; $i < $m100to75 ; $i++ ) {
print "*";
}
print "\n";
print "-0.75 to -0.50: ";
for ( $i = 0 ; $i < $m75to50 ; $i++ ) {
print "*";
}
print "\n";
print "-0.50 to -0.25: ";
for ( $i = 0 ; $i < $m50to25 ; $i++ ) {
print "*";
}
print "\n";
print "-0.25 to -0.00: ";
for ( $i = 0 ; $i < $m25to0 ; $i++ ) {
print "*";
}
print "\n";
print "+0.00 to +0.25: ";
for ( $i = 0 ; $i < $p0to25 ; $i++ ) {
print "*";
}
print "\n";
print "+0.25 to +0.50: ";
for ( $i = 0 ; $i < $p25to50 ; $i++ ) {
print "*";
}
print "\n";
print "+0.50 to +0.75: ";
for ( $i = 0 ; $i < $p50to75 ; $i++ ) {
print "*";
}
print "\n";
print "+0.75 to +1.00: ";
for ( $i = 0 ; $i < $p75to100 ; $i++ ) {
print "*";
}
print "\n";
But the terminal returns
Invalid version format (version required) at /Library/Perl/5.18/Module/Runtime.pm line 386.
BEGIN failed--compilation aborted at /Library/Perl/5.18/Net/Twitter.pm line 3.
Compilation failed in require at song_hanlun_hw8.pl line 3.
BEGIN failed--compilation aborted at song_hanlun_hw8.pl line 3.
Could someone tell me how to fix this?
After some Google search I highly believe downgrading the Module::Runtime would solve this issue. But I could't find how.
Eventually I solved this issue by delete the Runtime module file in perl folder and reinstalled the Net::Twitter module, which also reinstalled the Runtime that Net::Twitter depends on.

perl blowing up in sequence alignment by dynamic programming

I'm comparing a reference sequence of size 5500 bases and query sequence of size 3600, using dynamic programming (semi global alignment), in fact I don't know much about complexity and performance and the code is blowing up and giving me the error "out of memory". Knowing that it works normally on smaller sequences, my question is: This behavior is normal or I might have another problem in code ?if it's normal any hint to solve this problem ? Thanks in advance.
sub semiGlobal {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
$matrix[0][0]{score} = 0;
$matrix[0][0]{pointer} = "none";
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j]{score} = 0;
$matrix[0][$j]{pointer} = "none";
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0]{score} = $GAP * $i;
$matrix[$i][0]{pointer} = "up";
}
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2);
print "seq1: ".length($seq1);
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
if ( $letter1 eq $letter2 ) {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MATCH;
}
else {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MISMATCH;
}
# calculate gap scores
$up_score = $matrix[ $i - 1 ][$j]{score} + $GAP;
$left_score = $matrix[$i][ $j - 1 ]{score} + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j]{score} = $diagonal_score;
$matrix[$i][$j]{pointer} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j]{score} = $up_score;
$matrix[$i][$j]{pointer} = "up";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
# set maximum score
if ( $matrix[$i][$j]{score} > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = $matrix[$i][$j]{score};
}
}
}
my $align1 = "";
my $align2 = "";
my $j = $max_j;
my $i = $max_i;
while (1) {
if ( $matrix[$i][$j]{pointer} eq "none" ) {
$stseq1 = $j;
last;
}
if ( $matrix[$i][$j]{pointer} eq "diagonal" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "left" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "up" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
return ( $align1, $align2, $stseq1 ,$max_j);
}
One way to possibly solve the problem is to tie the #matrix with a file. However, this will dramatically slow down the program. Consider this:
sub semiGlobal {
use Tie::Array::CSV;
tie my #matrix, 'Tie::Array::CSV', 'temp.txt'; # Don't forget to add your own error handler.
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
$matrix[0][0] = '0 n';
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j] = '0 n';
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
my $score = $GAP * $i;
$matrix[$i][0] = join ' ',$score,'u';
}
#print Dumper(\#matrix);
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2)."\n";
print "seq1: ".length($seq1)."\n";
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
my $score = (split / /, $matrix[ $i - 1 ][ $j - 1 ])[0];
if ( $letter1 eq $letter2 ) {
$diagonal_score = $score + $MATCH;
}
else {
$diagonal_score = $score + $MISMATCH;
}
# calculate gap scores
$up_score = (split / /,$matrix[ $i - 1 ][$j])[0] + $GAP;
$left_score = (split / /,$matrix[$i][ $j - 1 ])[0] + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j] = join ' ',$diagonal_score,'d';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j] = join ' ', $up_score, 'u';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
# set maximum score
if ( (split / /, $matrix[$i][$j])[0] > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = (split / /, $matrix[$i][$j])[0];
}
}
}
my $align1 = "";
my $align2 = "";
my $stseq1;
my $j = $max_j;
my $i = $max_i;
while (1) {
my $pointer = (split / /, $matrix[$i][$j])[1];
if ( $pointer eq "n" ) {
$stseq1 = $j;
last;
}
if ( $pointer eq "d" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $pointer eq "l" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $pointer eq "u" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
untie #matrix; # Don't forget to add your own error handler.
unlink 'temp.txt'; # Don't forget to add your own error handler.
return ( $align1, $align2, $stseq1 ,$max_j);
}
You can still use your original sub for short sequences, and switch to this sub for long ones.
I think that #j_random_hacker and #Ashalynd are on the right track regarding using this algorithm in most Perl implementations. The datatypes you're using are going to use more memory that absolutely needed for the calculations.
So this is "normal" in that you should expect to see this kind of memory usage for how you've written this algorithm in perl. You may have other problems in surrounding code that are using a lot of memory but this algorithm will hit your memory hard with large sequences.
You can address some of the memory issues by changing the datatypes that you're using as #Ashalynd suggests. You could try changing the hash which holds score and pointer into an array and changing the string pointers into integer values. Something like this might get you some benefit while still maintaining readability:
use strict;
use warnings;
# define constants for array positions and pointer values
# so the code is still readable.
# (If you have the "Readonly" CPAN module you may want to use it for constants
# instead although none of the downsides of the "constant" pragma apply in this code.)
use constant {
SCORE => 0,
POINTER => 1,
DIAGONAL => 0,
LEFT => 1,
UP => 2,
NONE => 3,
};
...
sub semiGlobal2 {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
# score and pointer are now stored in an array
# using the defined constants as indices
$matrix[0][0][SCORE] = 0;
# pointer value is now a constant integer
$matrix[0][0][POINTER] = NONE;
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j][SCORE] = 0;
$matrix[0][$j][POINTER] = NONE;
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0][SCORE] = $GAP * $i;
$matrix[$i][0][POINTER] = UP;
}
... # continue to make the appropriate changes throughout the code
However, when I tested this I didn't get a huge benefit when attempting to align a 3600 char string in a 5500 char string of random data. I programmed my code to abort when it consumed more than 2GB of memory. The original code aborted after 23 seconds while the one using constants and an array instead of a hash aborted after 32 seconds.
If you really want to use this specific algorithm I'd check out the performance of Algorithm::NeedlemanWunsch. It doesn't look like it's very mature but it may have addressed your performance issues. Otherwise look into writing an Inline or Perl XS wrapper around a C implementation

Add time script Perl

I would like to create a script that will receive 2 paramerters (hours and minutes) ( HH1:MN1 and HH2:MN2)
It has to valid if the #ARGV = 2
Valide if the time provide is correct (hours between 0 to 200 and minutes between 0 to 59)
Add those thow time and print to results
If it is more than 24 hr to print Nbday; HH:Min
if it is more than 7 days it will print Week; nddays; HH:Min.
I started with this but cant figureout how to continue
Any help or idea will be welcomed for the calculation
Thanks
#!/usr/bin/perl
if ($#ARGV != 2)
{
print STDERR "Erreur Parameters have to be 2\n";
exit (-1);
}
if ($ARGV[0] = ~ / ([0-9] | 1 [0-9] ? [0-9] | 200 ) : ( [0-5] ? [0-9] ) /)
{
$heures1 = $1;
$minutes1 = $2;
}
else
{
print STDERR "first parameter invalid\";
exit (-1);
}
if ($ARGV[1] = ~ / ([0-9] | 1 [0-9] ? [0-9] | 200 ) : ( [0-5] ? [0-9] ) /)
{
$heures2 = $3;
$minutes2 = $4;
}
`else `
{
print STDERR "Second parameter invalid\";
exit (-1);
$heures = $heures1 + $heures2;
$minutes = $minutes1 + $minutes2'
The validation code is pretty straightforward:
sub usage {
print STDERR $_[0] if #_;
print STDERR "usage: ...\n";
exit(1);
}
usage() if #ARGV != 2;
my ($hours1, $minutes1) = $ARGV[1] =~ /^([0-9]+):([0-9]+)\z/ or usage();
my ($hours2, $minutes2) = $ARGV[1] =~ /^([0-9]+):([0-9]+)\z/ or usage();
0 <= $hours1 && $hours1 <= 200 or usage("Invalid number of hours for first argument\n");
0 <= $minutes1 && $minutes1 <= 59 or usage("Invalid number of minutes for first argument\n");
0 <= $hours2 && $hours2 <= 200 or usage("Invalid number of hours for second argument\n");
0 <= $minutes2 && $minutes2 <= 59 or usage("Invalid number of minutes for second argument\n");
The range check can be done by regex, but it's error prone and unreadable.
/^0*(0|1[0-9]{0,2}|2(?:00?|[1-9])?|[3-9][0-9]?):0*(0|[1-5][0-9]?|[6-9])\z/
(The regex could be a little simpler, but it's written to virtually eliminate the possibility of backtracking.)
You already asked and we gracefully provided solutions to the math part, so why are you asking again?
my ($hours1, $minutes1) = split /:/, $arg1;
my ($hours2, $minutes2) = split /:/, $arg2;
my $hours = $hours1 + $hours2;
my $minutes = $minutes1 + $minutes2;
$hours += ($minutes - ($minutes % 60)) / 60; $minutes %= 60;
my $days = ($hours - ($hours % 24)) / 24; $hours %= 24;
my $weeks = ($days - ($days % 7)) / 7; $days %= 7;
As for the output part, you should be able to manage on your own. One useful tip:
sprintf('%02d', $minutes) # 0-padded to two digits
#!/usr/bin/perl
die "Erreur Parameters have to be 2" if (scalar(#ARGV) != 2)
if ($ARGV[0] =~ /^([0-9]|1[0-9]?[0-9]|200):([0-5]?[0-9])$/) {
$heures1 = $1;
$minutes1 = $2;
} else {
die "first parameter invalid";
}
if ($ARGV[1] =~ /^([0-9]|1[0-9]?[0-9]|200):([0-5]?[0-9])$/) {
$heures2 = $3;
$minutes2 = $4;
} else {
die "Second parameter invalid";
}
$heures = $heures1 + $heures2;
$minutes = $minutes1 + $minutes2'

How to get all feature in a range from a GFF3 file in Perl?

I would like to write a Perl function that gets a GFF3 filename and a range (i.e. 100000 .. 2000000). and returns a reference to an array containing all names/accessions of genes found in this range.
I guess using bioperl will make sense, but I have very little experience with it. I can write a script that parses a GFF3 by my self, but if using bioperl (or another packagae) is not too complicated - I'd rather reuse their code.
use Bio::Tools::GFF;
my $range_start = 100000;
my $range_end = 200000;
my #features_in_range = ( );
my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
while (my $feature = $gffio->next_feature()) {
## What about features that are not contained within the coordinate range but
## do overlap it? Such features won't be caught by this check.
if (
($feature->start() >= $range_start)
&&
($feature->end() <= $range_end)
) {
push #features_in_range, $feature;
}
}
$gffio->close();
DISCLAIMER: Naive implementation. I just banged that out, it's had no testing. I won't even guarantee it compiles.
You do want to use BioPerl for this, using possibly the Bio::Tools::GFF module.
You should really ask on the BioPerl mailing list. It's very friendly and the subscribers are very knowledgeable -- they'll definitely be able to help you. And once you do get an answer (and if you don't get one here first), I suggest answering your own question here with the answer so we can all benefit!
The following function takes a hash of targets and ranges and returns a function that will iterate over all targets that overlap any of the ranges. The targets should be a reference to an array of references:
my $targets =
[
[
$start,
$end,
],
...,
]
The ranges should be a reference to an array of hashes:
my $ranges =
[
{
seqname => $seqname,
source => $source,
feature => $feature,
start => $start,
end => $end,
score => $score,
strand => $strand,
frame => $frame,
attribute => $attribute,
},
...,
]
You can, of course, just pass a single target.
my $brs_iterator
= binary_range_search( targets => $targets, ranges => $ranges );
while ( my $gff_line = $brs_iterator->() ) {
# do stuff
}
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || croak 'Need a targets parameter';
my $ranges = $options{ranges} || croak 'Need a ranges parameter';
my ( $low, $high ) = ( 0, $#{$ranges} );
my #iterators = ();
TARGET:
for my $range (#$targets) {
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down - 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while (#iterators) {
if ( my $range = $iterators[0]->() ) {
return $range;
}
shift #iterators;
}
return;
};
}

How to extend a binary search iterator to consume multiple targets

I have a function, binary_range_search, that is called like so:
my $brs_iterator = binary_range_search(
target => $range, # eg. [1, 200]
search => $ranges # eg. [ {start => 1, end => 1000},
); # {start => 500, end => 1500} ]
brs_iterator->() will iterate over all #$ranges on which $range overlaps.
I would like to extend binary_range_search to be able to call it with multiple ranges as its target, eg:
target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
search => $search_ranges # as above
So, when the search on $range->[0] is exhausted, it should move on to $range->[1], and so on. Here is the function in question, in its original form:
sub binary_range_search {
my %options = #_;
my $range = $options{target} || return;
my $ranges = $options{search} || return;
my ( $low, $high ) = ( 0, #{$ranges} - 1 );
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down + 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
return $brs_iterator;
}
return sub { };
}
It's a standard binary search strategy, until it finds an overlapping range. It then moves on the right, exhausts it, moves on the left, exhausts it, and finally gives up. Ideally, it should then maybe shift the next target range, and redo the search, I suppose (perhaps via recursion?). My problem is, I am not sure how to make that work with the iterator construction.
I just wrapped your iterator generation in a for loop, and built up an array of iterator functions.
Depending on context, I either return a master iterator or a list of iterator functions. I wasn't sure what you wanted.
use strict;
use warnings;
my $t = [ [1,200], [400,900] ];
my #r = (
{ start => 1, end => 100 },
{ start => 2, end => 500 },
{ start => 204, end => 500 },
{ start => 208, end => 500 },
{ start => 215, end => 1000 },
{ start => 150, end => 1000 },
{ start => 500, end => 1100 },
);
# Get a master iterator that will process each iterator in turn.
my $brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
# Get an array of iterators
my #brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || return;
my $ranges = $options{search} || return;
my #iterators;
TARGET:
for my $target ( #$targets ) {
my ( $low, $high ) = ( 0, $#{$ranges} );
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
# Remove non-overlapping ranges
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $target->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $target->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( exists $ranges->[$up + 1]
and $ranges->[ $up + 1 ]{end} >= $target->[0]
and $ranges->[ $up + 1 ]{start} <= $target->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $target->[0]
and $ranges->[ $down + 1 ]{start} <= $target->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while( #iterators ) {
if( my $range = $iterators[0]() ) {
return $range;
}
shift #iterators;
}
return;
};
}
If you're wanting to iterate over all values that overlap the search ranges, you don't need binary search.
First the customary front matter:
use warnings;
use strict;
use Carp;
First off, check that we have target and search parameters and that for each range, the starting point is no greater than its ending point. Otherwise, we refuse to proceed.
sub binary_range_search {
my %arg = #_;
my #errors;
my $target = $arg{target} || push #errors => "no target";
my $search = $arg{search} || push #errors => "no search";
for (#$target) {
my($start,$end) = #$_;
push #errors => "Target start ($start) is greater than end ($end)"
if $start > $end;
}
for (#$search) {
my($start,$end) = #{$_}{qw/ start end /};
push #errors => "Search start ($start) is greater than end ($end)"
if $start > $end;
}
croak "Invalid use of binary_range_search:\n",
map " - $_\n", #errors
if #errors;
The iterator itself is a closure that maintains the following state:
my $i;
my($ta,$tb);
my($sa,$sb);
my $si = 0;
where
$i if defined is the next value from the current overlapping range
$ta and $tb are the starting and ending points for the current target range
$sa and $sb are like the above but for the current search range
$si is an index into #$search and defines the current search range
We will be assigning and returning the iterator $it. The declaration and initialization are separate so the iterator may call itself when necessary.
my $it;
$it = sub {
We are done if no more target ranges remain or if there were no search ranges to begin with.
return unless #$target && #$search;
When $i is defined, it means we have found an overlap and iterate by incrementing $i until it is greater than the ending point of either the current target range or the current search range.
if (defined $i) {
# iterating within a target range
if ($i > $tb || $i > $sb) {
++$si;
undef $i;
return $it->();
}
else {
return $i++;
}
}
Otherwise, we need to determine whether the next target range overlaps any search range. However, if $i is undefined and we've already considered all the search ranges, we discard the current target range and start again.
else {
# does the next target range overlap?
if ($si >= #$search) {
shift #$target;
$si = 0;
return $it->();
}
Here we pull out the starting and ending points of both the current target range (always at the front of #$target) and the current search range (indexed by $si).
($ta,$tb) = #{ $target->[0] };
($sa,$sb) = #{ $search->[$si] }{qw/ start end /};
Now testing for overlap is straightforward. For disjoint search ranges, we ignore and move on. Otherwise, we find the leftmost point in the overlap and iterate from there.
if ($sb < $ta || $sa > $tb) {
# disjoint
++$si;
undef $i;
return $it->();
}
elsif ($sa >= $ta) {
$i = $sa;
return $i++;
}
elsif ($ta >= $sa) {
$i = $ta;
return $i++;
}
}
};
Finally, we return the iterator:
$it;
}
For an example similar to the one in your question
my $it = binary_range_search(
target => [ [1, 200], [50, 300] ],
search => [ { start => 1, end => 1000 },
{ start => 500, end => 1500 },
{ start => 40, end => 60 },
{ start => 250, end => 260 } ],
);
while (defined(my $value = $it->())) {
print "got $value\n";
}
Its output with internal points elided is
got 1
[...]
got 200
got 40
[...]
got 60
got 50
[...]
got 300
got 50
[...]
got 60
got 250
[...]
got 260
Split it into two functions, an outer function that loops over the ranges and calls an inner function that implements a conventional binary chop.
Warning: a very c++ biased answer:
what you have to do is define a new type of iterator, which is a pair of a usual iterator, and a segmemt iterrator (if you don't have a segment iterator, it's a pair of a const pointer / ref to the segments, and an index pointing to the correct segment). You have to define all the concepts of a random access iterator (difference, addition of integer, etc). bear in mind, that at least in c++ lingo this is not a true random iterator, since addition of an integer is not really constant time; such is life.