Trying to Develop PostFix Notation in Tree Using Perl - perl

I'm using Perl to run through a tree, and then calculate the leaf nodes of the tree using the internal nodes as operators. I want to be able to print this in a postfix manner, and I managed to this this fairly easily with the basic operands (simply call the left and right nodes respectively before calling the parent) but I am having trouble producing the desired output for an average function. I don't have any trouble printing the actual result of the calculation, but I want to be able to print the operators and operands in postfix notation.
For example, 1 + average(3, 4, 5) will be shown as 1 ; 3 4 5 average +.
Here is my code:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 0;
$Data::Dumper::Terse = 1;
my $debug = 0;
# an arithmetic expression tree is a reference to a list, which can
# be of two kinds as follows:
# [ 'leaf', value ]
# [ 'internal', operation, leftarg, rightarg ]
# Evaluate($ex) takes an arithmetic expression tree and returns its
# evaluated value.
sub Evaluate {
my ($ex) = #_;
$debug and
print "evaluating: ", Dumper($ex), "\n";
# the kind of node is given in the first element of the array
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
# base case
my $value = $ex->[1];
$debug and
print "returning leaf: $value\n";
return $value;
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Eval: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and two arguments
my $operation = $ex->[1];
my $left_ex = $ex->[2];
my $right_ex = $ex->[3];
# evaluate the left and right arguments;
my $left_value = Evaluate($left_ex);
my $right_value = Evaluate($right_ex);
# if any arguments are undefined, our value is undefined.
return undef unless
defined($left_value) and defined($right_value);
my $result;
# or do it explicitly for the required operators ...
if ($operation eq 'average') {
$result = ($left_value + $right_value) / 2;
}
if ($operation eq '+') {
$result = $left_value + $right_value;
} elsif ($operation eq '-') {
$result = $left_value - $right_value;
} elsif ($operation eq '*') {
$result = $left_value * $right_value;
} elsif ($operation eq 'div') {
if ($right_value != 0 ) {
$result = int ($left_value / $right_value);
} else {
$result = undef;
}
} elsif ($operation eq 'mod') {
$result = $left_value % $right_value;
} elsif ($operation eq '/') {
if ( $right_value != 0 ) {
$result = $left_value / $right_value;
}
else {
$result = undef;
}
}
$debug and
print "returning '$operation' on $left_value and $right_value result: $result\n";
return $result;
}
# Display($ex, $style) takes an arithmetic expression tree and a style
# parameter ('infix' or 'postfix') and returns a string that represents
# printable form of the expression in the given style.
sub Display {
my ($ex, $style) = #_;
# the kind of node is given in the first element of the array
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
# base case
my $value = $ex->[1];
return $value;
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Display: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and two arguments
my $operation = $ex->[1];
my $left_ex = $ex->[2];
my $right_ex = $ex->[3];
# evaluate the left and right arguments;
my $left_value = Display($left_ex, $style);
my $right_value = Display($right_ex, $style);
my $result;
if ($operation ne 'average') {
$result = "($left_value $operation $right_value) \n $left_value $right_value $operation";
} else {
$result = "($left_value $operation $right_value) \n $left_value $right_value $operation";
}
return $result;
}
# module end;
1;
And here is a test:
use strict;
use warnings;
use Display;
use arith;
my $ex1 = [ 'leaf', 42];
my $ex2 = [ 'internal', '+', [ 'leaf', 42], [ 'leaf', 10 ] ];
my $ex3 = [ 'internal', 'average', $ex2, [ 'leaf', 1 ] ];
print "ex1 is ", Evaluate($ex1), "\n";
print "ex1: ", Display($ex1), "\n";
print "\n";
print "ex2 is ", Evaluate($ex2), "\n";
print "ex2: ", Display($ex2), "\n";
print "\n";
print "ex3 is ", Evaluate($ex3), "\n";
print "ex3: ", Display($ex3), "\n";
print "\n";
Display::Render(\$ex3);
In order to do this, I realize I will have to change the subroutine "Display", but I'm not sure how to get the output --> value value ; #to indicate values that aren't averaged# value value average operand etc.
Any ideas?

I am not 100% sure that I understand your problem, but here is a cleanup / improvement of your two functions:
my %ops = ( # dispatch table for operations
average => sub {my $acc; $acc += $_ for #_; $acc / #_},
'+' => sub {$_[0] + $_[1]},
'-' => sub {$_[0] - $_[1]},
'*' => sub {$_[0] * $_[1]},
'mod' => sub {$_[0] % $_[1]},
(map {$_ => sub {$_[1] ? $_[0] / $_[1] : undef}} qw (/ div)),
);
sub Evaluate {
my $ex = shift;
print "evaluating: ", Dumper($ex), "\n" if $debug;
my $node_type = $ex->[0];
if ( $node_type eq 'leaf' ) {
print "returning leaf: $$ex[1]\n" if $debug;
return $$ex[1];
}
elsif ( $node_type ne 'internal' ) {
die "Eval: Strange node type '$node_type' when evaluating tree";
}
my $operation = $ex->[1];
my #values = map {Evaluate($_)} #$ex[2 .. $#$ex];
defined or return for #values;
if (my $op = $ops{$operation}) {
return $op->(#values);
} else {
print "operation $operation not found\n";
return undef;
}
}
Here the large if/elsif block is replaced with a dispatch table. This allows you to separate the logic from the parser. I have also replaced the $left_value and $right_value variables with the #values array, allowing your code to scale to n-arity operations (like average).
The following Display function has also been updated to handle n-arity operations:
my %is_infix = map {$_ => 1} qw( * + / - );
sub Display {
my ($ex, $style) = #_;
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
return $$ex[1];
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Display: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and n arguments
my $operation = $ex->[1];
if ($style and $style eq 'infix') {
my #values = map {Display($_, $style)} #$ex[2 .. $#$ex];
if ($is_infix{$operation}) {
return "$values[0] $operation $values[1]"
} else {
local $" = ', '; # "
return "$operation( #values )"
}
} else { # postfix by default
my #out;
for (#$ex[2 .. $#$ex]) {
if (#out and $_->[0] eq 'internal') {
push #out, ';'
}
push #out, Display($_, $style)
}
return join ' ' => #out, $operation;
}
}
You can call Display as Display($tree) or Display($tree, 'postfix') for postfix notation. And Display($tree, 'infix') for the infix notation.
ex1 is 42
ex1: 42
ex1: 42
ex2 is 52
ex2: 42 10 +
ex2: 42 + 10
ex3 is 26.5
ex3: 42 10 + 1 average
ex3: average( 42 + 10, 1 )
Which I believe is what you are looking for.
Finally, using your first example 1 + average(3, 4, 5):
my $avg = ['internal', 'average', [leaf => 3], [leaf => 4], [leaf => 5] ];
my $ex4 = ['internal', '+', [leaf => 1], $avg ];
print "ex4 is ", Evaluate($ex4), "\n";
print "ex4: ", Display($ex4), "\n";
print "ex4: ", Display($ex4, 'infix'), "\n";
print "\n";
which prints:
ex4 is 5
ex4: 1 ; 3 4 5 average +
ex4: 1 + average( 3, 4, 5 )

Maybe try AlgebraicToRPN?

Related

Sum values in second column in a text file

I have a text file with the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I need to sum the second column of data and return that value to the user.
My current attempt reads the data from the text file like so:
sub read_file{
my $data_failed = 1;
my $file = 'task_file';
if (-z $file){print "No tasks found\n";}
if(open (INFILE, "task_file" || die "$!\n")){
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<INFILE>) {
$data_failed = 0;
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
for my $column ($COLUMNS[2]) {
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ' ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
if($total_memory_size == 1){
sum_memory_size();
}
} else { print "No tasks found"}
}
And my attempt to sum the values:
sub sum_memory_size{
my #do_sum = qw(memory_size);
my $column;
my $value;
for $column (keys %{ $sort{$column}{$value} }) {
my $found;
my $sum = 0;
for $value (#do_sum) {
next unless exists $sort{$column}{$value};
$sum += $hash{$column}{$value};
delete $hash{$column}{$value};
$found = 1;
}
$sort{$column}{somename} = $sum if $found;
}
print $sum;
}
Use sum form List::Util:
use List::Util qw{ sum };
# ...
my $memory_sum = sum(map $_->{memory}, values %process_details);
The question describes a simple task, which appears to be part of a larger problem. One answer to the simple task is:
perl -ne '#fields=split; $sum+=$fields[1]; END { print "$sum\n" }' foo.txt
For each line (-ne), split it and add the second value ($fields[1]) to a running $sum. Then, after all lines have been processed (END), print the sum. I get 13078 from the program, which is also what I get from a calculator on the input above :) (4730 + 0 + 450 + 5928 + 1970).

Perl Mismatch among arrays

I have two arrays:
#array1 = (A,B,C,D,E,F);
#array2 = (A,C,H,D,E,G);
The arrays could be of different size. I want to find how many mismatches are there between the arrays. The indexes should be the same. In this case there are three mismatch :b->c,c->h and F->G.(i.e , The 'C' in $array[2] should not be considered a match to 'C' in $array[1]) I would like to get the number of mismatches as well as the mismatch.
foreach my $a1 ( 0 .. $#array1) {
foreach my $a2( 0 .. $#array2)
if($array1[$a1] ne $array2[$a2]) {
}
}
}
my %array_one = map {$_, 1} #array1;
my #difference = grep {!$array_one {$_}} #array1;
print "#difference\n";
Ans: gives me H, G but not C.
with my little Perl knowledge I tried this, with no result. Could you suggest me how I should deal this? Your suggestions and pointers would be very helpful.
You shouldn't have nested loops. You only need to go through the indexes once.
use List::Util qw( max );
my #mismatches;
for my $i (0..max($#array1, $#array2)) {
push #mismatches, $i
if $i >= #array1
|| $i >= #array2
|| $array1[$i] ne $array2[$i];
}
}
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Since you mentioned grep, this is how you'd replace the for with grep:
use List::Util qw( max );
my #mismatches =
grep { $_ >= #array1
|| $_ >= #array2
|| array1[$_] ne $array2[$_] }
0 .. max($#array1, $#array2);
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Here's an example using each_arrayref from List::MoreUtils.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my #out;
my $iter = each_arrayref(#_);
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}
This version should be faster if you are going to use this for determining the number of differences often. The output is exactly the same. It just doesn't have to work as hard when returning a number.
Read about wantarray for more information.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my $iter = each_arrayref(#_);
if( wantarray ){
# return structure
my #out;
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}else{
# only return a count of differences
my $out = 0;
while( my #current = $iter->() ){
++$out unless all_same #current;
}
return $out;
}
}
diff_array uses the subroutine all_same to determine if all of the current list of elements are the same.
sub all_same{
my $head = shift;
return undef unless #_; # not enough arguments
for( #_ ){
return 0 if $_ ne $head; # at least one mismatch
}
return 1; # all are the same
}
To get just the number of differences:
print scalar diff_array \#array1, \#array2;
my $count = diff_array \#array1, \#array2;
To get a list of differences:
my #list = diff_array \#array1, \#array2;
To get both:
my $count = my #list = diff_array \#array1, \#array2;
The output for the input you provided:
(
[ 1, 'B', 'C' ],
[ 2, 'C', 'H' ],
[ 5, 'F', 'G' ]
)
Example usage
my #a1 = qw'A B C D E F';
my #a2 = qw'A C H D E G';
my $count = my #list = diff_array \#a1, \#a2;
print "There were $count differences\n\n";
for my $group (#list){
my $index = shift #$group;
print " At index $index\n";
print " $_\n" for #$group;
print "\n";
}
You're iterating over both arrays when you don't want to be doing so.
#array1 = ("A","B","C","D","E","F");
#array2 = ("A","C","H","D","E","G");
foreach my $index (0 .. $#array1) {
if ($array1[$index] ne $array2[$index]) {
print "Arrays differ at index $index: $array1[$index] and $array2[$index]\n";
}
}
Output:
Arrays differ at index 1: B and C
Arrays differ at index 2: C and H
Arrays differ at index 5: F and G
Well, first, you're going to want to go over each element of one of the arrays, and compare it to the same element of the other array. List::MoreUtils provides an easy way to do this:
use v5.14;
use List::MoreUtils qw(each_array);
my #a = qw(a b c d);
my #b = qw(1 2 3);
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
say "a = $a, b = $b, idx = ", $ea->('index');
}
You can extend that to find where there is a non-match by checking inside that while loop (note: this assumes your arrays don't have undefs at the end, or that if they do, undef is the same as having a shorter array):
my #mismatch;
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
if (defined $a != defined $b || $a ne $b) {
push #mismatch, $ea->('index');
}
}
and then:
say "Mismatched count = ", scalar(#mismatch), " items are: ", join(q{, }, #mismatch);
The following code builds a list of mismatched pairs, then prints them out.
#a1 = (A,B,C,D,E,F);
#a2 = (A,C,H,D,E,G);
#diff = map { [$a1[$_] => $a2[$_]] }
grep { $a1[$_] ne $a2[$_] }
(0..($#a1 < $#a2 ? $#a1 : $#a2));
print "$_->[0]->$_->[1]\n" for #diff
You have the right idea, but you only need a single loop, since you are looking at each index and comparing entries between the arrays:
foreach my $a1 ( 0 .. $#array1) {
if($array1[$a1] ne $array2[$a1]) {
print "$a1: $array1[$a1] <-> $array2[$a1]\n";
}
}

Simple compiler Errors: Too new to Perl to know how to fix them

Hello I just began learning Perl. I have this simple temperature application but it is not compiling, I dont understand what are the errors & how I can fix them?
Can you point out what I am doing wrong & how I fix them:
#!/usr/local/bin/perl
use strict;
use warnings;
use constant false >= 0;
use constant true >= 1;
# Experimentation: Array & Hashes:
my #ar = ("a", "b", "c"); # error here
print $ar, "\n";
print $ar[0], "\n";
print $#ar, "\n";
print "The size of array = $#ar \n";
print "Trying to print $ar[1] within a string: ", $ar[1], " \n";
my %ha = ( "a" => 1, "b", "c", "d" ); # error here
print $ha, "\n";
print $ha{"a"}, "\n";
print $#ha, "\n";
print "The size of hash = $#ha \n";
print "Trying to print $ha{'b'} within string: ", $ha{'b'}, " \n";
# Functions:
sub isfloat
{
my $val = shift;
return $val =~ m/^\d+.\d+$/;
}
sub toFahrenheit
{
my $val = #_[0];
# if param var is a float
if ( isFloat($val) == 0 ) # Error here: Scalar value #_[0] better written as $_[0]
{
return -1;
}
return ( ($val * (9/5)) + 32 );
}
sub toCelsius
{
if ( isFloat(#_[0]) == 0 )
{
return -1;
}
return ( (#_[0] - 32) * (5/9) );
}
# Main Program:
my $programEnd = 0;
while ( $programEnd == 0 )
{
my $menu = "*** Welcome to Temperature Converter *** \n\n1. Convert from Celsius to Fahrenheit \n2. Convert from Fahrenheit to Celsius \n3. Exit \n Enter decision (1,2 or 3): ";
print $menu;
my $decision = <>;
if ( $decision == 3 )
{
$programEnd = 1;
# Could also just do this
# break;
}
print "Please enter a number: ";
my $val = <>;
if ( isfloat($val) )
{
$conVal = -1;
if ( decision == 1 )
{
$conVal = toFahrenheit( $val );
print $val, " C in Fahrenheit is: ", $conVal, " F \n";
}
else
{
$conVal = toCelsius( $val );
print $val, " F in Celsius is: ", $conVal, " C \n";
}
}
else { print $val, " is not a number \n"; }
}
The unhelpful messages about "Subroutine BEGIN redefined" actually is caused by these two lines:
use constant false >= 0;
use constant true >= 1;
You mean them to be:
use constant false => 0;
use constant true => 1;
The error "Scalar value #_[0] better written as $_[0]" is because in perl when you refer to an element of an array #arr, you say $arr[0], not #arr[0], because the element itself is a scalar.
>= ≠ =>

How can I check if all elements of an array are identical in Perl?

I have an array #test. What's the best way to check if each element of the array is the same string?
I know I can do it with a foreach loop but is there a better way to do this? I checked out the map function but I'm not sure if that's what I need.
If the string is known, you can use grep in scalar context:
if (#test == grep { $_ eq $string } #test) {
# all equal
}
Otherwise, use a hash:
my %string = map { $_, 1 } #test;
if (keys %string == 1) {
# all equal
}
or a shorter version:
if (keys %{{ map {$_, 1} #test }} == 1) {
# all equal
}
NOTE: The undefined value behaves like the empty string ("") when used as a string in Perl. Therefore, the checks will return true if the array contains only empty strings and undefs.
Here's a solution that takes this into account:
my $is_equal = 0;
my $string = $test[0]; # the first element
for my $i (0..$#test) {
last unless defined $string == defined $test[$i];
last if defined $test[$i] && $test[$i] ne $string;
$is_equal = 1 if $i == $#test;
}
Both methods in the accepted post give you the wrong answer if #test = (undef, ''). That is, they declare an undefined value to be equal to the empty string.
That might be acceptable. In addition, using grep goes through all elements of the array even if a mismatch is found early on and using the hash more than doubles the memory used by elements of array. Neither of these would be a problem if you have small arrays. And, grep is likely to be fast enough for reasonable list sizes.
However, here is an alternative that 1) returns false for (undef, '') and (undef, 0), 2) does not increase the memory footprint of your program and 3) short-circuits as soon as a mismatch is found:
#!/usr/bin/perl
use strict; use warnings;
# Returns true for an empty array as there exist
# no elements of an empty set that are different
# than each other (see
# http://en.wikipedia.org/wiki/Vacuous_truth)
sub all_the_same {
my ($ref) = #_;
return 1 unless #$ref;
my $cmpv = \ $ref->[-1];
for my $i (0 .. $#$ref - 1) {
my $this = \ $ref->[$i];
return unless defined $$cmpv == defined $$this;
return if defined $$this
and ( $$cmpv ne $$this );
}
return 1;
}
However, using List::MoreUtils::first_index is likely to be faster:
use List::MoreUtils qw( first_index );
sub all_the_same {
my ($ref) = #_;
my $first = \ $ref->[0];
return -1 == first_index {
(defined $$first != defined)
or (defined and $_ ne $$first)
} #$ref;
}
TIMTOWTDI, and I've been reading a lot of Mark Jason Dominus lately.
use strict;
use warnings;
sub all_the_same {
my $ref = shift;
return 1 unless #$ref;
my $cmp = $ref->[0];
my $equal = defined $cmp ?
sub { defined($_[0]) and $_[0] eq $cmp } :
sub { not defined $_[0] };
for my $v (#$ref){
return 0 unless $equal->($v);
}
return 1;
}
my #tests = (
[ qw(foo foo foo) ],
[ '', '', ''],
[ undef, undef, undef ],
[ qw(foo foo bar) ],
[ '', undef ],
[ undef, '' ]
);
for my $i (0 .. $#tests){
print "$i. ", all_the_same($tests[$i]) ? 'equal' : '', "\n";
}
You can check how many times the element in the array (#test) is repeated by counting it in a hash (%seen). You can check how many keys ($size) are present in the hash (%seen). If more than 1 key is present, you know that the elements in the array are not identical.
sub all_the_same {
my #test = #_;
my %seen;
foreach my $item (#test){
$seen{$item}++
}
my $size = keys %seen;
if ($size == 1){
return 1;
}
else{
return 0;
}
}
I think, we can use List::MoreUtils qw(uniq)
my #uniq_array = uniq #array;
my $array_length = #uniq_array;
$array_length == 1 ? return 1 : return 0;
I use List::Util::first for all similar purposes.
# try #0: $ok = !first { $_ ne $string } #test;
# try #1: $ok = !first { (defined $_ != defined $string) || !/\A\Q$string\E\z/ } #test;
# final solution
use List::Util 'first';
my $str = shift #test;
my $ok = !first { defined $$_ != defined $str || defined $str && $$_ ne $str } map \$_, #test;
I used map \$_, #test here to avoid problems with values that evaluate to false.
Note. As cjm noted fairly, using map defeats the advantage of first short-circuiting. So I tip my hat to Sinan with his first_index solution.

How to extend a binary search iterator to consume multiple targets

I have a function, binary_range_search, that is called like so:
my $brs_iterator = binary_range_search(
target => $range, # eg. [1, 200]
search => $ranges # eg. [ {start => 1, end => 1000},
); # {start => 500, end => 1500} ]
brs_iterator->() will iterate over all #$ranges on which $range overlaps.
I would like to extend binary_range_search to be able to call it with multiple ranges as its target, eg:
target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
search => $search_ranges # as above
So, when the search on $range->[0] is exhausted, it should move on to $range->[1], and so on. Here is the function in question, in its original form:
sub binary_range_search {
my %options = #_;
my $range = $options{target} || return;
my $ranges = $options{search} || return;
my ( $low, $high ) = ( 0, #{$ranges} - 1 );
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down + 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
return $brs_iterator;
}
return sub { };
}
It's a standard binary search strategy, until it finds an overlapping range. It then moves on the right, exhausts it, moves on the left, exhausts it, and finally gives up. Ideally, it should then maybe shift the next target range, and redo the search, I suppose (perhaps via recursion?). My problem is, I am not sure how to make that work with the iterator construction.
I just wrapped your iterator generation in a for loop, and built up an array of iterator functions.
Depending on context, I either return a master iterator or a list of iterator functions. I wasn't sure what you wanted.
use strict;
use warnings;
my $t = [ [1,200], [400,900] ];
my #r = (
{ start => 1, end => 100 },
{ start => 2, end => 500 },
{ start => 204, end => 500 },
{ start => 208, end => 500 },
{ start => 215, end => 1000 },
{ start => 150, end => 1000 },
{ start => 500, end => 1100 },
);
# Get a master iterator that will process each iterator in turn.
my $brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
# Get an array of iterators
my #brs_iterator = binary_range_search(
targets => $t,
search => \#r,
);
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || return;
my $ranges = $options{search} || return;
my #iterators;
TARGET:
for my $target ( #$targets ) {
my ( $low, $high ) = ( 0, $#{$ranges} );
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
# Remove non-overlapping ranges
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $target->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $target->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( exists $ranges->[$up + 1]
and $ranges->[ $up + 1 ]{end} >= $target->[0]
and $ranges->[ $up + 1 ]{start} <= $target->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $target->[0]
and $ranges->[ $down + 1 ]{start} <= $target->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while( #iterators ) {
if( my $range = $iterators[0]() ) {
return $range;
}
shift #iterators;
}
return;
};
}
If you're wanting to iterate over all values that overlap the search ranges, you don't need binary search.
First the customary front matter:
use warnings;
use strict;
use Carp;
First off, check that we have target and search parameters and that for each range, the starting point is no greater than its ending point. Otherwise, we refuse to proceed.
sub binary_range_search {
my %arg = #_;
my #errors;
my $target = $arg{target} || push #errors => "no target";
my $search = $arg{search} || push #errors => "no search";
for (#$target) {
my($start,$end) = #$_;
push #errors => "Target start ($start) is greater than end ($end)"
if $start > $end;
}
for (#$search) {
my($start,$end) = #{$_}{qw/ start end /};
push #errors => "Search start ($start) is greater than end ($end)"
if $start > $end;
}
croak "Invalid use of binary_range_search:\n",
map " - $_\n", #errors
if #errors;
The iterator itself is a closure that maintains the following state:
my $i;
my($ta,$tb);
my($sa,$sb);
my $si = 0;
where
$i if defined is the next value from the current overlapping range
$ta and $tb are the starting and ending points for the current target range
$sa and $sb are like the above but for the current search range
$si is an index into #$search and defines the current search range
We will be assigning and returning the iterator $it. The declaration and initialization are separate so the iterator may call itself when necessary.
my $it;
$it = sub {
We are done if no more target ranges remain or if there were no search ranges to begin with.
return unless #$target && #$search;
When $i is defined, it means we have found an overlap and iterate by incrementing $i until it is greater than the ending point of either the current target range or the current search range.
if (defined $i) {
# iterating within a target range
if ($i > $tb || $i > $sb) {
++$si;
undef $i;
return $it->();
}
else {
return $i++;
}
}
Otherwise, we need to determine whether the next target range overlaps any search range. However, if $i is undefined and we've already considered all the search ranges, we discard the current target range and start again.
else {
# does the next target range overlap?
if ($si >= #$search) {
shift #$target;
$si = 0;
return $it->();
}
Here we pull out the starting and ending points of both the current target range (always at the front of #$target) and the current search range (indexed by $si).
($ta,$tb) = #{ $target->[0] };
($sa,$sb) = #{ $search->[$si] }{qw/ start end /};
Now testing for overlap is straightforward. For disjoint search ranges, we ignore and move on. Otherwise, we find the leftmost point in the overlap and iterate from there.
if ($sb < $ta || $sa > $tb) {
# disjoint
++$si;
undef $i;
return $it->();
}
elsif ($sa >= $ta) {
$i = $sa;
return $i++;
}
elsif ($ta >= $sa) {
$i = $ta;
return $i++;
}
}
};
Finally, we return the iterator:
$it;
}
For an example similar to the one in your question
my $it = binary_range_search(
target => [ [1, 200], [50, 300] ],
search => [ { start => 1, end => 1000 },
{ start => 500, end => 1500 },
{ start => 40, end => 60 },
{ start => 250, end => 260 } ],
);
while (defined(my $value = $it->())) {
print "got $value\n";
}
Its output with internal points elided is
got 1
[...]
got 200
got 40
[...]
got 60
got 50
[...]
got 300
got 50
[...]
got 60
got 250
[...]
got 260
Split it into two functions, an outer function that loops over the ranges and calls an inner function that implements a conventional binary chop.
Warning: a very c++ biased answer:
what you have to do is define a new type of iterator, which is a pair of a usual iterator, and a segmemt iterrator (if you don't have a segment iterator, it's a pair of a const pointer / ref to the segments, and an index pointing to the correct segment). You have to define all the concepts of a random access iterator (difference, addition of integer, etc). bear in mind, that at least in c++ lingo this is not a true random iterator, since addition of an integer is not really constant time; such is life.