Create collection of bitwise matches from an integer range - perl

The OVS documentation
... describes populating rules in the following format:
Range matches can be expressed as a collection of bitwise matches. For
example, suppose that the goal is to match TCP source ports 1000 to
1999, inclusive. The binary representations of 1000 and 1999 are:
01111101000
11111001111
The following series of bitwise matches will match 1000 and 1999 and
all the values in between:
01111101xxx
0111111xxxx
10xxxxxxxxx
110xxxxxxxx
1110xxxxxxx
11110xxxxxx
1111100xxxx
which can be written as the following matches:
tcp,tp_src=0x03e8/0xfff8
tcp,tp_src=0x03f0/0xfff0
tcp,tp_src=0x0400/0xfe00
tcp,tp_src=0x0600/0xff00
tcp,tp_src=0x0700/0xff80
tcp,tp_src=0x0780/0xffc0
tcp,tp_src=0x07c0/0xfff0
I'm trying to determine the correct way to generate those matches based on a minimum and maximum integer value in perl. I looked at the module Bit::Vector , but I wasn't able to figure out how to effectively use it for this purpose.

Let's pretend we trying to solve the equivalent problem for decimal for a second.
Say you want 567 (inclusive) to 1203 (exclusive).
Enlarging phase
You increment by 1 until you have the a multiple of 10 or you would exceed the range.
⇒598 (Creates 597-597)
⇒599 (Creates 598-598)
⇒600 (Creates 599-599)
You increment by 10 until you have a multiple of 100 or you would exceed the range.
You increment by 100 until you have a multiple of 1000 or you would exceed the range.
⇒700 (Creates 600-699)
⇒800 (Creates 700-799)
⇒900 (Creates 800-899)
⇒1000 (Creates 900-999)
You increment by 1000 until you have a multiple of 10000 or you would exceed the range.
[Would exceed limit]
Shrinking phase
You increment by 100 until you would exceed the range.
⇒1100 (Creates 1000-1099)
⇒1200 (Creates 1100-1199)
You increment by 10 until you would exceed the range.
You increment by 1 until you would exceed the range.
⇒1201 (Creates 1200-1200)
⇒1202 (Creates 1201-1201)
⇒1203 (Creates 1202-1202)
Same in binary, but with powers of 2 instead of powers of 10.
my $start = 1000;
my $end = 1999 + 1;
my #ranges;
my $this = $start;
my $this_power = 1;
OUTER: while (1) {
my $next_power = $this_power * 2;
while ($this % $next_power) {
my $next = $this + $this_power;
last OUTER if $next > $end;
my $mask = ~($this_power - 1) & 0xFFFF;
push #ranges, sprintf("0x%04x/0x%x", $this, $mask);
$this = $next;
}
$this_power = $next_power;
}
while ($this_power > 1) {
$this_power /= 2;
while (1) {
my $next = $this + $this_power;
last if $next > $end;
my $mask = ~($this_power - 1) & 0xFFFF;
push #ranges, sprintf("0x%04x/0x%x", $this, $mask);
$this = $next;
}
}
say for #ranges;
We can optimize that by taking advantage of the fact that we're dealing with binary.
my $start = 1000;
my $end = 1999 + 1;
my #ranges;
my $this = $start;
my $power = 1;
my $mask = 0xFFFF;
while ($start & $mask) {
if ($this & $power) {
push #ranges, sprintf("0x%04x/0x%x", $this, $mask);
$this += $power;
}
$mask &= ~$power;
$power <<= 1;
}
while ($end & ~$mask) {
$power >>= 1;
$mask |= $power;
if ($end & $power) {
push #ranges, sprintf("0x%04x/0x%x", $this, $mask);
$this |= $power;
}
}
say for #ranges;
Output:
0x03e8/0xfff8
0x03f0/0xfff0
0x0400/0xfe00
0x0600/0xff00
0x0700/0xff80
0x0780/0xffc0
0x07c0/0xfff0

I attempted to use the very elegant solution provided by #ikegami but found there were edge cases with resulting ports outside of the range or missing ports (e.g. 1-6, 1000-4000, 1000-10000). This alternative approach seems to avoid these issues.
my $LIMIT = 65535;
sub maxPort {
my ($port, $mask) = #_;
my $xid = $LIMIT - $mask;
my $nid = $port & $mask;
return $nid + $xid;
}
sub portMask {
my ($port, $end) = #_;
my $mask = $LIMIT;
my $test_mask = $LIMIT;
my $bit = 1;
my $net = $port & $LIMIT;
my $max_port = maxPort($net, $LIMIT);
while ($net && ($max_port <= $end)) {
$net = $port & $test_mask;
if ($net < $port) {
last;
}
$max_port = maxPort($net, $test_mask);
if ($max_port <= $end) {
$mask = $test_mask;
}
$test_mask -= $bit;
$bit <<= 1;
}
return $mask;
}
sub maskRange {
my ($start, $end) = #_;
my #portMasks;
if (($end <= $start) || ($end > $LIMIT)) {
exit 1;
}
my $mask = $LIMIT;
my $port = $start;
while ($port <= $end) {
$mask = portMask($port, $end);
push #portMasks, sprintf("0x%04x/0x%x", $port, $mask);
$port = maxPort($port, $mask) + 1;
}
return #portMasks;
}
my #ranges = maskRange(1000, 1999);
for (#ranges) {
print("$_", "\n");
}
Outputs:
0x03e8/0xfff8
0x03f0/0xfff0
0x0400/0xfe00
0x0600/0xff00
0x0700/0xff80
0x0780/0xffc0
0x07c0/0xfff0

Related

Best data structure for a sparse ordered 2D array of floats allowing interpolation (perl)

The data are stock options. I want to make a 2D array based on days till expiration (int) & normalized distance out of the money (float), with the values being a list of normalized bid and ask prices. If the desired element is not in the array, I want to be able to interpolate between the nearest elements present.
I see 3 possible data structures:
A sparse 2D array, maybe 10000 elements, maybe 1/3 full.
A 2D linked list, ie: 4 listpointers for each data element (so 3000 elements becomes 15000)
A 2D hash (maybe 3000 elements), with 2 sorted lists of the keys (maybe 100 elements each) in each dimension.
The main problem is efficient retrieval when interpolation is required.
Retrieval of existing elements is relatively straight-forward with any method.
I'm currently using choice 3, but retrieval is a bit of a kloodge, since I have to scan along the keylists of each dimension till I find occupied elements, and then do a 2- or 4-way interpolation.
I use moreUtils::firstindx($_ > $desiredKey) to find the keys. The linked lists (choice 2) would spare me the search of the keylist arrays.
Choice 1 would also require scanning, wouldn't need the initial step of keylist lookup, but might need to look at more empty cells. And insertion would be a real hassle.
I would be doing many more searches than insertions.
Does any one have any suggestions for the most efficient data structure.
Since you predominantly perform lookups by lifespan and lookups by distance, and few inserts, I'd use sorted arrays to lookup the records by binary search.
Locating an existing element: O(log N)
Locating the box of a missing element: O(log N)
Inserting: O(N)
Given,
my #data = (
[ $lifespan0, $distance0, $bid0, $ask0 ],
[ $lifespan1, $distance1, $bid1, $ask1 ],
...
);
my $lifespan_search_cmp = sub { $a <=> $data[$b][0] };
my $distance_search_cmp = sub { $a <=> $data[$b][1] };
First, create indexes:
my #by_lifespan = sort { $data[$a][0] <=> $data[$b][0] } 0..$#data;
my #by_distance = sort { $data[$a][1] <=> $data[$b][1] } 0..$#data;
To lookup:
my $i = binsearch_first \&$lifespan_search_cmp, $lifespan, #by_lifespan;
my $j = binsearch_first \&$distance_search_cmp, $distance, #by_distance;
my #lifespan_matching_idxs = get_run_forward \&$lifespan_search_cmp, $lifespan, $i, #by_lifespan;
my #distance_matching_idxs = get_run_forward \&$distance_search_cmp, $distance, $j, #by_distance;
my #cross_match_idxs = do {
my %lifespan_matching_idxs = map { $_ => 1 } #lifespan_matching_idxs;
grep { $lifespan_matching_idxs{$_} }
#distance_matching_idxs
};
if (#cross_match_idxs) {
# Exact match(es) found.
...
} else {
my $lifespan_lowerbracket;
my $lifespan_upperbracket;
if ($i >= 0) {
$lifespan_lowerbracket = $lifespan;
$lifespan_upperbracket = $lifespan;
} else {
die "Can't interpolate" if ~$i == 0 || ~$i >= #by_lifespan;
$lifespan_lowerbracket = $data[~$i ][0];
$lifespan_lowerbracket = $data[~$i - 1][0];
}
my $distance_lowerbracket;
my $distance_upperbracket;
if ($i >= 0) {
$distance_lowerbracket = $distance;
$distance_upperbracket = $distance;
} else {
die "Can't interpolate" if ~$j == 0 || ~$j >= #by_distance;
$distance_lowerbracket = $data[~$j ][1];
$distance_upperbracket = $data[~$j - 1][1];
}
...
}
To insert:
my $i = binsearch_first \&$lifespan_search_cmp, $lifespan, #by_lifespan;
my $j = binsearch_first \&$distance_search_cmp, $distance, #by_distance;
push #data, [ $lifespan, $distance , $bid, $ask ];
splice(#by_lifespan, $i >= 0 ? $i : ~$i, 0, $#data);
splice(#by_distance, $j >= 0 ? $j : ~$j, 0, $#data);
Subs:
sub binsearch_first(&$\#) {
my $compare = $_[0];
#my $value = $_[1];
my $array = $_[2];
my $min = 0;
my $max = $#$array;
return -1 if $max == -1;
my $ap = do { no strict 'refs'; \*{caller().'::a'} }; local *$ap;
my $bp = do { no strict 'refs'; \*{caller().'::b'} }; local *$bp;
*$ap = \($_[1]);
while ($min <= $max) {
my $mid = int(($min+$max)/2);
*$bp = \($array->[$mid]);
my $cmp = $compare->();
if ($cmp < 0) {
$max = $mid - 1;
}
elsif ($cmp > 0) {
$min = $mid + 1;
}
else {
return $mid if $mid == $min;
$max = $mid;
}
}
# Converts unsigned int to signed int.
return unpack('j', pack('J', ~$min));
}
sub get_run_forward(&$\#) {
my $compare = $_[0];
#my $value = $_[1];
my $start = $_[2];
my $array = $_[3];
return if $start < 0;
my $ap = do { no strict 'refs'; \*{caller().'::a'} }; local *$ap;
my $bp = do { no strict 'refs'; \*{caller().'::b'} }; local *$bp;
*$ap = \($_[1]);
my $i = $start;
while ($i <= $#$array) {
*$bp = \($array->[$i]);
my $cmp = $compare->()
and last;
++$i;
}
return wantarray ? ($start..$i-1) : $i-1;
}
You might want to use a tolerance in the floating-point comparions (i.e. in $distance_search_cmp).

Heapsort function behaves the same with push and unshift.(Perl)

I've been trying to implement a heap-sort function but it sorts the array in reverse order. The weird part is that it doesn't matter if I use unshift or push, the elements always get printed reversed.
#!/usr/bin/perl -w
use 5.014;
no warnings 'recursion';
sub heapify{
my $index = pop #_;
my $larger;
unless ($index > int(#_/2-1))
{
my $left = 2 * $index + 1;
my $right = 2 * $index + 2;
if($right < #_ && ($_[$left]<$_[$right]))
{
$larger = $right;
} else {
$larger = $left;
}
if($_[$index] < $_[$larger])
{
($_[$index],$_[$larger]) =
($_[$larger],$_[$index]) ;
heapify(#_,$larger);
}
}
}
sub max_heap{
for(my $i = int(#_/2 -1) ; $i > -1; --$i){
heapify(#_,$i);
}
}
sub heapsort{
return unless #_ > 1 ;
max_heap(#_);
my $last = shift(#_);
heapsort(#_);
push(#_,$last);
}
my #test = (9,3,13,7,6,78,2);
heapsort(#test);
say "Heapsorted:";
say join("\n",#test);
Used pseudocode
use strict;
use warnings;
my #a = reverse 1..500;
heapSort(\#a);
print "#a\n";
sub heapSort {
my ($a) = #_;
# input: an unordered array a of length count
# (first place a in max-heap order)
heapify($a);
my $end = $#$a; # //in languages with zero-based arrays the children are 2*i+1 and 2*i+2
while ($end) {
# (swap the root(maximum value) of the heap with the last element of the heap)
#$a[$end, 0] = #$a[0, $end]; # swap(a[end], a[0])
# (decrease the size of the heap by one so that the previous max value will
# stay in its proper placement)
$end--;
# (put the heap back in max-heap order)
siftDown($a, 0, $end);
}
}
sub heapify {
my ($a) = #_;
my $count = #$a;
# (start is assigned the index in a of the last parent node)
my $start = ($count - 2 ) / 2;
while ($start >= 0) {
#(sift down the node at index start to the proper place such that all nodes below
# the start index are in heap order)
siftDown($a, $start, $count-1);
$start--;
#(after sifting down the root all nodes/elements are in heap order)
}
}
sub siftDown {
my ($a, $start, $end) = #_;
# input: end represents the limit of how far down the heap
# to sift.
my $root = $start;
while ($root * 2 + 1 <= $end) { # (While the root has at least one child)
my $child = $root * 2 + 1; # (root*2 + 1 points to the left child)
my $swap = $root; # (keeps track of child to swap with)
#(check if root is smaller than left child)
$swap = $child if $a->[$swap] < $a->[$child];
#(check if right child exists, and if it's bigger than what we're currently swapping with)
$swap = $child + 1 if $child+1 <= $end and $a->[$swap] < $a->[$child+1];
# (check if we need to swap at all)
if ($swap != $root) {
# swap(a[root], a[swap])
#$a[$root, $swap] = #$a[$swap, $root];
$root = $swap; # (repeat to continue sifting down the child now)
}
else {
return;
}
}
}
This is to mention the required change in the code by Сухой27 answered Dec 31 '13 at 17:46.
Code works fine when input array has even n number of elements, but goes wrong when n is odd.
For an input like #a = qw(10 11 2);
Output would be = 2 11 10
The output is wrong because in the first step, the above script skips checking of left child of last parent. When left is larger, the same does not swap with its parent.
The fix is simple:
my $start = int(($count - 2 ) / 2);

perl blowing up in sequence alignment by dynamic programming

I'm comparing a reference sequence of size 5500 bases and query sequence of size 3600, using dynamic programming (semi global alignment), in fact I don't know much about complexity and performance and the code is blowing up and giving me the error "out of memory". Knowing that it works normally on smaller sequences, my question is: This behavior is normal or I might have another problem in code ?if it's normal any hint to solve this problem ? Thanks in advance.
sub semiGlobal {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
$matrix[0][0]{score} = 0;
$matrix[0][0]{pointer} = "none";
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j]{score} = 0;
$matrix[0][$j]{pointer} = "none";
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0]{score} = $GAP * $i;
$matrix[$i][0]{pointer} = "up";
}
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2);
print "seq1: ".length($seq1);
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
if ( $letter1 eq $letter2 ) {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MATCH;
}
else {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MISMATCH;
}
# calculate gap scores
$up_score = $matrix[ $i - 1 ][$j]{score} + $GAP;
$left_score = $matrix[$i][ $j - 1 ]{score} + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j]{score} = $diagonal_score;
$matrix[$i][$j]{pointer} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j]{score} = $up_score;
$matrix[$i][$j]{pointer} = "up";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
# set maximum score
if ( $matrix[$i][$j]{score} > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = $matrix[$i][$j]{score};
}
}
}
my $align1 = "";
my $align2 = "";
my $j = $max_j;
my $i = $max_i;
while (1) {
if ( $matrix[$i][$j]{pointer} eq "none" ) {
$stseq1 = $j;
last;
}
if ( $matrix[$i][$j]{pointer} eq "diagonal" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "left" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "up" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
return ( $align1, $align2, $stseq1 ,$max_j);
}
One way to possibly solve the problem is to tie the #matrix with a file. However, this will dramatically slow down the program. Consider this:
sub semiGlobal {
use Tie::Array::CSV;
tie my #matrix, 'Tie::Array::CSV', 'temp.txt'; # Don't forget to add your own error handler.
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
$matrix[0][0] = '0 n';
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j] = '0 n';
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
my $score = $GAP * $i;
$matrix[$i][0] = join ' ',$score,'u';
}
#print Dumper(\#matrix);
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2)."\n";
print "seq1: ".length($seq1)."\n";
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
my $score = (split / /, $matrix[ $i - 1 ][ $j - 1 ])[0];
if ( $letter1 eq $letter2 ) {
$diagonal_score = $score + $MATCH;
}
else {
$diagonal_score = $score + $MISMATCH;
}
# calculate gap scores
$up_score = (split / /,$matrix[ $i - 1 ][$j])[0] + $GAP;
$left_score = (split / /,$matrix[$i][ $j - 1 ])[0] + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j] = join ' ',$diagonal_score,'d';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j] = join ' ', $up_score, 'u';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
# set maximum score
if ( (split / /, $matrix[$i][$j])[0] > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = (split / /, $matrix[$i][$j])[0];
}
}
}
my $align1 = "";
my $align2 = "";
my $stseq1;
my $j = $max_j;
my $i = $max_i;
while (1) {
my $pointer = (split / /, $matrix[$i][$j])[1];
if ( $pointer eq "n" ) {
$stseq1 = $j;
last;
}
if ( $pointer eq "d" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $pointer eq "l" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $pointer eq "u" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
untie #matrix; # Don't forget to add your own error handler.
unlink 'temp.txt'; # Don't forget to add your own error handler.
return ( $align1, $align2, $stseq1 ,$max_j);
}
You can still use your original sub for short sequences, and switch to this sub for long ones.
I think that #j_random_hacker and #Ashalynd are on the right track regarding using this algorithm in most Perl implementations. The datatypes you're using are going to use more memory that absolutely needed for the calculations.
So this is "normal" in that you should expect to see this kind of memory usage for how you've written this algorithm in perl. You may have other problems in surrounding code that are using a lot of memory but this algorithm will hit your memory hard with large sequences.
You can address some of the memory issues by changing the datatypes that you're using as #Ashalynd suggests. You could try changing the hash which holds score and pointer into an array and changing the string pointers into integer values. Something like this might get you some benefit while still maintaining readability:
use strict;
use warnings;
# define constants for array positions and pointer values
# so the code is still readable.
# (If you have the "Readonly" CPAN module you may want to use it for constants
# instead although none of the downsides of the "constant" pragma apply in this code.)
use constant {
SCORE => 0,
POINTER => 1,
DIAGONAL => 0,
LEFT => 1,
UP => 2,
NONE => 3,
};
...
sub semiGlobal2 {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
# score and pointer are now stored in an array
# using the defined constants as indices
$matrix[0][0][SCORE] = 0;
# pointer value is now a constant integer
$matrix[0][0][POINTER] = NONE;
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j][SCORE] = 0;
$matrix[0][$j][POINTER] = NONE;
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0][SCORE] = $GAP * $i;
$matrix[$i][0][POINTER] = UP;
}
... # continue to make the appropriate changes throughout the code
However, when I tested this I didn't get a huge benefit when attempting to align a 3600 char string in a 5500 char string of random data. I programmed my code to abort when it consumed more than 2GB of memory. The original code aborted after 23 seconds while the one using constants and an array instead of a hash aborted after 32 seconds.
If you really want to use this specific algorithm I'd check out the performance of Algorithm::NeedlemanWunsch. It doesn't look like it's very mature but it may have addressed your performance issues. Otherwise look into writing an Inline or Perl XS wrapper around a C implementation

Help me finish the last part of my app? It solves any Countdown Numbers game on Channel 4 by brute forcing every possibly equation

For those not familiar with the game. You're given 8 numbers and you have to reach the target by using +, -, / and *.
So if the target is 254 and your game numbers are 2, 50, 5, 2, 1, you would answer the question correctly by saying 5 * 50 = 250. Then 2+2 is four. Add that on aswell to get 254.
Some videos of the game are here:
Video 1
video 2
Basically I brute force the game using by generating all perms of all sizes for the numbers and all perms of the symbols and use a basic inflix calculator to calculate the solution.
However it contains a flaw because all the solutions are solved as following: ((((1+1)*2)*3)*4). It doesn't permutate the brackets and it's causing my a headache.
Therefore I cannot solve every equation. For example, given
A target of 16 and the numbers 1,1,1,1,1,1,1,1 it fails when it should do (1+1+1+1)*(1+1+1+1)=16.
I'd love it in someone could help finish this...in any language.
This is what I've written so far:
#!/usr/bin/env perl
use strict;
use warnings;
use Algorithm::Permute;
# GAME PARAMETERS TO FILL IN
my $target = 751;
my #numbers = ( '2', '4', '7', '9', '1', '6', '50', '25' );
my $num_numbers = scalar(#numbers);
my #symbols = ();
foreach my $n (#numbers) {
push(#symbols, ('+', '-', '/', '*'));
}
my $num_symbols = scalar(#symbols);
print "Symbol table: " . join(", ", #symbols);
my $lst = [];
my $symb_lst = [];
my $perms = '';
my #perm = ();
my $symb_perms = '';
my #symb_perm;
my $print_mark = 0;
my $progress = 0;
my $total_perms = 0;
my #closest_numbers;
my #closest_symb;
my $distance = 999999;
sub calculate {
my #oprms = #{ $_[0] };
my #ooperators = #{ $_[1] };
my #prms = #oprms;
my #operators = #ooperators;
#print "PERMS: " . join(", ", #prms) . ", OPERATORS: " . join(", ", #operators);
my $total = pop(#prms);
foreach my $operator (#operators) {
my $x = pop(#prms);
if ($operator eq '+') {
$total += $x;
}
if ($operator eq '-') {
$total -= $x;
}
if ($operator eq '*') {
$total *= $x;
}
if ($operator eq '/') {
$total /= $x;
}
}
#print "Total: $total\n";
if ($total == $target) {
#print "ABLE TO ACCURATELY SOLVE WITH THIS ALGORITHM:\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, 0);
exit(0);
}
my $own_distance = ($target - $total);
if ($own_distance < 0) {
$own_distance *= -1;
}
if ($own_distance < $distance) {
#print "found a new solution - only $own_distance from target $target\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, $own_distance);
#closest_numbers = #oprms;
#closest_symb = #ooperators;
$distance = $own_distance;
}
$progress++;
if (($progress % $print_mark) == 0) {
print "Tested $progress permutations. " . (($progress / $total_perms) * 100) . "%\n";
}
}
sub factorial {
my $f = shift;
$f == 0 ? 1 : $f*factorial($f-1);
}
sub sum_print {
my #prms = #{ $_[0] };
my #operators = #{ $_[1] };
my $total = $_[2];
my $distance = $_[3];
my $tmp = '';
my $op_len = scalar(#operators);
print "BEST SOLUTION SO FAR: ";
for (my $x = 0; $x < $op_len; $x++) {
print "(";
}
$tmp = pop(#prms);
print "$tmp";
foreach my $operator (#operators) {
$tmp = pop(#prms);
print " $operator $tmp)";
}
if ($distance == 0) {
print " = $total\n";
}
else {
print " = $total (distance from target $target is $distance)\n";
}
}
# look for straight match
foreach my $number (#numbers) {
if ($number == $target) {
print "matched!\n";
}
}
for (my $x = 1; $x < (($num_numbers*2)-1); $x++) {
$total_perms += factorial($x);
}
print "Total number of permutations: $total_perms\n";
$print_mark = $total_perms / 100;
if ($print_mark == 0) {
$print_mark = $total_perms;
}
for (my $num_size=2; $num_size <= $num_numbers; $num_size++) {
$lst = \#numbers;
$perms = new Algorithm::Permute($lst, $num_size);
print "Perms of size: $num_size.\n";
# print matching symb permutations
$symb_lst = \#symbols;
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
while (#perm = $perms->next) {
while (#symb_perm = $symb_perms->next) {
calculate(\#perm, \#symb_perm);
}
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
}
}
print "exhausted solutions";
print "CLOSEST I CAN GET: $distance\n";
sum_print(\#closest_numbers, \#closest_symb, $target-$distance, $distance);
exit(0);
Here is the example output:
[15:53: /mnt/mydocuments/git_working_dir/countdown_solver$] perl countdown_solver.pl
Symbol table: +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *Total number of permutations: 93928268313
Perms of size: 2.
BEST SOLUTION SO FAR: (2 + 4) = 6 (distance from target 751 is 745)
BEST SOLUTION SO FAR: (2 * 4) = 8 (distance from target 751 is 743)
BEST SOLUTION SO FAR: (4 + 7) = 11 (distance from target 751 is 740)
BEST SOLUTION SO FAR: (4 * 7) = 28 (distance from target 751 is 723)
BEST SOLUTION SO FAR: (4 * 9) = 36 (distance from target 751 is 715)
BEST SOLUTION SO FAR: (7 * 9) = 63 (distance from target 751 is 688)
BEST SOLUTION SO FAR: (4 * 50) = 200 (distance from target 751 is 551)
BEST SOLUTION SO FAR: (7 * 50) = 350 (distance from target 751 is 401)
BEST SOLUTION SO FAR: (9 * 50) = 450 (distance from target 751 is 301)
Perms of size: 3.
BEST SOLUTION SO FAR: ((4 + 7) * 50) = 550 (distance from target 751 is 201)
BEST SOLUTION SO FAR: ((2 * 7) * 50) = 700 (distance from target 751 is 51)
BEST SOLUTION SO FAR: ((7 + 9) * 50) = 800 (distance from target 751 is 49)
BEST SOLUTION SO FAR: ((9 + 6) * 50) = 750 (distance from target 751 is 1)
Perms of size: 4.
BEST SOLUTION SO FAR: (((9 + 6) * 50) + 1) = 751
Here is Java applet (source) and Javascript version.
The suggestion to use reverse polish notation is excellent.
If you have N=5 numbers, the template is
{num} {num} {ops} {num} {ops} {num} {ops} {num} {ops}
There can be zero to N ops in any spot, although the total number will be N-1. You just have to try different placements of numbers and ops.
The (((1+1)+1)+1)*(((1+1)+1)+1)=16 solution will be found when you try
1 1 + 1 + 1 + 1 1 + 1 + 1 + *
Update: Maybe not so good, since finding the above could take 433,701,273,600 tries. The number was obtained using the following:
use strict;
use warnings;
{
my %cache = ( 0 => 1 );
sub fact { my ($n) = #_; $cache{$n} ||= fact($n-1) * $n }
}
{
my %cache;
sub C {
my ($n,$r) = #_;
return $cache{"$n,$r"} ||= do {
my $i = $n;
my $j = $n-$r;
my $c = 1;
$c *= $i--/$j-- while $j;
$c
};
}
}
my #nums = (1,1,1,1,1,1,1,1);
my $Nn = 0+#nums; # Number of numbers.
my $No = $Nn-1; # Number of operators.
my $max_tries = do {
my $num_orderings = fact($Nn);
{
my %counts;
++$counts{$_} for #nums;
$num_orderings /= fact($_) for values(%counts);
}
my $op_orderings = 4 ** $No;
my $op_placements = 1;
$op_placements *= C($No, $_) for 1..$No-1;
$num_orderings * $op_orderings * $op_placements
};
printf "At most %.f tries needed\n", $max_tries;

How can I convert a number to its multiple form in Perl?

Do you know an easy and straight-forward method/sub/module which allows me to convert a number (say 1234567.89) to an easily readable form - something like 1.23M?
Right now I can do this by making several comparisons, but I'm not happy with my method:
if($bytes > 1000000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000000 )). " Gb/s";
}
elsif ($bytes > 1000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000 )). " Mb/s";
}
elsif ($bytes > 1000){
$bytes = ( sprintf( "%0.2f", $bytes/1000 )). " Kb/s";
}
else{
$bytes = sprintf( "%0.2f", $bytes ). "b/s";
}
Thank you for your help!
The Number::Bytes::Human module should be able to help you out.
An example of how to use it can be found in its synopsis:
use Number::Bytes::Human qw(format_bytes);
$size = format_bytes(0); # '0'
$size = format_bytes(2*1024); # '2.0K'
$size = format_bytes(1_234_890, bs => 1000); # '1.3M'
$size = format_bytes(1E9, bs => 1000); # '1.0G'
# the OO way
$human = Number::Bytes::Human->new(bs => 1000, si => 1);
$size = $human->format(1E7); # '10MB'
$human->set_options(zero => '-');
$size = $human->format(0); # '-'
Number::Bytes::Human seems to do exactly what you want.
sub magnitudeformat {
my $val = shift;
my $expstr;
my $exp = log($val) / log(10);
if ($exp < 3) { return $val; }
elsif ($exp < 6) { $exp = 3; $expstr = "K"; }
elsif ($exp < 9) { $exp = 6; $expstr = "M"; }
elsif ($exp < 12) { $exp = 9; $expstr = "G"; } # Or "B".
else { $exp = 12; $expstr = "T"; }
return sprintf("%0.1f%s", $val/(10**$exp), $expstr);
}
In pure Perl form, I've done this with a nested ternary operator to cut on verbosity:
sub BytesToReadableString($) {
my $c = shift;
$c >= 1073741824 ? sprintf("%0.2fGB", $c/1073741824)
: $c >= 1048576 ? sprintf("%0.2fMB", $c/1048576)
: $c >= 1024 ? sprintf("%0.2fKB", $c/1024)
: $c . "bytes";
}
print BytesToReadableString(225939) . "/s\n";
Outputs:
220.64KB/s
This snippet is in PHP, and it's loosely based on some example someone else had on their website somewhere (sorry buddy, I can't remember).
The basic concept is instead of using if, use a loop.
function formatNumberThousands($a,$dig)
{
$unim = array("","k","m","g");
$c = 0;
while ($a>=1000 && $c<=3) {
$c++;
$a = $a/1000;
}
$d = $dig-ceil(log10($a));
return number_format($a,($c ? $d : 0))."".$unim[$c];
}
The number_format() call is a PHP library function which returns a string with commas between the thousands groups. I'm not sure if something like it exists in perl.
The $dig parameter sets a limit on the number of digits to show. If $dig is 2, it will give you 1.2k from 1237.
To format bytes, just divide by 1024 instead.
This function is in use in some production code to this day.