Reverse Polish Notation with a Single Variable - perl

I've already written code that can perform the following conversions to an input string
3(7x-1)+10x-4+3x=90x+1
(3*(7x-1)+10x-4+3x)-(90x+1)
37x1-*10x+4-3x+90x1+-
But now, I'm a little stuck with this last one. I know how to write the code to solve RPN without an X involved. However, I'm not sure how to parse this with the x'es involved.
EIDT: I have fixed the incorrect RPN error, and it now reads
3 7 x * 1 - * 10 x * + 4 - 3 x * + 90 x * 1 + -

parsing this RPN string is what I'm worried about.
The whole point of RPN is that it doesn't require any parsing. You can work with the output of the tokenizer directly.
sub get_next_token {
/\G \s+ /xgc;
/\G \z /xgc && return [ 'EOF' ];
/\G ( [0-9]+ ) /xgc && return [ NUM => $1 ];
/\G ( [a-z] ) /xgc && return [ VAR => $1 ];
/\G ( [*+\-] ) /xgc && return [ $1 ];
die("Syntax error\n");
}
my %ops = (
EOF => sub { my $token = shift; ...; return 0; },
NUM => sub { my $token = shift; ...; return 1; },
VAR => sub { my $token = shift; ...; return 1; },
'*' => sub { my $token = shift; ...; return 1; },
'+' => sub { my $token = shift; ...; return 1; },
'-' => sub { my $token = shift; ...; return 1; },
);
for ($rpn) {
while (1) {
my $token = get_next_token();
my $op = $opts{$token}
or die("Internal error");
last if !$op->($token);
}
}
Note that you don't have valid RPN.
(3(7x-1)+10x-4+3x)-(90x+1)
should result in
3 7 x * 1 - * 10 x * + 4 - 3 x * + 90 x * 1 + -
but you have
37 x 1 - * 10 x + 4 - 3 x + 90 x 1 + -
You're missing some multiplications, and you merged 3 and 7 into 37.

I'm not sure how to parse this with the x'es involved.
Seems like you've written a parser which handles grammars of the following form:
stmt : expr '=' expr
expr : sum
sum : prod '+' sum
| prod '-' sum
| prod
prod : term prod
| term
term : '(' expr ')'
| NUM
All you need to change is term.
term : '(' expr ')'
| NUM
| VAR

Math::RPN:
use Math::RPN;
$value=rpn(expr...);
#array=rpn(expr...);
expr... is one or more scalars or lists of scalars which contain
RPN expressions. An RPN expression is a series of numbers and/or
operators separated by commas. (commas are only required within
scalars).

Related

Perl: Sorting multi dimensional hash

I'm currently trying to sort by values in a multi dimensional hash and I'm coming up a bit stumped.
I've done this on a single level as follows:
my %exampleHash = ();
$exampleHash{1}{value} = 10;
$exampleHash{2}{value} = 30;
$exampleHash{3}{value} = 0;
$exampleHash{4}{value} = 20;
foreach my $key ( reverse sort {$exampleHash{$a}{value} <=> $exampleHash{$b}{value}} keys %exampleHash )
{
printf( "%s => %d\n", $key, $exampleHash{$key}{value} );
}
This produces the following output, sorted on the value as expected:
2 => 30
4 => 20
1 => 10
3 => 0
I've then tried the following to do the same thing but with an extra key:
my %exampleHashTwo = ();
$exampleHashTwo{1}{"One"}{value} = 10;
$exampleHashTwo{2}{"Two"}{value} = 30;
$exampleHashTwo{3}{"Three"}{value} = 0;
$exampleHashTwo{4}{"Four"}{value} = 20;
foreach my $intKey ( keys %exampleHashTwo )
{
foreach my $stringKey ( reverse sort {$exampleHashTwo{$intKey}{$a}{value} <=> $exampleHashTwo{$intKey}{$b}{value}} keys %{$exampleHashTwo{$intKey}} )
{
printf( "{%d} - {%5s} => %d\n", $intKey, $stringKey, $exampleHashTwo{$intKey}{$stringKey}{value} );
}
}
However this seems to sort the string key's in alphabetical order. So I'm assuming I'm on the right lines, but I've misplaced something perhaps?
{4} - { Four} => 20
{1} - { One} => 10
{3} - {Three} => 0
{2} - { Two} => 30
Any ideas?
Your outer loop, which loops over the integer keys in somewhat random order, is dictating the order of your output. The inner loop is sorting but only given one value each time.
To sort all the values for all combinations of both keys, you need a single loop and to sort over a generated list of such combinations:
use strict;
use warnings;
my %exampleHashTwo = ();
$exampleHashTwo{1}{"One"}{value} = 10;
$exampleHashTwo{2}{"Two"}{value} = 20;
$exampleHashTwo{2}{"TwoB"}{value} = 5;
$exampleHashTwo{3}{"Three"}{value} = 0;
$exampleHashTwo{4}{"Two"}{value} = 15;
for my $keypair (
sort { $exampleHashTwo{$b->[0]}{$b->[1]}{value} <=> $exampleHashTwo{$a->[0]}{$a->[1]}{value} }
map { my $intKey=$_; map [$intKey, $_], keys %{$exampleHashTwo{$intKey}} } keys %exampleHashTwo
) {
printf( "{%d} - {%5s} => %d\n", $keypair->[0], $keypair->[1], $exampleHashTwo{$keypair->[0]}{$keypair->[1]}{value} );
}
Output:
{2} - { Two} => 20
{4} - { Two} => 15
{1} - { One} => 10
{2} - { TwoB} => 5
{3} - {Three} => 0
(Replaced reverse with switching order of $a and $b.)

How to print range values with specific reference number are given?

I have a set of data file looks like below. I would like to get the interpolation final value (final,P) by referring to 2 set of number range (scoreA and scoreB). Let's say for "Eric", his scoreA is 35 (value between range 30.00 - 40.00) and scoreB is 48 (a value between range 45.00 - 50.00). He will get 2 set of final values range which are (22.88,40.90) & (26.99,38.99). And I would like to get the final value of "Eric" and "George" in the data file. "George"'s scoreA = 38 and scoreB = 26.
After formula calculation, I want to get the exact final value when his scoreA=35 & scoreB=45. Let's assume formula is P=X+Y (P is final value), so far I have been trying the code as shown below. However it cannot get the correct lines.
How to get the exactly final value range by referring to the data given?
data file
Student_name ("Eric")
/* This is a junk line */
scoreA ("10.00, 20.00, 30.00, 40.00")
scoreB ("15.00, 30.00, 45.00, 50.00, 55.00")
final (
"12.23,19.00,37.88,45.98,60.00",\
"07.00,20.11,24.56,45.66,57.88",\
"05.00,15.78,22.88,40.90,57.99",\
"10.00,16.87,26.99,38.99,40.66"\)
Student_name ("Liy")
/* This is a junk line */
scoreA ("5.00, 10.00, 20.00, 60.00")
scoreB ("25.00, 30.00, 40.00, 55.00, 60.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("Frank")
/* This is a junk line */
scoreA ("2.00, 15.00, 25.00, 40.00")
scoreB ("15.00, 24.00, 38.00, 45.00, 80.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("George")
/* This is a junk line */
scoreA ("10.00, 15.00, 20.00, 40.00")
scoreB ("25.00, 33.00, 46.00, 55.00, 60.00")
final (
"10.23,25.00,37.88,45.98,68.00",\
"09.00,28.11,34.56,45.66,60.88",\
"18.00,19.78,32.88,40.90,79.66",\
"17.00,27.87,40.99,59.99,66.66"\)
Coding
data();
sub data() {
my $cnt = 0;
while (my #array = <FILE>) {
foreach $line(#array) {
if ($line =~ /Student_name/) {
$a = $line;
if ($a =~ /Eric/ or $cnt > 0 ) {
$cnt++;
}
if ( $cnt > 1 and $cnt <= 3 ) {
print $a;
}
if ( $cnt > 2 and $cnt <= 4 ) {
print $a;
}
if ( $cnt == 5 ) {
$cnt = 0;
}
}
}
}
}
Result
Eric final=42.66
George final=24.30
In my comment I said that parsing is fairly easy. Here is how it could be done. As the question lacks a proper specification of the file format, I will assume the following:
The file consists of properties, which have values:
document ::= property*
property ::= word "(" value ("," value)* ")"
A value is a double-quoted string containing numbers seperated by commata, or a single word:
value ::= '"' ( word | number ("," number)* ) '"'
Spaces, backslashes, and comments are irrelevant.
Here is a possible implementation; I will not go into the details of explaining how to write a simple parser.
package Parser;
use strict; use warnings;
sub parse {
my ($data) = #_;
# perform tokenization
pos($data) = 0;
my $length = length $data;
my #tokens;
while(pos($data) < $length) {
next if $data =~ m{\G\s+}gc
or $data =~ m{\G\\}gc
or $data =~ m{\G/[*].*?[*]/}gc;
if ($data =~ m/\G([",()])/gc) {
push #tokens, [symbol => $1];
} elsif ($data =~ m/\G([0-9]+[.][0-9]+)/gc) {
push #tokens, [number => 0+$1];
} elsif ($data =~ m/\G(\w+)/gc) {
push #tokens, [word => $1];
} else {
die "unreckognized token at:\n", substr $data, pos($data), 10;
}
}
return parse_document(\#tokens);
}
sub token_error {
my ($token, $expected) = #_;
return "Wrong token [#$token] when expecting [#$expected]";
}
sub parse_document {
my ($tokens) = #_;
my #properties;
push #properties, parse_property($tokens) while #$tokens;
return #properties;
}
sub parse_property {
my ($tokens) = #_;
$tokens->[0][0] eq "word"
or die token_error $tokens->[0], ["word"];
my $name = (shift #$tokens)->[1];
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '('
or die token_error $tokens->[0], [symbol => '('];
shift #$tokens;
my #vals;
VAL: {
push #vals, parse_value($tokens);
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo VAL;
}
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq ')'
or die token_error $tokens->[0], [symbol => ')'];
shift #$tokens;
return [ $name => #vals ];
}
sub parse_value {
my ($tokens) = #_;
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
my $value;
if ($tokens->[0][0] eq "word") {
$value = (shift #$tokens)->[1];
} else {
my #nums;
NUM: {
$tokens->[0][0] eq 'number'
or die token_error $tokens->[0], ['number'];
push #nums, (shift #$tokens)->[1];
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo NUM;
}
}
$value = \#nums;
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
return $value;
}
Now, we get the following data structure as output from Parser::parse:
(
["Student_name", "Eric"],
["scoreA", [10, 20, 30, 40]],
["scoreB", [15, 30, 45, 50, 55]],
[
"final",
[12.23, 19, 37.88, 45.98, 60],
[7, 20.11, 24.56, 45.66, 57.88],
[5, 15.78, 22.88, 40.9, 57.99],
[10, 16.87, 26.99, 38.99, 40.66],
],
["Student_name", "Liy"],
["scoreA", [5, 10, 20, 60]],
["scoreB", [25, 30, 40, 55, 60]],
[
"final",
[2.23, 15, 37.88, 45.98, 70],
[10, 28.11, 34.56, 45.66, 57.88],
[8, 19.78, 32.88, 40.9, 57.66],
[10, 27.87, 39.99, 59.99, 78.66],
],
...,
)
As a next step, we want to transform it into nested hashes, so that we have the structure
{
Eric => {
scoreA => [...],
scoreB => [...],
final => [[...], ...],
},
Liy => {...},
...,
}
So we simply run it through this small sub:
sub properties_to_hash {
my %hash;
while(my $name_prop = shift #_) {
$name_prop->[0] eq 'Student_name' or die "Expected Student_name property";
my $name = $name_prop->[1];
while( #_ and $_[0][0] ne 'Student_name') {
my ($prop, #vals) = #{ shift #_ };
if (#vals > 1) {
$hash{$name}{$prop} = \#vals;
} else {
$hash{$name}{$prop} = $vals[0];
}
}
}
return \%hash;
}
So we have the main code
my $data = properties_to_hash(Parser::parse( $file_contents ));
Now we can move on to Part 2 fo the problem: calculating your scores. That is, once you make clear what you need done.
Edit: Bilinear interpolation
Let f be the function that returns the value at a certain coordinate. If we have a value at those coordinates, we can return that. Else, we perform bilinear interpolation with the next known values.
The formula for bilinear interpolation[1] is:
f(x, y) = 1/( (x_2 - x_1) · (y_2 - y_1) ) · (
f(x_1, y_1) · (x_2 - x) · (y_2 - y)
+ f(x_2, y_1) · (x - x_1) · (y_2 - y)
+ f(x_1, y_2) · (x_2 - x) · (y - y_1)
+ f(x_2, y_2) · (x - x_1) · (y - y_1)
)
Now, scoreA denote the positions of the data points in the final table on the first axis, scoreA the positions on the second axis. We have to do the following:
assert that the requested values x, y are inside the bounds
fetch the next smaller and next larger positions
perform interpolation
.
sub f {
my ($data, $x, $y) = #_;
# do bounds check:
my ($x_min, $x_max, $y_min, $y_max) = (#{$data->{scoreA}}[0, -1], #{$data->{scoreB}}[0, -1]);
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
unless $x_min <= $x && $x <= $x_max
&& $y_min <= $y && $y <= $y_max;
To fetch the boxing indices x_1, x_2, y_1, y_2 we need to iterate through all possible scores. We'll also remember the physical indices x_i1, x_i2, y_i1, y_i2 of the underlying arrays.
my ($x_i1, $x_i2, $y_i1, $y_i2);
for ([$data->{scoreA}, \$x_i1, \$x_i2], [$data->{scoreB}, \$y_i1, \$y_i2]) {
my ($scores, $a_i1, $a_i2) = #$_;
for my $i (0 .. $#$scores) {
if ($scores->[$i] <= $x) {
($$a_i1, $$a_i2) = $i == $#$scores ? ($i, $i+1) : ($i-1, $i);
last;
}
}
}
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
my ($y_1, $y_2) = #{$data->{scoreB}}[$y_i1, $y_i2];
Now, interpolation according to above formula can be performed, but each access at a known index can be changed to an access via physical index, so f(x_1, y_2) would become
$final->[$x_i1][$y_i2]
Detailed Explanation of sub f
sub f { ... } declares a sub with name f, although that is probably a bad name. bilinear_interpolation might be a better name.
my ($data, $x, $y) = #_ states that our sub takes three arguments:
$data, a hash reference containing fields scoreA, scoreB and final, which are array references.
$x, the position along the scoreA-axis where interpolation is required.
$y, the position along the scoreB-axis where interpolation is required.
Next, we want to assert that the positions for $x and $y are valid aka inside bounds. The first value in $data->{scoreA} is the minimal value; the maximal value is in the last position (index -1). To get both at once, we use an array slice. A slice accesses multiple values at once and returns a list, like #array[1, 2]. Because we use complex data structures which use references, we have to dereference the array in $data->{scoreA}. This makes the slice look like #{$data->{scoreA}}[0, 1].
Now that we have the $x_min and $x_max values, we throw and error unless the requested value $x is inside the range defined by the min/max values. This is true when
$x_min <= $x && $x <= $x_max
Should either $x or $y be out of bounds, we throw an error which shows the actual bounds. So the code
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
could, for example, throw an error like
indices (10, 500) out of range ([20, 30], [25, 57]) at script.pl line 42
Here we can see that the value for $x is too small, and $y is too large.
The next problem is to find neighbouring values. Assuming scoreA holds [1, 2, 3, 4, 5], and $x is 3.7, we want to select the values 3 and 4. But because we can pull some nifty tricks a bit later, we would rather remember the position of the neighbouring values, not the values themselves. So this would give 2 and 3 in above example (remember that arrows are zero-based).
We can do this by looping over all indices of our array. When we find a value that is ≤ $x, we remember the index. E.g. 3 is the first value that is ≤ $x, so we remember the index 2. For the next higher value, we have to be a bit carful: Obviously, we can just take the next index, so 2 + 1 = 3. But now assume that $x is 5. This passes the bounds check. The first value that is ≤ $x would be value 5, so we can remember position 4. However, there is no entry at position 5, so we could use the current index itself. Because this would lead to division by zero later on, we would be better off remembering positions 3 and 4 (values 4 and 5).
Expressed as code, that is
my ($x_i1, $x_i2);
my #scoreA = #{ $data->{scoreA} }; # shortcut to the scoreA entry
for my $i (0 .. $#scores) { # iterate over all indices: `$#arr` is the last idx of #arr
if ($scores[$i] <= $x) { # do this if the current value is ≤ $x
if ($i != $#scores) { # if this isn't the last index
($x_i1, $x_i2) = ($i, $i+1);
} else { # so this is the last index
($x_i1, $x_i2) = ($i-1, $i);
}
last; # break out of the loop
}
}
In my original code I choose a more complex solution to avoid copy-pasting the same code for finding the neighbours of $y.
Because we also need the values, we obtain them via a slice with the indices:
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
Now we have all surrounding values $x1, $x_2, $y_1, $y_2 which define the rectangle in which we want to perform bilinear interpolation. The mathematical formula is easy to translate to Perl: just choose the correct operators (*, not · for multiplication), and the variables need dollar signs before them.
The formula I used is recursive: The definition of f refers to itself. This would imply an infinite loop, unless we do some thinking and break the recursion. f symbolizes the value at a certain position. In most cases, this means interpolating. However, if $x and $y are both equal to values in scoreA and scoreB respectively, we don't need bilinear interpolation, and can return the final entry directly.
This can be done by checking if both $x and $y are members of their arrays, and doing an early return. Or we can use the fact that $x_1, ..., $y_2 all are members of the arrays. Instead of recursing with values we know don't need interpolating, we just do an array access. This is what we have saved the indices $x_i1, ..., $y_i2 for. So wherever the original formula says f(x_1, y_1) or similar, we write the equivalent $data->{final}[$x_i1][$y_i2].

Perl to count current value based on next value

Currently I'm learning Perl and gnuplot. I would like to know how to count certain value based on the next value. For example I have a text file consist of:
#ID(X) Y
1 1
3 9
5 11
The output should show the value of the unknown ID as well. So, the output should show:
#ID(X) Y
1 1
2 5
3 9
4 10
5 11
The Y of ID#2 is based on the following:
((2-3)/(1-3))*1 + ((2-1)/(3-1))*9 which is linear algebra
Y2=((X2-X3)/(X1-X3))*Y1 + ((X2-X1)/(X3-X1)) * Y3
Same goes to ID#5
Currently I have this code,
#! /usr/bin/perl -w
use strict;
my $prev_id = 0;
my $prev_val = 0;
my $next_id;
my $next_val;
while (<>)
{
my ($id, $val) = split;
for (my $i = $prev_id + 1; $i < $next_id; $i++)
{
$val = (($id - $next_id) / ($prev_id - $next_id)) * $prev_val + (($id - $prev_id) / ($next_id - $prev_id)) * $next_val;
printf ("%d %s\n", $i, $val);
}
printf ("%d %s\n", $id, $val);
($prev_val, $prev_id) = ($val, $id);
($next_val, $next_id) = ($prev_val, $prev_id);
}
Your formula seems more complicated than I would expect, given that you are always dealing with integer spacings of 1.
You did not say whether you want to fill gaps for multiple consecutive missing values, but let's assume you want to.
What you do is read in the first line, and say that's the current one and you output it. Now you read the next line, and if its ID is not the expected one, you fill the gaps with simple linear interpolation...
Pseudocode
(currID, currY) = readline()
outputvals( currID, currY )
while lines remain do
(nextID, nextY) = readline()
gap = nextID - currID
for i = 1 to gap
id = currID + i
y = currY + (nextY - currY) * i / gap
outputvals( id, y )
end
(currID, currY) = (nextID, nextY)
end
Sorry for the non-Perl code. It's just that I haven't been using Perl for ages, and can't remember half of the syntax. =) The concepts here are pretty easy to translate into code though.
Using an array may be the way to go. This will also make your data available for further manipulation.
** Caveat: will not work for multiple consecutive missing values of y; see #paddy's answer.
#!/usr/bin/perl
use strict;
use warnings;
my #coordinates;
while (<DATA>) {
my ($x, $y) = split;
$coordinates[$x] = $y;
}
# note that the for loop starts on index 1 here ...
for my $x (1 .. $#coordinates) {
if (! $coordinates[$x]) {
$coordinates[$x] = (($x - ($x + 1)) / (($x - 1) - ($x + 1)))
* $coordinates[$x - 1]
+ (($x - ($x - 1)) / (($x + 1) - ($x - 1)))
* $coordinates[$x + 1];
}
print "$x - $coordinates[$x]\n";
}
__DATA__
1 1
3 9
5 11
You indicated your problem is getting the next value. The key isn't to look ahead, it's to look behind.
my $prev = get first value;
my ($prev_a, $prev_b) = parse($prev);
my $this = get second value;
my ($this_a, $this_b) = parse($this);
while ($next = get next value) {
my ($next_a, $next_b) = parse($next);
...
$prev = $this; $prev_a = $this_a; $prev_b = $this_b;
$this = $next; $this_a = $next_a; $this_b = $next_b;
}
#! /usr/bin/perl -w
use strict;
my #in = (1,9,11);
my #out;
for (my $i = 0; $i<$#in; $i++) {
my $j = $i*2;
my $X1 = $i;
my $X2 = $i+1;
my $X3 = $i+2;
my $Y1 = $in[$i];
my $Y3 = $in[$i+1];
my $Y2 = $Y1*(($X2-$X3)/($X1-$X3))
+ $Y3*(($X2-$X1)/($X3-$X1));
$out[$j] = $in[$i];
$out[$j+1] = $Y2;
}
$out[$#in*2] = $in[$#in];
print (join " ",#out);

Help me finish the last part of my app? It solves any Countdown Numbers game on Channel 4 by brute forcing every possibly equation

For those not familiar with the game. You're given 8 numbers and you have to reach the target by using +, -, / and *.
So if the target is 254 and your game numbers are 2, 50, 5, 2, 1, you would answer the question correctly by saying 5 * 50 = 250. Then 2+2 is four. Add that on aswell to get 254.
Some videos of the game are here:
Video 1
video 2
Basically I brute force the game using by generating all perms of all sizes for the numbers and all perms of the symbols and use a basic inflix calculator to calculate the solution.
However it contains a flaw because all the solutions are solved as following: ((((1+1)*2)*3)*4). It doesn't permutate the brackets and it's causing my a headache.
Therefore I cannot solve every equation. For example, given
A target of 16 and the numbers 1,1,1,1,1,1,1,1 it fails when it should do (1+1+1+1)*(1+1+1+1)=16.
I'd love it in someone could help finish this...in any language.
This is what I've written so far:
#!/usr/bin/env perl
use strict;
use warnings;
use Algorithm::Permute;
# GAME PARAMETERS TO FILL IN
my $target = 751;
my #numbers = ( '2', '4', '7', '9', '1', '6', '50', '25' );
my $num_numbers = scalar(#numbers);
my #symbols = ();
foreach my $n (#numbers) {
push(#symbols, ('+', '-', '/', '*'));
}
my $num_symbols = scalar(#symbols);
print "Symbol table: " . join(", ", #symbols);
my $lst = [];
my $symb_lst = [];
my $perms = '';
my #perm = ();
my $symb_perms = '';
my #symb_perm;
my $print_mark = 0;
my $progress = 0;
my $total_perms = 0;
my #closest_numbers;
my #closest_symb;
my $distance = 999999;
sub calculate {
my #oprms = #{ $_[0] };
my #ooperators = #{ $_[1] };
my #prms = #oprms;
my #operators = #ooperators;
#print "PERMS: " . join(", ", #prms) . ", OPERATORS: " . join(", ", #operators);
my $total = pop(#prms);
foreach my $operator (#operators) {
my $x = pop(#prms);
if ($operator eq '+') {
$total += $x;
}
if ($operator eq '-') {
$total -= $x;
}
if ($operator eq '*') {
$total *= $x;
}
if ($operator eq '/') {
$total /= $x;
}
}
#print "Total: $total\n";
if ($total == $target) {
#print "ABLE TO ACCURATELY SOLVE WITH THIS ALGORITHM:\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, 0);
exit(0);
}
my $own_distance = ($target - $total);
if ($own_distance < 0) {
$own_distance *= -1;
}
if ($own_distance < $distance) {
#print "found a new solution - only $own_distance from target $target\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, $own_distance);
#closest_numbers = #oprms;
#closest_symb = #ooperators;
$distance = $own_distance;
}
$progress++;
if (($progress % $print_mark) == 0) {
print "Tested $progress permutations. " . (($progress / $total_perms) * 100) . "%\n";
}
}
sub factorial {
my $f = shift;
$f == 0 ? 1 : $f*factorial($f-1);
}
sub sum_print {
my #prms = #{ $_[0] };
my #operators = #{ $_[1] };
my $total = $_[2];
my $distance = $_[3];
my $tmp = '';
my $op_len = scalar(#operators);
print "BEST SOLUTION SO FAR: ";
for (my $x = 0; $x < $op_len; $x++) {
print "(";
}
$tmp = pop(#prms);
print "$tmp";
foreach my $operator (#operators) {
$tmp = pop(#prms);
print " $operator $tmp)";
}
if ($distance == 0) {
print " = $total\n";
}
else {
print " = $total (distance from target $target is $distance)\n";
}
}
# look for straight match
foreach my $number (#numbers) {
if ($number == $target) {
print "matched!\n";
}
}
for (my $x = 1; $x < (($num_numbers*2)-1); $x++) {
$total_perms += factorial($x);
}
print "Total number of permutations: $total_perms\n";
$print_mark = $total_perms / 100;
if ($print_mark == 0) {
$print_mark = $total_perms;
}
for (my $num_size=2; $num_size <= $num_numbers; $num_size++) {
$lst = \#numbers;
$perms = new Algorithm::Permute($lst, $num_size);
print "Perms of size: $num_size.\n";
# print matching symb permutations
$symb_lst = \#symbols;
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
while (#perm = $perms->next) {
while (#symb_perm = $symb_perms->next) {
calculate(\#perm, \#symb_perm);
}
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
}
}
print "exhausted solutions";
print "CLOSEST I CAN GET: $distance\n";
sum_print(\#closest_numbers, \#closest_symb, $target-$distance, $distance);
exit(0);
Here is the example output:
[15:53: /mnt/mydocuments/git_working_dir/countdown_solver$] perl countdown_solver.pl
Symbol table: +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *Total number of permutations: 93928268313
Perms of size: 2.
BEST SOLUTION SO FAR: (2 + 4) = 6 (distance from target 751 is 745)
BEST SOLUTION SO FAR: (2 * 4) = 8 (distance from target 751 is 743)
BEST SOLUTION SO FAR: (4 + 7) = 11 (distance from target 751 is 740)
BEST SOLUTION SO FAR: (4 * 7) = 28 (distance from target 751 is 723)
BEST SOLUTION SO FAR: (4 * 9) = 36 (distance from target 751 is 715)
BEST SOLUTION SO FAR: (7 * 9) = 63 (distance from target 751 is 688)
BEST SOLUTION SO FAR: (4 * 50) = 200 (distance from target 751 is 551)
BEST SOLUTION SO FAR: (7 * 50) = 350 (distance from target 751 is 401)
BEST SOLUTION SO FAR: (9 * 50) = 450 (distance from target 751 is 301)
Perms of size: 3.
BEST SOLUTION SO FAR: ((4 + 7) * 50) = 550 (distance from target 751 is 201)
BEST SOLUTION SO FAR: ((2 * 7) * 50) = 700 (distance from target 751 is 51)
BEST SOLUTION SO FAR: ((7 + 9) * 50) = 800 (distance from target 751 is 49)
BEST SOLUTION SO FAR: ((9 + 6) * 50) = 750 (distance from target 751 is 1)
Perms of size: 4.
BEST SOLUTION SO FAR: (((9 + 6) * 50) + 1) = 751
Here is Java applet (source) and Javascript version.
The suggestion to use reverse polish notation is excellent.
If you have N=5 numbers, the template is
{num} {num} {ops} {num} {ops} {num} {ops} {num} {ops}
There can be zero to N ops in any spot, although the total number will be N-1. You just have to try different placements of numbers and ops.
The (((1+1)+1)+1)*(((1+1)+1)+1)=16 solution will be found when you try
1 1 + 1 + 1 + 1 1 + 1 + 1 + *
Update: Maybe not so good, since finding the above could take 433,701,273,600 tries. The number was obtained using the following:
use strict;
use warnings;
{
my %cache = ( 0 => 1 );
sub fact { my ($n) = #_; $cache{$n} ||= fact($n-1) * $n }
}
{
my %cache;
sub C {
my ($n,$r) = #_;
return $cache{"$n,$r"} ||= do {
my $i = $n;
my $j = $n-$r;
my $c = 1;
$c *= $i--/$j-- while $j;
$c
};
}
}
my #nums = (1,1,1,1,1,1,1,1);
my $Nn = 0+#nums; # Number of numbers.
my $No = $Nn-1; # Number of operators.
my $max_tries = do {
my $num_orderings = fact($Nn);
{
my %counts;
++$counts{$_} for #nums;
$num_orderings /= fact($_) for values(%counts);
}
my $op_orderings = 4 ** $No;
my $op_placements = 1;
$op_placements *= C($No, $_) for 1..$No-1;
$num_orderings * $op_orderings * $op_placements
};
printf "At most %.f tries needed\n", $max_tries;

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100