I have a set of data file looks like below. I would like to get the interpolation final value (final,P) by referring to 2 set of number range (scoreA and scoreB). Let's say for "Eric", his scoreA is 35 (value between range 30.00 - 40.00) and scoreB is 48 (a value between range 45.00 - 50.00). He will get 2 set of final values range which are (22.88,40.90) & (26.99,38.99). And I would like to get the final value of "Eric" and "George" in the data file. "George"'s scoreA = 38 and scoreB = 26.
After formula calculation, I want to get the exact final value when his scoreA=35 & scoreB=45. Let's assume formula is P=X+Y (P is final value), so far I have been trying the code as shown below. However it cannot get the correct lines.
How to get the exactly final value range by referring to the data given?
data file
Student_name ("Eric")
/* This is a junk line */
scoreA ("10.00, 20.00, 30.00, 40.00")
scoreB ("15.00, 30.00, 45.00, 50.00, 55.00")
final (
"12.23,19.00,37.88,45.98,60.00",\
"07.00,20.11,24.56,45.66,57.88",\
"05.00,15.78,22.88,40.90,57.99",\
"10.00,16.87,26.99,38.99,40.66"\)
Student_name ("Liy")
/* This is a junk line */
scoreA ("5.00, 10.00, 20.00, 60.00")
scoreB ("25.00, 30.00, 40.00, 55.00, 60.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("Frank")
/* This is a junk line */
scoreA ("2.00, 15.00, 25.00, 40.00")
scoreB ("15.00, 24.00, 38.00, 45.00, 80.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("George")
/* This is a junk line */
scoreA ("10.00, 15.00, 20.00, 40.00")
scoreB ("25.00, 33.00, 46.00, 55.00, 60.00")
final (
"10.23,25.00,37.88,45.98,68.00",\
"09.00,28.11,34.56,45.66,60.88",\
"18.00,19.78,32.88,40.90,79.66",\
"17.00,27.87,40.99,59.99,66.66"\)
Coding
data();
sub data() {
my $cnt = 0;
while (my #array = <FILE>) {
foreach $line(#array) {
if ($line =~ /Student_name/) {
$a = $line;
if ($a =~ /Eric/ or $cnt > 0 ) {
$cnt++;
}
if ( $cnt > 1 and $cnt <= 3 ) {
print $a;
}
if ( $cnt > 2 and $cnt <= 4 ) {
print $a;
}
if ( $cnt == 5 ) {
$cnt = 0;
}
}
}
}
}
Result
Eric final=42.66
George final=24.30
In my comment I said that parsing is fairly easy. Here is how it could be done. As the question lacks a proper specification of the file format, I will assume the following:
The file consists of properties, which have values:
document ::= property*
property ::= word "(" value ("," value)* ")"
A value is a double-quoted string containing numbers seperated by commata, or a single word:
value ::= '"' ( word | number ("," number)* ) '"'
Spaces, backslashes, and comments are irrelevant.
Here is a possible implementation; I will not go into the details of explaining how to write a simple parser.
package Parser;
use strict; use warnings;
sub parse {
my ($data) = #_;
# perform tokenization
pos($data) = 0;
my $length = length $data;
my #tokens;
while(pos($data) < $length) {
next if $data =~ m{\G\s+}gc
or $data =~ m{\G\\}gc
or $data =~ m{\G/[*].*?[*]/}gc;
if ($data =~ m/\G([",()])/gc) {
push #tokens, [symbol => $1];
} elsif ($data =~ m/\G([0-9]+[.][0-9]+)/gc) {
push #tokens, [number => 0+$1];
} elsif ($data =~ m/\G(\w+)/gc) {
push #tokens, [word => $1];
} else {
die "unreckognized token at:\n", substr $data, pos($data), 10;
}
}
return parse_document(\#tokens);
}
sub token_error {
my ($token, $expected) = #_;
return "Wrong token [#$token] when expecting [#$expected]";
}
sub parse_document {
my ($tokens) = #_;
my #properties;
push #properties, parse_property($tokens) while #$tokens;
return #properties;
}
sub parse_property {
my ($tokens) = #_;
$tokens->[0][0] eq "word"
or die token_error $tokens->[0], ["word"];
my $name = (shift #$tokens)->[1];
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '('
or die token_error $tokens->[0], [symbol => '('];
shift #$tokens;
my #vals;
VAL: {
push #vals, parse_value($tokens);
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo VAL;
}
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq ')'
or die token_error $tokens->[0], [symbol => ')'];
shift #$tokens;
return [ $name => #vals ];
}
sub parse_value {
my ($tokens) = #_;
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
my $value;
if ($tokens->[0][0] eq "word") {
$value = (shift #$tokens)->[1];
} else {
my #nums;
NUM: {
$tokens->[0][0] eq 'number'
or die token_error $tokens->[0], ['number'];
push #nums, (shift #$tokens)->[1];
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo NUM;
}
}
$value = \#nums;
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
return $value;
}
Now, we get the following data structure as output from Parser::parse:
(
["Student_name", "Eric"],
["scoreA", [10, 20, 30, 40]],
["scoreB", [15, 30, 45, 50, 55]],
[
"final",
[12.23, 19, 37.88, 45.98, 60],
[7, 20.11, 24.56, 45.66, 57.88],
[5, 15.78, 22.88, 40.9, 57.99],
[10, 16.87, 26.99, 38.99, 40.66],
],
["Student_name", "Liy"],
["scoreA", [5, 10, 20, 60]],
["scoreB", [25, 30, 40, 55, 60]],
[
"final",
[2.23, 15, 37.88, 45.98, 70],
[10, 28.11, 34.56, 45.66, 57.88],
[8, 19.78, 32.88, 40.9, 57.66],
[10, 27.87, 39.99, 59.99, 78.66],
],
...,
)
As a next step, we want to transform it into nested hashes, so that we have the structure
{
Eric => {
scoreA => [...],
scoreB => [...],
final => [[...], ...],
},
Liy => {...},
...,
}
So we simply run it through this small sub:
sub properties_to_hash {
my %hash;
while(my $name_prop = shift #_) {
$name_prop->[0] eq 'Student_name' or die "Expected Student_name property";
my $name = $name_prop->[1];
while( #_ and $_[0][0] ne 'Student_name') {
my ($prop, #vals) = #{ shift #_ };
if (#vals > 1) {
$hash{$name}{$prop} = \#vals;
} else {
$hash{$name}{$prop} = $vals[0];
}
}
}
return \%hash;
}
So we have the main code
my $data = properties_to_hash(Parser::parse( $file_contents ));
Now we can move on to Part 2 fo the problem: calculating your scores. That is, once you make clear what you need done.
Edit: Bilinear interpolation
Let f be the function that returns the value at a certain coordinate. If we have a value at those coordinates, we can return that. Else, we perform bilinear interpolation with the next known values.
The formula for bilinear interpolation[1] is:
f(x, y) = 1/( (x_2 - x_1) · (y_2 - y_1) ) · (
f(x_1, y_1) · (x_2 - x) · (y_2 - y)
+ f(x_2, y_1) · (x - x_1) · (y_2 - y)
+ f(x_1, y_2) · (x_2 - x) · (y - y_1)
+ f(x_2, y_2) · (x - x_1) · (y - y_1)
)
Now, scoreA denote the positions of the data points in the final table on the first axis, scoreA the positions on the second axis. We have to do the following:
assert that the requested values x, y are inside the bounds
fetch the next smaller and next larger positions
perform interpolation
.
sub f {
my ($data, $x, $y) = #_;
# do bounds check:
my ($x_min, $x_max, $y_min, $y_max) = (#{$data->{scoreA}}[0, -1], #{$data->{scoreB}}[0, -1]);
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
unless $x_min <= $x && $x <= $x_max
&& $y_min <= $y && $y <= $y_max;
To fetch the boxing indices x_1, x_2, y_1, y_2 we need to iterate through all possible scores. We'll also remember the physical indices x_i1, x_i2, y_i1, y_i2 of the underlying arrays.
my ($x_i1, $x_i2, $y_i1, $y_i2);
for ([$data->{scoreA}, \$x_i1, \$x_i2], [$data->{scoreB}, \$y_i1, \$y_i2]) {
my ($scores, $a_i1, $a_i2) = #$_;
for my $i (0 .. $#$scores) {
if ($scores->[$i] <= $x) {
($$a_i1, $$a_i2) = $i == $#$scores ? ($i, $i+1) : ($i-1, $i);
last;
}
}
}
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
my ($y_1, $y_2) = #{$data->{scoreB}}[$y_i1, $y_i2];
Now, interpolation according to above formula can be performed, but each access at a known index can be changed to an access via physical index, so f(x_1, y_2) would become
$final->[$x_i1][$y_i2]
Detailed Explanation of sub f
sub f { ... } declares a sub with name f, although that is probably a bad name. bilinear_interpolation might be a better name.
my ($data, $x, $y) = #_ states that our sub takes three arguments:
$data, a hash reference containing fields scoreA, scoreB and final, which are array references.
$x, the position along the scoreA-axis where interpolation is required.
$y, the position along the scoreB-axis where interpolation is required.
Next, we want to assert that the positions for $x and $y are valid aka inside bounds. The first value in $data->{scoreA} is the minimal value; the maximal value is in the last position (index -1). To get both at once, we use an array slice. A slice accesses multiple values at once and returns a list, like #array[1, 2]. Because we use complex data structures which use references, we have to dereference the array in $data->{scoreA}. This makes the slice look like #{$data->{scoreA}}[0, 1].
Now that we have the $x_min and $x_max values, we throw and error unless the requested value $x is inside the range defined by the min/max values. This is true when
$x_min <= $x && $x <= $x_max
Should either $x or $y be out of bounds, we throw an error which shows the actual bounds. So the code
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
could, for example, throw an error like
indices (10, 500) out of range ([20, 30], [25, 57]) at script.pl line 42
Here we can see that the value for $x is too small, and $y is too large.
The next problem is to find neighbouring values. Assuming scoreA holds [1, 2, 3, 4, 5], and $x is 3.7, we want to select the values 3 and 4. But because we can pull some nifty tricks a bit later, we would rather remember the position of the neighbouring values, not the values themselves. So this would give 2 and 3 in above example (remember that arrows are zero-based).
We can do this by looping over all indices of our array. When we find a value that is ≤ $x, we remember the index. E.g. 3 is the first value that is ≤ $x, so we remember the index 2. For the next higher value, we have to be a bit carful: Obviously, we can just take the next index, so 2 + 1 = 3. But now assume that $x is 5. This passes the bounds check. The first value that is ≤ $x would be value 5, so we can remember position 4. However, there is no entry at position 5, so we could use the current index itself. Because this would lead to division by zero later on, we would be better off remembering positions 3 and 4 (values 4 and 5).
Expressed as code, that is
my ($x_i1, $x_i2);
my #scoreA = #{ $data->{scoreA} }; # shortcut to the scoreA entry
for my $i (0 .. $#scores) { # iterate over all indices: `$#arr` is the last idx of #arr
if ($scores[$i] <= $x) { # do this if the current value is ≤ $x
if ($i != $#scores) { # if this isn't the last index
($x_i1, $x_i2) = ($i, $i+1);
} else { # so this is the last index
($x_i1, $x_i2) = ($i-1, $i);
}
last; # break out of the loop
}
}
In my original code I choose a more complex solution to avoid copy-pasting the same code for finding the neighbours of $y.
Because we also need the values, we obtain them via a slice with the indices:
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
Now we have all surrounding values $x1, $x_2, $y_1, $y_2 which define the rectangle in which we want to perform bilinear interpolation. The mathematical formula is easy to translate to Perl: just choose the correct operators (*, not · for multiplication), and the variables need dollar signs before them.
The formula I used is recursive: The definition of f refers to itself. This would imply an infinite loop, unless we do some thinking and break the recursion. f symbolizes the value at a certain position. In most cases, this means interpolating. However, if $x and $y are both equal to values in scoreA and scoreB respectively, we don't need bilinear interpolation, and can return the final entry directly.
This can be done by checking if both $x and $y are members of their arrays, and doing an early return. Or we can use the fact that $x_1, ..., $y_2 all are members of the arrays. Instead of recursing with values we know don't need interpolating, we just do an array access. This is what we have saved the indices $x_i1, ..., $y_i2 for. So wherever the original formula says f(x_1, y_1) or similar, we write the equivalent $data->{final}[$x_i1][$y_i2].
I want to create a subroutine that adds commas to elements and adds an "and" before the last element, e.g., so that "12345" becomes "1, 2, 3, 4, and 5". I know how to add the commas, but the problem is the result I get is "1, 2, 3, 4, and 5," and I don't know how to get rid of the last comma.
sub commas {
my #with_commas;
foreach (#_) {
push (#with_commas, ($_, ", ")); #up to here it's fine
}
splice #with_commas, -2, 1, ("and ", $_[-1]);
#with_commas;
}
As you can probably tell, I'm trying to delete the last element in the new array (#with_commas), since it has the comma appended, and add in the last element in the old array (#_, passed to the sub routine from the main program, with no added comma).
When I run this, the result is, e.g., "1, 2, 3, 4, and 5," -- with the comma at the end. Where is that comma coming from? Only #with_commas was supposed to get the commas.
Any help is appreciated.
sub format_list {
return "" if !#_;
my $last = pop(#_);
return $last if !#_;
return join(', ', #_) . " and " . $last;
}
print format_list(#list), "\n";
This also handles lists with only one element, unlike most of the other answers.
You could use join and modify the last element to include an and:
my #list = 1 .. 5;
$list[-1] = "and $list[-1]" if $#list;
print join ', ', #list;
There is a CPAN module for this, Lingua::Conjunction. I use it myself, and recommend it over rolling your own solution. The usage syntax is very simple:
conjunction(#list);
#!/usr/bin/perl
use warnings;
use strict;
sub commas {
return "" if #_ == 0;
return $_[0] if #_ == 1;
my $last = pop #_;
my $rest = join (", ", #_);
return $rest.", and ".$last;
}
my #a = (1,2,3,4,5);
print commas(#a), "\n";
Add the commas then add the "and ":
use v5.10;
my $string = join ', ', 1 .. 5;
substr
$string,
rindex( $string, ', ' ) + 2,
0,
'and '
;
say $string;
So, work that in as the case when you have more than two elements:
use v5.10;
my #array = 1..5;
my $string = do {
if( #array == 1 ) {
#array[0];
}
elsif( #array == 2 ) {
join ' and ', #array
}
elsif( #array > 2 ) {
my $string = join ', ', #array;
my $commas = $string =~ tr/,//;
substr
$string,
rindex( $string, ', ' ) + 2,
0,
'and '
;
$string;
}
};
say $string;
Just in the spirit of TIMTOWTDI (though, frankly, #perreal's answer is better as far as readability):
sub commas {
my $last_index = $#_;
my #with_commas = map { (($_==$last_index) ? "and " : "") . $_[$_] }
0 .. $last_index;
print join("," #with_commas)
}
This is somewhat similar to Alan's answer (more convoluted/complicated), but the benefit compared to that is that it would work if you need to add "and " to any OTHER element than the last one; Alan's only works when you know the exact offset (e.g. last element)
Small hint
for( 1 .. 10 ) {
print ;
$_ == 10 ? print '' : ($_ != 9 ? print ', ' : print ' and ');
}
I'm using Perl to run through a tree, and then calculate the leaf nodes of the tree using the internal nodes as operators. I want to be able to print this in a postfix manner, and I managed to this this fairly easily with the basic operands (simply call the left and right nodes respectively before calling the parent) but I am having trouble producing the desired output for an average function. I don't have any trouble printing the actual result of the calculation, but I want to be able to print the operators and operands in postfix notation.
For example, 1 + average(3, 4, 5) will be shown as 1 ; 3 4 5 average +.
Here is my code:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 0;
$Data::Dumper::Terse = 1;
my $debug = 0;
# an arithmetic expression tree is a reference to a list, which can
# be of two kinds as follows:
# [ 'leaf', value ]
# [ 'internal', operation, leftarg, rightarg ]
# Evaluate($ex) takes an arithmetic expression tree and returns its
# evaluated value.
sub Evaluate {
my ($ex) = #_;
$debug and
print "evaluating: ", Dumper($ex), "\n";
# the kind of node is given in the first element of the array
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
# base case
my $value = $ex->[1];
$debug and
print "returning leaf: $value\n";
return $value;
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Eval: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and two arguments
my $operation = $ex->[1];
my $left_ex = $ex->[2];
my $right_ex = $ex->[3];
# evaluate the left and right arguments;
my $left_value = Evaluate($left_ex);
my $right_value = Evaluate($right_ex);
# if any arguments are undefined, our value is undefined.
return undef unless
defined($left_value) and defined($right_value);
my $result;
# or do it explicitly for the required operators ...
if ($operation eq 'average') {
$result = ($left_value + $right_value) / 2;
}
if ($operation eq '+') {
$result = $left_value + $right_value;
} elsif ($operation eq '-') {
$result = $left_value - $right_value;
} elsif ($operation eq '*') {
$result = $left_value * $right_value;
} elsif ($operation eq 'div') {
if ($right_value != 0 ) {
$result = int ($left_value / $right_value);
} else {
$result = undef;
}
} elsif ($operation eq 'mod') {
$result = $left_value % $right_value;
} elsif ($operation eq '/') {
if ( $right_value != 0 ) {
$result = $left_value / $right_value;
}
else {
$result = undef;
}
}
$debug and
print "returning '$operation' on $left_value and $right_value result: $result\n";
return $result;
}
# Display($ex, $style) takes an arithmetic expression tree and a style
# parameter ('infix' or 'postfix') and returns a string that represents
# printable form of the expression in the given style.
sub Display {
my ($ex, $style) = #_;
# the kind of node is given in the first element of the array
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
# base case
my $value = $ex->[1];
return $value;
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Display: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and two arguments
my $operation = $ex->[1];
my $left_ex = $ex->[2];
my $right_ex = $ex->[3];
# evaluate the left and right arguments;
my $left_value = Display($left_ex, $style);
my $right_value = Display($right_ex, $style);
my $result;
if ($operation ne 'average') {
$result = "($left_value $operation $right_value) \n $left_value $right_value $operation";
} else {
$result = "($left_value $operation $right_value) \n $left_value $right_value $operation";
}
return $result;
}
# module end;
1;
And here is a test:
use strict;
use warnings;
use Display;
use arith;
my $ex1 = [ 'leaf', 42];
my $ex2 = [ 'internal', '+', [ 'leaf', 42], [ 'leaf', 10 ] ];
my $ex3 = [ 'internal', 'average', $ex2, [ 'leaf', 1 ] ];
print "ex1 is ", Evaluate($ex1), "\n";
print "ex1: ", Display($ex1), "\n";
print "\n";
print "ex2 is ", Evaluate($ex2), "\n";
print "ex2: ", Display($ex2), "\n";
print "\n";
print "ex3 is ", Evaluate($ex3), "\n";
print "ex3: ", Display($ex3), "\n";
print "\n";
Display::Render(\$ex3);
In order to do this, I realize I will have to change the subroutine "Display", but I'm not sure how to get the output --> value value ; #to indicate values that aren't averaged# value value average operand etc.
Any ideas?
I am not 100% sure that I understand your problem, but here is a cleanup / improvement of your two functions:
my %ops = ( # dispatch table for operations
average => sub {my $acc; $acc += $_ for #_; $acc / #_},
'+' => sub {$_[0] + $_[1]},
'-' => sub {$_[0] - $_[1]},
'*' => sub {$_[0] * $_[1]},
'mod' => sub {$_[0] % $_[1]},
(map {$_ => sub {$_[1] ? $_[0] / $_[1] : undef}} qw (/ div)),
);
sub Evaluate {
my $ex = shift;
print "evaluating: ", Dumper($ex), "\n" if $debug;
my $node_type = $ex->[0];
if ( $node_type eq 'leaf' ) {
print "returning leaf: $$ex[1]\n" if $debug;
return $$ex[1];
}
elsif ( $node_type ne 'internal' ) {
die "Eval: Strange node type '$node_type' when evaluating tree";
}
my $operation = $ex->[1];
my #values = map {Evaluate($_)} #$ex[2 .. $#$ex];
defined or return for #values;
if (my $op = $ops{$operation}) {
return $op->(#values);
} else {
print "operation $operation not found\n";
return undef;
}
}
Here the large if/elsif block is replaced with a dispatch table. This allows you to separate the logic from the parser. I have also replaced the $left_value and $right_value variables with the #values array, allowing your code to scale to n-arity operations (like average).
The following Display function has also been updated to handle n-arity operations:
my %is_infix = map {$_ => 1} qw( * + / - );
sub Display {
my ($ex, $style) = #_;
my $node_type = $ex->[0];
# if a leaf, then the value is a number
if ( $node_type eq 'leaf' ) {
return $$ex[1];
}
# if not a leaf, then is internal,
if ( $node_type ne 'internal' ) {
die "Display: Strange node type '$node_type' when evaluating tree";
}
# should now have an operation and n arguments
my $operation = $ex->[1];
if ($style and $style eq 'infix') {
my #values = map {Display($_, $style)} #$ex[2 .. $#$ex];
if ($is_infix{$operation}) {
return "$values[0] $operation $values[1]"
} else {
local $" = ', '; # "
return "$operation( #values )"
}
} else { # postfix by default
my #out;
for (#$ex[2 .. $#$ex]) {
if (#out and $_->[0] eq 'internal') {
push #out, ';'
}
push #out, Display($_, $style)
}
return join ' ' => #out, $operation;
}
}
You can call Display as Display($tree) or Display($tree, 'postfix') for postfix notation. And Display($tree, 'infix') for the infix notation.
ex1 is 42
ex1: 42
ex1: 42
ex2 is 52
ex2: 42 10 +
ex2: 42 + 10
ex3 is 26.5
ex3: 42 10 + 1 average
ex3: average( 42 + 10, 1 )
Which I believe is what you are looking for.
Finally, using your first example 1 + average(3, 4, 5):
my $avg = ['internal', 'average', [leaf => 3], [leaf => 4], [leaf => 5] ];
my $ex4 = ['internal', '+', [leaf => 1], $avg ];
print "ex4 is ", Evaluate($ex4), "\n";
print "ex4: ", Display($ex4), "\n";
print "ex4: ", Display($ex4, 'infix'), "\n";
print "\n";
which prints:
ex4 is 5
ex4: 1 ; 3 4 5 average +
ex4: 1 + average( 3, 4, 5 )
Maybe try AlgebraicToRPN?