I have a set of data file looks like below. I would like to get the interpolation final value (final,P) by referring to 2 set of number range (scoreA and scoreB). Let's say for "Eric", his scoreA is 35 (value between range 30.00 - 40.00) and scoreB is 48 (a value between range 45.00 - 50.00). He will get 2 set of final values range which are (22.88,40.90) & (26.99,38.99). And I would like to get the final value of "Eric" and "George" in the data file. "George"'s scoreA = 38 and scoreB = 26.
After formula calculation, I want to get the exact final value when his scoreA=35 & scoreB=45. Let's assume formula is P=X+Y (P is final value), so far I have been trying the code as shown below. However it cannot get the correct lines.
How to get the exactly final value range by referring to the data given?
data file
Student_name ("Eric")
/* This is a junk line */
scoreA ("10.00, 20.00, 30.00, 40.00")
scoreB ("15.00, 30.00, 45.00, 50.00, 55.00")
final (
"12.23,19.00,37.88,45.98,60.00",\
"07.00,20.11,24.56,45.66,57.88",\
"05.00,15.78,22.88,40.90,57.99",\
"10.00,16.87,26.99,38.99,40.66"\)
Student_name ("Liy")
/* This is a junk line */
scoreA ("5.00, 10.00, 20.00, 60.00")
scoreB ("25.00, 30.00, 40.00, 55.00, 60.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("Frank")
/* This is a junk line */
scoreA ("2.00, 15.00, 25.00, 40.00")
scoreB ("15.00, 24.00, 38.00, 45.00, 80.00")
final (
"02.23,15.00,37.88,45.98,70.00",\
"10.00,28.11,34.56,45.66,57.88",\
"08.00,19.78,32.88,40.90,57.66",\
"10.00,27.87,39.99,59.99,78.66"\)
Student_name ("George")
/* This is a junk line */
scoreA ("10.00, 15.00, 20.00, 40.00")
scoreB ("25.00, 33.00, 46.00, 55.00, 60.00")
final (
"10.23,25.00,37.88,45.98,68.00",\
"09.00,28.11,34.56,45.66,60.88",\
"18.00,19.78,32.88,40.90,79.66",\
"17.00,27.87,40.99,59.99,66.66"\)
Coding
data();
sub data() {
my $cnt = 0;
while (my #array = <FILE>) {
foreach $line(#array) {
if ($line =~ /Student_name/) {
$a = $line;
if ($a =~ /Eric/ or $cnt > 0 ) {
$cnt++;
}
if ( $cnt > 1 and $cnt <= 3 ) {
print $a;
}
if ( $cnt > 2 and $cnt <= 4 ) {
print $a;
}
if ( $cnt == 5 ) {
$cnt = 0;
}
}
}
}
}
Result
Eric final=42.66
George final=24.30
In my comment I said that parsing is fairly easy. Here is how it could be done. As the question lacks a proper specification of the file format, I will assume the following:
The file consists of properties, which have values:
document ::= property*
property ::= word "(" value ("," value)* ")"
A value is a double-quoted string containing numbers seperated by commata, or a single word:
value ::= '"' ( word | number ("," number)* ) '"'
Spaces, backslashes, and comments are irrelevant.
Here is a possible implementation; I will not go into the details of explaining how to write a simple parser.
package Parser;
use strict; use warnings;
sub parse {
my ($data) = #_;
# perform tokenization
pos($data) = 0;
my $length = length $data;
my #tokens;
while(pos($data) < $length) {
next if $data =~ m{\G\s+}gc
or $data =~ m{\G\\}gc
or $data =~ m{\G/[*].*?[*]/}gc;
if ($data =~ m/\G([",()])/gc) {
push #tokens, [symbol => $1];
} elsif ($data =~ m/\G([0-9]+[.][0-9]+)/gc) {
push #tokens, [number => 0+$1];
} elsif ($data =~ m/\G(\w+)/gc) {
push #tokens, [word => $1];
} else {
die "unreckognized token at:\n", substr $data, pos($data), 10;
}
}
return parse_document(\#tokens);
}
sub token_error {
my ($token, $expected) = #_;
return "Wrong token [#$token] when expecting [#$expected]";
}
sub parse_document {
my ($tokens) = #_;
my #properties;
push #properties, parse_property($tokens) while #$tokens;
return #properties;
}
sub parse_property {
my ($tokens) = #_;
$tokens->[0][0] eq "word"
or die token_error $tokens->[0], ["word"];
my $name = (shift #$tokens)->[1];
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '('
or die token_error $tokens->[0], [symbol => '('];
shift #$tokens;
my #vals;
VAL: {
push #vals, parse_value($tokens);
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo VAL;
}
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq ')'
or die token_error $tokens->[0], [symbol => ')'];
shift #$tokens;
return [ $name => #vals ];
}
sub parse_value {
my ($tokens) = #_;
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
my $value;
if ($tokens->[0][0] eq "word") {
$value = (shift #$tokens)->[1];
} else {
my #nums;
NUM: {
$tokens->[0][0] eq 'number'
or die token_error $tokens->[0], ['number'];
push #nums, (shift #$tokens)->[1];
if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
shift #$tokens;
redo NUM;
}
}
$value = \#nums;
}
$tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
or die token_error $tokens->[0], [symbol => '"'];
shift #$tokens;
return $value;
}
Now, we get the following data structure as output from Parser::parse:
(
["Student_name", "Eric"],
["scoreA", [10, 20, 30, 40]],
["scoreB", [15, 30, 45, 50, 55]],
[
"final",
[12.23, 19, 37.88, 45.98, 60],
[7, 20.11, 24.56, 45.66, 57.88],
[5, 15.78, 22.88, 40.9, 57.99],
[10, 16.87, 26.99, 38.99, 40.66],
],
["Student_name", "Liy"],
["scoreA", [5, 10, 20, 60]],
["scoreB", [25, 30, 40, 55, 60]],
[
"final",
[2.23, 15, 37.88, 45.98, 70],
[10, 28.11, 34.56, 45.66, 57.88],
[8, 19.78, 32.88, 40.9, 57.66],
[10, 27.87, 39.99, 59.99, 78.66],
],
...,
)
As a next step, we want to transform it into nested hashes, so that we have the structure
{
Eric => {
scoreA => [...],
scoreB => [...],
final => [[...], ...],
},
Liy => {...},
...,
}
So we simply run it through this small sub:
sub properties_to_hash {
my %hash;
while(my $name_prop = shift #_) {
$name_prop->[0] eq 'Student_name' or die "Expected Student_name property";
my $name = $name_prop->[1];
while( #_ and $_[0][0] ne 'Student_name') {
my ($prop, #vals) = #{ shift #_ };
if (#vals > 1) {
$hash{$name}{$prop} = \#vals;
} else {
$hash{$name}{$prop} = $vals[0];
}
}
}
return \%hash;
}
So we have the main code
my $data = properties_to_hash(Parser::parse( $file_contents ));
Now we can move on to Part 2 fo the problem: calculating your scores. That is, once you make clear what you need done.
Edit: Bilinear interpolation
Let f be the function that returns the value at a certain coordinate. If we have a value at those coordinates, we can return that. Else, we perform bilinear interpolation with the next known values.
The formula for bilinear interpolation[1] is:
f(x, y) = 1/( (x_2 - x_1) · (y_2 - y_1) ) · (
f(x_1, y_1) · (x_2 - x) · (y_2 - y)
+ f(x_2, y_1) · (x - x_1) · (y_2 - y)
+ f(x_1, y_2) · (x_2 - x) · (y - y_1)
+ f(x_2, y_2) · (x - x_1) · (y - y_1)
)
Now, scoreA denote the positions of the data points in the final table on the first axis, scoreA the positions on the second axis. We have to do the following:
assert that the requested values x, y are inside the bounds
fetch the next smaller and next larger positions
perform interpolation
.
sub f {
my ($data, $x, $y) = #_;
# do bounds check:
my ($x_min, $x_max, $y_min, $y_max) = (#{$data->{scoreA}}[0, -1], #{$data->{scoreB}}[0, -1]);
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
unless $x_min <= $x && $x <= $x_max
&& $y_min <= $y && $y <= $y_max;
To fetch the boxing indices x_1, x_2, y_1, y_2 we need to iterate through all possible scores. We'll also remember the physical indices x_i1, x_i2, y_i1, y_i2 of the underlying arrays.
my ($x_i1, $x_i2, $y_i1, $y_i2);
for ([$data->{scoreA}, \$x_i1, \$x_i2], [$data->{scoreB}, \$y_i1, \$y_i2]) {
my ($scores, $a_i1, $a_i2) = #$_;
for my $i (0 .. $#$scores) {
if ($scores->[$i] <= $x) {
($$a_i1, $$a_i2) = $i == $#$scores ? ($i, $i+1) : ($i-1, $i);
last;
}
}
}
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
my ($y_1, $y_2) = #{$data->{scoreB}}[$y_i1, $y_i2];
Now, interpolation according to above formula can be performed, but each access at a known index can be changed to an access via physical index, so f(x_1, y_2) would become
$final->[$x_i1][$y_i2]
Detailed Explanation of sub f
sub f { ... } declares a sub with name f, although that is probably a bad name. bilinear_interpolation might be a better name.
my ($data, $x, $y) = #_ states that our sub takes three arguments:
$data, a hash reference containing fields scoreA, scoreB and final, which are array references.
$x, the position along the scoreA-axis where interpolation is required.
$y, the position along the scoreB-axis where interpolation is required.
Next, we want to assert that the positions for $x and $y are valid aka inside bounds. The first value in $data->{scoreA} is the minimal value; the maximal value is in the last position (index -1). To get both at once, we use an array slice. A slice accesses multiple values at once and returns a list, like #array[1, 2]. Because we use complex data structures which use references, we have to dereference the array in $data->{scoreA}. This makes the slice look like #{$data->{scoreA}}[0, 1].
Now that we have the $x_min and $x_max values, we throw and error unless the requested value $x is inside the range defined by the min/max values. This is true when
$x_min <= $x && $x <= $x_max
Should either $x or $y be out of bounds, we throw an error which shows the actual bounds. So the code
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
could, for example, throw an error like
indices (10, 500) out of range ([20, 30], [25, 57]) at script.pl line 42
Here we can see that the value for $x is too small, and $y is too large.
The next problem is to find neighbouring values. Assuming scoreA holds [1, 2, 3, 4, 5], and $x is 3.7, we want to select the values 3 and 4. But because we can pull some nifty tricks a bit later, we would rather remember the position of the neighbouring values, not the values themselves. So this would give 2 and 3 in above example (remember that arrows are zero-based).
We can do this by looping over all indices of our array. When we find a value that is ≤ $x, we remember the index. E.g. 3 is the first value that is ≤ $x, so we remember the index 2. For the next higher value, we have to be a bit carful: Obviously, we can just take the next index, so 2 + 1 = 3. But now assume that $x is 5. This passes the bounds check. The first value that is ≤ $x would be value 5, so we can remember position 4. However, there is no entry at position 5, so we could use the current index itself. Because this would lead to division by zero later on, we would be better off remembering positions 3 and 4 (values 4 and 5).
Expressed as code, that is
my ($x_i1, $x_i2);
my #scoreA = #{ $data->{scoreA} }; # shortcut to the scoreA entry
for my $i (0 .. $#scores) { # iterate over all indices: `$#arr` is the last idx of #arr
if ($scores[$i] <= $x) { # do this if the current value is ≤ $x
if ($i != $#scores) { # if this isn't the last index
($x_i1, $x_i2) = ($i, $i+1);
} else { # so this is the last index
($x_i1, $x_i2) = ($i-1, $i);
}
last; # break out of the loop
}
}
In my original code I choose a more complex solution to avoid copy-pasting the same code for finding the neighbours of $y.
Because we also need the values, we obtain them via a slice with the indices:
my ($x_1, $x_2) = #{$data->{scoreA}}[$x_i1, $x_i2];
Now we have all surrounding values $x1, $x_2, $y_1, $y_2 which define the rectangle in which we want to perform bilinear interpolation. The mathematical formula is easy to translate to Perl: just choose the correct operators (*, not · for multiplication), and the variables need dollar signs before them.
The formula I used is recursive: The definition of f refers to itself. This would imply an infinite loop, unless we do some thinking and break the recursion. f symbolizes the value at a certain position. In most cases, this means interpolating. However, if $x and $y are both equal to values in scoreA and scoreB respectively, we don't need bilinear interpolation, and can return the final entry directly.
This can be done by checking if both $x and $y are members of their arrays, and doing an early return. Or we can use the fact that $x_1, ..., $y_2 all are members of the arrays. Instead of recursing with values we know don't need interpolating, we just do an array access. This is what we have saved the indices $x_i1, ..., $y_i2 for. So wherever the original formula says f(x_1, y_1) or similar, we write the equivalent $data->{final}[$x_i1][$y_i2].
Related
How to encode a number in Perl to 8 character Alphanumeric string starting with an alphabet and ending with a digit where the ending digit is check digit.
So how to generate check digit and I planned to start counter from 21767823360 so that my resultant string starts with A000000 but perl is not taking such a big number for calculation.
Please suggest a solution.
$AppID=alphanumero($appid,8,1);
sub alphanumero{
my ($n,$length,$type)=#_;
my #to_b36 = (0 .. 9, 'A' .. 'Z');
use integer; # so that /= 36 is easy
my $u=$n%10;$n=21767823360+($n-$u)/10;
my $t = "";do { $t = $to_b36[$n % 36] . $t, $n /= 36 } while $n;
return "$t$u";
}
Perl has little problems with big numbers, and if your numbers are really huge, just use bignum. This transparently enables infinite-precision arithmetics.
Your number 21767823360 needs about 35 bits. My perl is built with 64-bit integers (see perl -v to check your support), so your number isn't "too large" for me.
The algorithm to convert a number to base-n is simple:
# pseudocode
let "digits" be the array containing all the digits of our representation.
# the size of digits is the base of our new representation
# the digits are sorted in ascending order.
#digits[0] is zero.
var "n" is the number we want to represent.
var "size" is the number of digits of the new representation.
# floor(ln(n)/ln(digits.size))
var "representation" is the empty string.
while size >= 0:
representation ← representation.concat(digits[n / digits.length ^ size]).
n ← n.modulo(digits.length ^ size).
size ← size - 1.
return representation.
Example Perl:
#!/usr/bin/perl
use strict; use warnings;
use Carp;
sub base_n {
my ($number, $base, $max_digits, $pad) = #_;
defined $number or croak "Undefined number for base_n";
$number == int $number and $number >= 0
or croak "The number has to be a natural number for base_n";
defined $base or croak "Undefined base for base_n";
$base == int $base and $base > 0
or croak "The base has to be a positive integer for base_n";
my #digits = (0 .. 9, "A" .. "Z");
$base <= #digits or croak "base_n can only convert to base-" . #digits . " max.";
#digits = #digits[0 .. $base - 1];
my $size = $number ? int(log($number) / log($base)) : 0; # avoid log(0)
if (defined $max_digits) {
$size < $max_digits
or croak "The number $number is too large for $max_digits digits in base $base.";
$size = $max_digits - 1 if $pad;
}
my $representation = "";
while ($size >= 0) {
$representation .= $digits[$number / #digits**$size];
$number %= #digits**$size;
$size--;
}
if (wantarray) {
my $checksum = substr $representation, -1;
return $representation, $checksum;
} else {
return $representation;
}
}
A corresponding (but incomplete) unit test:
use Test::More;
my $n = 21767823360;
ok "A000000" eq base_n($n => 36), "simple";
ok "A000000" eq base_n($n => 36, 8), "valid constraint";
ok "0A000000" eq base_n($n => 36, 8, 1), "padding";
ok ! eval { base_n($n => 36, 6); 1 }, "invalid constraint";
ok "0" eq (base_n($n => 36))[1], "checksum (1)";
ok "A" eq (base_n($n+10 => 36))[1], "checksum (2)";
ok "0" eq base_n(0 => 36), "zero: simple";
ok "0"x8 eq base_n(0 => 36, 8, 1), "zero: padding";
ok ! eval { base_n($n => 0.7); 1 }, "invalid base";
ok ! eval { base_n(0.7 => 36); 1 }, "invalid number";
ok $n == base_n($n => 10), "round-trip safety";
ok $n eq base_n($n => 10, length $n, 1), "round-trip safety: padding";
done_testing;
I have two arrays:
#array1 = (A,B,C,D,E,F);
#array2 = (A,C,H,D,E,G);
The arrays could be of different size. I want to find how many mismatches are there between the arrays. The indexes should be the same. In this case there are three mismatch :b->c,c->h and F->G.(i.e , The 'C' in $array[2] should not be considered a match to 'C' in $array[1]) I would like to get the number of mismatches as well as the mismatch.
foreach my $a1 ( 0 .. $#array1) {
foreach my $a2( 0 .. $#array2)
if($array1[$a1] ne $array2[$a2]) {
}
}
}
my %array_one = map {$_, 1} #array1;
my #difference = grep {!$array_one {$_}} #array1;
print "#difference\n";
Ans: gives me H, G but not C.
with my little Perl knowledge I tried this, with no result. Could you suggest me how I should deal this? Your suggestions and pointers would be very helpful.
You shouldn't have nested loops. You only need to go through the indexes once.
use List::Util qw( max );
my #mismatches;
for my $i (0..max($#array1, $#array2)) {
push #mismatches, $i
if $i >= #array1
|| $i >= #array2
|| $array1[$i] ne $array2[$i];
}
}
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Since you mentioned grep, this is how you'd replace the for with grep:
use List::Util qw( max );
my #mismatches =
grep { $_ >= #array1
|| $_ >= #array2
|| array1[$_] ne $array2[$_] }
0 .. max($#array1, $#array2);
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Here's an example using each_arrayref from List::MoreUtils.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my #out;
my $iter = each_arrayref(#_);
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}
This version should be faster if you are going to use this for determining the number of differences often. The output is exactly the same. It just doesn't have to work as hard when returning a number.
Read about wantarray for more information.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my $iter = each_arrayref(#_);
if( wantarray ){
# return structure
my #out;
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}else{
# only return a count of differences
my $out = 0;
while( my #current = $iter->() ){
++$out unless all_same #current;
}
return $out;
}
}
diff_array uses the subroutine all_same to determine if all of the current list of elements are the same.
sub all_same{
my $head = shift;
return undef unless #_; # not enough arguments
for( #_ ){
return 0 if $_ ne $head; # at least one mismatch
}
return 1; # all are the same
}
To get just the number of differences:
print scalar diff_array \#array1, \#array2;
my $count = diff_array \#array1, \#array2;
To get a list of differences:
my #list = diff_array \#array1, \#array2;
To get both:
my $count = my #list = diff_array \#array1, \#array2;
The output for the input you provided:
(
[ 1, 'B', 'C' ],
[ 2, 'C', 'H' ],
[ 5, 'F', 'G' ]
)
Example usage
my #a1 = qw'A B C D E F';
my #a2 = qw'A C H D E G';
my $count = my #list = diff_array \#a1, \#a2;
print "There were $count differences\n\n";
for my $group (#list){
my $index = shift #$group;
print " At index $index\n";
print " $_\n" for #$group;
print "\n";
}
You're iterating over both arrays when you don't want to be doing so.
#array1 = ("A","B","C","D","E","F");
#array2 = ("A","C","H","D","E","G");
foreach my $index (0 .. $#array1) {
if ($array1[$index] ne $array2[$index]) {
print "Arrays differ at index $index: $array1[$index] and $array2[$index]\n";
}
}
Output:
Arrays differ at index 1: B and C
Arrays differ at index 2: C and H
Arrays differ at index 5: F and G
Well, first, you're going to want to go over each element of one of the arrays, and compare it to the same element of the other array. List::MoreUtils provides an easy way to do this:
use v5.14;
use List::MoreUtils qw(each_array);
my #a = qw(a b c d);
my #b = qw(1 2 3);
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
say "a = $a, b = $b, idx = ", $ea->('index');
}
You can extend that to find where there is a non-match by checking inside that while loop (note: this assumes your arrays don't have undefs at the end, or that if they do, undef is the same as having a shorter array):
my #mismatch;
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
if (defined $a != defined $b || $a ne $b) {
push #mismatch, $ea->('index');
}
}
and then:
say "Mismatched count = ", scalar(#mismatch), " items are: ", join(q{, }, #mismatch);
The following code builds a list of mismatched pairs, then prints them out.
#a1 = (A,B,C,D,E,F);
#a2 = (A,C,H,D,E,G);
#diff = map { [$a1[$_] => $a2[$_]] }
grep { $a1[$_] ne $a2[$_] }
(0..($#a1 < $#a2 ? $#a1 : $#a2));
print "$_->[0]->$_->[1]\n" for #diff
You have the right idea, but you only need a single loop, since you are looking at each index and comparing entries between the arrays:
foreach my $a1 ( 0 .. $#array1) {
if($array1[$a1] ne $array2[$a1]) {
print "$a1: $array1[$a1] <-> $array2[$a1]\n";
}
}
I want Perl (5.8.8) to find out what word has the most letters in common with the other words in an array - but only letters that are in the same place. (And preferably without using libs.)
Take this list of words as an example:
BAKER
SALER
BALER
CARER
RUFFR
Her BALER is the word that has the most letters in common with the others. It matches BAxER in BAKER, xALER in SALER, xAxER in CARER, and xxxxR in RUFFR.
I want Perl to find this word for me in an arbitrary list of words with the same length and case. Seems I've hit the wall here, so help is much appreciated!
What I've tried until now
Don't really have much of a script at the moment:
use strict;
use warnings;
my #wordlist = qw(BAKER SALER MALER BARER RUFFR);
foreach my $word (#wordlist) {
my #letters = split(//, $word);
# now trip trough each iteration and work magic...
}
Where the comment is, I've tried several kinds of code, heavy with for-loops and ++ varables. Thus far, none of my attempts have done what I need it to do.
So, to better explain: What I need is to test word for word against the list, for each letterposition, to find the word that has the most letters in common with the others in the list, at that letter's position.
One possible way could be to first check which word(s) has the most in common at letter-position 0, then test letter-position 1, and so on, until you find the word that in sum has the most letters in common with the other words in the list. Then I'd like to print the list like a matrix with scores for each letterposition plus a total score for each word, not unlike what DavidO suggest.
What you'd in effect end up with is a matrix for each words, with the score for each letter position, and the sum total score fore each word in the matrix.
Purpose of the Program
Hehe, I might as well say it: The program is for hacking terminals in the game Fallout 3. :D My thinking is that it's a great way to learn Perl while also having fun gaming.
Here's one of the Fallout 3 terminal hacking tutorials I've used for research: FALLOUT 3: Hacking FAQ v1.2, and I've already made a program to shorten the list of words, like this:
#!/usr/bin/perl
# See if one word has equal letters as the other, and how many of them are equal
use strict;
use warnings;
my $checkword = "APPRECIATION"; # the word to be checked
my $match = 4; # equal to the match you got from testing your checkword
my #checkletters = split(//, $checkword); #/
my #wordlist = qw(
PARTNERSHIPS
REPRIMANDING
CIVILIZATION
APPRECIATION
CONVERSATION
CIRCUMSTANCE
PURIFICATION
SECLUSIONIST
CONSTRUCTION
DISAPPEARING
TRANSMISSION
APPREHENSIVE
ENCOUNTERING
);
print "$checkword has $match letters in common with:\n";
foreach my $word (#wordlist) {
next if $word eq $checkword;
my #letters = split(//, $word);
my $length = #letters; # determine length of array (how many letters to check)
my $eq_letters = 0; # reset to 0 for every new word to be tested
for (my $i = 0; $i < $length; $i++) {
if ($letters[$i] eq $checkletters[$i]) {
$eq_letters++;
}
}
if ($eq_letters == $match) {
print "$word\n";
}
}
# Now to make a script on to find the best word to check in the first place...
This script will yield CONSTRUCTION and TRANSMISSION as its result, just as in the game FAQ. The trick to the original question, though (and the thing I didn't manage to find out on my own), is how to find the best word to try in the first place, i.e. APPRECIATION.
OK, I've now supplied my own solution based on your help, and consider this thread closed. Many, many thanks to all the contributers. You've helped tremendously, and on the way I've also learned a lot. :D
Here's one way. Having re-read your spec a couple of times I think it's what you're looking for.
It's worth mentioning that it's possible there will be more than one word with an equal top score. From your list there's only one winner, but it's possible that in longer lists, there will be several equally winning words. This solution deals with that. Also, as I understand it, you count letter matches only if they occur in the same column per word. If that's the case, here's a working solution:
use 5.012;
use strict;
use warnings;
use List::Util 'max';
my #words = qw/
BAKER
SALER
BALER
CARER
RUFFR
/;
my #scores;
foreach my $word ( #words ) {
my $score;
foreach my $comp_word ( #words ) {
next if $comp_word eq $word;
foreach my $pos ( 0 .. ( length $word ) - 1 ) {
$score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
}
}
push #scores, $score;
}
my $max = max( #scores );
my ( #max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
say "Words with most matches:";
say for #words[#max_ixs];
This solution counts how many times per letter column each word's letters match other words. So for example:
Words: Scores: Because:
ABC 1, 2, 1 = 4 A matched once, B matched twice, C matched once.
ABD 1, 2, 1 = 4 A matched once, B matched twice, D matched once.
CBD 0, 2, 1 = 3 C never matched, B matched twice, D matched once.
BAC 0, 0, 1 = 1 B never matched, A never matched, C matched once.
That gives you the winners of ABC and ABD, each with a score of four positional matches. Ie, the cumulative times that column one, row one matched column one row two, three, and four, and so on for the subsequent columns.
It may be able to be optimized further, and re-worded to be shorter, but I tried to keep the logic fairly easy to read. Enjoy!
UPDATE / EDIT
I thought about it and realized that though my existing method does exactly what your original question requested, it did it in O(n^2) time, which is comparatively slow. But if we use hash keys for each column's letters (one letter per key), and do a count of how many times each letter appears in the column (as the value of the hash element), we could do our summations in O(1) time, and our traversal of the list in O(n*c) time (where c is the number of columns, and n is the number of words). There's some setup time too (creation of the hash). But we still have a big improvement. Here is a new version of each technique, as well as a benchmark comparison of each.
use strict;
use warnings;
use List::Util qw/ max sum /;
use Benchmark qw/ cmpthese /;
my #words = qw/
PARTNERSHIPS
REPRIMANDING
CIVILIZATION
APPRECIATION
CONVERSATION
CIRCUMSTANCE
PURIFICATION
SECLUSIONIST
CONSTRUCTION
DISAPPEARING
TRANSMISSION
APPREHENSIVE
ENCOUNTERING
/;
# Just a test run for each solution.
my( $top, $indexes_ref );
($top, $indexes_ref ) = find_top_matches_force( \#words );
print "Testing force method: $top matches.\n";
print "#words[#$indexes_ref]\n";
( $top, $indexes_ref ) = find_top_matches_hash( \#words );
print "Testing hash method: $top matches.\n";
print "#words[#$indexes_ref]\n";
my $count = 20000;
cmpthese( $count, {
'Hash' => sub{ find_top_matches_hash( \#words ); },
'Force' => sub{ find_top_matches_force( \#words ); },
} );
sub find_top_matches_hash {
my $words = shift;
my #scores;
my $columns;
my $max_col = max( map { length $_ } #{$words} ) - 1;
foreach my $col_idx ( 0 .. $max_col ) {
$columns->[$col_idx]{ substr $_, $col_idx, 1 }++
for #{$words};
}
foreach my $word ( #{$words} ) {
my $score = sum(
map{
$columns->[$_]{ substr $word, $_, 1 } - 1
} 0 .. $max_col
);
push #scores, $score;
}
my $max = max( #scores );
my ( #max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
return( $max, \#max_ixs );
}
sub find_top_matches_force {
my $words = shift;
my #scores;
foreach my $word ( #{$words} ) {
my $score;
foreach my $comp_word ( #{$words} ) {
next if $comp_word eq $word;
foreach my $pos ( 0 .. ( length $word ) - 1 ) {
$score++ if
substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
}
}
push #scores, $score;
}
my $max = max( #scores );
my ( #max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
return( $max, \#max_ixs );
}
The output is:
Testing force method: 39 matches.
APPRECIATION
Testing hash method: 39 matches.
APPRECIATION
Rate Force Hash
Force 2358/s -- -74%
Hash 9132/s 287% --
I realize your original spec changed after you saw some of the other options provided, and that's sort of the nature of innovation to a degree, but the puzzle was still alive in my mind. As you can see, my hash method is 287% faster than the original method. More fun in less time!
As a starting point, you can efficiently check how many letters they have in common with:
$count = ($word1 ^ $word2) =~ y/\0//;
But that's only useful if you loop through all possible pairs of words, something that isn't necessary in this case:
use strict;
use warnings;
my #words = qw/
BAKER
SALER
BALER
CARER
RUFFR
/;
# you want a hash to indicate which letters are present how many times in each position:
my %count;
for my $word (#words) {
my #letters = split //, $word;
$count{$_}{ $letters[$_] }++ for 0..$#letters;
}
# then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total:
my %max_common_letters_count;
my %max_common_letters_words;
for my $word (#words) {
my #letters = split //, $word;
my $total;
for my $position (0..$#letters, 'total') {
my $count;
if ( $position eq 'total' ) {
$count = $total;
}
else {
$count = $count{$position}{ $letters[$position] } - 1;
$total += $count;
}
if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) {
if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) {
push #{ $max_common_letters_words{$position} }, $word;
}
else {
$max_common_letters_count{$position} = $count;
$max_common_letters_words{$position} = [ $word ];
}
}
}
}
# then show the maximum words for each position and in total:
for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) {
printf( "Position %s had a maximum of common letters of %s in words: %s\n",
$position,
$max_common_letters_count{$position},
join(', ', #{ $max_common_letters_words{$position} })
);
}
printf( "The maximum total common letters was %s in words(s): %s\n",
$max_common_letters_count{'total'},
join(', ', #{ $max_common_letters_words{'total'} })
);
Here's a complete script. It uses the same idea that ysth mentioned (although I had it independently). Use bitwise xor to combine the strings, and then count the number of NULs in the result. As long as your strings are ASCII, that will tell you how many matching letters there were. (That comparison is case sensitive, and I'm not sure what would happen if the strings were UTF-8. Probably nothing good.)
use strict;
use warnings;
use 5.010;
use List::Util qw(max);
sub findMatches
{
my ($words) = #_;
# Compare each word to every other word:
my #matches = (0) x #$words;
for my $i (0 .. $#$words-1) {
for my $j ($i+1 .. $#$words) {
my $m = ($words->[$i] ^ $words->[$j]) =~ tr/\0//;
$matches[$i] += $m;
$matches[$j] += $m;
}
}
# Find how many matches in the best word:
my $max = max(#matches);
# Find the words with that many matches:
my #wanted = grep { $matches[$_] == $max } 0 .. $#matches;
wantarray ? #$words[#wanted] : $words->[$wanted[0]];
} # end findMatches
my #words = qw(
BAKER
SALER
BALER
CARER
RUFFR
);
say for findMatches(\#words);
Haven't touched perl in a while, so pseudo-code it is. This isn't the fastest algorithm, but it will work fine for a small amount of words.
totals = new map #e.g. an object to map :key => :value
for each word a
for each word b
next if a equals b
totals[a] = 0
for i from 1 to a.length
if a[i] == b[i]
totals[a] += 1
end
end
end
end
return totals.sort_by_key.last
Sorry about the lack of perl, but if you code this into perl, it should work like a charm.
A quick note on run-time: this will run in time number_of_words^2 * length_of_words, so on a list of 100 words, each of length 10 characters, this will run in 100,000 cycles, which is adequate for most applications.
Here's a version that relies on transposing the words in order to count the identical characters. I used the words from your original comparison, not the code.
This should work with any length words, and any length list. Output is:
Word score
---- -----
BALER 12
SALER 11
BAKER 11
CARER 10
RUFFR 4
The code:
use warnings;
use strict;
my #w = qw(BAKER SALER BALER CARER RUFFR);
my #tword = t_word(#w);
my #score;
push #score, str_count($_) for #tword;
#score = t_score(#score);
my %total;
for (0 .. $#w) {
$total{$w[$_]} = $score[$_];
}
print "Word\tscore\n";
print "----\t-----\n";
print "$_\t$total{$_}\n" for (sort { $total{$b} <=> $total{$a} } keys %total);
# transpose the words
sub t_word {
my #w = #_;
my #tword;
for my $word (#w) {
my $i = 0;
while ($word =~ s/(.)//) {
$tword[$i++] .= $1;
}
}
return #tword;
}
# turn each character into a count
sub str_count {
my $str = uc(shift);
while ( $str =~ /([A-Z])/ ) {
my $chr = $1;
my $num = () = $str =~ /$chr/g;
$num--;
$str =~ s/$chr/$num /g;
}
return $str;
}
# sum up the character counts
# while reversing the transpose
sub t_score {
my #count = #_;
my #score;
for my $num (#count) {
my $i = 0;
while( $num =~ s/(\d+) //) {
$score[$i++] += $1;
}
}
return #score;
}
Here is my attempt at an answer. This will also allow you to see each individual match if you need it. (ie. BALER matches 4 characters in BAKER). EDIT: It now catches all matches if there is a tie between words (I added "CAKER" to the list to test).
#! usr/bin/perl
use strict;
use warnings;
my #wordlist = qw( BAKER SALER BALER CARER RUFFR CAKER);
my %wordcomparison;
#foreach word, break it into letters, then compare it against all other words
#break all other words into letters and loop through the letters (both words have same amount), adding to the count of matched characters each time there's a match
foreach my $word (#wordlist) {
my #letters = split(//, $word);
foreach my $otherword (#wordlist) {
my $count;
next if $otherword eq $word;
my #otherwordletters = split (//, $otherword);
foreach my $i (0..$#letters) {
$count++ if ( $letters[$i] eq $otherwordletters[$i] );
}
$wordcomparison{"$word"}{"$otherword"} = $count;
}
}
# sort (unnecessary) and loop through the keys of the hash (words in your list)
# foreach key, loop through the other words it compares with
#Add a new key: total, and sum up all the matched characters.
foreach my $word (sort keys %wordcomparison) {
foreach ( sort keys %{ $wordcomparison{$word} }) {
$wordcomparison{$word}{total} += $wordcomparison{$word}{$_};
}
}
#Want $word with highest total
my #max_match = (sort { $wordcomparison{$b}{total} <=> $wordcomparison{$a}{total} } keys %wordcomparison );
#This is to get all if there is a tie:
my $maximum = $max_match[0];
foreach (#max_match) {
print "$_\n" if ($wordcomparison{$_}{total} >= $wordcomparison{$maximum}{total} )
}
The output is simply: CAKER BALER and BAKER.
The hash %wordcomparison looks like:
'SALER'
{
'RUFFR' => 1,
'BALER' => 4,
'BAKER' => 3,
'total' => 11,
'CARER' => 3
};
You can do this, using a dirty regex trick to execute code if a letter matches in its place, but not otherwise, thankfully it's quite easy to build the regexes as you go:
An example regular expression is:
(?:(C(?{ $c++ }))|.)(?:(A(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)(?:(E(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)
This may or may not be fast.
use 5.12.0;
use warnings;
use re 'eval';
my #words = qw(BAKER SALER BALER CARER RUFFR);
my ($best, $count) = ('', 0);
foreach my $word (#words) {
our $c = 0;
foreach my $candidate (#words) {
next if $word eq $candidate;
my $regex_str = join('', map {"(?:($_(?{ \$c++ }))|.)"} split '', $word);
my $regex = qr/^$regex_str$/;
$candidate =~ $regex or die "did not match!";
}
say "$word $c";
if ($c > $count) {
$best = $word;
$count = $c;
}
}
say "Matching: first best: $best";
Using xor trick will be fast but assumes a lot about the range of characters you might encounter. There are many ways in which utf-8 will break with that case.
Many thanks to all the contributers! You've certainly shown me that I still have a lot to learn, but you have also helped me tremendously in working out my own answer. I'm just putting it here for reference and possible feedback, since there are probably better ways of doing it. To me this was the simplest and most straight forward approach I could find on my own. Enjøy! :)
#!/usr/bin/perl
use strict;
use warnings;
# a list of words for testing
my #list = qw(
BAKER
SALER
BALER
CARER
RUFFR
);
# populate two dimensional array with the list,
# so we can compare each letter with the other letters on the same row more easily
my $list_length = #list;
my #words;
for (my $i = 0; $i < $list_length; $i++) {
my #letters = split(//, $list[$i]);
my $letters_length = #letters;
for (my $j = 0; $j < $letters_length; $j++) {
$words[$i][$j] = $letters[$j];
}
}
# this gives a two-dimensionla array:
#
# #words = ( ["B", "A", "K", "E", "R"],
# ["S", "A", "L", "E", "R"],
# ["B", "A", "L", "E", "R"],
# ["C", "A", "R", "E", "R"],
# ["R", "U", "F", "F", "R"],
# );
# now, on to find the word with most letters in common with the other on the same row
# add up the score for each letter in each word
my $word_length = #words;
my #letter_score;
for my $i (0 .. $#words) {
for my $j (0 .. $#{$words[$i]}) {
for (my $k = 0; $k < $word_length; $k++) {
if ($words[$i][$j] eq $words[$k][$j]) {
$letter_score[$i][$j] += 1;
}
}
# we only want to add in matches outside the one we're testing, therefore
$letter_score[$i][$j] -= 1;
}
}
# sum each score up
my #scores;
for my $i (0 .. $#letter_score ) {
for my $j (0 .. $#{$letter_score[$i]}) {
$scores[$i] += $letter_score[$i][$j];
}
}
# find the highest score
my $max = $scores[0];
foreach my $i (#scores[1 .. $#scores]) {
if ($i > $max) {
$max = $i;
}
}
# and print it all out :D
for my $i (0 .. $#letter_score ) {
print "$list[$i]: $scores[$i]";
if ($scores[$i] == $max) {
print " <- best";
}
print "\n";
}
When run, the script yields the following:
BAKER: 11
SALER: 11
BALER: 12 <- best
CARER: 10
RUFFR: 4
I'm not sure exactly how to explain this, so I'll just start with an example.
Given the following data:
Apple
Apricot
Blackberry
Blueberry
Cherry
Crabapple
Cranberry
Elderberry
Grapefruit
Grapes
Kiwi
Mulberry
Nectarine
Pawpaw
Peach
Pear
Plum
Raspberry
Rhubarb
Strawberry
I want to generate an index based on the first letter of my data, but I want the letters grouped together.
Here is the frequency of the first letters in the above dataset:
2 A
2 B
3 C
1 E
2 G
1 K
1 M
1 N
4 P
2 R
1 S
Since my example data set is small, let's just say that the maximum number to combine the letters together is 3. Using the data above, this is what my index would come out to be:
A B C D-G H-O P Q-Z
Clicking the "D-G" link would show:
Elderberry
Grapefruit
Grapes
In my range listing above, I am covering the full alphabet - I guess that is not completely neccessary - I would be fine with this output as well:
A B C E-G K-N P R-S
Obviously my dataset is not fruit, I will have more data (around 1000-2000 items), and my "maximum per range" will be more than 3.
I am not too worried about lopsided data either - so if I 40% of my data starts with an "S", then S will just have its own link - I don't need to break it down by the second letter in the data.
Since my dataset won't change too often, I would be fine with a static "maximum per range", but it would be nice to have that calculated dynamically too. Also, the dataset will not start with numbers - it is guaranteed to start with a letter from A-Z.
I've started building the algorithm for this, but it keeps getting so messy I start over. I don't know how to search google for this - I'm not sure what this method is called.
Here is what I started with:
#!/usr/bin/perl
use strict;
use warnings;
my $index_frequency = { map { ( $_, 0 ) } ( 'A' .. 'Z' ) };
my $ranges = {};
open( $DATASET, '<', 'mydata' ) || die "Cannot open data file: $!\n";
while ( my $item = <$DATASET> ) {
chomp($item);
my $first_letter = uc( substr( $item, 0, 1 ) );
$index_frequency->{$first_letter}++;
}
foreach my $letter ( sort keys %{$index_frequency} ) {
if ( $index_frequency->{$letter} ) {
# build $ranges here
}
}
My problem is that I keep using a bunch of global variables to keep track of counts and previous letters examined - my code gets very messy very fast.
Can someone give me a step in the right direction? I guess this is more of an algorithm question, so if you don't have a way to do this in Perl, pseudo code would work too, I guess - I can convert it to Perl.
Thanks in advance!
Basic approach:
#!/usr/bin/perl -w
use strict;
use autodie;
my $PAGE_SIZE = 3;
my %frequencies;
open my $fh, '<', 'data';
while ( my $l = <$fh> ) {
next unless $l =~ m{\A([a-z])}i;
$frequencies{ uc $1 }++;
}
close $fh;
my $current_sum = 0;
my #letters = ();
my #pages = ();
for my $letter ( "A" .. "Z" ) {
my $letter_weigth = ( $frequencies{ $letter } || 0 );
if ( $letter_weigth + $current_sum > $PAGE_SIZE ) {
if ( $current_sum ) {
my $title = $letters[ 0 ];
$title .= '-' . $letters[ -1 ] if 1 < scalar #letters;
push #pages, $title;
}
$current_sum = $letter_weigth;
#letters = ( $letter );
next;
}
push #letters, $letter;
$current_sum += $letter_weigth;
}
if ( $current_sum ) {
my $title = $letters[ 0 ];
$title .= '-' . $letters[ -1 ] if 1 < scalar #letters;
push #pages, $title;
}
print "Pages : " . join( " , ", #pages ) . "\n";
Problem with it is that it outputs (from your data):
Pages : A , B , C-D , E-J , K-O , P , Q-Z
But I would argue this is actually good approach :) And you can always change the for loop into:
for my $letter ( sort keys %frequencies ) {
if you need.
Here's my suggestion:
# get the number of instances of each letter
my %count = ();
while (<FILE>)
{
$count{ uc( substr( $_, 0, 1 ) ) }++;
}
# transform the list of counts into a map of count => letters
my %freq = ();
while (my ($letter, $count) = each %count)
{
push #{ $freq{ $count } }, $letter;
}
# now print out the list of letters for each count (or do other appropriate
# output)
foreach (sort keys %freq)
{
my #sorted_letters = sort #{ $freq{$_} };
print "$_: #sorted_letters\n";
}
Update: I think that I misunderstood your requirements. The following code block does something more like what you want.
my %count = ();
while (<FILE>)
{
$count{ uc( substr( $_, 0, 1 ) ) }++;
}
# get the maximum frequency
my $max_freq = (sort values %count)[-1];
my $curr_set_count = 0;
my #curr_set = ();
foreach ('A' .. 'Z') {
push #curr_set, $_;
$curr_set_count += $count{$_};
if ($curr_set_count >= $max_freq) {
# print out the range of the current set, then clear the set
if (#curr_set > 1)
print "$curr_set[0] - $curr_set[-1]\n";
else
print "$_\n";
#curr_set = ();
$curr_set_count = 0;
}
}
# print any trailing letters from the end of the alphabet
if (#curr_set > 1)
print "$curr_set[0] - $curr_set[-1]\n";
else
print "$_\n";
Try something like that, where frequency is the frequency array you computed at the previous step and threshold_low is the minimal number of entries in a range, and threshold_high is the max. number. This should give harmonious results.
count=0
threshold_low=3
threshold_high=6
inrange=false
frequency['Z'+1]=threshold_high+1
for letter in range('A' to 'Z'):
count += frequency[letter];
if (count>=threshold_low or count+frequency[letter+1]>threshold_high):
if (inrange): print rangeStart+'-'
print letter+' '
inrange=false
count=0
else:
if (not inrange) rangeStart=letter
inrange=true
use strict;
use warnings;
use List::Util qw(sum);
my #letters = ('A' .. 'Z');
my #raw_data = qw(
Apple Apricot Blackberry Blueberry Cherry Crabapple Cranberry
Elderberry Grapefruit Grapes Kiwi Mulberry Nectarine
Pawpaw Peach Pear Plum Raspberry Rhubarb Strawberry
);
# Store the data by starting letter.
my %data;
push #{$data{ substr $_, 0, 1 }}, $_ for #raw_data;
# Set max page size dynamically, based on the average
# letter-group size (in this case, a multiple of it).
my $MAX_SIZE = sum(map { scalar #$_ } values %data) / keys %data;
$MAX_SIZE = int(1.5 * $MAX_SIZE + .5);
# Organize the data into pages. Each page is an array reference,
# with the first element being the letter range.
my #pages = (['']);
for my $letter (#letters){
my #d = exists $data{$letter} ? #{$data{$letter}} : ();
if (#{$pages[-1]} - 1 < $MAX_SIZE or #d == 0){
push #{$pages[-1]}, #d;
$pages[-1][0] .= $letter;
}
else {
push #pages, [ $letter, #d ];
}
}
$_->[0] =~ s/^(.).*(.)$/$1-$2/ for #pages; # Convert letters to range.
This is an example of how I would write this program.
#! /opt/perl/bin/perl
use strict;
use warnings;
my %frequency;
{
use autodie;
open my $data_file, '<', 'datafile';
while( my $line = <$data_file> ){
my $first_letter = uc( substr( $line, 0, 1 ) );
$frequency{$first_letter} ++
}
# $data_file is automatically closed here
}
#use Util::Any qw'sum';
use List::Util qw'sum';
# This is just an example of how to calculate a threshold
my $mean = sum( values %frequency ) / scalar values %frequency;
my $threshold = $mean * 2;
my #index;
my #group;
for my $letter ( sort keys %frequency ){
my $frequency = $frequency{$letter};
if( $frequency >= $threshold ){
if( #group ){
if( #group == 1 ){
push #index, #group;
}else{
# push #index, [#group]; # copy #group
push #index, "$group[0]-$group[-1]";
}
#group = ();
}
push #index, $letter;
}elsif( sum( #frequency{#group,$letter} ) >= $threshold ){
if( #group == 1 ){
push #index, #group;
}else{
#push #index, [#group];
push #index, "$group[0]-$group[-1]"
}
#group = ($letter);
}else{
push #group, $letter;
}
}
#push #index, [#group] if #group;
push #index, "$group[0]-$group[-1]" if #group;
print join( ', ', #index ), "\n";
I recently "needed" a zip function in Perl 5 (while I was thinking about How do I calculate relative time?), i.e. a function that takes two lists and "zips" them together to one list, interleaving the elements.
(Pseudo)example:
#a=(1, 2, 3);
#b=('apple', 'orange', 'grape');
zip #a, #b; # (1, 'apple', 2, 'orange', 3, 'grape');
Haskell has zip in the Prelude and Perl 6 has a zip operator built in, but how do you do it in an elegant way in Perl 5?
Assuming you have exactly two lists and they are exactly the same length, here is a solution originally by merlyn (Randal Schwartz), who called it perversely perlish:
sub zip2 {
my $p = #_ / 2;
return #_[ map { $_, $_ + $p } 0 .. $p - 1 ];
}
What happens here is that for a 10-element list, first, we find the pivot point in the middle, in this case 5, and save it in $p. Then we make a list of indices up to that point, in this case 0 1 2 3 4. Next we use map to pair each index with another index that’s at the same distance from the pivot point as the first index is from the start, giving us (in this case) 0 5 1 6 2 7 3 8 4 9. Then we take a slice from #_ using that as the list of indices. This means that if 'a', 'b', 'c', 1, 2, 3 is passed to zip2, it will return that list rearranged into 'a', 1, 'b', 2, 'c', 3.
This can be written in a single expression along ysth’s lines like so:
sub zip2 { #_[map { $_, $_ + #_/2 } 0..(#_/2 - 1)] }
Whether you’d want to use either variation depends on whether you can see yourself remembering how they work, but for me, it was a mind expander.
The List::MoreUtils module has a zip/mesh function that should do the trick:
use List::MoreUtils qw(zip);
my #numbers = (1, 2, 3);
my #fruit = ('apple', 'orange', 'grape');
my #zipped = zip #numbers, #fruit;
Here is the source of the mesh function:
sub mesh (\#\#;\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#) {
my $max = -1;
$max < $#$_ && ($max = $#$_) for #_;
map { my $ix = $_; map $_->[$ix], #_; } 0..$max;
}
I find the following solution straightforward and easy to read:
#a = (1, 2, 3);
#b = ('apple', 'orange', 'grape');
#zipped = map {($a[$_], $b[$_])} (0 .. $#a);
I believe it's also faster than solutions that create the array in a wrong order first and then use slice to reorder, or solutions that modify #a and #b.
For arrays of the same length:
my #zipped = ( #a, #b )[ map { $_, $_ + #a } ( 0 .. $#a ) ];
my #l1 = qw/1 2 3/;
my #l2 = qw/7 8 9/;
my #out;
push #out, shift #l1, shift #l2 while ( #l1 || #l2 );
If the lists are a different length, this will put 'undef' in the extra slots but you can easily remedy this if you don't wish to do this. Something like ( #l1[0] && shift #l1 ) would do it.
Hope this helps!
Algorithm::Loops is really nice if you do much of this kind of thing.
My own code:
sub zip { #_[map $_&1 ? $_>>1 : ($_>>1)+($#_>>1), 1..#_] }
This is totally not an elegant solution, nor is it the best solution by any stretch of the imagination. But it's fun!
package zip;
sub TIEARRAY {
my ($class, #self) = #_;
bless \#self, $class;
}
sub FETCH {
my ($self, $index) = #_;
$self->[$index % #$self][$index / #$self];
}
sub STORE {
my ($self, $index, $value) = #_;
$self->[$index % #$self][$index / #$self] = $value;
}
sub FETCHSIZE {
my ($self) = #_;
my $size = 0;
#$_ > $size and $size = #$_ for #$self;
$size * #$self;
}
sub CLEAR {
my ($self) = #_;
#$_ = () for #$self;
}
package main;
my #a = qw(a b c d e f g);
my #b = 1 .. 7;
tie my #c, zip => \#a, \#b;
print "#c\n"; # ==> a 1 b 2 c 3 d 4 e 5 f 6 g 7
How to handle STORESIZE/PUSH/POP/SHIFT/UNSHIFT/SPLICE is an exercise left to the reader.