Perl: Dice roll "fix" to translate 2-8, 5-8, to 2d4, 1d4+3 - perl

I am trying to do some importing of data that uses different dice roll text. For example you'll have 2-4, 2-12, 5-8, 2-5. I am trying to write a translator that will convert them to proper dice rolls. What I have so far will translate it but I end up with some odd rolls. For example on the 5-8 it comes up with 5d1+3. With other things like 2-8 it will give me a reasonable 2d4. I'm trying to figure out how to make it ALWAYS use "real" dice.
Currently I am using this perl script. You simply pass it the dice string you want to fix and it returns what it thinks is the right roll. My problem is I can't think of how to limit the dice "sides" to only these 2,3,4,6,8,12,20 sides.
# turn 2-7 into 1d6+1 or 2-8 into 2d4
sub fix_DiceRolls {
my($diceRoll) = #_;
use POSIX;
if ($diceRoll =~ /^(\d+)(\-)(\d+)\b/i) {
# 2-5
#Group 1. 0-1 `2`
#Group 2. 1-2 `-`
#Group 3. 2-3 `5`
my($count) = $1;
my($size) = $3;
if ($count == 1) {
$diceRoll = "$1d$3";
} else {
my ($newSize) = $size/$count;
my ($remainder) = $size % $count;
my ($round_remainder) = ceil($remainder);
my ($round_newSize) = floor($newSize);
if ($remainder == 0) {
$diceRoll = $count."d".$newSize;
} else {
$diceRoll = $count."d".$round_newSize."+".$round_remainder;
}
}
}
return $diceRoll;
}

The following might help you. It doesn't know how to do 1d6/2, but it correctly translates 4-19 to 3d6+1 | 5d4-1.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #valid_dice = (4, 6, 8, 10, 12, 20);
sub fix_dice_roll {
my ($dice_roll) = #_;
my ($from, $to) = split /-/, $dice_roll;
my $range = $to - $from + 1;
my #translations;
for my $dice (#valid_dice) {
if (0 == ($range - 1) % ($dice - 1)) {
my $times = ($range - 1) / ($dice - 1);
my $plus = sprintf '%+d', $to - $times * $dice;
$plus = q() if '+0' eq $plus;
push #translations, [ $dice, $times, $plus ];
}
}
#translations = sort { $a->[1] <=> $b->[1] } #translations;
return map "$_->[1]d$_->[0]$_->[2]", #translations;
}
say join ' | ', fix_dice_roll($_) while <>;

Related

input a number and output the Fibonacci number recursively Perl

For a given value N I am trying to output the corresponding Fibonacci number F(N). My script doesn't seem to enter the recursive stage. fibonnaci($number) is not calling the subroutine. It is simply outputing "fibonacci(whatever number is inputted)".
Here is my code:
#!/usr/bin/perl -w
use warnings;
use strict;
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
sub fibonacci
{
my $f;
if ( $number == 0 ) { # base case
$f = 0;
} elsif ( $number == 1 ) {
$f = 1;
} else { # recursive step
$f = fibonacci( $number - 1 ) + fibonacci( $number - 2 );
}
return $f;
}
print "\nf($number) = fibonacci($number)\n";
Sample Output:
Please enter value of N: 4
f(4) = fibonacci(4)
user1:~>recursiveFib.pl
Please enter value of N: 5
f(5) = fibonacci(5)
user1:~>recursiveFib.pl
Please enter value of N: 10
f(10) = fibonacci(10)
user1:~>
Not sure where I went wrong. Any help would be greatly appreciated.
You need to accept the function arguments properly and take the function call out of the quotes.
use warnings;
use strict;
sub fibonacci {
my ($number) = #_;
if ($number < 2) { # base case
return $number;
}
return fibonacci($number-1) + fibonacci($number-2);
}
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
print "\n$number: ", fibonacci($number), "\n";
A more efficient but still recursive version:
sub fib_r {
my ($n,$a,$b) = #_;
if ($n <= 0) { return $a; }
else { return fib_r($n-1, $b, $a+$b); }
}
sub fib { fib_r($_[0], 0, 1); } # pass initial values of a and b
print fib(10), "\n";
Other answers have already mentioned the lack of taking an argument correctly to the fibonacci function, and that you can't interpolate a function call in the print string like that. Lately my favourite way to interpolate function calls into print strings is to use the ${\ ... } notation for embedding arbitrary expressions into strings:
print "f($number) = ${\ fibonacci($number) }\n";
Other techniques include separate arguments:
print "f($number) = ", fibonacci($number), "\n";
or a helper variable:
my $result = fibonacci($number);
print "f($number) = $result\n";
or even printf:
printf "f(%d) = %d\n", $number, fibonacci($number);
Of all these techniques I tend to prefer either of the first two, because they lead to putting the expressions "in-line" with the rest of the text string, whereas in the latter two they sit elsewhere, making it harder to see at a glance what gets printed where. Especially with printf's positional arguments, it can be easy to be "off-by-one" with a large number of arguments, and put everything in the wrong place.
You are printing in wrong way. you just need to handle the return value. Also the way you are using Number in the sub is also not seems relevant. I have updated the and its working fine.
Also the values that you wanted to print is depend on the start up of the series. whether you want to start from 0 or 1.
The series example start with 1 is 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, so if you put 10 you will get 55.
#!/usr/bin/perl -w
use warnings;
use strict;
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
my $result=fibonacci($number);
sub fibonacci
{
my $f =0;
if ($_[0] == 1 ) { # base case
$f = 1;
} elsif ( $_[0] == 2 ) {
$f = 1;
} else { # recursive step
$f= fibonacci( $_[0] - 1 ) + fibonacci( $_[0] - 2 );
}
return $f;
}
print "\nf($number) = $result\n";

Perl to count current value based on next value

Currently I'm learning Perl and gnuplot. I would like to know how to count certain value based on the next value. For example I have a text file consist of:
#ID(X) Y
1 1
3 9
5 11
The output should show the value of the unknown ID as well. So, the output should show:
#ID(X) Y
1 1
2 5
3 9
4 10
5 11
The Y of ID#2 is based on the following:
((2-3)/(1-3))*1 + ((2-1)/(3-1))*9 which is linear algebra
Y2=((X2-X3)/(X1-X3))*Y1 + ((X2-X1)/(X3-X1)) * Y3
Same goes to ID#5
Currently I have this code,
#! /usr/bin/perl -w
use strict;
my $prev_id = 0;
my $prev_val = 0;
my $next_id;
my $next_val;
while (<>)
{
my ($id, $val) = split;
for (my $i = $prev_id + 1; $i < $next_id; $i++)
{
$val = (($id - $next_id) / ($prev_id - $next_id)) * $prev_val + (($id - $prev_id) / ($next_id - $prev_id)) * $next_val;
printf ("%d %s\n", $i, $val);
}
printf ("%d %s\n", $id, $val);
($prev_val, $prev_id) = ($val, $id);
($next_val, $next_id) = ($prev_val, $prev_id);
}
Your formula seems more complicated than I would expect, given that you are always dealing with integer spacings of 1.
You did not say whether you want to fill gaps for multiple consecutive missing values, but let's assume you want to.
What you do is read in the first line, and say that's the current one and you output it. Now you read the next line, and if its ID is not the expected one, you fill the gaps with simple linear interpolation...
Pseudocode
(currID, currY) = readline()
outputvals( currID, currY )
while lines remain do
(nextID, nextY) = readline()
gap = nextID - currID
for i = 1 to gap
id = currID + i
y = currY + (nextY - currY) * i / gap
outputvals( id, y )
end
(currID, currY) = (nextID, nextY)
end
Sorry for the non-Perl code. It's just that I haven't been using Perl for ages, and can't remember half of the syntax. =) The concepts here are pretty easy to translate into code though.
Using an array may be the way to go. This will also make your data available for further manipulation.
** Caveat: will not work for multiple consecutive missing values of y; see #paddy's answer.
#!/usr/bin/perl
use strict;
use warnings;
my #coordinates;
while (<DATA>) {
my ($x, $y) = split;
$coordinates[$x] = $y;
}
# note that the for loop starts on index 1 here ...
for my $x (1 .. $#coordinates) {
if (! $coordinates[$x]) {
$coordinates[$x] = (($x - ($x + 1)) / (($x - 1) - ($x + 1)))
* $coordinates[$x - 1]
+ (($x - ($x - 1)) / (($x + 1) - ($x - 1)))
* $coordinates[$x + 1];
}
print "$x - $coordinates[$x]\n";
}
__DATA__
1 1
3 9
5 11
You indicated your problem is getting the next value. The key isn't to look ahead, it's to look behind.
my $prev = get first value;
my ($prev_a, $prev_b) = parse($prev);
my $this = get second value;
my ($this_a, $this_b) = parse($this);
while ($next = get next value) {
my ($next_a, $next_b) = parse($next);
...
$prev = $this; $prev_a = $this_a; $prev_b = $this_b;
$this = $next; $this_a = $next_a; $this_b = $next_b;
}
#! /usr/bin/perl -w
use strict;
my #in = (1,9,11);
my #out;
for (my $i = 0; $i<$#in; $i++) {
my $j = $i*2;
my $X1 = $i;
my $X2 = $i+1;
my $X3 = $i+2;
my $Y1 = $in[$i];
my $Y3 = $in[$i+1];
my $Y2 = $Y1*(($X2-$X3)/($X1-$X3))
+ $Y3*(($X2-$X1)/($X3-$X1));
$out[$j] = $in[$i];
$out[$j+1] = $Y2;
}
$out[$#in*2] = $in[$#in];
print (join " ",#out);

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100

find extra, missing, invalid strings when comparing two lists in perl

List-1 List-2
one one
two three
three three
four four
five six
six seven
eight eighttt
nine nine
Looking to output
one | one PASS
two | * FAIL MISSING
three | three PASS
* | three FAIL EXTRA
four | four PASS
five | * FAIL MISSING
six | six PASS
* | seven FAIL EXTRA
eight | eighttt FAIL INVALID
nine | nine PASS
Actually the return from my current solution is a reference to the two modified lists and a reference to a "fail" list describing the failure for the index as either "no fail", "missing", "extra", or "invalid" which is also (obviously) fine output.
My current solution is:
sub compare {
local $thisfound = shift;
local $thatfound = shift;
local #thisorig = #{ $thisfound };
local #thatorig = #{ $thatfound };
local $best = 9999;
foreach $n (1..6) {
local $diff = 0;
local #thisfound = #thisorig;
local #thatfound = #thatorig;
local #fail = ();
for (local $i=0;$i<scalar(#thisfound) || $i<scalar(#thatfound);$i++) {
if($thisfound[$i] eq $thatfound[$i]) {
$fail[$i] = 'NO_FAIL';
next;
}
if($n == 1) { # 1 2 3
next unless __compare_missing__();
next unless __compare_extra__();
next unless __compare_invalid__();
} elsif($n == 2) { # 1 3 2
next unless __compare_missing__();
next unless __compare_invalid__();
next unless __compare_extra__();
} elsif($n == 3) { # 2 1 3
next unless __compare_extra__();
next unless __compare_missing__();
next unless __compare_invalid__();
} elsif($n == 4) { # 2 3 1
next unless __compare_extra__();
next unless __compare_invalid__();
next unless __compare_missing__();
} elsif($n == 5) { # 3 1 2
next unless __compare_invalid__();
next unless __compare_missing__();
next unless __compare_extra__();
} elsif($n == 6) { # 3 2 1
next unless __compare_invalid__();
next unless __compare_extra__();
next unless __compare_missing__();
}
push #fail,'INVALID';
$diff += 1;
}
if ($diff<$best) {
$best = $diff;
#thisbest = #thisfound;
#thatbest = #thatfound;
#failbest = #fail;
}
}
return (\#thisbest,\#thatbest,\#failbest)
}
sub __compare_missing__ {
my $j;
### Does that command match a later this command? ###
### If so most likely a MISSING command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'MISSING'); }
#end = #thatfound[$i..$#thatfound];
#thatfound = #thatfound[0..$i-1];
for ($i..$j-1) { push(#thatfound,'*'); }
push(#thatfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
sub __compare_extra__ {
my $j;
### Does this command match a later that command? ###
### If so, most likely an EXTRA command ###
for($j=$i+1;$j<scalar(#thatfound);$j++) {
if($thatfound[$j] eq $thisfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'EXTRA'); }
#end = #thisfound[$i..$#thisfound];
#thisfound = #thisfound[0..$i-1];
for ($i..$j-1) { push (#thisfound,'*'); }
push(#thisfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thatfound);
}
sub __compare_invalid__ {
my $j;
### Do later commands match? ###
### If so most likely an INVALID command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$j]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'INVALID'); }
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
But this isn't perfect ... who wants to simplify and improve? Specifically ... within a single data set, one order of searching is better for a subset and another order is better for a different subset.
If the arrays contain duplicate values, the answer is quite a bit more complicated than that.
See e.g. Algorithm::Diff or read about Levenshtein distance.
From perlfaq4's answer to How can I tell whether a certain element is contained in a list or array?:
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
sub compare {
local #d = ();
my $this = shift;
my $that = shift;
my $distance = _levenshteindistance($this, $that);
my #thisorig = #{ $this };
my #thatorig = #{ $that };
my $s = $#thisorig;
my $t = $#thatorig;
#this = ();
#that = ();
#fail = ();
while($s>0 || $t>0) {
# deletion, insertion, substitution
my $min = _minimum($d[$s-1][$t],$d[$s][$t-1],$d[$s-1][$t-1]);
if($min == $d[$s-1][$t-1]) {
unshift(#this,$thisorig[$s]);
unshift(#that,$thatorig[$t]);
if($d[$s][$t] > $d[$s-1][$t-1]) {
unshift(#fail,'INVALID');
} else {
unshift(#fail,'NO_FAIL');
}
$s -= 1;
$t -= 1;
} elsif($min == $d[$s][$t-1]) {
unshift(#this,'*');
unshift(#that,$thatorig[$t]);
unshift(#fail,'EXTRA');
$t -= 1;
} elsif($min == $d[$s-1][$t]) {
unshift(#this,$thisorig[$s]);
unshift(#that,'*');
unshift(#fail,'MISSING');
$s -= 1;
} else {
die("Error! $!");
}
}
return(\#this, \#that, \#fail);
}
sub _minimum {
my $ret = 2**53;
foreach $in (#_) {
$ret = $ret < $in ? $ret : $in;
}
$ret;
}
sub _levenshteindistance {
my $s = shift;
my $t = shift;
my #s = #{ $s };
my #t = #{ $t };
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i] = ();
}
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i][0] = $i # deletion
}
for(my $j=0;$j<scalar(#t);$j++) {
$d[0][$j] = $j # insertion
}
for(my $j=1;$j<scalar(#t);$j++) {
for(my $i=1;$i<scalar(#s);$i++) {
if ($s[$i] eq $t[$j]) {
$d[$i][$j] = $d[$i-1][$j-1];
} else {
# deletion, insertion, substitution
$d[$i][$j] = _minimum($d[$i-1][$j]+1,$d[$i][$j-1]+1,$d[$i-1][$j-1]+1);
}
}
}
foreach $a (#d) {
#a = #{ $a };
foreach $b (#a) {
printf STDERR "%2d ",$b;
}
print STDERR "\n";
}
return $d[$#s][$#t];
}
The trick in Perl (and similar languages) is the hash, which doesn't care about order.
Suppose that the first array is the one that hold the valid elements. Construct a hash with those values as keys:
my #valid = qw( one two ... );
my %valid = map { $_, 1 } #valid;
Now, to find the invalid elements, you just have to find the ones not in the %valid hash:
my #invalid = grep { ! exists $valid{$_} } #array;
If you want to know the array indices of the invalid elements:
my #invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
Now, you can expand that to find the repeated elements too. Not only do you check the %valid hash, but also keep track of what you have already seen:
my %Seen;
my #invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
The repeated valid elements are the ones with a value in %Seen that is greater than 1:
my #repeated_valid = grep { $Seen{$_} > 1 } #valid;
To find the missing elements, you look in %Seen to check what isn't in there.
my #missing = grep { ! $Seen{$_ } } #valid;
From perlfaq4's answer to How do I compute the difference of two arrays? How do I compute the intersection of two arrays?:
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements in either A or in B but not in both. Think of it as an xor operation.

How can I convert a number to its multiple form in Perl?

Do you know an easy and straight-forward method/sub/module which allows me to convert a number (say 1234567.89) to an easily readable form - something like 1.23M?
Right now I can do this by making several comparisons, but I'm not happy with my method:
if($bytes > 1000000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000000 )). " Gb/s";
}
elsif ($bytes > 1000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000 )). " Mb/s";
}
elsif ($bytes > 1000){
$bytes = ( sprintf( "%0.2f", $bytes/1000 )). " Kb/s";
}
else{
$bytes = sprintf( "%0.2f", $bytes ). "b/s";
}
Thank you for your help!
The Number::Bytes::Human module should be able to help you out.
An example of how to use it can be found in its synopsis:
use Number::Bytes::Human qw(format_bytes);
$size = format_bytes(0); # '0'
$size = format_bytes(2*1024); # '2.0K'
$size = format_bytes(1_234_890, bs => 1000); # '1.3M'
$size = format_bytes(1E9, bs => 1000); # '1.0G'
# the OO way
$human = Number::Bytes::Human->new(bs => 1000, si => 1);
$size = $human->format(1E7); # '10MB'
$human->set_options(zero => '-');
$size = $human->format(0); # '-'
Number::Bytes::Human seems to do exactly what you want.
sub magnitudeformat {
my $val = shift;
my $expstr;
my $exp = log($val) / log(10);
if ($exp < 3) { return $val; }
elsif ($exp < 6) { $exp = 3; $expstr = "K"; }
elsif ($exp < 9) { $exp = 6; $expstr = "M"; }
elsif ($exp < 12) { $exp = 9; $expstr = "G"; } # Or "B".
else { $exp = 12; $expstr = "T"; }
return sprintf("%0.1f%s", $val/(10**$exp), $expstr);
}
In pure Perl form, I've done this with a nested ternary operator to cut on verbosity:
sub BytesToReadableString($) {
my $c = shift;
$c >= 1073741824 ? sprintf("%0.2fGB", $c/1073741824)
: $c >= 1048576 ? sprintf("%0.2fMB", $c/1048576)
: $c >= 1024 ? sprintf("%0.2fKB", $c/1024)
: $c . "bytes";
}
print BytesToReadableString(225939) . "/s\n";
Outputs:
220.64KB/s
This snippet is in PHP, and it's loosely based on some example someone else had on their website somewhere (sorry buddy, I can't remember).
The basic concept is instead of using if, use a loop.
function formatNumberThousands($a,$dig)
{
$unim = array("","k","m","g");
$c = 0;
while ($a>=1000 && $c<=3) {
$c++;
$a = $a/1000;
}
$d = $dig-ceil(log10($a));
return number_format($a,($c ? $d : 0))."".$unim[$c];
}
The number_format() call is a PHP library function which returns a string with commas between the thousands groups. I'm not sure if something like it exists in perl.
The $dig parameter sets a limit on the number of digits to show. If $dig is 2, it will give you 1.2k from 1237.
To format bytes, just divide by 1024 instead.
This function is in use in some production code to this day.