Vector binning algorithm in Perl - perl

I have the following vector:
19.01
20.2572347267
16.4893617021
19.0981432361
36.3636363636
20.41
It's actually much longer, but that doesn't matter. I need an algorithm to bin these values into a hash. The hash keys must be floating point values that start from the minimum value + 1 (in this case 17.48...) and increase by 1. The values of the hash must be the number of elements that fall into the corresponding bin, i.e. the end result should be:
$hash{17.49}=1
$hash{18.49}=0
$hash{19.49}=2
$hash{20.49}=2
$hash{21.49}=0
$hash{22.49}=0
.
.
.
$hash{35.49}=0
$hash{37.49}=1
Please help guys.

This seems to work:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use List::Util qw{ min };
my #vector = qw( 19.01
20.2572347267
16.4893617021
19.0981432361
36.3636363636
20.41
);
my %hash;
my $min = min(#vector);
for my $n (#vector) {
my $diff = $n - $min;
++$hash{ 1 + $min + int $diff };
}
print Dumper \%hash;
If you need the zeroes as well, just add the follwoing before the loop:
my $max = max(#vector);
my $i = $min;
while ($i <= $max) {
$hash{$i++} = 0;
}
(And include max in the use clause, too.)

Came up with a sweet solution, hopefully somebody else will also find it helpful.
use POSIX;
sub frac { $_[0]-floor($_[0]) } #saw this little function posted somewhere, quddos to the guy who came up with it
for (my $x = ${min_value} + 1; $x <= ${max_value} + 1; $x += 1) # if you don't need the zeroes, remove this loop
{
$bins{$x} = 0;
}
foreach my $n (#array)
{
$bins{floor($n+1)+frac($min_value)}++;
}
floor() or ceil() (and use POSIX;) should be used instead of int(), because int() can produce erenous results - 278 may be internally stored as 277.99999999997899999 (for example), so int(278) turns out equal to 277, which may mess up your computation. Read this somewhere, but can't find the link...

Related

In array how to check the maximum size of the list elements using perl

my #str = qw /(I) (II) (XIII) (IV)/;
my #max = map { my $len=0; if(length($_)>$len){ $len = length($_);} } #str;
print $max;
output is (XIII) and length is 5;
Could you please correct me where I have doing wrong in the above code.
To answer the question in the title, the cleaner option is to delegate this to List::Util::max:
use List::Util 'max';
print max map { length } #str;
The problem with the snippet in the question is that $len is local to each iteration of the map block, so effectively it gets reset to zero six times. The end result of this is that $len will correspond to the length of the last item in #str, which is why it doesn't do what you expect of it. You could do
my $len=0;
my #max = map { $len = length if $len < length } #str;
print $len;
but that's just lousy if you aren't meaningfully using #max for anything else.
I think you want to find the string length of the longest element in the array #str.
Your problem is that map is not doing what you think. It iterates the elements of #str and applies a modification to each of them. It returns a list of the modified elements. You are assigning that list to a scalar variable $max. A list in scalar context in Perl will be converted to the number of elements in that list, so you get 4.
What you want is a for loop.
my #str = qw /(I) (II) (XIII) (IV)/;
my $max = 0;
for (#str) {
$max = length($_) if length($_) > $max;
}
print $max;
This will print 6, which is the correct length of (XIII).
(XIII)
123456
You could use List::Util::reduce instead of map in order to avoid creating unnecessary list of lengths:
use strict;
use warnings;
use List::Util qw/reduce max/;
my #str = qw /(I) (II) (XIII) (IV)/;
my $max = reduce { max($a, length($b)) } 0, #str;
print $max;

Using big numbers in Perl

I have a scenario where I take 2 very big binary strings (having 100 characters) and I need to add them.
The issue is that I am getting the answer in the form 2.000xxxxxxxxxxe+2, whereas I need the precise answer, as another 100 character long string.
chomp($str1=<STDIN>);
chomp($str2=<STDIN>);
print "Str 1 is $str1\n";
print "Str 2 is $str2\n";
$t = $str1 + $str2;
print "Sum is $t\n";
Sample Input
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Sample Output
Str1 is
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Str2 is
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Sum is
2.0022022220202e+099
As already suggested, you can use Math::BigInt core module,
use Math::BigInt;
# chomp($str1=<STDIN>);
# chomp($str2=<STDIN>);
# print "Str 1 is $str1\n";
# print "Str 2 is $str2\n";
my $t = Math::BigInt->new("0b$str1") + Math::BigInt->new("0b$str2");
print $t->as_bin;
In order to perform arithmetic on your strings, Perl converts them to floating-point numbers, which are inherently imprecise. If you want to avoid that, use Math::BigInt as already suggested ... or roll your own.
######## WARNING/GUARANTEE: This is practically certain to be
# slower, buggier, less portable, and less secure than Math::BigInt.
# In fact, I planted a security hole just to prove a point. Enjoy.
use strict;
use warnings;
sub addition {
my ($int1, $int2) = #_;
my #int1 = reverse split //, $int1;
my #int2 = reverse split //, $int2;
my $len = scalar(#int1>#int2 ? #int1 : #int2);
my #result;
my $carry = 0;
for (my $i=0; $i < $len; ++$i)
{
$int1[$i] |= 0;
$int2[$i] |= 0;
my $sum = $carry + $int1[$i] + $int2[$i];
if ($sum >= 10)
{
$carry = int($sum / 10);
$sum %= 10;
}
push #result, $sum;
}
push #result, $carry if $carry;
return join ('', reverse #result);
}

Builtin method of culling all values outside lower and upper, perl array

I've got an array in perl which contains sorted non-contiguous values. For example: 1 2 3 5 7 11 13 15.
I want to remove all values that are outside lower and upper, keeping lower and upper in the returned selection. My method of doing that looks like this (could probably be improved by using slice):
my #culledArray;
for ( my $i = 0; $i < scalar(#array); $i++ ) {
if ( ( $array[$i] <= $_[1] ) and ( $array[$i] >= $_[0] ) ) {
push(#culledArray, $array[$i]);
}
}
where the lower and upper are contained in $_[0] and $_[1], respectively. Is there a perl builtin that does this?
Don't know anything built-in that would do that (that is quite a specific requirement), but you can save yourself some typing by using grep:
my #culledArray = grep {( $_ <= $_[1] ) and ( $_ >= $_[0] )} #array;
If the list is long and you don't want to copy it, finding the start and end indices and using a slice might be interesting.
This is messy, but my unit tests pass, so it seems to work. Take the lower and upper indexes, based on the fact that #array is a sorted list and $_[0] >= $_[1], then create the #culledArray from #array[$lower..$upper]:
my #culledArray;
my $index = 0;
++$index until $array[$index] >= $_[0];
my $lowerIndex = $index;
while (($array[$index] <= $_[1]) and ($index < $#array)) { ++$index; }
my $upperIndex = $index;
#culledArray = #array[$lowerIndex .. $upperIndex];
return \#culledArray;
I'd love to know the efficiency of this vs the answer Mat gave. I'm almost sure that I don't necessarily traverse the entire #array (because I traverse from index of 0 until I find the $upperIndex. I'm not sure how the grep method in the linked answer works, or how perl implements the slicing of #array to #culledArray in the above code, though.
It looks like you may be using percentiles or quantiles? If so then Statistics::Descriptive may help.
The percentile method returns the value and index at that percentile, so you can use code as below
use strict;
use warnings;
use Statistics::Descriptive;
my #data = qw/ 1 2 3 5 7 11 13 15 /;
my $stat = Statistics::Descriptive::Full->new;
$stat->add_data(#data);
my ($d25, $i25) = $stat->percentile(25);
my ($d75, $i75) = $stat->percentile(75);
my #subset = ($stat->get_data)[$i25 .. $i75];
print "#subset\n";
output
2 3 5 7 11

How to sum two lists element-wise

I want to parse a file line by line, each of which containing two integers, then sum these values in two distinct variables. My naive approach was like this:
my $i = 0;
my $j = 0;
foreach my $line (<INFILE>)
{
($i, $j) += ($line =~ /(\d+)\t(\d+)/);
}
But it yields the following warning:
Useless use of private variable in void context
hinting that resorting to the += operator triggers evaluation of the left-hand side in scalar instead of list context (please correct me if I'm wrong on this point).
Is it possible to achieve this elegantly (possibly in one line) without resorting to arrays or intermediate variables?
Related question: How can I sum arrays element-wise in Perl?
No, it's because the expression ($i, $j) += (something, 1) parses as adding 1 to $j only, leaving $i hanging in void context. Perl 5 has no hyper-operators or automatic zipping for the assignment operators such as +=. This works:
my ($i, $j) = (0, 0);
foreach my $line (<INFILE>) {
my ($this_i, $this_j) = split /\t/, $line;
$i += $this_i;
$j += $this_j;
}
You can avoid the repetion by using a compound data structure instead of named variables for the columns.
First of all, your way of adding arrays pairwise does not work (the related question you posted yourself gives some hints there).
And for the parsing part: How about just splitting the lines? If your lines are formatted accordingly (whitespaces should not be a problem).
split(/\t/, $line, 2)
If you really, really want to do it in one line, you could do something like this (though I don't think you would call it elegant):
my #a = (0, 0);
foreach my $line (<INFILE>)
{
#a = map { shift(#a)+$_ } split(/\t/, $line, 2);
}
For an input of #lines = ("11\t1\n", " 22 \t 2 \n", "33\t3"); it gave me the #a = (6, 66)
I would advise you to use the split part of my answer, but not the adding up part. There is nothing wrong in using more than one line! If it makes your intention clearer, more lines are better than one. But than again I'm hardly using perl nowadays but python instead, so my perl coding style might have a "bad" influence there...
It is quite possible to swap the pair over for each addition, meaning you're always adding to the same element in each pair. (This generalises to rotating multi-element arrays if required.)
use strict;
use warnings;
my #pair = (0, 0);
while (<DATA>) {
#pair = ($pair[1], $pair[0] + $_) for /\d+/g;
}
print "#pair\n";
__DATA__
99 42
12 15
18 14
output
129 71
Here's another option:
use Modern::Perl;
my $i = my $j = 0;
map{$i += $_->[0]; $j += $_->[1]} [split] for <DATA>;
say "$i - $j";
__DATA__
1 2
3 4
5 6
7 8
Output:
16 - 20

Perl: Using Algorithm::Loops

I'm trying to construct a permutation program in Perl using the NestedLoops function. Here's my code:
use strict;
use warnings;
use Algorithm::Loops qw(NestedLoops);
my #a = 'a'..'o';
my $length = 5;
my $start = 0;
my $depth = 2;
NestedLoops([
[0..$length],
( sub {
$start = 0 if $start == $depth;
$start++;
[$start * $length..$start * $length + $length - 1]
}) x $depth,
], \&permute,);
sub permute {
my #ind = #_;
foreach my $i (#ind) {
print $a[$i];
}
print "\n";
}
So I've got an array that holds the letters 'a' to 'o' (size being 15). I'm treating the array as if it had 3 rows, so my imagination of the array is this:
abcde
fghij
klmno
Then each loop corresponds to each row... and I want to build permutations like:
afk
afl
afm
afn
afo
agk // fails here... I end up getting agg
...
It works for the first 5 values (the entire run of the lowest for loop), but then the second run fails because the last row's value of $start gets reset to 0... this is a problem because that breaks everything.
So what I want to know is, how can I keep the value of $start persistent based on the level... So what I'm asking for is essentially having constants. My loops really should look like this:
for my $a (0..5) { # 0 at this level and never change
for my $b (5..10) { # $start should be 5 at this level and never change
for my $c (10..15) { # $start should be 10 at this level and never change
permute($a, $b, $c);
}
}
}
Now, because I will have a variable length of for loops, I can't hard code each start value, so I'm looking for a way to initially create those start values, and then keep them for when the loop gets reset.
I realize this is a confusing question, so please ask questions, and I will help clarify.
You are making this harder than it has to be.
Part of the problem is that the documentation for NestedLoops doesn't go into much detail about how a subroutine reference in the first argument, will be used.
For the following examples, assume this is written somewhere above them.
use strict;
use warnings;
use Algorithm::Loops qw'NestedLoops';
Really the simplest way to call NestedLoops to get what you want is like this:
NestedLoops(
[
['a'..'e'],
['f'..'j'],
['k'..'o'],
],
\&permute
);
sub permute {
print #_, "\n";
}
If you really want the arguments to NestedLoops to be generated on the fly, I would recommend using part from List::MoreUtils.
use List::MoreUtils qw'part';
my #a = 'a'..'o';
my $length = 5;
my $index;
NestedLoops(
[
part {
$index++ / $length
} #a
],
\&permute
);
sub permute {
print #_, "\n";
}
If for some reason you want to call NestedLoops with indexes into the array, It is still easy with part.
use List::MoreUtils qw'part';
my #a = 'a'..'o';
my $length = 5;
NestedLoops(
[
part {
$_ / $length
} 0..#a-1
],
\&permute
);
sub permute {
print map { $a[$_] } #_;
print "\n";
}
Really the main problem you're having is that the two subroutine references that you give to NestedLoops are modifying the same variables, and they are both called multiple times.
The best way to fix this is to rely on the last value given to the subroutine when it is called. ( From looking at the implementation, this seems to be closer to how it was meant to be used. )
my #a = 'a'..'o';
my $length = 5;
my $depth = 3;
NestedLoops(
[
[0..$length-1],
(sub{
return unless #_;
my $last = pop;
my $part = int( $last / $length ) + 1; # current partition
my $start = $part * $length; # start of this partition
my $end = $start + $length;
[$start..$end-1] # list of variables in this partition
}) x ($depth-1)
],
\&permute
);
sub permute {
print map { $a[$_] } #_;
print "\n";
}
When you use a subroutine to generate the range of a loop, it is called every time that one of the nested loops must start. That means once for each iteration of the containing loop. Before each call $_ is set to the current value of the containing loop's variable, and the values of all the containing loop variables are passed as parameters.
To clarify this, the NestedLoops statement you have coded is equivalent to
sub loop_over {
$start = 0 if $start == $depth;
$start++;
[$start * $length..$start * $length + $length - 1]
};
NestedLoops([
[0..$length],
(\&loop_over) x $depth,
], \&permute,);
which, in raw Perl, looks something like
for my $i (0 .. $length) {
$_ = $i;
my $list = loop_over($i);
for my $j (#$list) {
$_ = $j;
my $list = loop_over($i, $j);
for my $k (#$list) {
permute($i, $j, $k);
}
}
}
so perhaps it is clearer now that your calculation of $start is wrong? It is reevaluated several times for the innermost level before execution ascends to restart the containing loop.
Since the parameters passed to the subroutine consist of all the values of the containing loop variables, the size of #_ can be checked to see for which level of the loop to generate a range. For instance, in the code above, if #_ contains two values they are $i and $j, so the values for $k must be returned; alternatively, if there is only one parameter then it is the value of $i, and the returned value must be the range for $j. So the correct value for your $start is simply the number of elements in #_ and can be set using my $start = #_;.
Using this method the subroutine can return the range for the outermost loop as well. The code looks like this
use strict;
use warnings;
use Algorithm::Loops qw(NestedLoops);
my #a = 'a'..'o';
my $length = 5;
my $start = 0;
my $depth = 2;
NestedLoops([
(sub {
$start = #_;
[$start * $length .. $start * $length + $length - 1];
}) x ($depth + 1)
], \&permute,);
sub permute {
print map { $a[$_] } #_;
print "\n";
}