Creating a lazy hashed iterator - perl

In research of this question I peeked at the Iterators chapter in the book Higher Order Perl, and some of the material there was a bit over my head and didn't think necessarily addressed what I specifically want here.
What I mean by lazy hashed iterator is a way to create a structure that would emulate this behavior:
%Ds = {
'1' => 1 .. 20;
'2' => 21 .. 40;
'3' => 41 .. 60;
'4' => 61 .. 80;
...
}
Unfortunately, since this is a hash it would not be in order and thus useless in case of very large numbers.
The behavior is this:
I have a number.
I need to compare it with a sequence of ranges and as a result of the comparison the
code/sub would return another number that is the "key" of that range in case the
number is in that range. (>= with the beginning or <= with the end point of said range)
The "key" of the ranges are numbers from 1..2..3 and so on.
The code/sub will always return for a positive integer no matter how large it is.
By implementing this all lazily I mean if there is a way to emulate this behavior and not compute the sequences of ranges with their respective "keys" with every call of the sub or iteration of a loop. Basically compute once.
Yes it's true that I could choose a maximum boundary, hardcode this in a loop and be done with it but the problem is I don't know of how many of these steps I would need in the end.
Is there a way to do this with perl constructs or maybe perhaps there is a CPAN module that offers this kind of behaviour and my simple search of it didn't uncover it.
Here is a piece of code that illustrates what I mean:
sub get_nr {
my $nr = shift;
my %ds = map { $a = '1' if /1/ .. /20/;
$a = '2' if /21/ .. /40/;
$a = '3' if /41/ .. /60/;
$a = '4' if /61/ .. /80/;
$_ => $a } 1 .. 80;
while (my ($k, $v) = each %ds) {
if ( $k == $nr){
print "number is in range $v \n";
}
}
}
The output for:
get_nr(4);
get_nr(15);
get_nr(22);
get_nr(45);
Is:
number is in range 1
number is in range 1
number is in range 2
number is in range 3

Based on the discussion in the comments, the code you seem to want is a very simple subroutine
sub get_nr {
my $nr = shift;
my $range = int(($nr-1) / 20) + 1;
return $range;
}
You need to compensate for the edge cases, you wanted 20 to return 1, for example, so we need to subtract 1 from the number before dividing it.
If you want to customize further, you might use a variable for the range size, instead of a hard coded number.

sub get_range_number {
my ($n) = #_;
return int(($n-1)/20) + 1;
}
print "$_ is in range ".get_range_number($_)."\n"
for 4, 15, 22, 45;

Related

How to count the odd number of occurrences in Perl?

I have a program in Perl that is supposed to count the number of times an element appears in an array, and prints out the value of the element if the number of times it appears is odd.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
sub FindOddCount($)
{
my #arraynumber = #_;
my $Even = 0;
my $i = 0;
my $j = 0;
my $array_length = scalar(#_);
for ($i = 0; $i <= $array_length; $i++)
{
my $IntCount = 0;
for ($j = 0; $j <= $array_length; $j++)
{
if ($arraynumber[$i] == $arraynumber[$j])
{
$IntCount++;
print($j);
}
}
$Even = $IntCount % 2;
if ($Even != 0)
{
return $arraynumber[$i];
}
}
if ($Even == 0)
{
return "none";
}
}
my #array1 = (1,1,2,2,3,3,4,4,5,5,6,7,7,7,7);
my #array2 = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my #array3 = (6,6,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10.10);
my #array4 = (10,10,7,7,2,2,3,3,4,4,5,5,7,7,7,7,10,10,6);
my #array5 = (6,6);
my #array6 = (1);
my $return_value1 = FindOddCount(#array1);
my $return_value2 = FindOddCount(#array2);
my $return_value3 = FindOddCount(#array3);
my $return_value4 = FindOddCount(#array4);
my $return_value5 = FindOddCount(#array5);
my $return_value6 = FindOddCount(#array6);
print "The Odd value for the first array is $return_value1\n";
print "The Odd value for the 2nd array is $return_value2\n ";
print "The Odd value for the 3rd array is $return_value3\n ";
print "The Odd value for the 4th array is $return_value4\n ";
print "The Odd value for the 5th array is $return_value5\n ";
print "The Odd value for the sixth array is $return_value6\n ";
Here are my results.
The Odd value for the first array is 15
The Odd value for the first array is 21
The Odd value for the first array is 21
The Odd value for the first array is 19
The Odd value for the first array is 2
The Odd value for the first array is 1
If you can't tell. It is printing the count of all of the elements of the array instead of returning the element that occurs an odd number of times. In addition I get this error.
Use of uninitialized value in numeric eq (==) at OddCount.pl line 17.
Line 17 is where the 1st array and the 2nd array are compared. Yet the values are clearly instantiated and they work when I print them out. What is the issue?
Build a frequency hash for an array then go through it to see which elements have odd counts
use warnings;
use strict;
use feature 'say';
my #ary = qw(7 o1 7 o2 o1 z z o1); # o1,o2 appear odd number of times
my %freq;
++$freq{$_} for #ary;
foreach my $key (sort keys %freq) {
say "$key => $freq{$key}" if $freq{$key} & 1;
}
This is far simpler than the code in the question -- but which is easily fixed, too. See below.
Some notes
++$freq{$_} increments the value for the key $_ in the hash %freq by 1, or it adds the key to the hash if it doesn't exist (by autovivification) and sets its value to one. So when an array is iterated over with this code in the end the hash %freq contains for keys the array elements and for their values the elements' counts
Test $n & 1 uses the bitwise AND -- it is true if $n has the lowest bit set, so if it is odd
That ++$freq{$_} for #ary; is a Statement Modifier, running the statement for each element of #ary where the current element is aliased by $_ variable
This prints
o1 => 3
o2 => 1
This printing of odd-frequency elements (if any) is sorted alphabetically in elements, just so. Please change to any particular order that may be needed, or let me know.
Comments on the code in the question, which is correct with two simple fixes.
It uses prototypes in a wrong way for the purpose, in sub FindOddCount($). I suspect that this isn't needed so let's not dwell on it -- just drop that and make it sub FindOddCount
The index in loops includes the length of the array (<=) so in the last iteration they attempt to index into the array past its last element. Off-by-one error. That can be fixed by changing the condition into < $array_length (instead of <=), but read on
There is no reason to use C-style loops, not even to iterate over the index. (Needed here since the position in the array is used.) Scripting languages provide for cleaner ways†
foreach my $i1 (0 .. $#arraynumber) {
my $IntCount = 0;
foreach my $i2 (0 .. $#arraynumber) {
if ( $arraynumber[$i1] == $arraynumber[$i2] ) {
...
That 0..N is the range operator, which creates the list of numbers within that range. The syntax $#array_name is the index of the last element in the array #array_name. Exactly what's needed. So there is no need for the array length
Multiple (six) arrays, used to check the code, can be manipulated in far better and easier ways by using references; see the tutorial for complex data structures perldsc, and in particular the page perllol, for array-of-arrays
In short: when you remove the prototype and fix off-by-one error your code seems to be correct.
† And not only scripting ones -- for example, C++11 introduced the range-based for loop
for (auto var: container) ... // really const auto&, or auto&, or auto&&
and the link (a standard reference) says
Used as a more readable equivalent to the traditional for loop [...]
Count the number of occurrences in a for loop using a hash. Then print the desired elements using grep, like so:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw( say );
my #array = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my %cnt;
# Count each element of the array:
$cnt{$_}++ for #array;
# Print only the array elements that occurred an odd number of times,
# separated by ", ":
say join q{, }, grep { $cnt{$_} % 2 } #array;
# 6, 6, 6

Binary search—Can't use string "1" as a symbol ref while strict refs is in use

I've been browsing over the already answered questions regarding this error message.
I am trying to solve a problem from the Rosalind web site that looks for some indexes using a binary search.
When my subroutine finds the number it seems to ignore it, and if I try to print the $found variable, it gives me the error
Can't use string "1" as a symbol ref while strict refs is in use
The code is this
sub binarysearch
{
my $numbertolook = shift;
my #intarray=#_;
my $lengthint = scalar #intarray;
my #sorted = sort {$a <=> $b} #intarray;
#print $numbertolook, " " , #sorted, "\n";
my $low=0;
my $high=$lengthint-1;
my $found =undef;
my $midpoint;
while ($low<$high)
{
$midpoint=int(($low+$high)/2);
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ($numbertolook<$sorted[$midpoint])
{
$high=$midpoint;
}
elsif ($numbertolook>$sorted[$midpoint])
{
$low=$midpoint;
}
elsif ($numbertolook==$sorted[$midpoint])
{
$found=1;
print $found "\n";
last;
}
if ($low==$high-1 and $low==$midpoint)
{
if ($numbertolook==$sorted[$high])
{
$found=1;
print $found "\n";
last;
}
$low=$high;
}
}
return $found;
}
You want
print $found, "\n";
Or
print $found . "\n";
With no operator between $found and the newline, it thinks $found is the filehandle to print a newline to, and is getting an error because it isn't a filehandle.
I'll try to help
First of all, as simple as it may seem, a binary search is quite difficult to code correctly. The main reason is that it's a hotbed of off-by-one errors, which are so prevalent that they have their own Wikipedia page
The issue is that an array containing, say, the values A to Z will have 26 elements with indices 0 to 25. I think FORTRAN bucks the trend, and Lua, but pretty much every other language has the first element of an array at index zero
A zero base works pretty well for everything until you start using divide and conquer algorithms. Merge Sort as well as Binary Search are such algorithms. Binary search goes
Is it in the first half?
If so then search the first half further
Else search the second half further
The hard part is when you have to decide when you've found the object, or when you need to give up looking. Splitting data in two nearly-halves is easy. Knowing when to stop is hard
It's highly efficient for sorted data, but the problem comes when implementing it that, if we do it properly, we have to deal with all sorts of weird index bases beyond zero or one.
Suppose I have an array
my #alpha = 'A' .. 'Q'
If I print scalar #alpha I will see 17, meaning the array has seventeen elements, indexed from 0 to 16
Now I'm looking for E in that array, so I do a binary search, so I want the "first half" and the "second half" of #alpha. If I add 0 to 16 and divide by 2 I get a neat "8", so the middle element is at index 8, which is H
But wait. There are 17 elements, which is an odd number, so if we say the first eight (A .. H) are left of the middle and the last eight (I .. Q) are right of the middle then surely the "middle" is I?
In truth this is all a deception, because a binary search will work however we partition the data. In this case binary means two parts, and although the search would be more efficient if those parts could be equal in size it's not necessary for the algorithm to work. So it can be the first third and the last two-thirds, or just the first element and the rest
That's why using int(($low+high)/2) is fine. It rounds down to the nearest integer so that with our 17-element array $mid is a usable 8 instead of 8.5
But your code still has to account for some unexpected things. In the case of our 17-element array we have calculated the middle index to be 8. So indexes 0 .. 7 are the "first half" while 8 .. 16 are the "second half", and the middle index is where the second half starts
But didn't we round the division down? So in the case of an odd number of elements, shouldn't our mid point be at the end of the first half, and not the start of the second? This is an arcane off-by-one error, but let's see if it still works with a simple even number of elements
#alpha = `A` .. `D`
The start and and indices are 0 and 3; the middle index is int((0+3)/2) == 1. So the first half is 0..1 and the second half is 2 .. 3. That works fine
But there's still a lot more. Say I have to search an array with two elements X and Y. That has two clear halves, and I'm looking for A, which is before the middle. So I now search the one-element list X for A. The minimum and maximum elements of the target array are both zero. The mid-point is int((0+0)/2) == 0. So what happens next?
It is similar but rather worse when we're searching for Z in the same list. The code has to be exactly right, otherwise we will be either searching off the end of the array or checking the last element again and again
Saving the worst for last, suppose
my #alpha = ( 'A', 'B, 'Y, 'Z' )
and I'm looking for M. That lest loose all sorts of optimisations that involve checks that may may the ordinary case much slower
Because of all of this it's by far the best solution to use a library or a language's built-in function to do all of this. In particular, Perl's hashes are usually all you need to check for specific strings and any associated data. The algorithm used is vastly better than a binary search for any non-trivial data sets
Wikipedia shows this algorithm for an iterative binary search
The binary search algorithm can also be expressed iteratively with two index limits that progressively narrow the search range.
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if (A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
And here is a version of your code that is far from bug-free but does what you intended. You weren't so far off
use strict;
use warnings 'all';
print binarysearch( 76, 10 .. 99 ), "\n";
sub binarysearch {
my $numbertolook = shift;
my #intarray = #_;
my $lengthint = scalar #intarray;
my #sorted = sort { $a <=> $b } #intarray;
my $low = 0;
my $high = $lengthint - 1;
my $found = undef;
my $midpoint;
while ( $low < $high ) {
$midpoint = int( ( $low + $high ) / 2 );
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ( $numbertolook < $sorted[$midpoint] ) {
$high = $midpoint;
}
elsif ( $numbertolook > $sorted[$midpoint] ) {
$low = $midpoint;
}
elsif ( $numbertolook == $sorted[$midpoint] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
if ( $low == $high - 1 and $low == $midpoint ) {
if ( $numbertolook == $sorted[$high] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
return;
}
}
return $midpoint;
}
output
FOUND
66
If you call print with several parameters separated with a space print expects the first one to be a filehandle. This is interprented as print FILEHANDLE LIST from the documentation.
print $found "\n";
What you want to do is either to separate with ,, to call it as print LIST.
print $found, "\n";
or to concat as strings, which will also call it as print LIST, but with only one element in LIST.
print $found . "\n";

Perl - Returning the maximum value in a data set

I have never delved into the world of Perl before and I find it pretty confusing and could use some help. In the code below the calc() section returns a running average of an'input' over 'count' samples. I would like to modify that so the calc() returns the maximum value within the sample set. Thanks in advance for the help!
sub calc
{
my ($this, $dim, $input, $count) = #_;
if ($count < 1)
{
warn "count=$count is less than 1.";
return undef;
}
my $inputsum_in = $this->{inputsum};
my ($inputcumsum, $inputsum_out) = PDL::CumulativeSumOver2($input, $inputsum_in);
my $inputdelay = $this->delay('inputhistory', $input);
my $inputdelaysum_in = $this->{inputdelaysum};
my ($inputdelaycumsum, $inputdelaysum_out) = PDL::CumulativeSumOver2($inputdelay, $inputdelaysum_in);
$this->{inputsum} = $inputsum_out;
$this->{inputdelaysum} = $inputdelaysum_out;
my $sampleno = $this->{sampleno};
my $divider = $count;
if($sampleno < $count)
{
my $last = $dim - 1;
$divider = sequence($dim) + ($sampleno + 1);
my $start = $count - $sampleno;
$divider->slice("$start:$last") .= $count if $start <= $last;
$this->{sampleno} = $sampleno + $dim;
}
return ($inputcumsum - $inputdelaycumsum) / $divider;
}
How about
$max = max($input);
PDL Primitives
If you want to find the maximum of a certain list of values, you do not need to write your own subroutine. There is already a function that comes shipped with perl v5.7.3 or higher:
use List::Util qw(max); # core module since v5.7.3
use strict;
use warnings;
print max(1 .. 10); # prints 10
EDIT: Here is the loop I take it you need.
Read input data from sensor
append new data to stored data
Throw away excess data
Evaluate
Here's how I'd do it.
my $storedData = pdl;
# $storedData is now a vector containing one element, 0
while (! stopCondition()) {
my $input = readSensorData(); # step 1
$storedData = $storedData->append($input); # step 2
if ($storedData->nelem > $count) { # step 3
$storedData = $storedData->slice("-$count:-1");
# note that -1 points to the last element in a piddle and -X refers to
# the element X-1 away from the end (true for piddles and native arrays)
}
my ($max, $min) = evaluate($storedData); # step 4
}
I'm not sure if this answers your question, but your comment below seems pretty different from the question you have above. Consider editing the above to better reflect what you're having trouble with or asking a new question.
An easy way to get a running average is with a finite impulse response filter, aka convolution. Convolve any signal with a (normalized) rectangular impulse and you get running average.
my $filter = ones($count) / $count;
my $runningAve = convolveND($input, $filter);
my $max = $runningAve->max`;
Or in one line
my $max = convolveND($input, ones($count) / $count)->max;
convolveND is documented here.
There is one thing to be careful of with this method, which is that the values at the beginning and end of the $runningAve piddle aren't really running averages. To ensure that the output is the same size as the input convolveND (by default) effectively concatenates zeroes to the beginning and end of the input, the result being that the first and last few elements of $runningAve are lower than actual running averages. (Note that a running average should have N - (window - 1) elements in principle, N being the size of $input.) Since these "bad" values will necessarily be lower than the actual running average values, they won't disturb the maximum that you want. (Re "by default": convolveND has other ways of handling edges, as you will see in the documentation linked to above.)
(NB: I am not a PDL expert. There may be a cheaper way to get the running average that's cheaper than convolveND, something like $ra = $input->range(...)->sumover(0) / $count, but I don't know what you'd put in the ... and the above is readable. See also http://search.cpan.org/~jlapeyre/PDL-DSP-Iir-0.002/README.pod#moving_average)

Perl: test for an arbitrary bit in a bit string

I'm trying to parse CPU node affinity+cache sibling info in Linyx sysfs.
I can get a string of bits, just for example:
0000111100001111
Now I need a function where I have a decimal number (e.g. 4 or 5) and I need to test whether the nth bit is set or not. So it would return true for 4 and false for 5. I could create a string by shifting 1 n number of times, but I'm not sure about the syntax, and is there an easier way? Also, there's no limit on how long the string could be, so I want to avoid decimal <-> binary conversoins.
Assuming that you have the string of bits "0000111100001111" in $str, if you do the precomputation step:
my $bit_vector = pack "b*", $str;
you can then use vec like so:
$is_set = vec $bit_vector, $offset, 1;
so for example, this code
for (0..15) {
print "$_\n" if vec $bit_vector, $_, 1;
}
will output
4
5
6
7
12
13
14
15
Note that the offsets are zero-based, so if you want the first bit to be bit 1, you'll need to add/subtract 1 yourself.
Well, this seems to work, and I'm not going for efficiency:
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
my $index = length($bitstring) - $bit - 1;
if (substr($bitstring, $index, 1) == "1") {
return 1;
}
else {
return 0;
}
}
Simpler variant without bit vector, but for sure vector would be more efficient way to deal.
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
return int substr($bitstring, -$bit, 1);
}

When is the spaceship operator used outside a sort?

I've only seen the Perl spaceship operator (<=>) used in numeric sort routines. But it seems useful in other situations. I just can't think of a practical use.
What would be an example of when it could be used outside of a Perl sort?
This is a best practice question.
I'm writing a control system for robot Joe that wants to go to robot Mary and recharge her. They move along the integer points on the line. Joe starts at $j and can walk 1 meter in any direction per time unit. Mary stands still at $m and can't move -- she needs a good recharge! The controlling program would look like that:
while ($m != $j) {
$j += ($m <=> $j);
}
The <=> operator would be useful for a binary search algorithm. Most programing languages don't have an operator that does a three-way comparison which makes it necessary to do two comparisons per iteration. With <=> you can do just one.
sub binary_search {
my $value = shift;
my $array = shift;
my $low = 0;
my $high = $#$array;
while ($low <= $high) {
my $mid = $low + int(($high - $low) / 2);
given ($array->[$mid] <=> $value) {
when (-1) { $low = $mid + 1 }
when ( 1) { $high = $mid - 1 }
when ( 0) { return $mid }
}
}
return;
}
In any sort of comparison method. For example, you could have a complicated object, but it still has a defined "order", so you could define a comparison function for it (which you don't have to use inside a sort method, although it would be handy):
package Foo;
# ... other stuff...
# Note: this is a class function, not a method
sub cmp
{
my $object1 = shift;
my $object2 = shift;
my $compare1 = sprintf("%04d%04d%04d", $object1->{field1}, $object1->{field2}, $object1->{field3});
my $compare2 = sprintf("%04d%04d%04d", $object2->{field1}, $object2->{field2}, $object2->{field3});
return $compare1 <=> $compare2;
}
This is a totally contrived example of course. However, in my company's source code I found nearly exactly the above, for comparing objects used for holding date and time information.
One other use I can think of is for statistical analysis -- if a value is repeatedly run against a list of values, you can tell if the value is higher or lower than the set's arithmetic median:
use List::Util qw(sum);
# $result will be
# -1 if value is lower than the median of #setOfValues,
# 1 if value is higher than the median of #setOfValues,
# 0 if value is equal to the median
my $result = sum(map { $value <=> $_ } #setOfValues);
Here's one more, from Wikipedia: "If the two arguments cannot be compared (e.g. one of them is NaN), the operator returns undef.", i.e., you can determine if two numbers are a a number at once, although personally I'd go for the less cryptic Scalar::Util::looks_like_number.