Unintended endless While loop - why? - perl

I have an array of numeric values that's sorted. I also have a minimum and maximum value and want to remove any values from the array that are smaller than the minimum or bigger than the maximum value. I am getting an endless loop when my minimum value is smaller than the value of the first array element. Here's a minimal example of the code in question:
#!/usr/bin/perl
use strict;
use warnings;
my #array = ( 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 );
my $min_wert = 1;
my $max_wert = 13;
while ( $array[0] < $min_wert ) {
shift #array;
}
while ( $array[-1] > $max_wert ) {
pop #array;
}
print join( ' ', #array );
print "\n";
The problem is, this version works flawlessly, outputting
5 6 7 8 9 10 11 12 13
The first while is not entered in this case.
Dropping the same test case into my production code, I'm getting the following error message on the line with the shift-statement:
Use of uninitialized value in numeric lt (<) at
line 1130.
I then introduced a counter to try to figure out why the while-loop is even entered, and that completely removes the problem instead of giving me the opportunity for further diagnostics.
#werte_liste = sort {$a <=> $b} #werte_liste;
print join( ' ', #werte_liste );
print "\n";
print "Start: $start_index - Stop: $stop_index\n"
while ( $werte_liste[0] < $start_index ) {
print "In while-loop- why?\n";
shift #werte_liste;
}
while ( $werte_liste[-1] > $stop_index ) {
pop #werte_liste;
}
why do I enter that first while loop in this case? And second, is there a better solution to my specific problem (I'm not talking lots of values here, so readability of code is more important than efficiency).

I don't know why it works with your testcase but doesn't in your production code, but here's my guess:
Your array becomes empty. $array[0] < $min_wert is true if $array[0] is undef (which happens if the array is empty), and $min_wert > 0.
undef is basically treated as a 0 in numerical comparisons (it emits a warning).
You can check that the array still has elements with this:
while ( #array and $array[0] < $min_wert ) {
The other while loop probably has the same problem.

What happens when #werte_liste is empty?
For one, $werte_liste[0] will be undefined, and the expression $werte_liste[0] < $start_index will generate the Use of uninitialized value in numerlic lt ... warnings.
For another, $werte_liste[0] will evaluate to 0 for the < comparison. If $start_index is positive, then $werte_liste[0] < $start_index is true.
Finally, shift #werte_liste will have no effect on #werte_liste, #werte_liste will remain empty, and your while ... expression will repeat indefinitely.

Related

Why array counter returns a smaller number? [duplicate]

I seem to have come across several different ways to find the size of an array. What is the difference between these three methods?
my #arr = (2);
print scalar #arr; # First way to print array size
print $#arr; # Second way to print array size
my $arrSize = #arr;
print $arrSize; # Third way to print array size
The first and third ways are the same: they evaluate an array in scalar context. I would consider this to be the standard way to get an array's size.
The second way actually returns the last index of the array, which is not (usually) the same as the array size.
First, the second ($#array) is not equivalent to the other two. $#array returns the last index of the array, which is one less than the size of the array.
The other two (scalar #arr and $arrSize = #arr) are virtually the same. You are simply using two different means to create scalar context. It comes down to a question of readability.
I personally prefer the following:
say 0+#array; # Represent #array as a number
I find it clearer than
say scalar(#array); # Represent #array as a scalar
and
my $size = #array;
say $size;
The latter looks quite clear alone like this, but I find that the extra line takes away from clarity when part of other code. It's useful for teaching what #array does in scalar context, and maybe if you want to use $size more than once.
This gets the size by forcing the array into a scalar context, in which it is evaluated as its size:
print scalar #arr;
This is another way of forcing the array into a scalar context, since it's being assigned to a scalar variable:
my $arrSize = #arr;
This gets the index of the last element in the array, so it's actually the size minus 1 (assuming indexes start at 0, which is adjustable in Perl although doing so is usually a bad idea):
print $#arr;
This last one isn't really good to use for getting the array size. It would be useful if you just want to get the last element of the array:
my $lastElement = $arr[$#arr];
Also, as you can see here on Stack Overflow, this construct isn't handled correctly by most syntax highlighters...
To use the second way, add 1:
print $#arr + 1; # Second way to print array size
All three give the same result if we modify the second one a bit:
my #arr = (2, 4, 8, 10);
print "First result:\n";
print scalar #arr;
print "\n\nSecond result:\n";
print $#arr + 1; # Shift numeration with +1 as it shows last index that starts with 0.
print "\n\nThird result:\n";
my $arrSize = #arr;
print $arrSize;
Example:
my #a = (undef, undef);
my $size = #a;
warn "Size: " . $#a; # Size: 1. It's not the size
warn "Size: " . $size; # Size: 2
The “Perl variable types” section of the perlintro documentation contains
The special variable $#array tells you the index of the last element of an array:
print $mixed[$#mixed]; # last element, prints 1.23
You might be tempted to use $#array + 1 to tell you how many items there are in an array. Don’t bother. As it happens, using #array where Perl expects to find a scalar value (“in scalar context”) will give you the number of elements in the array:
if (#animals < 5) { ... }
The perldata documentation also covers this in the “Scalar values” section.
If you evaluate an array in scalar context, it returns the length of the array. (Note that this is not true of lists, which return the last value, like the C comma operator, nor of built-in functions, which return whatever they feel like returning.) The following is always true:
scalar(#whatever) == $#whatever + 1;
Some programmers choose to use an explicit conversion so as to leave nothing to doubt:
$element_count = scalar(#whatever);
Earlier in the same section documents how to obtain the index of the last element of an array.
The length of an array is a scalar value. You may find the length of array #days by evaluating $#days, as in csh. However, this isn’t the length of the array; it’s the subscript of the last element, which is a different value since there is ordinarily a 0th element.
From perldoc perldata, which should be safe to quote:
The following is always true:
scalar(#whatever) == $#whatever + 1;
Just so long as you don't $#whatever++ and mysteriously increase the size or your array.
The array indices start with 0.
and
You can truncate an array down to nothing by assigning the null list () to it. The following are equivalent:
#whatever = ();
$#whatever = -1;
Which brings me to what I was looking for which is how to detect the array is empty. I found it if $#empty == -1;
There are various ways to print size of an array. Here are the meanings of all:
Let’s say our array is my #arr = (3,4);
Method 1: scalar
This is the right way to get the size of arrays.
print scalar #arr; # Prints size, here 2
Method 2: Index number
$#arr gives the last index of an array. So if array is of size 10 then its last index would be 9.
print $#arr; # Prints 1, as last index is 1
print $#arr + 1; # Adds 1 to the last index to get the array size
We are adding 1 here, considering the array as 0-indexed. But, if it's not zero-based then, this logic will fail.
perl -le 'local $[ = 4; my #arr = (3, 4); print $#arr + 1;' # prints 6
The above example prints 6, because we have set its initial index to 4. Now the index would be 5 and 6, with elements 3 and 4 respectively.
Method 3:
When an array is used in a scalar context, then it returns the size of the array
my $size = #arr;
print $size; # Prints size, here 2
Actually, method 3 and method 1 are same.
Use int(#array) as it threats the argument as scalar.
To find the size of an array use the scalar keyword:
print scalar #array;
To find out the last index of an array there is $# (Perl default variable). It gives the last index of an array. As an array starts from 0, we get the size of array by adding one to $#:
print "$#array+1";
Example:
my #a = qw(1 3 5);
print scalar #a, "\n";
print $#a+1, "\n";
Output:
3
3
As numerous answers pointed out, the first and third way are the correct methods to get the array size, and the second way is not.
Here I expand on these answers with some usage examples.
#array_name evaluates to the length of the array = the size of the array = the number of elements in the array, when used in a scalar context.
Below are some examples of a scalar context, such as #array_name by itself inside if or unless, of in arithmetic comparisons such as == or !=.
All of these examples will work if you change #array_name to scalar(#array_name). This would make the code more explicit, but also longer and slightly less readable. Therefore, more idiomatic usage omitting scalar() is preferred here.
my #a = (undef, q{}, 0, 1);
# All of these test whether 'array' has four elements:
print q{array has four elements} if #a == 4;
print q{array has four elements} unless #a != 4;
#a == 4 and print q{array has four elements};
!(#a != 4) and print q{array has four elements};
# All of the above print:
# array has four elements
# All of these test whether array is not empty:
print q{array is not empty} if #a;
print q{array is not empty} unless !#a;
#a and print q{array is not empty};
!(!#a) and print q{array is not empty};
# All of the above print:
# array is not empty

How to count the odd number of occurrences in Perl?

I have a program in Perl that is supposed to count the number of times an element appears in an array, and prints out the value of the element if the number of times it appears is odd.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
sub FindOddCount($)
{
my #arraynumber = #_;
my $Even = 0;
my $i = 0;
my $j = 0;
my $array_length = scalar(#_);
for ($i = 0; $i <= $array_length; $i++)
{
my $IntCount = 0;
for ($j = 0; $j <= $array_length; $j++)
{
if ($arraynumber[$i] == $arraynumber[$j])
{
$IntCount++;
print($j);
}
}
$Even = $IntCount % 2;
if ($Even != 0)
{
return $arraynumber[$i];
}
}
if ($Even == 0)
{
return "none";
}
}
my #array1 = (1,1,2,2,3,3,4,4,5,5,6,7,7,7,7);
my #array2 = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my #array3 = (6,6,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10.10);
my #array4 = (10,10,7,7,2,2,3,3,4,4,5,5,7,7,7,7,10,10,6);
my #array5 = (6,6);
my #array6 = (1);
my $return_value1 = FindOddCount(#array1);
my $return_value2 = FindOddCount(#array2);
my $return_value3 = FindOddCount(#array3);
my $return_value4 = FindOddCount(#array4);
my $return_value5 = FindOddCount(#array5);
my $return_value6 = FindOddCount(#array6);
print "The Odd value for the first array is $return_value1\n";
print "The Odd value for the 2nd array is $return_value2\n ";
print "The Odd value for the 3rd array is $return_value3\n ";
print "The Odd value for the 4th array is $return_value4\n ";
print "The Odd value for the 5th array is $return_value5\n ";
print "The Odd value for the sixth array is $return_value6\n ";
Here are my results.
The Odd value for the first array is 15
The Odd value for the first array is 21
The Odd value for the first array is 21
The Odd value for the first array is 19
The Odd value for the first array is 2
The Odd value for the first array is 1
If you can't tell. It is printing the count of all of the elements of the array instead of returning the element that occurs an odd number of times. In addition I get this error.
Use of uninitialized value in numeric eq (==) at OddCount.pl line 17.
Line 17 is where the 1st array and the 2nd array are compared. Yet the values are clearly instantiated and they work when I print them out. What is the issue?
Build a frequency hash for an array then go through it to see which elements have odd counts
use warnings;
use strict;
use feature 'say';
my #ary = qw(7 o1 7 o2 o1 z z o1); # o1,o2 appear odd number of times
my %freq;
++$freq{$_} for #ary;
foreach my $key (sort keys %freq) {
say "$key => $freq{$key}" if $freq{$key} & 1;
}
This is far simpler than the code in the question -- but which is easily fixed, too. See below.
Some notes
++$freq{$_} increments the value for the key $_ in the hash %freq by 1, or it adds the key to the hash if it doesn't exist (by autovivification) and sets its value to one. So when an array is iterated over with this code in the end the hash %freq contains for keys the array elements and for their values the elements' counts
Test $n & 1 uses the bitwise AND -- it is true if $n has the lowest bit set, so if it is odd
That ++$freq{$_} for #ary; is a Statement Modifier, running the statement for each element of #ary where the current element is aliased by $_ variable
This prints
o1 => 3
o2 => 1
This printing of odd-frequency elements (if any) is sorted alphabetically in elements, just so. Please change to any particular order that may be needed, or let me know.
Comments on the code in the question, which is correct with two simple fixes.
It uses prototypes in a wrong way for the purpose, in sub FindOddCount($). I suspect that this isn't needed so let's not dwell on it -- just drop that and make it sub FindOddCount
The index in loops includes the length of the array (<=) so in the last iteration they attempt to index into the array past its last element. Off-by-one error. That can be fixed by changing the condition into < $array_length (instead of <=), but read on
There is no reason to use C-style loops, not even to iterate over the index. (Needed here since the position in the array is used.) Scripting languages provide for cleaner ways†
foreach my $i1 (0 .. $#arraynumber) {
my $IntCount = 0;
foreach my $i2 (0 .. $#arraynumber) {
if ( $arraynumber[$i1] == $arraynumber[$i2] ) {
...
That 0..N is the range operator, which creates the list of numbers within that range. The syntax $#array_name is the index of the last element in the array #array_name. Exactly what's needed. So there is no need for the array length
Multiple (six) arrays, used to check the code, can be manipulated in far better and easier ways by using references; see the tutorial for complex data structures perldsc, and in particular the page perllol, for array-of-arrays
In short: when you remove the prototype and fix off-by-one error your code seems to be correct.
† And not only scripting ones -- for example, C++11 introduced the range-based for loop
for (auto var: container) ... // really const auto&, or auto&, or auto&&
and the link (a standard reference) says
Used as a more readable equivalent to the traditional for loop [...]
Count the number of occurrences in a for loop using a hash. Then print the desired elements using grep, like so:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw( say );
my #array = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my %cnt;
# Count each element of the array:
$cnt{$_}++ for #array;
# Print only the array elements that occurred an odd number of times,
# separated by ", ":
say join q{, }, grep { $cnt{$_} % 2 } #array;
# 6, 6, 6

Is it ok to use conditional splice?

Let's consider simple Perl code:
my #x = ( 1, 5, 9);
for my $i ( 0 .. $#x ) {
splice( #x, $i, 1 ) if ( $x[$i] >= 5 );
}
print "#x";
Output is not correct, 1 9 but there must be 1
If we run code with -w flag it prints warning
Use of uninitialized value within #x in numeric ge (>=) at splice.pl line 5.
So, it's not good practise to use conditional splice and better to push result in new variable?
The problem isn't your use of conditional splice per se, it's your loop. The most obvious problem, and the one that causes your warning, is that you're running off of the end of the array. for my $i ( 0 .. $#x ) sets the iteration endpoint to $#x before the loop starts, but after you splice one or more elements out, the last index of the array will be smaller. You could fix that using a C-style for loop, instead of the range-style loop, but I don't recommend it — keep reading.
The next problem is that after you splice an element out of the array, you continue the loop with $i one higher... but because you spliced an element out of the array, the next element that you haven't seen yet is in $x[$i], not $x[$i+1]. You say "Output is correct, 1 9", but shouldn't 9 have been removed, since it's more than 5? You could fix this using redo after splice to go through the loop again without incrementing $i, but I don't recommend that either.
So it is possible to fix your loop which uses splice in place so that it will work correctly, but the result would be pretty complicated. Unless there's a compelling reason to do it differently, I would recommend using simply
#x = grep { $_ < 5 } #x;
There's no problem with assigning the result to the same array as the source, and there is no loop management or other housekeeping for you to do.

Binary search—Can't use string "1" as a symbol ref while strict refs is in use

I've been browsing over the already answered questions regarding this error message.
I am trying to solve a problem from the Rosalind web site that looks for some indexes using a binary search.
When my subroutine finds the number it seems to ignore it, and if I try to print the $found variable, it gives me the error
Can't use string "1" as a symbol ref while strict refs is in use
The code is this
sub binarysearch
{
my $numbertolook = shift;
my #intarray=#_;
my $lengthint = scalar #intarray;
my #sorted = sort {$a <=> $b} #intarray;
#print $numbertolook, " " , #sorted, "\n";
my $low=0;
my $high=$lengthint-1;
my $found =undef;
my $midpoint;
while ($low<$high)
{
$midpoint=int(($low+$high)/2);
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ($numbertolook<$sorted[$midpoint])
{
$high=$midpoint;
}
elsif ($numbertolook>$sorted[$midpoint])
{
$low=$midpoint;
}
elsif ($numbertolook==$sorted[$midpoint])
{
$found=1;
print $found "\n";
last;
}
if ($low==$high-1 and $low==$midpoint)
{
if ($numbertolook==$sorted[$high])
{
$found=1;
print $found "\n";
last;
}
$low=$high;
}
}
return $found;
}
You want
print $found, "\n";
Or
print $found . "\n";
With no operator between $found and the newline, it thinks $found is the filehandle to print a newline to, and is getting an error because it isn't a filehandle.
I'll try to help
First of all, as simple as it may seem, a binary search is quite difficult to code correctly. The main reason is that it's a hotbed of off-by-one errors, which are so prevalent that they have their own Wikipedia page
The issue is that an array containing, say, the values A to Z will have 26 elements with indices 0 to 25. I think FORTRAN bucks the trend, and Lua, but pretty much every other language has the first element of an array at index zero
A zero base works pretty well for everything until you start using divide and conquer algorithms. Merge Sort as well as Binary Search are such algorithms. Binary search goes
Is it in the first half?
If so then search the first half further
Else search the second half further
The hard part is when you have to decide when you've found the object, or when you need to give up looking. Splitting data in two nearly-halves is easy. Knowing when to stop is hard
It's highly efficient for sorted data, but the problem comes when implementing it that, if we do it properly, we have to deal with all sorts of weird index bases beyond zero or one.
Suppose I have an array
my #alpha = 'A' .. 'Q'
If I print scalar #alpha I will see 17, meaning the array has seventeen elements, indexed from 0 to 16
Now I'm looking for E in that array, so I do a binary search, so I want the "first half" and the "second half" of #alpha. If I add 0 to 16 and divide by 2 I get a neat "8", so the middle element is at index 8, which is H
But wait. There are 17 elements, which is an odd number, so if we say the first eight (A .. H) are left of the middle and the last eight (I .. Q) are right of the middle then surely the "middle" is I?
In truth this is all a deception, because a binary search will work however we partition the data. In this case binary means two parts, and although the search would be more efficient if those parts could be equal in size it's not necessary for the algorithm to work. So it can be the first third and the last two-thirds, or just the first element and the rest
That's why using int(($low+high)/2) is fine. It rounds down to the nearest integer so that with our 17-element array $mid is a usable 8 instead of 8.5
But your code still has to account for some unexpected things. In the case of our 17-element array we have calculated the middle index to be 8. So indexes 0 .. 7 are the "first half" while 8 .. 16 are the "second half", and the middle index is where the second half starts
But didn't we round the division down? So in the case of an odd number of elements, shouldn't our mid point be at the end of the first half, and not the start of the second? This is an arcane off-by-one error, but let's see if it still works with a simple even number of elements
#alpha = `A` .. `D`
The start and and indices are 0 and 3; the middle index is int((0+3)/2) == 1. So the first half is 0..1 and the second half is 2 .. 3. That works fine
But there's still a lot more. Say I have to search an array with two elements X and Y. That has two clear halves, and I'm looking for A, which is before the middle. So I now search the one-element list X for A. The minimum and maximum elements of the target array are both zero. The mid-point is int((0+0)/2) == 0. So what happens next?
It is similar but rather worse when we're searching for Z in the same list. The code has to be exactly right, otherwise we will be either searching off the end of the array or checking the last element again and again
Saving the worst for last, suppose
my #alpha = ( 'A', 'B, 'Y, 'Z' )
and I'm looking for M. That lest loose all sorts of optimisations that involve checks that may may the ordinary case much slower
Because of all of this it's by far the best solution to use a library or a language's built-in function to do all of this. In particular, Perl's hashes are usually all you need to check for specific strings and any associated data. The algorithm used is vastly better than a binary search for any non-trivial data sets
Wikipedia shows this algorithm for an iterative binary search
The binary search algorithm can also be expressed iteratively with two index limits that progressively narrow the search range.
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if (A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
And here is a version of your code that is far from bug-free but does what you intended. You weren't so far off
use strict;
use warnings 'all';
print binarysearch( 76, 10 .. 99 ), "\n";
sub binarysearch {
my $numbertolook = shift;
my #intarray = #_;
my $lengthint = scalar #intarray;
my #sorted = sort { $a <=> $b } #intarray;
my $low = 0;
my $high = $lengthint - 1;
my $found = undef;
my $midpoint;
while ( $low < $high ) {
$midpoint = int( ( $low + $high ) / 2 );
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ( $numbertolook < $sorted[$midpoint] ) {
$high = $midpoint;
}
elsif ( $numbertolook > $sorted[$midpoint] ) {
$low = $midpoint;
}
elsif ( $numbertolook == $sorted[$midpoint] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
if ( $low == $high - 1 and $low == $midpoint ) {
if ( $numbertolook == $sorted[$high] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
return;
}
}
return $midpoint;
}
output
FOUND
66
If you call print with several parameters separated with a space print expects the first one to be a filehandle. This is interprented as print FILEHANDLE LIST from the documentation.
print $found "\n";
What you want to do is either to separate with ,, to call it as print LIST.
print $found, "\n";
or to concat as strings, which will also call it as print LIST, but with only one element in LIST.
print $found . "\n";

I am unable to to understand following Perl code

I have the following Perl code.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my #array = ( 3, 4, 1, 4, 7, 7, 4, 1, 3, 8 );
my %unordered;
#unordered{#array} = undef;
foreach my $key (keys %unordered) {
print "Unordered: $key\n";
}
my %seen;
my #ordered;
foreach my $element (#array) {
if ( not $seen{$element}++ ) {
push #ordered, $element;
}
}
In the last foreach code block, I am unable to understand this - in the first iteration, the expression not $seen{$element}++ evaluate to not 0 - true - so the if block execute. In the second iteration the expression not $seen{$element}++ should again evaluate to not 0 - true as the hash is empty. So, reading the scalar $seen{$element} will read 0 and not 0 will evaluate to true. So, the if block should execute again. But, the book says it stops after first iteration. Can anyone explain this?
In the second iteration the hash will no longer be empty, because the ++ operator will have put a 1 in there. In a third iteration the value will be 2 (which for the purposes of this program is the same as 1, it just means "seen at least once before").
At the end of your program %seen will contain the number of times each entry appears in your list.
if $a++ increments the value of $a (treating it as 0 if missing), and then returns the value before that increment to the comparison.
It is important to use the postfix operator, as if ++$a will not work here: It also places a 1 in your hash, but it returns the modified value (so 1 even for the first iteration).
The last foreach loop can be detailled as:
# loop on all elements of the array
foreach my $element (#array) {
# if the current element haven't been seen yet
if ( not exists $seen{$element} ) {
# add current element into ordered array
push #ordered, $element;
}
# Increment the number of time element have been seen
$seen{$element}++;
}
At the end, #ordered will contain:
(3, 4, 1, 7, 8)
A better name should be #unique instead of #ordered.
%seen will contain:
(3 => 2, 4 => 3, 1 => 2, 7 => 2, 8 => 1)