How can I work around a round-off error that causes an infinite loop in Perl's Statistics::Descriptive? - perl

I'm using the Statistics::Descriptive library in Perl to calculate frequency distributions and coming up against a floating point rounding error problem.
I pass in two values, 0.205 and 0.205, (taken from other numbers and sprintf'd to those) to the stats module and ask it to calculate the frequency distribution but it's getting stuck in an infinite loop.
Stepping through with a debugger I can see that it's doing:
my $interval = $self->{sample_range}/$partitions;
my $iter = $self->{min};
while (($iter += $interval) < $self->{max}) {
$bins{$iter} = 0;
push #k, $iter; ##Keep the "keys" unstringified
}
$self->sample_range (The range is max-min)is returning 2.77555756156289e-17 rather than 0 as I'd expect. This means that the loop ((min+=range) < max)) enters a (for all intents and purposes) infinite loop.
DB<8> print $self->{max};
0.205
DB<9> print $self->{min};
0.205
DB<10> print $self->{max}-$self->{min};
2.77555756156289e-17
So this looks like a rounding problem. I can't think how to fix this on my side though, and I'm not sure editing the library is a good idea. I'm looking for suggestions of a workaround or alternative.
Cheers,
Neil

I am the Statistics::Descriptive maintainer. Due to its numeric nature, many rounding problems have been reported. I believe this particular one was fixed in a later version to the one you were using that I released recently, by using multiplication for the divisions instead of +=.
Please use the most up-to-date version from the CPAN, and it should be better.

Not exactly a rounding problem; you can see the more precise values with something like
printf("%.18g %.18g", $self->{max}, $self->{min});
Looks to me like there's a flaw in the module where it assumes the sample range can be divided up into $partitions pieces; because floating point doesn't have infinite precision, this isn't always possible. In your case, the min and max values are exactly adjacent representable values, so there can't be more than one partition. I don't know what exactly the module is using the partitions for, so I'm not sure what the impact of this may be.
Another possible problem in the module is that it is using numbers as hash keys, which
implicitly stringifies them which slightly rounds the value.
You may have some success in laundering your data through stringization before feeding it
to the module:
$data = 0+"$data";
This will at least ensure that two numbers that (with the default printing precision) appear equal are actually equal.

That shouldn't cause an infinite loop. What would cause that loop to be infinite would be if $self->{sample_range}/$partitions is 0.

Related

I've been trying to apply the polyRoots function with "TI-nspire cx" but it keeps returning an empty value, what is happening?

I've been able to use the polyRoots function since day one, but all of a sudden it stopped working and it keeps returning an empty result,does someone know what is happening? Thank you in advance! This is the return I get, I´ve put a very simple one so that I know it's not an input problem
The function polyRoots only returns the real-valued zeros of a polynomial. Instead, you can use cPolyRoots to include complex-valued zeros, which will -- in your case -- look something like this:
cPolyRoots(3*x^2+4*x+5,x)
{-2/3-sqrt(11)/3*i,-2/3+sqrt(11)/3*i}
In case you're not familiar with complex numbers, here's a Wikipedia article regarding them. Basically, i=sqrt(-1), which is a number that isn't present anywhere on the real number line; the real number line only includes negative infinity through positive infinity.

Matlab not taking last number in array?

I've got a Subscripted assignment dimension mismatch problem.
I've already localized the issue, and know exactly what is going on, I just don't know why.
Here's the problematic piece of code:
mFrames(:,i) = vSignal(round(start:1:frameLength*samplingRate));
start=start+frameShift*samplingRate;
frameLength = frameLength+frameShift;
I've already checked what's going on in debugmode; usually my resulting column length of mFrames is 128, this stays the same until i=1004. Then, my column length changes to 127.
I've checked the values involved while in debug mode, and it simply does not make sense what is going on. At i==1004 start=32097 and frameLength*samplingRate=32224.
That's a difference of 127 meaning 128 points, that should work.
BUT when i assign a vector A=round(start:1:frameLength*samplingRate)
OR B=start:1:frameLength*samplingRate
In both cases I get a vector going from 32097 to 32223. This ALTHOUGH when I give in frameLength*samplingRate matlab is giving me 32224.
In other words, matlab is telling me it's using one number, but when I test I find it's using a different one.
Any help appreciated.
I suspect your 32224 is not actually 32224. MATLAB's default format only displays so many decimal places, so when dealing with floating point numbers, what is printed on screen is not necessarily the "exact" value.
Let's go back a step and look at how the synatx x = start:step:end works.
1:1:10 should give us numbers in steps of 1 from 1 to 10. Fair enough, that makes sense. What if we set the end value to something that's slightly above 10?
e.g.:
1:1:10.1
Well, it still gives us 1:1:10, (or 1:10, 1 being the default step) because we can't have values higher than the end-point, so 11 isn't a correct step.
So what about this:
1:1:9.99
Spoiler: it's the same as 1:9
And this?
1:1:9.9999999
Yep, still 1:9
But if we do this:
a = 9.9999999;
Then with default format, the value of a will be shown on the command line and in your list of workspace variables as 10.0000.
Now, if frameLength and samplingRate are both stored as floating point numbers, it's possible that the number you see as 32224 is not 32224 but very slightly below that. You can check this by changing your default format - e.g. format long at the command line - to show more decimal places.
The simplest solution is probably to do something like:
B=start:1:round(frameLength*samplingRate)
Or try to store the relevant values as integers (e.g., uint32).

Logical indexing and double precision numbers

I am trying to solve a non-linear system of equations using the Newton-Raphson iterative method, and in order to explore the parameter space of my variables, it is useful to store the previous solutions and use them as my first initial guess so that I stay in the basin of attraction.
I currently save my solutions in a structure array that I store in a .mat file, in about this way:
load('solutions.mat','sol');
str = struct('a',Param1,'b',Param2,'solution',SolutionVector);
sol=[sol;str];
save('solutions.mat','sol');
Now, I do another run, in which I need the above solution for different parameters NewParam1 and NewParam2. If Param1 = NewParam1-deltaParam1, and Param2 = NewParam2 - deltaParam2, then
load('solutions.mat','sol');
index = [sol.a]== NewParam1 - deltaParam1 & [sol.b]== NewParam2 - deltaParam2;
% logical index to find solution from first block
SolutionVector = sol(index).solution;
I sometimes get an error message saying that no such solution exists. The problem lies in the double precisions of my parameters, since 2-1 ~= 1 can happen in Matlab, but I can't seem to find an alternative way to achieve the same result. I have tried changing the numerical parameters to strings in the saving process, but then I ran into problems with logical indexing with strings.
Ideally, I would like to avoid multiplying my parameters by a power of 10 to make them integers as this will make the code quite messy to understand due to the number of parameters. Other than that, any help will be greatly appreciated. Thanks!
You should never use == when comparing double precision numbers in MATLAB. The reason is, as you state in the the question, that some numbers can't be represented precisely using binary numbers the same way 1/3 can't be written precisely using decimal numbers.
What you should do is something like this:
index = abs([sol.a] - (NewParam1 - deltaParam1)) < 1e-10 & ...
abs([sol.b] - (NewParam2 - deltaParam2)) < 1e-10;
I actually recommend not using eps, as it's so small that it might actually fail in some situations. You can however use a smaller number than 1e-10 if you need a very high level of accuracy (but how often do we work with numbers less than 1e-10)?

Matlab not accepting whole number as index

I am using a while loop with an index t starting from 1 and increasing with each loop.
I'm having problems with this index in the following bit of code within the loop:
dt = 100000^(-1);
t = 1;
equi = false;
while equi==false
***some code that populates the arrays S(t) and I(t)***
t=t+1;
if (t>2/dt)
n = [S(t) I(t)];
np = [S(t-1/dt) I(t-1/dt)];
if sum((n-np).^2)<1e-5
equi=true;
end
end
First, the code in the "if" statement is accessed at t==200000 instead of at t==200001.
Second, the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005 .
I guess I can solve this using "round", but this worked before and suddenly doesn't work and I'd like to figure out why.
Thanks!
the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005
Is it really? ;)
mod(200000 - 1/dt, 1)
%ans = 1.455191522836685e-11
Your index is not an integer. This is one of the things to be aware of when working with floating point arithmetic. I suggest reading this excellent resource: "What every computer scientist should know about floating-point Arithmetic".
You can either use round as you did, or store 1/dt as a separate variable (many options exist).
Matlab is lying to you. You're running into floating point inaccuracies and Matlab does not have an honest printing policy. Try printing the numbers with full precision:
dt = 100000^(-1);
t = 200000;
fprintf('2/dt == %.12f\n',2/dt) % 199999.999999999971
fprintf('t - 1/dt == %.12f\n',t - 1/dt) % 100000.000000000015
While powers of 10 are very nice for us to type and read, 1e-5 (your dt) cannot be represented exactly as a floating point number. That's why your resulting calculations aren't coming out as even integers.
The statement
S(t-1/dt)
can be replaced by
S(uint32(t-1/dt))
And similarly for I.
Also you might want to save 1/dt hardcoded as 100000 as suggested above.
I reckon this will improve the comparison.

Turn off "smart behavior" in Matlab

There is one thing I do not like on Matlab: It tries sometimes to be too smart. For instance, if I have a negative square root like
a = -1; sqrt(a)
Matlab does not throw an error but switches silently to complex numbers. The same happens for negative logarithms. This can lead to hard to find errors in a more complicated algorithm.
A similar problem is that Matlab "solves" silently non quadratic linear systems like in the following example:
A=eye(3,2); b=ones(3,1); x = A \ b
Obviously x does not satisfy A*x==b (It solves a least square problem instead).
Is there any possibility to turn that "features" off, or at least let Matlab print a warning message in this cases? That would really helps a lot in many situations.
I don't think there is anything like "being smart" in your examples. The square root of a negative number is complex. Similarly, the left-division operator is defined in Matlab as calculating the pseudoinverse for non-square inputs.
If you have an application that should not return complex numbers (beware of floating point errors!), then you can use isreal to test for that. If you do not want the left division operator to calculate the pseudoinverse, test for whether A is square.
Alternatively, if for some reason you are really unable to do input validation, you can overload both sqrt and \ to only work on positive numbers, and to not calculate the pseudoinverse.
You need to understand all of the implications of what you're writing and make sure that you use the right functions if you're going to guarantee good code. For example:
For the first case, use realsqrt instead
For the second case, use inv(A) * b instead
Or alternatively, include the appropriate checks before/after you call the built-in functions. If you need to do this every time, then you can always write your own functions.