how to solve floating point errors in matlab - matlab

I have a question that I somehow surprising cannot find an answer to. The issue is about floating point errors. It is not a "why does a==b give 0" question, but rather if there are a way to fix floating point errors. The part of the code where I am right now is sensitive and I need to find a way to solve floating point errors. I have tried with
round(100000*myDouble)/100000;
and with
double(int64(100000*myDouble))/100000
but the output does still have floating point errors for some numbers (not all of them, but a few is enough to mess up my code). The problem is that the function I use in matlab is a polygon clipper which I use to calculate the union of many polygons. The function looks for common points and if there are even a small difference, this will mess everything up. The problem should really not be a problem anyway since it is a union and partly overlapping polygons should not cause trouble. However due to some issues with the function I need to make sure to not have these overlaps. The function works really fine in most cases but to speed up I have added a vector of nan separated polygons with holes and since the function is not expected to handle cases like this there are sometimes problems.
There are no point to use int64 for the calculation instead since the function end up with a mex file and is thus not possible to make it work for int64.

There is not a single answer to solve all floating point precision issues. Some strategies are:
Usage of vpa (or symbolic variables).
Make the errors deterministic. If the same formula is used to calculate the same number twice, the result should be equal.
Don't round using decimal factors. Use 2^x instead. In this case, try 2^16 instead of 100000. This way you round the fraction and keep the exponent.

Related

How to turn off denormal number support in MATLAB?

I am trying to turn off denormal number support in matlab, so that basically any two computations that would result in a denormal number would instead just result in zero (DAZ, FTZ)
I've researched several sites include the one below, but I haven't found anything about doing this.
http://blogs.mathworks.com/cleve/2014/07/21/floating-point-denormals-insignificant-but-controversial-2/
I've never heard of such an option in Matlab. It would likely require deep manipulation of a lot of the floating-point math, effectively requiring a new datatype to be supported if this were to be an easily toggle-able option in Matlab. You could write your own mex C code to do this (more here and here) for an individual function.
And of course you can get something like this with one line of Matlab – here's an example:
a = [1e-300 1e-310 1e-310];
b = [1e-301 1e-311 1e-310];
x = a-b;
x(abs(x(:)) < realmin(class(x))) = 0;
where realmin is the smallest normalized floating-point number. However, the floating point math is still performed using the extended denormal/subnormal values in a. It's just the output that's clipped to zero.
Unless you're doing this for fun an experimentation, or possibly running code on an embedded platform, I'd really recommend against disabling denormals as a form of optimization. Instead, focus on why your values are so small and how you might rescale your problem to avoid the issue entirely.

Getting around floating point error with logarithms?

I'm trying to write a basic digit counter (an integer is inputted and the number of digits of that integer is outputted) for positive integers. This is my general formula:
dig(x) := Math.floor(Math.log(x,10))
I tried implementing the equivalent of dig(x) in Ruby, and found that when I was computing dig(1000) I was getting 2 instead of 3 because Math.log was returning 2.9999999999999996 which would then be truncated down to 2. What is the proper way to handle this problem? (I'm assuming this problem can occur regardless of the language used to implement this approach, but if that's not the case then please explain that in your answer).
To get an exact count of the number of digits in an integer, you can do the usual thing: (in C/C++, assuming n is non-negative)
int digits = 0;
while (n > 0) {
n = n / 10; // integer division, just drops the ones digit and shifts right
digits = digits + 1;
}
I'm not certain but I suspect running a built-in logarithm function won't be faster than this, and this will give you an exact answer.
I thought about it for a minute and couldn't come up with a way to make the logarithm-based approach work with any guarantees, and almost convinced myself that it is probably a doomed pursuit in the first place because of floating point rounding errors, etc.
From The Art of Computer Programming volume 2, we will eliminate one bit of error before the floor function is applied by adding that one bit back in.
Let x be the result of log and then do x += x / 0x10000000 for a single precision floating point number (C's float). Then pass the value into floor.
This is guaranteed to be the fastest (assuming you have the answer in numerical form) because it uses only a few floating point instructions.
Floating point is always subject to roundoff error; that's one of the hazards you need to be aware of, and actively manage, when working with it. The proper way to handle it, if you must use floats is to figure out what the expected amount of accumulated error is and allow for that in comparisons and printouts -- round off appropriately, compare for whether the difference is within that range rather than comparing for equality, etcetera.
There is no exact binary-floating-point representation of simple things like 1/10th, for example.
(As others have noted, you could rewrite the problem to avoid using the floating-point-based solution entirely, but since you asked specifically about working log() I wanted to address that question; apologies if I'm off target. Some of the other answers provide specific suggestions for how you might round off the result. That would "solve" this particular case, but as your floating operations get more complicated you'll have to continue to allow for roundoff accumulating at each step and either deal with the error at each step or deal with the cumulative error -- the latter being the more complicated but more accurate solution.)
If this is a serious problem for an application, folks sometimes use scaled fixed point instead (running financial computations in terms of pennies rather than dollars, for example). Or they use one of the "big number" packages which computes in decimal rather than in binary; those have their own round-off problems, but they round off more the way humans expect them to.

Numerical Integral of large numbers in Fortran 90

so I have the following Integral that i need to do numerically:
Int[Exp(0.5*(aCosx + bSinx + cCos2x + dSin2x))] x=0..2Pi
The problem is that the output at any given value of x can be extremely large, e^2000, so larger than I can deal with in double precision.
I havn't had much luck googling for the following, how do you deal with large numbers in fortran, not high precision, i dont care if i know it to beyond double precision, and at the end i'll just be taking the log, but i just need to be able to handle the large numbers untill i can take the log..
Are there integration packes that have the ability to handle arbitrarily large numbers? Mathematica clearly can.. so there must be something like this out there.
Cheers
This is probably an extended comment rather than an answer but here goes anyway ...
As you've already observed Fortran isn't equipped, out of the box, with the facility for handling such large numbers as e^2000. I think you have 3 options.
Use mathematics to reduce your problem to one which does (or a number of related ones which do) fall within the numerical range that your Fortran compiler can compute.
Use Mathematica or one of the other computer algebra systems (eg Maple, SAGE, Maxima). All (I think) of these can be integrated into a Fortran program (with varying degrees of difficulty and integration).
Use a library for high-precision (often called either arbitray-precision or multiple-precision too) arithmetic. Your favourite search engine will turn up a number of these for you, some written in Fortran (and therefore easy to integrate), some written in C/C++ or other languages (and therefore slightly harder to integrate). You might start your search at Lawrence Berkeley or the GNU bignum library.
(Yes I know that I wrote that you have 3 options, but your question suggests that you aren't ready to consider this yet) You could write your own high-/arbitrary-/multiple-precision functions. Fortran provides everything you need to construct such a library, there is a lot of work already done in the field to learn from, and it might be something of interest to you.
In practice it generally makes sense to apply as much mathematics as possible to a problem before resorting to a computer, that process can not only assist in solving the problem but guide your selection or construction of a program to solve what's left of the problem.
I agree with High Peformance Mark that the best option here numerically is to use analytics to scale or simplify the result first.
I will mention that if you do want to brute force it, gfortran (as of 4.6, with the libquadmath library) has support for quadruple precision reals, which you can use by selecting the appropriate kind. As long as your answers (and the intermediate results!) don't get too much bigger than what you're describing, that may work, but it will generally be much slower than double precision.
This requires looking deeper at the problem you are trying to solve and the behavior of the underlying mathematics. To add to the good advice already provided by Mark and Jonathan, consider expanding the exponential and trig functions into Taylor series and truncating to the desired level of precision.
Also, take a step back and ask why you are trying to accomplish by calculating this value. As an example, I recently had to debug why I was getting outlandish results from a property correlation which was calculating vapor pressure of a fluid to see if condensation was occurring. I spent a long time trying to understand what was wrong with the temperature being fed into the correlation until I realized the case causing the error was a simulation of vapor detonation. The problem was not in the numerics but in the logic of checking for condensation during a literal explosion; physically, a condensation check made no sense. The real problem was the code was asking an unnecessary question; it already had the answer.
I highly recommend Forman Acton's Numerical Methods That (Usually) Work and Real Computing Made Real. Both focus on problems like this and suggest techniques to tame ill-mannered computations.

max likelihood fminsearch

I used Matlab-fminsearch for a negativ max likelihood model for a binomial distributed function. I don't get any error notice, but the parameter which I want to estimate, take always the start value. Apparently, there is a mistake. I know that I ask a totally general question. But is it possible that anybody had the same mistake and know how to deal with it?
Thanks a lot,
#woodchips, thank you a lot. Step by step, I've tried to do what you advised me. First of all, I actually maximized (-log(likelihood)) and this is not the problem. I think I found out the problem but I still have some questions, if I don't bother you. I have a model(param) to maximize in paramstart=p1. This model is built for (-log(likelihood(F))) and my F is a vectorized function like F(t,Z,X,T,param,m2,m3,k,l). I have a data like (tdata,kdata,ldata),X,T are grids and Z is a function on this grid and (m1,m2,m3) are given parameters.When I want to see the value of F(tdata,Z,X,T,m1,m2,m3,kdata,ldata), I get a good output. But I think fminsearch accept that F(tdata,Z,X,T,p,m2,m3,kdata,ldata) like a constant and thatswhy I always have as estimated parameter the start value. I will be happy, if you have any advise to tweak that.
You have some options you can try to tweak. I'd start with algorithm.
When the function value practically doesn't change around your startpoint it's also problematic. Maybe switching to log-likelyhood helps.
I always use fminunc or fmincon. They allow also providing the Hessian (typically better than "estimated") or 'typical values' so the algorithm doesn't spend time in unfeasible regions.
It is virtually always true that you should NEVER maximize a likelihood function, but ALWAYS maximize the log of that function. Floating point issues will almost always corrupt the problem otherwise. That your optimization starts and stops at the same point is a good indicator this is the problem.
You may well need to dig a little deeper than the above, but even so, this next test is the test I recommend that all users of optimization tools do for every one of their problems, BEFORE they throw a function into an optimizer. Evaluate your objective for several points in the vicinity. Does it yield significantly different values? If not, then look to see why not. Are you creating a non-smooth objective to optimize, or a zero objective? I.e., zero to within the supplied tolerances?
If it does yield different values but still not converge, then make sure you know how to call the optimizer correctly. Yeah, right, like nobody has ever made this mistake before. This is actually a very common cause of failure of optimizers.
If it does yield good values that vary, and you ARE calling the optimizer correctly, then think if there are regions into which the optimizer is trying to diverge that yield garbage results. Is the objective generating complex or imaginary results?

What values to use in my 3D-space

This is not really a functional problem I'm having but more a strategic question. I am new to 3D-programming and when looking at tutorials and examples I recon that the coordinates are usually between -1 and 1.
It feels more natural using integers as coordinates, I think. Is there any particula reason(s) why small float-values are used, perhaps performance or anything else?
I haven't gotten that far yet so perhaps this questions is a bit too early to ask, but when creating objects/textures that I will import, they are created in applications where the coordinates usually are having sizes in integer numbers, I guess (E.g. Photoshop for textures). Doesn't this matter for how I define my x/y/z-sizes?
Thanks in advance!
I've never seen such small ranges used. This is likely to introduce problems in calculations I would say.
A more common style is to use a real-world scale, so 1 unit = 1 metre. And using floating-point values is more realistic - you need fractional values because when you rotate something, the new coordinates will nearly always be non integral. Using integers you'll run into problems of scale and precision.