Does Fortran have inherent limitations on numerical accuracy compared to other languages? - perl

While working on a simple programming exercise, I produced a while loop (DO loop in Fortran) that was meant to exit when a real variable had reached a precise value.
I noticed that due to the precision being used, the equality was never met and the loop became infinite. This is, of course, not unheard of and one is advised that, rather than comparing two numbers for equality, it is best see if the absolute difference between two numbers is less than a set threshold.
What I found disappointing was how low I had to set this threshold, even with variables at double precision, for my loop to exit properly. Furthermore, when I rewrote a "distilled" version of this loop in Perl, I had no problems with numerical accuracy and the loop exited fine.
Since the code to produce the problem is so small, in both Perl and Fortran, I'd like to reproduce it here in case I am glossing over an important detail:
Fortran Code
PROGRAM precision_test
IMPLICIT NONE
! Data Dictionary
INTEGER :: count = 0 ! Number of times the loop has iterated
REAL(KIND=8) :: velocity
REAL(KIND=8), PARAMETER :: MACH_2_METERS_PER_SEC = 340.0
velocity = 0.5 * MACH_2_METERS_PER_SEC ! Initial Velocity
DO
WRITE (*, 300) velocity
300 FORMAT (F20.8)
IF (count == 50) EXIT
IF (velocity == 5.0 * MACH_2_METERS_PER_SEC) EXIT
! IF (abs(velocity - (5.0 * MACH_2_METERS_PER_SEC)) < 1E-4) EXIT
velocity = velocity + 0.1 * MACH_2_METERS_PER_SEC
count = count + 1
END DO
END PROGRAM precision_test
Perl Code
#! /usr/bin/perl -w
use strict;
my $mach_2_meters_per_sec = 340.0;
my $velocity = 0.5 * $mach_2_meters_per_sec;
while (1) {
printf "%20.8f\n", $velocity;
exit if ($velocity == 5.0 * $mach_2_meters_per_sec);
$velocity = $velocity + 0.1 * $mach_2_meters_per_sec;
}
The commented-out line in Fortran is what I would need to use for the loop to exit normally. Notice that the threshold is set to 1E-4, which I feel is quite pathetic.
The names of the variables come from the self-study-based programming exercise I was performing and don't have any relevance.
The intent is that the loop stops when the velocity variable reaches 1700.
Here are the truncated outputs:
Perl Output
170.00000000
204.00000000
238.00000000
272.00000000
306.00000000
340.00000000
...
1564.00000000
1598.00000000
1632.00000000
1666.00000000
1700.00000000
Fortran Output
170.00000000
204.00000051
238.00000101
272.00000152
306.00000203
340.00000253
...
1564.00002077
1598.00002128
1632.00002179
1666.00002229
1700.00002280
What good is Fortran's speed and ease of parallelization if its accuracy stinks? Reminds me of the three ways to do things:
The Right Way
The Wrong Way
The Max Power Way
"Isn't that just the wrong way?"
"Yeah! But faster!"
All kidding aside, I must be doing something wrong.
Does Fortran have inherent limitations on numerical accuracy compared to other languages, or am I (quite likely) the one at fault?
My compiler is gfortran (gcc version 4.1.2), Perl v5.12.1, on a Dual Core AMD Opteron # 1 GHZ.

Your assignment is accidentally converting the value to single precision and then back to double.
Try making your 0.1 * be 0.1D0 * and you should see your problem fixed.

As already answered, "plain" floating point constants in Fortran will default to the default real type, which will likely be single-precision. This is an almost classic mistake.
Also, using "kind=8" is not portable -- it will give you double precision with gfortran, but not with some other compilers. The safe, portable way to specify precisions for both variables and constants in Fortran >= 90 is to use the intrinsic functions, and request the precision that you need. Then specify "kinds" on the constants where precision is important. A convenient method is to define your own symbols. For example:
integer, parameter :: DR_K = selected_real_kind (14)
REAL(DR_K), PARAMETER :: MACH_2_METERS_PER_SEC = 340.0_DR_K
real (DR_K) :: mass, velocity, energy
energy = 0.5_DR_K * mass * velocity**2
This can also be important for integers, e.g., if large values are needed. For related questions for integers, see Fortran: integer*4 vs integer(4) vs integer(kind=4) and Long ints in Fortran

Related

Matrix Multiplication Simplification not giving same results in code

I want to evaluate the expression xTAx -2yTAx+yTAy in Matlab.
If my calculations are correct, this should be equivalent to solving(x-y)TA(x-y)
My main problem right now is that I am not getting the same results from the above two expressions. I'm wondering if it's a precision error or a conceptual error.
I tried a very simple example with some random values to check my math was right. The mini example I used is as follows
A = [1,2,0;2,1,2;0,2,1];
x = [1;2;3];
y = [4;4;4];
(transpose(x)*A*x)-(2*transpose(y)*A*x)+(transpose(y)*A*(y))
transpose(x-y)*A*(x-y)
Both give 46.
I then tried a more realistic example of values for what I'm doing and it failed. For example
A = [2.66666666666667,-0.333333333333333,0,-0.333333333333333,-0.333333333333333,0,0,0,0;
-0.333333333333333,2.66666666666667,-0.333333333333333,-0.333333333333333,-0.333333333333333,-0.333333333333333,0,0,0;
0,-0.333333333333333,2.66666666666667,0,-0.333333333333333,-0.333333333333333,0,0,0;
-0.333333333333333,-0.333333333333333,0,2.66666666666667,-0.333333333333333,0,-0.333333333333333,-0.333333333333333,0;
-0.333333333333333,-0.333333333333333,-0.333333333333333,-0.333333333333333,2.66666666666667,-0.333333333333333,-0.333333333333333,-0.333333333333333 -0.333333333333333;
0,-0.333333333333333,-0.333333333333333,0,-0.333333333333333,2.66666666666667,0,-0.333333333333333,-0.333333333333333;
0,0,0,-0.333333333333333,-0.333333333333333,0,2.66666666666667,-0.333333333333333,0;
0,0,0,-0.333333333333333,-0.333333333333333,-0.333333333333333,-0.333333333333333,2.66666666666667,-0.333333333333333;
0,0,0,0,-0.333333333333333,-0.333333333333333,0,-0.333333333333333,2.66666666666667];
x =[1.21585420370805;
1.00388159497757e-16;
-0.405284734569351;
1.36809776609658e-16;
-1.04796659533634e-17;
-7.52459042423650e-17;
-0.607927101854027;
-8.49163704356314e-17;
0.303963550927013];
v =[0.0319067068305797,0.00786616506360615,0.0709811622828447,0.0719615328671117;
1.26150800194897e-17,5.77584497721720e-18,7.89740111567879e-18,7.14396333930938e-18;
-0.158358815125228,-0.876275686098803,0.0539216716399594,0.0450616819309899;
7.90937837037453e-18,3.24196519177793e-18,3.99402664932776e-18,4.17486202509670e-18;
5.35533279761622e-18,-8.91723780863019e-19,-1.56128480212603e-18,1.84423677629470e-19;
-2.18478057810330e-17,-6.63779320738873e-18,-3.21099714760257e-18,-3.93612240449303e-18;
-0.0213963226092681,-0.0168256143048771,-0.0175695110350900,-0.0128155908603601;
-4.06029341772399e-18,-5.65705978843172e-18,-1.80182480882056e-18,-1.59281757789645e-18;
0.221482525259710,-0.0576644539359728,0.0163934384910353,0.0197432976432437];
u = [1.37058079022968;
1.79302486419321;
69.4330887239959;
-52.3949662214410];
y = v*u;
Gives 0 for the first expression and 7.1387e-28 for the second. Is this a precision error thing? Which version is better/ more accurate to use and why? Thank you!
I have not checked your simplification, but floating-point math in general is inexact. Even doing the same operations in a different order can give you different results. With the deviations you are seeing I would believe they could reasonably come from inexactness in floating-point math and it is up to you to decide whether the deviations are acceptable for your purpose.
To determine which version is more accurate you would have to compute reference values to compare to, which might prove difficult.

Arima antipersistence

I’m running RStudio Version 1.1.419 with R-3.4.3 on Windows 10. I am trying to fit an (f)arima model and setting the fractional differencing parameter during the optimization process to be between (-0.5,0.5), i.e. allowing for antipersistence (d < 0), short memory (d = 0) and long memory (d > 0). I have tried multiple functions to accomplish that. I am aware that the default of fracdiff$drange is (0,0.5). Therefore this ...
> result <- fracdiff(MeanPrice, nar = 2, nma = 1, drange = c(-0.5,0.5))
sadly returns this..
Warning: C fracdf() optimization failure
Warning message: unable to compute correlation matrix; maybe change 'h'
Is there a way to fit fracdiff or other models (maybe arfima::arfima()?) with that drange? Your help is very much appreciated.
If you look at the package documentation, it states that the h argument for fracdiff "is used to compute a finite difference approximation to the Hessian, and
hence only influences the cov, cor, and std.error computations." However, as they are referring to the Hessian, I would assume that this affects the results of the MLE. There are other functions in that package that may be helpful: fdGHP for estimating the order of fractional differencing based on the Geweke and Porter-Hudak method, and similarly fdSperio.
Take a look at the forecast package. If you estimate the order of fractional differencing using the above mentioned functions, you might be able to use the same method described in the details of the arfima function.

SciPy.optimize.least_squares() Objective Function Questions

I am trying to minimize a highly non-linear function by optimizing three unknown parameters a, b, and c0. I'm attempting to replicate some governing equations of a casino roulette ball in Python 3.
Here is the link to the research paper:
http://www.dewtronics.com/tutorials/roulette/documents/Roulette_Physik.pdf
I will be referencing equations (35) and (40) in the paper.
Basically, I take stopwatch lap measurements of the roulette ball spinning on the wheel. For each successive lap, the lap time will increase because of losses of momentum to non-conservative forces of friction. Then I take these time measurements and fit equation (35) using a Levenberg-Marquardt least squares method in equation (40).
My question is twofold:
(1) I'm using the scipy.optimize.least_squares() method='lm', and I'm not sure how to write the objective function! Right now I have the function written exactly as is in the paper:
def fall_time(k,a,b,c0):
F = (1 / (a * b)) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
return F
def parameter_estimation_function(x0,tk):
a = x0[0]
b = x0[1]
c0 = x0[2]
S = 0
for i,t in enumerate(tk):
k = i + 1
S += (t - fall_time(k,a,b,c0))**2
return [S,1,1]
sol = least_squares(parameter_estimation_function,[0.1,0.8,-0.1],args=([tk1]),method='lm',jac='2-point',max_nfev=2000)
print(sol)
Now, in the documentation examples, I never saw the objective function written the way I have it. In the documentation, the objective function is always returns the residual, not the square of the residual. Additionally, in the documentation they never use the sum! So I'm wondering if the sum and the square are automatically handled under the hood of least_squares()?
(2) Perhaps my second question is a result of my failure to understand how to write the objective function. But anyhow, I'm having trouble getting the algorithm to converge on the minimum. I know this is because the levenberg alogrithm is "greedy" and stops near the closest minima, but I figured that I would be able to at least converge on about the same result given different initial guesses. With slight alterations in the initial guess, I'm getting parameter results with different signs. Additionally, I've yet to find a combination of initial guesses that allows the algo to converge! It always times out before it finds the solution. I've even increased the amount of function evaluations to 10,000 to see if it would. To no avail!
Perhaps somebody could shed some light on my mistakes here! I'm still relatively new to python and the scipy library!
Here is some sample data for tk that I've measured myself from the video here: https://www.youtube.com/watch?v=0Zj_9ypBnzg
tk = [0.52,1.28,2.04,3.17,4.53,6.22]
tk1 = [0.51,1.4,2.09,3,4.42,6.17]
tk2 = [0.63,1.35,2.19,3.02,4.57,6.29]
tk3 = [0.63,1.39,2.23,3.28,4.70,6.32]
tk4 = [0.57,1.4,2.1,3.06,4.53,6.17]
Thanks
1) Yes, as you suspected the sum and the square of the residuals are automatically handled.
2) Hard to say, since I'm not deeply familiar with the problem (e.g., how many local minima exist, what constitutes a 'reasonable' result, etc.). I may investigate more later.
But for kicks I fiddled with some of the values to see what would happen. For example, you can just replace the 1/b constant with a standalone variable b_inv, and this seemed to stabilize the results quite a bit. Here's the code I used to check results. (Note that I rewrote the objective function for brevity. It simply leverages the element-wise operations of numpy arrays, without changing the overall result.)
import numpy as np
from scipy.optimize import least_squares
def fall_time(k,a,b_inv,c0):
return (b_inv / a) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
def parameter_estimation_function(x,tk):
return np.asarray(tk) - fall_time(k=np.arange(1,len(tk)+1), a=x[0],b_inv=x[1],c0=x[2])
tk_samples = [
[0.52,1.28,2.04,3.17,4.53,6.22],
[0.51,1.4,2.09,3,4.42,6.17],
[0.63,1.35,2.19,3.02,4.57,6.29],
[0.63,1.39,2.23,3.28,4.70,6.32],
[0.57,1.4,2.1,3.06,4.53,6.17]
]
for i in range(len(tk_samples)):
sol = least_squares(parameter_estimation_function,[0.1,1.25,-0.1],
args=(tk_samples[i],),method='lm',jac='2-point',max_nfev=2000)
print(sol.x)
with console output:
[ 0.03621789 0.64201913 -0.12072879]
[ 3.59319972e-02 1.17129458e+01 -6.53358716e-03]
[ 3.55516005e-02 1.48491493e+01 -5.31098257e-03]
[ 3.18068316e-02 1.11828091e+01 -7.75329834e-03]
[ 3.43920725e-02 1.25160378e+01 -6.36307506e-03]

Logical indexing and double precision numbers

I am trying to solve a non-linear system of equations using the Newton-Raphson iterative method, and in order to explore the parameter space of my variables, it is useful to store the previous solutions and use them as my first initial guess so that I stay in the basin of attraction.
I currently save my solutions in a structure array that I store in a .mat file, in about this way:
load('solutions.mat','sol');
str = struct('a',Param1,'b',Param2,'solution',SolutionVector);
sol=[sol;str];
save('solutions.mat','sol');
Now, I do another run, in which I need the above solution for different parameters NewParam1 and NewParam2. If Param1 = NewParam1-deltaParam1, and Param2 = NewParam2 - deltaParam2, then
load('solutions.mat','sol');
index = [sol.a]== NewParam1 - deltaParam1 & [sol.b]== NewParam2 - deltaParam2;
% logical index to find solution from first block
SolutionVector = sol(index).solution;
I sometimes get an error message saying that no such solution exists. The problem lies in the double precisions of my parameters, since 2-1 ~= 1 can happen in Matlab, but I can't seem to find an alternative way to achieve the same result. I have tried changing the numerical parameters to strings in the saving process, but then I ran into problems with logical indexing with strings.
Ideally, I would like to avoid multiplying my parameters by a power of 10 to make them integers as this will make the code quite messy to understand due to the number of parameters. Other than that, any help will be greatly appreciated. Thanks!
You should never use == when comparing double precision numbers in MATLAB. The reason is, as you state in the the question, that some numbers can't be represented precisely using binary numbers the same way 1/3 can't be written precisely using decimal numbers.
What you should do is something like this:
index = abs([sol.a] - (NewParam1 - deltaParam1)) < 1e-10 & ...
abs([sol.b] - (NewParam2 - deltaParam2)) < 1e-10;
I actually recommend not using eps, as it's so small that it might actually fail in some situations. You can however use a smaller number than 1e-10 if you need a very high level of accuracy (but how often do we work with numbers less than 1e-10)?

Matlab not accepting whole number as index

I am using a while loop with an index t starting from 1 and increasing with each loop.
I'm having problems with this index in the following bit of code within the loop:
dt = 100000^(-1);
t = 1;
equi = false;
while equi==false
***some code that populates the arrays S(t) and I(t)***
t=t+1;
if (t>2/dt)
n = [S(t) I(t)];
np = [S(t-1/dt) I(t-1/dt)];
if sum((n-np).^2)<1e-5
equi=true;
end
end
First, the code in the "if" statement is accessed at t==200000 instead of at t==200001.
Second, the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005 .
I guess I can solve this using "round", but this worked before and suddenly doesn't work and I'd like to figure out why.
Thanks!
the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005
Is it really? ;)
mod(200000 - 1/dt, 1)
%ans = 1.455191522836685e-11
Your index is not an integer. This is one of the things to be aware of when working with floating point arithmetic. I suggest reading this excellent resource: "What every computer scientist should know about floating-point Arithmetic".
You can either use round as you did, or store 1/dt as a separate variable (many options exist).
Matlab is lying to you. You're running into floating point inaccuracies and Matlab does not have an honest printing policy. Try printing the numbers with full precision:
dt = 100000^(-1);
t = 200000;
fprintf('2/dt == %.12f\n',2/dt) % 199999.999999999971
fprintf('t - 1/dt == %.12f\n',t - 1/dt) % 100000.000000000015
While powers of 10 are very nice for us to type and read, 1e-5 (your dt) cannot be represented exactly as a floating point number. That's why your resulting calculations aren't coming out as even integers.
The statement
S(t-1/dt)
can be replaced by
S(uint32(t-1/dt))
And similarly for I.
Also you might want to save 1/dt hardcoded as 100000 as suggested above.
I reckon this will improve the comparison.