Matlab and fortran precision [duplicate] - matlab

This question already has answers here:
Different precision in C++ and Fortran
(2 answers)
Precision not respected
(1 answer)
Closed 1 year ago.
I am running a code in Matlab and fortran 90 but I get different results althought the codes are the same. Is this due to different precisions in the languages?
My code is posted below
XSTART = 2.0;
EPSA = 1.0;
EPSW = 80.0;
BULK_STRENGTH = 9.42629*1.0;
KAPPA = 8.486902807*BULK_STRENGTH/EPSW;
VK = sqrt(KAPPA);
EC2_KBT = (332.06364/0.5921830)*1.0;
F1 = 1.1682185947500601;
UNC1 = F1 - ((EC2_KBT*1.0)/(EPSA*XSTART));
FREE_ENERGY = (0.50)*(1.0)*(UNC1)*(0.5921830*1.0);
ORIGINAL_FE = (0.50)*(1.0^2)*(332.06364)*(0.50)* ...
(1.0/((VK*XSTART+1.0)*EPSW) - 1.0/EPSA)
abs(FREE_ENERGY-ORIGINAL_FE);
for ORIGINAL_FE, I get -82.670010385176070 in matlab but -82.670007683885615 in fortran 90 and I am not sure why. My fortran code is posted below (you still get the results I had using implicit double precision (A-H,O-Z)
PROGRAM MIB_HDM
IMPLICIT real*8 (A-H,O-Z)
REAL*8 :: EPSW,VK,XSTART,
REAL*8 :: EC2_KBT,KAPPA
REAL*8 :: UNC1,BULK_STRENGTH
REAL*8 :: ORIGINAL_FE
REAL*8 :: EPSA
XSTART = 2.0
EPSA = 1.0
EPSW = 80.0
BULK_STRENGTH = 9.42629*1.0
KAPPA = 8.486902807*BULK_STRENGTH/EPSW
VK = sqrt(KAPPA)
EC2_KBT = (332.06364/0.5921830)*1.0
F1 = 1.1682217268107287
UNC1 = F1 - ((EC2_KBT*1.0)/(EPSA*XSTART))
FREE_ENERGY = (0.50)*(1.0)*(UNC1)*(0.5921830)
ORIGINAL_FE = (0.50)*(1.0**2)*(332.06364)*(0.50)* &
(1.0/((VK*XSTART+1.0)*EPSW) - 1.0/EPSA)
print *, original_fe
end program

Thanks so much guys for your help. I figured out the answer to my question. Below are my matlab and fortran codes to get the same results. The fortran code is the first one and the matlab code is the second one. If you run the two codes you should get the same result ( -82.670010385176070). If you uncomment the double(single) expressions in matlab then you have to take out all of the D0's in fortran to get the same result as matlab ( -82.670007683885615).
Fortran Code
PROGRAM MIB_HDM
IMPLICIT real*8 (A-H,O-Z)
REAL*8 :: EPSW,VK,XSTART
REAL*8 :: EC2_KBT,KAPPA
REAL*8 :: UNC1,BULK_STRENGTH
REAL*8 :: ORIGINAL_FE
REAL*8 :: EPSA,F1
XSTART = 2.D0
EPSA = 1.D0
EPSW = 80.D0
BULK_STRENGTH = 9.42629D0
KAPPA = 8.486902807D0*BULK_STRENGTH/EPSW
VK = sqrt(KAPPA)*1.D0
EC2_KBT = (332.06364D0/0.5921830D0)
F1 = 1.1682217268107287D0
UNC1 = F1 - ((EC2_KBT*1.D0)/(EPSA*XSTART))
FREE_ENERGY = (0.5D0)*(1.D0)*(UNC1)*(0.5921830)
ORIGINAL_FE = (0.5D0)*(1.D0**2)*(332.06364D0)*(0.5D0)* &
(1.D0/((VK*XSTART+1.D0)*EPSW) - 1.D0/EPSA)
print *, original_fe
end program
Matlab code
XSTART = 2.0;
EPSA = 1.0;
EPSW = 80.0;
BULK_STRENGTH = 9.42629*1.0;
% BULK_STRENGTH = double(single(BULK_STRENGTH));
c1 = 8.486902807;
% c1 = double(single(c1));
KAPPA = c1*BULK_STRENGTH/EPSW;
VK = sqrt(KAPPA);
% EC2_KBT = (332.06364/0.5921830)*1.0;
% F1 = 1.1682217268107287;
% F1 = double(single(F1))
% UNC1 = F1 - ((EC2_KBT*1.0)/(EPSA*XSTART));
% FREE_ENERGY = (0.50)*(1.0)*(UNC1)*(0.5921830*1.0);
c2 = 332.06364;
% c2 = double(single(c2));
ORIGINAL_FE = (0.50)*(1.0^2)*c2*(0.50)* ...
(1.0/((VK*XSTART+1.0)*EPSW) - 1.0/EPSA)

Related

Numerical Analysis help in MatLab

I am in a numerical analysis class, and I am working on a homework question. This comes Timothy Sauer's Numerical Analysis, and is in the second suggested activity section. I have been talking with my professor about this code, and it seems the error and the approximation are wrong, but neither one of us are able to figure out why. The following code is what I am using, and this is in MatLab. Anyone know enough about Euler Bernoulli beams, and Matlab who can help out?
function ebbeamerror %This is for part three
disp(['tabe of errors at x=L for each n'])
disp([' n ',' Aprox ',' Actual value',' Error'])
disp(['======================================================='])
format bank
for k = 1:11
n = 10*(2^k);
D = sparse(1:n,1:n,6*ones(1,n),n,n);
G = sparse(2:n,1:n-1,-4*ones(1,n-1),n,n);
F = sparse(3:n,1:n-2,ones(1,n-2),n,n);
S = G+D+F+G'+F';
S(1,1) = 16;
S(1,2) = -9;
S(1,3) = 8/3;
S(1,4) = -1/4;
S(2,1) = -4;
S(2,2) = 6;
S(2,3) = -4;
S(2,4) = 1;
S(n-1,n-3)=16/17;
S(n-1,n-2)=-60/17;
S(n-1,n-1)=72/17;
S(n-1,n)=-28/17;
S(n,n-3)=-12/17;
S(n,n-2)=96/17;
S(n,n-1)=-156/17;
S(n,n)=72/17;
E = 1.3e10;
w = 0.3;
d = 0.03;
I = w*d^3/12;
g = -9.81;
f = 480*d*g*w;
h = 2/10;
L = 2;
x = (h^4)*f/(E*I);
x1 = ones(n ,1);
b = x*x1;
size (S);
size(b);
pause
y = S\b;
x=2;
a = (f/(24*E*I))*(x^2)*(x^2-4*L*x+6*L^2);
disp([n y(n) a abs(y(n)-a)])
end
end

Not enough input arguments error in matlab fmincon optimization

I'm new to MATLAB and I need help to solve this optimization problem. Here is the objective function m file:
function f = objfun(x,w1,w2)
w = 6:1:125;
z1 = x(4).*((x(1).*(1i.^2).*w.^2)+(x(5).*1i.*w)+x(3));
z2 = x(1).*x(2).*(1i.^4).*w.^4;
z3 = ((x(1)+x(2)).*x(5).*(1i.^3).*w.^3);
z4 = (1i.^2).*w.^2.*((x(1).*x(3))+(x(2).*x(3))+(x(1).*x(4)));
z5 = 1i.*w.*(x(5).*x(4));
z6 = x(3).*x(4);
z7 = x(4).*(-x(5).*1i.*w-x(3));
z8 = (z7./(z2+z3+z4+z5+z6));
trfs = 0.1.*(w.^2).*(z7./(z2+z3+z4+z5+z6));
trfs2 = 0.1.*(z7./(z2+z3+z4+z5+z6));
trfu = 0.1.*(z1./(z2+z3+z4+z5+z6));
abstrfs = abs(trfs);
abstrfs2 = abs(trfs2);
abstrfu = abs(trfu);
y1 = rms(abstrfs);
y2 = rms(abstrfs2);
y3 = abs(rms(abstrfu)-y2);
f = w1.*(y1.^2)+w2.*(y3.^2);
end
These are the constraints:
function [c,ceq] = confun(x,z8,y1,trfu)
c(1) = y1 - 0.315;
c(2) = abs(z8-trfu) - 0.217;
c(3) = abs(trfu) - 0.07;
c(4) = sqrt(x(3)./x(1)) - 9.425;
ceq = [];
end
Main file
x0 = [510 85 81000 650000 3000];
UB = [764 124 120720 839170 3840];
LB = [509 83 80480 559440 2560];
j = 1;
for i = 0:0.05:1
w1 = i;
w2 = 1-i;
[x,fval] = fmincon(#objfun,x0,[],[],[],[],LB,UB,#confun,[],w1,w2);
w = 6:1:125;
z1 = x(4).*((x(1).*(1i.^2).*w.^2)+(x(5).*1i.*w)+x(3));
z2 = x(1).*x(2).*(1i.^4).*w.^4;
z3 = ((x(1)+x(2)).*x(5).*(1i.^3).*w.^3);
z4 = (1i.^2).*w.^2.*((x(1).*x(3))+(x(2).*x(3))+(x(1).*x(4)));
z5 = 1i.*w.*(x(5).*x(4));
z6 = x(3).*x(4);
z7 = x(4).*(-x(5).*1i.*w-x(3));
z8 = (z7./(z2+z3+z4+z5+z6));
trfs =(w.^2).*(z7./(z2+z3+z4+z5+z6));
trfs2 = (z7./(z2+z3+z4+z5+z6));
trfu = (z1./(z2+z3+z4+z5+z6));
abstrfs = abs(trfs);
abstrfs2 = abs(trfs2);
abstrfu = abs(trfu);
y1(j) = rms(abstrfs);
y2(j) = rms(abstrfs2);
y3(j) = abs(rms(abstrfu)-y2(j));
j = j+1;
end
plot (y1,y3,'r.','MarkerSize',10)
I'm getting the error message;
Not enough input arguments.
Error in confun (line 5)
c(2) = abs(z8-trfu) - 0.217;
Error in fmincon (line 633)
[ctmp,ceqtmp] = feval(confcn{3},X,varargin{:});
Error in main (line 13)
[x,fval] = fmincon(#objfun,x0,[],[],[],[],LB,UB,#confun,[],w1,w2);
Caused by:
Failure in initial nonlinear constraint function evaluation. FMINCON cannot continue.
I know fmincon accepts constraint function with input in form of one vector with number of elements corresponding to number of constrained variables. But what I don't know is how to set all the input arguments as one vector.
Because the objective function is a bit bulky I separated it into different variables in objfun. Do I have to expand the function when setting constraints or is there another way? I have done a lot of research and still not sure how this works.
Both functions that their handles are passed as fun and nonlcon arguments should take only one argument, which is the design vector (not the constant values of the problem). So you should construct new anonymous functions and pass them to fmincon like this:
[x,fval] = fmincon(#(x)objfun(x, w1, w2),...
x0,[],[],[],[],LB,UB,#(x)confun(x,z8,y1,trfu));
But to do so, z8, y1, and trfu should be assigned before the call to fmincon. Since these values are actually calculated for each x, I'm afraid you need to calculate them again in confun. If this is not a very time consuming optimization, only move them to a third function and call it from both objfun and confun. Otherwise follow the method described here, to use values calculated in objective function, in constraint functions.

Speed in Matlab vs. Julia vs. Fortran

I am playing around with different languages to solve a simple value function iteration problem where I loop over a state-space grid. I am trying to understand the performance differences and how I could tweak each code. For posterity I have posted full length working examples for each language below. However, I believe that most of the tweaking is to be done in the while loop. I am a bit confused what I am doing wrong in Fortran as the speed seems subpar.
Matlab ~2.7secs : I am avoiding a more efficient solution using the repmat function for now to keep the codes comparable. Code seems to be automatically multithreaded onto 4 threads
beta = 0.98;
sigma = 0.5;
R = 1/beta;
a_grid = linspace(0,100,1001);
tic
[V_mat, next_mat] = valfun(beta, sigma, R ,a_grid);
toc
where valfun()
function [V_mat, next_mat] = valfun(beta, sigma, R, a_grid)
zeta = 1-1/sigma;
len = length(a_grid);
V_mat = zeros(2,len);
next_mat = zeros(2,len);
u = zeros(2,len,len);
c = zeros(2,len,len);
for i = 1:len
c(1,:,i) = a_grid(i) - a_grid/R + 20.0;
c(2,:,i) = a_grid(i) - a_grid/R;
end
u = c.^zeta * zeta^(-1);
u(c<=0) = -1e8;
tol = 1e-4;
outeriter = 0;
diff = 1000.0;
while (diff>tol) %&& (outeriter<20000)
outeriter = outeriter + 1;
V_last = V_mat;
for i = 1:len
[V_mat(1,i), next_mat(1,i)] = max( u(1,:,i) + beta*V_last(2,:));
[V_mat(2,i), next_mat(2,i)] = max( u(2,:,i) + beta*V_last(1,:));
end
diff = max(abs(V_mat - V_last));
end
fprintf("\n Value Function converged in %i steps. \n", outeriter)
end
Julia (after compilation) ~5.4secs (4 threads (9425469 allocations: 22.43 GiB)), ~7.8secs (1 thread (2912564 allocations: 22.29 GiB))
[EDIT: after adding correct broadcasting and #views its only 1.8-2.1seconds now, see below!]
using LinearAlgebra, UnPack, BenchmarkTools
struct paramsnew
β::Float64
σ::Float64
R::Float64
end
function valfun(params, a_grid)
#unpack β,σ, R = params
ζ = 1-1/σ
len = length(a_grid)
V_mat = zeros(2,len)
next_mat = zeros(2,len)
u = zeros(2,len,len)
c = zeros(2,len,len)
#inbounds for i in 1:len
c[1,:,i] = #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #. a_grid[i] - a_grid/R
end
u = c.^ζ * ζ^(-1)
u[c.<=0] .= typemin(Float64)
tol = 1e-4
outeriter = 0
test = 1000.0
while test>tol
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = findmax( u[1,:,i] .+ β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = findmax( u[2,:,i] .+ β*V_last[1,:])
end
test = maximum( abs.(V_mat - V_last)[.!isnan.( V_mat - V_last )])
end
print("\n Value Function converged in ", outeriter, " steps.")
return V_mat, next_mat
end
a_grid = collect(0:0.1:100)
p1 = paramsnew(0.98, 1/2, 1/0.98);
#time valfun(p1,a_grid)
print("\n should be compiled now \n")
#btime valfun(p1,a_grid)
Fortran (O3, mkl, qopenmp) ~9.2secs: I also must be doing something wrong when declaring the openmp variables as the compilation will crash for some grid sizes when using openmp (SIGSEGV error).
module mod_calc
use omp_lib
implicit none
integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(2,len), V_last(2,len), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98
sigma = 0.5
R = 1.0/0.98
zeta = 1.0 - 1.0/sigma
tol = 1e-4
test = 1000.0
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0/zeta
where (c<=0)
u = -1e6
end where
V_mat = 0.0
next_mat = 0.0
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(1,i) = maxval(u(:,i,1) + bbeta*V_last(2,:))
next_mat(1,i) = maxloc(u(:,i,1) + bbeta*V_last(2,:),1)
V_mat(2,i) = maxval(u(:,i,2) + bbeta*V_last(1,:))
next_mat(2,i) = maxloc(u(:,i,2) + bbeta*V_last(1,:),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call omp_set_num_threads(4)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
There should be ways to reduce the amount of allocations (Julia reports 32-45% gc time!) but for now I am too novice to see them, so any comments and tipps are welcome.
Edit:
Adding #views and correct broadcasting to the while loop improved the Julia speed considerably (as expected, I guess) and hence beats the Matlab loop now. With 4 threads the code now takes only 1.97secs. Specifically,
#inbounds for i in 1:len
c[1,:,i] = #views #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #views #. a_grid[i] - a_grid/R
end
u = #. c^ζ * ζ^(-1)
#. u[c<=0] = typemin(Float64)
while test>tol && outeriter<20000
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = #views findmax( #. u[1,:,i] + β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = #views findmax( #. u[2,:,i] + β*V_last[1,:])
end
test = #views maximum( #. abs(V_mat - V_last)[!isnan( V_mat - V_last )])
end
The reason the fortran is so slow is that it is using quadruple precision - I don't know Julia or Matlab but it looks as though double precision is being used in that case. Further as noted in the comments some of the loop orders are incorrect for Fortran, and also you are not consistent in your use of precision in the Fortran code, most of your constants are single precision. Correcting all these leads to the following:
Original: test = 9.83440674663232047922921588613472439E-0005 Time =
31.413 seconds.
Optimised: test = 9.8343643237979391E-005 Time = 0.912 seconds.
Note I have turned off parallelisation for these, all results are single threaded. Code is below:
module mod_calc
!!$ use omp_lib
implicit none
!!$ integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
integer, parameter :: dp = selected_real_kind(15), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(len,2), V_last(len,2), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0_dp
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0_dp/zeta
where (c<=0)
u = -1e6_dp
end where
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
!!$ call omp_set_num_threads(2)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compilation / linking:
ian#eris:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian#eris:~/work/stack$ gfortran -Wall -Wextra -O3 jul.f90
jul.f90:36:48:
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
1
Warning: Unused parameter ‘file_name’ declared at (1) [-Wunused-parameter]
jul.f90:35:57:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘fu’ declared at (1) [-Wunused-variable]
jul.f90:35:36:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘sss’ declared at (1) [-Wunused-variable]
Running:
ian#eris:~/work/stack$ ./a.out
9.8343643237979391E-005
Time = 0.908 seconds.
What #Ian Bush says in his answer about the dual precision is correct. Moreover,
You will likely not need openmp for the kind of parallelization you have done in your code. The Fortran's intrinsic do concurrent() will automatically parallelize the loop for you (when the code is compiled with the parallel flag of the respective compiler).
Also, the where elsewhere construct is slow as it often requires the creation of a logical mask array and then applying it in a do-loop. You can use do concurrent() in place of where to both avoid the extra temporary array creation and parallelize the computation on multiple cores.
Also, when comparing 64bit precision numbers, it's good to make sure both values are the same type and kind to avoid an implicit type/kind conversion before the comparison is made.
Also, the calculation of a_grid(i) - a_grid/R in computing the c array is redundant and can be avoided in the subsequent line.
Here is the modified optimized parallel Fortran code without any OpenMP,
module mod_calc
use iso_fortran_env, only: dp => real64
implicit none
integer, parameter :: len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do concurrent(i=1:n)
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
implicit none
real(dp) :: bbeta, sigma, R, zeta, tol, test
real(dp) :: a_grid(len), V_mat(len,2), V_last(len,2), u(len,len,2), c(len,len,2)
integer :: outeriter, i, j, k, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter + 1
V_last = V_mat
do concurrent(i=1:len)
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compiling your original code with /standard-semantics /F0x1000000000 /O3 /Qip /Qipo /Qunroll /Qunroll-aggressive /inline:all /Ob2 /Qparallel Intel Fortran compiler flags, yields the following timing,
original.exe
Time = 37.284 seconds.
compiling and running the parallel concurrent Fortran code in the above (on at most 4 cores, if any at all is used) yields,
concurrent.exe
Time = 0.149 seconds.
For comparison, this MATLAB's timing,
Value Function converged in 362 steps.
Elapsed time is 3.575691 seconds.
One last tip: There are several vectorized array computations and loops in the above code that can still be merged together to even further improve the speed of your Fortran code. For example,
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
in the above code can be all merged with the do concurrent loop appearing before it,
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
If you decide to do so, then you can define an auxiliary variable inverse_zeta = 1.0_dp / zeta to use in the computation of u inside the loop instead of using * 1.0_dp / zeta, thus avoiding the extra division (which is more costly than multiplication), without degrading the readability of the code.

Matlab integral returning infinite not-a-number

I'm attempting to integrate a function and keep receiving the warning:
Warning: Infinite or Not-a-Number value encountered.
I've been unable to determine why this is the case and was hoping someone may be able to shed some light. I believe one of the parameters is giving off an Inf value but I haven't been able to determine which one. Any help would be appreciated.
lm = 1.75;
Cm = 3.2E6;
fe = 1380;
H = 13.5;
q = 1E-5;
Cw = 4.2E6;
y = 0.0;
x = 0.1;
ts = [0.1 97/24];
Mt = 100;
t = linspace(ts(1)*86400, ts(2)*86400, Mt); % [s]
QL = fe/H;
z = H/2;
Dt = lm/Cm;
r = x.^2+y.^2;
vT = q*Cw*Cm;
T = zeros(size(t));
for i = 1:length(t)
tt = t(i);
fun = #(ze) T_GIGF(z,ze,Dt,tt,vT,r)/sqrt(pi)./sqrt(r+(z-ze).^2);
T(i) = QL/(4*pi*lm)*exp(vT*x/2*Dt).*...
(integral(fun,0,H)-...
integral(fun,-H,0));
end
function func = T_GIGF(z,ze,a,tt,VT,r)
u1 = (r+(z-ze).^2)/(4*a*tt);
u2 = VT^2*(r+(z-ze).^2)/(16*a^2);
func = 0.5*sqrt(pi)*(exp(-2*sqrt(u2)).*erfc(sqrt(u1)-sqrt(u2./u1))+...
exp(+2*sqrt(u2)).*erfc(sqrt(u1)+sqrt(u2./u1)));
end
You getting this because your u1 and u2 are huge numbers of about 1e29!!! Thus doing exp(1e29) goes over range of what double number can support:
exp(1e29) > realmax results in 1

How to stop MATLAB from rounding extremely small values to 0?

I have a code in MATLAB which works with very small numbers, for example, I have values that are on the order of 10^{-25}, however when MATLAB does the calculations, the values themselves are rounded to 0. Note, I am not referring to format to display these extra decimals, but rather the number itself is changed to 0. I think the reason is because MATLAB, by default, uses up to 15 digits after the decimal point for its calculations. How can I change this so that numbers that are very very small are retained as they are in the calculations?
EDIT:
My code is the following:
clc;
clear;
format long;
% Import data
P = xlsread('Data.xlsx', 'P');
d = xlsread('Data.xlsx', 'd');
CM = xlsread('Data.xlsx', 'Cov');
Original_PD = P; %Store original PD
LM_rows = size(P,1)+1; %Expected LM rows
LM_columns = size(P,2); %Expected LM columns
LM_FINAL = zeros(LM_rows,LM_columns); %Dimensions of LM_FINAL
for ii = 1:size(P,2)
P = Original_PD(:,ii);
% c1, c2, ..., cn, c0, f
interval = cell(size(P,1)+2,1);
for i = 1:size(P,1)
interval{i,1} = NaN(size(P,1),2);
interval{i,1}(:,1) = -Inf;
interval{i,1}(:,2) = d;
interval{i,1}(i,1) = d(i,1);
interval{i,1}(i,2) = Inf;
end
interval{i+1,1} = [-Inf*ones(size(P,1),1) d];
interval{i+2,1} = [d Inf*ones(size(P,1),1)];
c = NaN(size(interval,1),1);
for i = 1:size(c,1)
c(i,1) = mvncdf(interval{i,1}(:,1),interval{i,1}(:,2),0,CM);
end
c0 = c(size(P,1)+1,1);
f = c(size(P,1)+2,1);
c = c(1:size(P,1),:);
b0 = exp(1);
b = exp(1)*P;
syms x;
eqn = f*x;
for i = 1:size(P,1)
eqn = eqn*(c0/c(i,1)*x + (b(i,1)-b0)/c(i,1));
end
eqn = c0*x^(size(P,1)+1) + eqn - b0*x^size(P,1);
x0 = solve(eqn);
x0 = double(x0);
for i = 1:size(x0)
id(i,1) = isreal(x0(i,1));
end
x0 = x0(id,:);
x0 = x0(x0 > 0,:);
clear x;
for i = 1:size(P,1)
x(i,:) = (b(i,1) - b0)./(c(i,1)*x0) + c0/c(i,1);
end
% x = [x0 x1 ... xn]
x = [x0'; x];
x = x(:,sum(x <= 0,1) == 0);
% lamda
lamda = -log(x);
LM_FINAL(:,ii) = lamda;
end
The problem is in this step:
for i = 1:size(P,1)
x(i,:) = (b(i,1) - b0)./(c(i,1)*x0) + c0/c(i,1);
end
where the "difference" gets very close to 0. How can I stop this rounding from occurring at this step?
For example, when i = 10, I have the following values:
b_10 = 0.006639735483297
b_0 = 2.71828182845904
c_10 = 0.000190641848119641
c_0 = 0.356210110252579
x_0 = 7.61247930625269
After doing the calculations we get: -1868.47805854794 + 1868.47805854794 which yields a difference of -2.27373675443232E-12, that gets rounded to 0 by MATLAB.
EDIT 2:
Here is my data file which is used for the code. After you run the code (should take about a minute and half to finish running), row 11 in the variable x shows 0 (even after double clicking to check it's real value), when it shouldn't.
The problem you're having is because the IEEE standard for floating points can't distinguish your numbers from zero because they don't utilize sufficient bits.
Have a look at John D'Errico's Big Decimal Class and Variable Precision Integer Arithmetic. Another option would be to use the Big Integer Class from Java but that might be more challenging if you are unfamiliar with using Java and othe rexternal libraries in MATLAB.
Can you give an example of the calculations in which you are using 1e-25 and getting zero? Here's what I get for a floating point called small_num and one of John's high-precision-floats called small_hpf when assigning them and multiplying by pi.
>> small_num = 1e-25
small_num =
1.0000e-25
>> small_hpf = hpf(1e-25)
small_hpf =
1.000000000000000038494869749191839081371989361591338301396127644e-25
>> small_num * pi
ans =
3.1416e-25
>> small_hpf * pi
ans =
3.141592653589793236933163473501228686498684350685747717239459106e-25