I am trying to write a program in Matlab to solve a system of linear equations using LU decomposition which employs the use of gauss elimination and hence a lot of arithmetic steps.
The answers are close to the right solution but the round-off errors are pretty high as compared to other languages like Python etc.
For instance, one of the solutions is exactly 3 but I get 2.9877.
I know in-built functions should be used for such trivial things since Matlab is a high-computing language but if I still wanted to do it with loops etc would I always get round-off errors or is there a way to reduce those while doing numerical calculations?
I am attaching the code but it's big and not worth reading. I have still attached it for the sake of completion.
One can just note the use of many arithmetic operations which introduce a lot of round-off errors.
Are these round-off errors intrinsic to Matlab and unavoidable?
clc
clear
%No of equations is n
n=3;
%WRITING THE Coefficients
A(1,1)=3;
A(1,2)=-0.1;
A(1,3)=-0.2;
B(1)=7.85;
A(2,1)=0.1;
A(2,2)=7;
A(2,3)=-0.3;
B(2)=-19.3;
A(3,1)=0.3;
A(3,2)=-0.2;
A(3,3)=10;
B(3)=71.4;
%Forward Elimination
for i=1:n-1
for j=i+1:n
fact=A(j,i)/A(i,i);
A(j,i)=fact;
A(j,j:n)=A(j,j:n)-fact*A(i,j:n);
B(j)=B(j)-fact*B(i);
end
end
disp(A)
% Calculating d matrix
sum=0;
D(1)=B(1);
for i=2:n
for j=1:i-1
sum=sum+A(i,j)*B(j);
D(i)=B(i)-sum;
end
end
disp("D =")
disp(transpose(D))
%Back Substitution
X(n)=D(n)/A(n,n);
for z=n-1:-1:1
sum=0;
for w=z+1:n
sum=sum+A(z,w)*X(w);
end
X(z)=(D(z)-sum)/(A(z,z));
end
disp("Solution X is:")
disp(transpose(X))
Never forget to distrust your coding.
If you comment out line 24 and replace 26 with A(j,i:n)=A(j,i:n)-fact*A(i,i:n); you get the nice solution (long format)
3.000000000000000
-2.500000000000000
7.000000000000002
I am not saying that this is the best fix (it is not), but it clearly shows that roundoff is not guilty. The wrong solution remained close because the system is strongly diagonal-dominant.
Related
First off, I'm not sure if this is the best place to post this, but since there isn't a dedicated Matlab community I'm posting this here.
To give a little background, I'm currently prototyping a plasma physics simulation which involves triple integration. The innermost integral can be done analytically, but for the outer two this is just impossible. I always thought it's best to work with values close to unity and thus normalized the my innermost integral such that it is unit-less and usually takes values close to unity. However, compared to an earlier version of the code where the this innermost integral evaluated to values of the order of 1e-50, the numerical double integration, which uses the native Matlab function integral2 with target relative tolerance of 1e-6, now requires around 1000 times more function evaluations to converge. As a consequence my simulation now takes roughly 12h instead of the previous 20 minutes.
Question
So my questions are:
Is it possible that the faster convergence in the older version is simply due to the additional evaluations vanishing as roundoff errors and that the results thus arn't trustworthy even though it passes the 1e-6 relative tolerance? In the few tests I run the results seemed to be the same in both versions though.
What is the best practice concerning the normalization of the integrand for numerical integration?
Is there some way to improve the convergence of numerical integrals, especially if the integrand might have singularities?
I'm thankful for any help or insight, especially since I don't fully understand the inner workings of Matlab's integral2 function and what should be paid attention to when using it.
If I didn't know any better I would actually conclude, that the integrand which is of the order of 1e-50 works way better than one of say the order of 1e+0, but that doesn't seem to make sense. Is there some numerical reason why this could actually be the case?
TL;DR when multiplying the function to be integrated numerically by Matlab 's integral2 with a factor 1e-50 and then the result in turn with a factor 1e+50, the integral gives the same result but converges way faster and I don't understand why.
edit:
I prepared a short script to illustrate the problem. Here the relative difference between the two results was of the order of 1e-4 and thus below the actual relative tolerance of integral2. In my original problem however the difference was even smaller.
fun = #(x,y,l) l./(sqrt(1-x.*cos(y)).^5).*((1-x).*sin(y));
x = linspace(0,1,101);
y = linspace(0,pi,101).';
figure
surf(x,y,fun(x,y,1));
l = linspace(0,1,101); l=l(2:end);
v1 = zeros(1,100); v2 = v1;
tval = tic;
for i=1:100
fun1 = #(x,y) fun(x,y,l(i));
v1(i) = integral2(fun1,0,1,0,pi,'RelTol',1e-6);
end
t1 = toc(tval)
tval = tic;
for i=1:100
fun1 = #(x,y) 1e-50*fun(x,y,l(i));
v2(i) = 1e+50*integral2(fun1,0,1,0,pi,'RelTol',1e-6);
end
t2 = toc(tval)
figure
hold all;
plot(l,v1);
plot(l,v2);
plot(l,abs((v2-v1)./v1));
This question might be too broad to be posted here but I'll try to be as specific as possible. If you still consider it to be too broad, I'll simply delete it.
Have a look at the EDIT in the bottom for my final thoughts on the subject.
Also have a look at Ander Biguri 's answer if you have access to the parallel computing toolbox and have an NVIDIA GPU.
My problem :
I'm solving dynamic equations by using a Newmark scheme (2nd order implicit), which involves solving a lot of linear systems of the form A*x=b for x.
I've already optimized all the code that doesn't involve solving linear systems. As it stands now, the linear systems solving take up to 70% of the calculation time in the process.
I've though using MATLAB's linsolve, but my matrix A doesn't have any of the properties that could be used as opts input for linsolve.
The idea :
As seen in the documentation of linsolve :
If A has the properties in opts, linsolve is faster than mldivide,
because linsolve does not perform any tests to verify that A has the
specified properties
As far as I know, by using mldivide, MATLAB will use LU decomposition as my matrix A doens't have any specific property except for being square.
My question :
So I'm wondering if I'd gain some time by first decomposing A using MATLAB's lu, and then feed these to linsolve in order to solve x = U\(L\b) with opts being respectively upper and lower triangular.
That way I'd prevent MATLAB of doing all the properties checking that takes place during the mldivide process.
Note : I'm absolutely not expecting a huge time gain. But on calculations that take up to a week, even 2% matter..
Now why don't I try this myself you may ask? Well I've got calculations running until tuesday approximatively, and I'd want to ask if someone has already tried this and gained time, getting rid of the overhead due to matrix property checking by mldivide.
Toy example :
A=randn(2500);
% Getting A to be non singular
A=A.'*A;
x_=randn(2500,1);
b=A*x_;
clear x_
% Case 1 : mldivide
tic
for ii=1:100
x=A\b;
end
out=toc;
disp(['Case 1 time per iteration :' num2str((out)/100)]);
% Case 2 : LU+linsolve
opts1.LT=true;
opts2.UT=true;
tic;
for ii=1:100
[L,U]=lu(A);
% It seems that these could be directly replaced by U\(L\b) as mldivide check for triangularity first
Tmp=linsolve(L,b,opts1);
x=linsolve(U,Tmp,opts2);
end
out2=toc;
disp(['Case 2 time per iteration :' num2str((out2)/100)]);
EDIT
So I just had the possibility to try a few things.
I missed earlier in the documentation of linsolve that if you don't specify any opts input it will default to using the LU solver, which is what I want. Doing a bit of time testing with it (And taking into account #rayryeng 's remark to "timeit that bad boy"), it saves around 2~3% of processing time when compared to mldivide, as shown below. It's not a huge deal in term of time gain, but it's something non neglictible on calculations that take up to a week.
timeit results on a 1626*1626 linear system:
mldivide :
t1 =
0.102149773097083
linsolve :
t2 =
0.099272037768204
relative : 0.028171725121151
I know you do not have NVIDIA GPU and parallel computing toolbox, but if you had, this would work:
If you replace the second test in your code by:
tic;
for ii=1:10
A2=gpuArray(A); % so we account for memory management
b2=gpuArray(b);
x=A2\b2;
end
out2=toc;
My PC says (CPU vs GPU)
Case 1 time per iteration :0.011881
Case 2 time per iteration :0.0052003
I have simple question. I'm trying to evaluate improper integral of 0th order Bessel function using Matlab R2012a:
v = integral(#(x)(besselj(0, x), 0, Inf)
which gives me v = 3.7573e+09. However this should be v = 1 in theory. When I'm trying to do
v = integral(#(l)besselj(0,l), 0, 1000)
it results to v = 1.0047. Could you briefly explain me, what is going wrong with integration? And how to properly integrate Bessel-type functions?
From the docs to do an improper integral on an oscillatory function:
q = integral(fun,0,Inf,'RelTol',1e-8,'AbsTol',1e-13)
in the docs the example is
fun = #(x)x.^5.*exp(-x).*sin(x);
but I guess in your case try:
q = integral(#(x)(besselj(0, x),0,Inf,'RelTol',1e-8,'AbsTol',1e-13)
At first I was sceptical that taking an integral over a Bessel function would produce finite results. Mathematica/Wofram Alpha however showed that the result is finite, but it is not for the faint of heart.
However, then I was pointed to this site where it is explained how to do it properly, and that the value of the integral should be 1.
I experimented a bit to verify the correctness of their statements:
F = #(z) arrayfun(#(y) quadgk(#(x)besselj(0,x), 0, y), z);
z = 10:100:1e4;
plot(z, F(z))
which gave:
so clearly, the integral indeed seems to converges to 1. Shame on Wolfram Alpha!
(Note that this is kind of a misleading plot; try to do it with z = 10:1e4; and you'll see why. But oh well, the principle is the same anyway).
This figure also shows precicely what the problem is you're experiencing in Matlab; the value of the integral is like a damped oscillation around 1 for increasing x. Problem is, the dampening is very weak -- as you can see, my z needed to go all the way to 10,000 just to produce this plot, whereas the oscillation amplitude was only decreased by ~0.5.
When you try to do the improper integral by messing around with the 'MaxIntervalCount' setting, you get this:
>> quadgk(#(x)besselj(0,x), 0, inf, 'maxintervalcount', 1e4)
Warning: Reached the limit on the maximum number of intervals in use.
Approximate bound on error is 1.2e+009. The integral may not exist, or
it may be difficult to approximate numerically.
Increase MaxIntervalCount to 10396 to enable QUADGK to continue for
another iteration.
> In quadgk>vadapt at 317
In quadgk at 216
It doesn't matter how high you set the MaxIntervalCount; you'll keep running into this error. Similar things also happen when using quad, quadl, or similar (these underly the R2012 integral function).
As this warning and plot show, the integral is just not suited to approximate accurately by any quadrature method implemented in standard MATLAB (at least, that I know of).
I believe the proper analytical derivation, as done on the physics forum, is really the only way to get to the result without having to resort to specialized quadrature methods.
I'm currently working on a rudimentary optimization algorithm in Matlab, and I'm running into issues with Matlab saving variables at ridiculous precision. Within a few iterations the variables are so massive that it's actually triggering some kind of infinite loop in sym.m.
Here's the line of code that's starting it all:
SLine = (m * (X - P(1))) + P(2);
Where P = [2,2] and m = 1.2595. When I type this line of code into the command line manually, SLine is saved as the symbolic expression (2519*X)/2000 - 519/1000. I'm not sure why it isn't using a decimal approximation, but at least these fractions have the correct value. When this line of code runs in my program, however, it saves SLine as the expression (2836078626493975*X)/2251799813685248 - 584278812808727/1125899906842624, which when divided out isn't even precise to four decimals. These massive fractions are getting carried through my program, growing with each new line of code, and causing it to grind to a halt.
Does anyone have any idea why Matlab is behaving in this way? Is there a way to specify what precision it should use while performing calculations? Thanks for any help you can provide.
You've told us what m and P are, but what is X? X is apparently a symbolic variable. So further computations are all done symbolically.
Welcome to the Joys of Symbolic Computing!
Most Symbolic Algebra systems represent numbers as rationals, $(p,q) = \frac{p}{q}$, and perform rational arithmetic operations (+,-,*,/) on these numbers, which produce rational results. Generally, these results are exact (also called infinite precision).
It is well-known that the sizes of the rationals generated by rationals operations on rationals grow exponentially. Hence, if you try to solve a realistic problem with any Symbolic Algebra system, you eventually run out of space or time.
Here is the last word on this topic, where Nick Trefethen FRS shows why floating point arithmetic is absolutely vital for solving realistic numeric problems.
http://people.maths.ox.ac.uk/trefethen/publication/PDF/2007_123.pdf
Try this in Matlab:
function xnew = NewtonSym(xstart,niters);
% Symbolic Newton on simple polynomial
% Derek O'Connor 2 Dec 2012. derekroconnor#eircom.net
x = sym(xstart,'f');
for iter = 1:niters
xnew = x - (x^5-2*x^4-3*x^3+3*x^2-2*x-1)/...
(5*x^4-8*x^3-9*x^2+6*x-2);
x = xnew;
end
function xnew = TestNewtonSym(maxits);
% Test the running time of Symbolic Newton
% Derek O'Connor 2 Dec 2012.
time=zeros(maxits,1);
for niters=1:maxits
xstart=0;
tic;
xnew = NewtonSym(xstart,niters);
time(niters,1)=toc;
end;
semilogy((1:maxits)',time)
So, from MATLAB reference documentation on Symbolic computations, the symbolic representation will always be in exact rational form, as opposed to decimal approximation of a floating-point number [1]. The reason this is done, apparently, is to "to help avoid rounding errors and representation errors" [2].
The exact representation is something that cannot be overcome by just doing symbolic arithmetic. However, you can use Variable-Precision Arithmetic (vpa) in Matlab to get the same precision [3].
For example
>> sym(pi)
ans =
0
>> vpa(sym(pi))
ans =
3.1415926535897932384626433832795
References
[1] http://www.mathworks.com/help/symbolic/create-symbolic-numbers-variables-and-expressions.html
[2]https://en.wikibooks.org/wiki/MATLAB_Programming/Advanced_Topics/Toolboxes_and_Extensions/Symbolic_Toolbox
[3]http://www.mathworks.com/help/symbolic/vpa.html
Sorry if this is obvious but I searched a while and did not find anything (or missed it).
I'm trying to solve linear systems of the form Ax=B with A a 4x4 matrix, and B a 4x1 vector.
I know that for a single system I can use mldivide to obtain x: x=A\B.
However I am trying to solve a great number of systems (possibly > 10000) and I am reluctant to use a for loop because I was told it is notably slower than matrix formulation in many MATLAB problems.
My question is then: is there a way to solve Ax=B using vectorization with A 4x4x N and B a matrix 4x N ?
PS: I do not know if it is important but the B vector is the same for all the systems.
You should use a for loop. There might be a benefit in precomputing a factorization and reusing it, if A stays the same and B changes. But for your problem where A changes and B stays the same, there's no alternative to solving N linear systems.
You shouldn't worry too much about the performance cost of loops either: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.
I don't think you can optimize this further. As explained by #Tom, since A is the one changing, there is no benefit in factoring the various A's beforehand...
Besides the looped solution is pretty fast given the dimensions you mention:
A = rand(4,4,10000);
B = rand(4,1); %# same for all linear systems
tic
X = zeros(4,size(A,3));
for i=1:size(A,3)
X(:,i) = A(:,:,i)\B;
end
toc
Elapsed time is 0.168101 seconds.
Here's the problem:
you're trying to perform a 2D operation (mldivide) on a 3d matrix. No matter how you look at it, you need reference the matrix by index which is where the time penalty kicks in... it's not the for loop which is the problem, but it's how people use them.
If you can structure your problem differently, then perhaps you can find a better option, but right now you have a few options:
1 - mex
2 - parallel processing (write a parfor loop)
3 - CUDA
Here's a rather esoteric solution that takes advantage of MATLAB's peculiar optimizations. Construct an enormous 4k x 4k sparse matrix with your 4x4 blocks down the diagonal. Then solve all simultaneously.
On my machine this gets the same solution up to single precision accuracy as #Amro/Tom's for-loop solution, but faster.
n = size(A,1);
k = size(A,3);
AS = reshape(permute(A,[1 3 2]),n*k,n);
S = sparse( ...
repmat(1:n*k,n,1)', ...
bsxfun(#plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ...
AS, ...
n*k,n*k);
X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000
For loop: 0.122570 seconds.
Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite then you can use chol on S to improve the accuracy.
This is silly. But so is how slow matlab's for loops still are in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large so everything still fits into memory.
I know this post is years old now, but I'll contribute my two cents anyway. You CAN put all of your A matricies into a bigger block diagonal matrix, where there will be 4x4 blocks on the diagonal of a big matrix. The right hand side will be all of your b vectors stacked on top of each other over and over. Once you set this up, it is represented as a sparse system, and can be efficiently solved with the algorithms mldivide chooses. The blocks are numerically decoupled, so even if there are singular blocks in there, the answers for the nonsingular blocks should be right when you use mldivide. There is a code that took this approach on MATLAB Central:
http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
I suggest experimenting to see if the approach is any faster than looping. I suspect it can be more efficient, especially for large numbers of small systems. In particular, if there are nice formulas for the coefficients of A across the N matricies, you can build the full left hand side using MATLAB vector operations (without looping), which could give you additional cost savings. As others have noted, vectorized operations aren't always faster, but they often are in my experience.