Matlab inverse operation and warning - matlab

Not quite sure what this means.
"Warning: Matrix is singular to working precision."
I have a 3x4 matrix called matrix bestM
matrix Q is 3x3 of bestM and matrix m is the last column of bestM
I would like to do C = -Inverse matrix of Q * matrix m
and I get that warning
and C =[Inf Inf Inf] which isn't right because i am calculating for the camera center in the world
bestM = [-0.0031 -0.0002 0.0005 0.9788;
-0.0003 -0.0006 0.0028 0.2047;
-0.0000 -0.0000 0.0000 0.0013];
Q = bestM(1:3,1:3);
m = bestM(:,4);
X = inv(Q);
C = -X*m;
disp(C);

A singular matrix can be thought of as the matrix equivalent of zero, when you try to invert 0 it blows up (goes to infinity) which is what you are getting here. user 1281385 is absolutely wrong about using the format command to increase precision; the format command is used to change the format of what is shown to you. In fact the very first line of the help command for format says
format does not affect how MATLAB computations are done.

As found here, a singular matrix is one that does not have an inverse. As dvreed77 already pointed out, you can think of this as 1/0 for matrices.
Why I'm answering, is to tell you that using inv explicitly is almost never a good idea. If you need the same inverse a few hundred times, it might be worth it, however, in most circumstances you're interested in the product C:
C = -inv(Q)*m
which can be computed much more accurately and faster in Matlab using the backslash operator:
C = -Q\m
Type help slash for more information on that. And even if you happen to find yourself in a situation where you really need the inverse explicitly, I'd still advise you to avoid inv:
invQ = Q\eye(size(Q))
Below is a little performance test to demonstrate one of the very few situations where the explicit inverse can be handy:
% This test will demonstrate the one case I ever encountered where
% an explicit inverse proved useful. Unfortunately, I cannot disclose
% the full details without breaking the law, but roughly, it came down
% to this: The (large) design matrix A, a result of a few hundred
% co-registrated images, needed to be used to solve several thousands
% of systems, where the result matrices b came from processing the
% images one-by-one.
%
% That means the same design matrix was re-used thousands of times, to
% solve thousands of systems at a time. To add to the fun, the images
% were also complex-valued, but I'll leave that one out of consideration
% for now :)
clear; clc
% parameters for this demo
its = 1e2;
sz = 2e3;
Bsz = 2e2;
% initialize design matrix
A = rand(sz);
% initialize cell-array to prevent allocating memory from consuming
% unfair amounts of time in the first loop.
% Also, initialize them, NOT copy them (as in D=C,E=D), because Matlab
% follows a lazy copy-on-write scheme, which would influence the results
C = {cellfun(#(~) zeros(sz,Bsz), cell(its,1), 'uni', false) zeros(its,1)};
D = {cellfun(#(~) zeros(sz,Bsz), cell(its,1), 'uni', false) zeros(its,1)};
E = {cellfun(#(~) zeros(sz,Bsz), cell(its,1), 'uni', false) zeros(its,1)};
% The impact of rand() is the same in both loops, so it has no
% effect, it just gives a longer total run time. Still, we do the
% rand explicitly to *include* the indexing operation in the test.
% Also, caching will most definitely influence the results, because
% any compiler (JIT), even without optimizations, might recognize the
% easy performance gain when the code computes the same array over and
% over again. It probably will, but we have no control over when and
% wherethat happens. So, we prevent that from happening at all, by
% re-initializing b at every iteration.
% The assignment to cell is a necessary part of the demonstration;
% it is the desired output of the whole calculation. Assigning to cell
% instead of overwriting 'ans' takes some time, which is to be included
% in the demonstration, again for cache reasons: the extra time is now
% guaranteed to be equal in both loops, so it really does not matter --
% only the total run time will be affected.
% Direct computation
start = tic;
for ii = 1:its
b = rand(sz,Bsz);
C{ii,1} = A\b;
C{ii,2} = max(max(abs( A*C{ii,1}-b )));
end
time0 = toc(start);
[max([C{:,2}]) mean([C{:,2}]) std([C{:,2}])]
% LU factorization (everyone's
start = tic;
[L,U,P] = lu(A, 'vector');
for ii = 1:its
b = rand(sz,Bsz);
D{ii,1} = U\(L\b(P,:));
D{ii,2} = max(max(abs( A*D{ii,1}-b )));
end
time1 = toc(start);
[max([D{:,2}]) mean([D{:,2}]) std([D{:,2}])]
% explicit inv
start = tic;
invA = A\eye(size(A)); % NOTE: DON'T EVER USE INV()!
for ii = 1:its
b = rand(sz,Bsz);
E{ii,1} = invA*b;
E{ii,2} = max(max(abs( A*E{ii,1}-b )));
end
time2 = toc(start);
[max([E{:,2}]) mean([E{:,2}]) std([E{:,2}])]
speedup0_1 = (time0/time1-1)*100
speedup1_2 = (time1/time2-1)*100
speedup0_2 = (time0/time2-1)*100
Results:
% |Ax-b|
1.0e-12 * % max. mean st.dev.
0.1121 0.0764 0.0159 % A\b
0.1167 0.0784 0.0183 % U\(L\b(P,;))
0.0968 0.0845 0.0078 % invA*b
speedup0_1 = 352.57 % percent
speedup1_2 = 12.86 % percent
speedup0_2 = 410.80 % percent
It should be clear that an explicit inverse has its uses, but just as a goto construct in any language -- use it sparingly and wisely.

Related

Coding a recursion in MATLAB

I have been stuck trying trying to write a MATLAB algorithm that computes a recursion in reverse, or that is what it seems like to me.
y_n = (1/n)−10*y_n−1 for n = 1,...,30 works in MATLAB, but because of the (*10), the round-off error makes the algorithm unstable and it is useless. Just by manipulating the recursion, y_n-1 = (1/10)(1/n - y_n) will work and the round-off errors will be reduced 10 fold at each step, potentially making this a stable algorithm.
After a couple days, I still cannot understand the logic needed to code this. Evaluating at y_n-1 is really throwing me in a loop. I was able to tackle the unstable algorithm, but I cannot think of the logic to manipulate the code to make it work. My question lies with how do you code this in MATLAB? I am truly stumped.
% Evaluate the integral yn = integral from 0 to 1 of x^n/(x+10).
% Unstable algorithm:
y(1) = log(11) - log(10);
k = 30;
for n = 1:k
y(n+1) = (1/n) - 10*y(n);
end
n_vector = 0:k;
[n_vector;y]
By manipulating the recursion, the results will be close to true values because of the bound on the error. The current output:
0.0953101798043248
0.0468982019567523
0.0310179804324768
0.0231535290085650
0.0184647099143501
0.0153529008564988
0.0131376581016785
0.0114805618403582
0.0101943815964183
0.00916729514692832
0.00832704853071678
0.00763860560192312
0.00694727731410218
0.00745030378205516
-0.00307446639198020
0.0974113305864686
-0.911613305864686
9.17495658805863
-91.6940103250307
916.992734829255
-9169.87734829255
91698.8211019731
-916988.165565185
9169881.69913012
-91698816.9496345
916988169.536345
-9169881695.32499
91698816953.2869
-916988169532.833
9169881695328.37
-91698816953283.7
What is expected, with the round-off errors taken care of is the results to stay between 0and1.
This output you are getting is correct, and as pointed out in the comments by Mad Physicist, the recursive function you have should behave this way.
If you look at the behavior of the two terms, as n gets bigger the initial subtraction will have less of an effect on the 10*y(n) term. So for large n, we can ignore 1/n.
At large n we then expect each step will increase our value by roughly a factor of 10. This is what you see in your output.
As far as writing a backward recursion. By definition you need a starting value, so you would need to assume y(30) and run the recursion backward as suggested in the comments.
So, I was able to answer by own question. The code needed would look like this:
% This function calculates the value of y20 with a guarantee to have an
% absolute error less than 10^-5
% The yn1 chosen to be high enough to guarantee this is n1 = 25
% Returns the value of y(20)
function [x]= formula(k)
% RECURSION APPROXIMATION
y(k) = 0;
n = k:-1:20;
y(n-1) = (1./10)*(1./n - y(n));
x = y(20);
% FURTHER: I needed to guarantee y20 to have <= 10^-5 magnitude error
% I determined n=25 would be my starting point, approximating y25=0 and working
% backwards to n=20 as I did above.
% y(n-1)=1/10(1/n-yn) “exact solution”
% (yn-1)*=1/10(1/n-(yn)*) “approximate solution with error”
% y(n-1)-(y(n-1))*=1/10(1/n-yn)-1/10(1/n-(yn)*) calculating the error
% = 1/10((yn)*-yn)
% So,
% E(n-1)=1/10(En)
% E(n-2)=1/100(E(n-1))
% E(n-3)=1/1000(E(n-2))
% E(n-4)=1/10000(E(n-3))
% E(n-5)=1/100000(E(n-4)) ⇒ 10^(-5)
% En20=(10^-5)En25
% Therefore, if we start with n1=25, it guarantees that y20 will have 10^-5 magnitude of % the initial propagating error.

Vectorizing a matlab loop with internal functions

I have a 3D Mesh grid, X, Y, Z. I want to create a new 3D array that is a function of X, Y, & Z. That function comprises the sum of several 3D Gaussians located at different points. Currently, I have a for loop that runs over the different points where I have my gaussians, and I have an array of center locations r0(nGauss, 1:3)
[X,Y,Z]=meshgrid(-10:.1:10);
Psi=0*X;
for index = 1:nGauss
Psi = Psi + Gauss3D(X,Y,Z,[r0(index,1),r0(index,2),r0(index,3)]);
end
where my 3D gaussian function is
function output=Gauss3D(X,Y,Z,r0)
output=exp(-(X-r0(1)).^2 + (Y-r0(2)).^2 + (Z-r0(3)).^2);
end
I'm happy to redesign the function, which is the slowest part of my code and has to happen many many time, but I can't figure out how to vectorize this so that it will run faster. Any suggestions would be appreciated
*****NB the Original function had a square root in it, and has been modified to make it an actual gaussian***
NOTE! I've modified your code to create a Gaussian, which was:
output=exp(-sqrt((X-r0(1)).^2 + (Y-r0(2)).^2 + (Z-r0(3)).^2));
That does not make a Gaussian. I changed this to:
output = exp(-((X-r0(1)).^2 + (Y-r0(2)).^2 + (Z-r0(3)).^2));
(note no sqrt). This is a Gaussian with sigma = sqrt(1/2).
If this is not what you want, then this answer might not be very useful to you, because your function does not go to 0 as fast as the Gaussian, and therefore is harder to truncate, and it is not separable.
Vectorizing this code is pointless, as the other answers attest. MATLAB's JIT is perfectly capable of running this as fast as it'll go. But you can reduce the amount of computation significantly by noting that the Gaussian goes to almost zero very quickly, and is separable:
Most of the exp evaluations you're doing here yield a very tiny number. You don't need to compute those, just fill in 0.
exp(-x.^2-y.^2) is the same as exp(-x.^2).*exp(-y.^2), which is much cheaper to compute.
Let's put these two things to the test. Here is the test code:
function gaussian_test
N = 100;
r0 = rand(N,3)*20 - 10;
% Original
tic
[X,Y,Z] = meshgrid(-10:.1:10);
Psi1 = zeros(size(X));
for index = 1:N
Psi1 = Psi1 + Gauss3D(X,Y,Z,r0(index,:));
end
t = toc;
fprintf('original, time = %f\n',t)
% Fast, large truncation
tic
[X,Y,Z] = deal(-10:.1:10);
Psi2 = zeros(numel(X),numel(Y),numel(Z));
for index = 1:N
Psi2 = Gauss3D_fast(Psi2,X,Y,Z,r0(index,:),5);
end
t = toc;
fprintf('tuncation = 5, time = %f\n',t)
fprintf('mean abs error = %f\n',mean(reshape(abs(Psi2-Psi1),[],1)))
fprintf('mean square error = %f\n',mean(reshape((Psi2-Psi1).^2,[],1)))
fprintf('max abs error = %f\n',max(reshape(abs(Psi2-Psi1),[],1)))
% Fast, smaller truncation
tic
[X,Y,Z] = deal(-10:.1:10);
Psi3 = zeros(numel(X),numel(Y),numel(Z));
for index = 1:N
Psi3 = Gauss3D_fast(Psi3,X,Y,Z,r0(index,:),3);
end
t = toc;
fprintf('tuncation = 3, time = %f\n',t)
fprintf('mean abs error = %f\n',mean(reshape(abs(Psi3-Psi1),[],1)))
fprintf('mean square error = %f\n',mean(reshape((Psi3-Psi1).^2,[],1)))
fprintf('max abs error = %f\n',max(reshape(abs(Psi3-Psi1),[],1)))
% DIPimage, same smaller truncation
tic
Psi4 = newim(201,201,201);
coords = (r0+10) * 10;
Psi4 = gaussianblob(Psi4,coords,10*sqrt(1/2),(pi*100).^(3/2));
t = toc;
fprintf('DIPimage, time = %f\n',t)
fprintf('mean abs error = %f\n',mean(reshape(abs(Psi4-Psi1),[],1)))
fprintf('mean square error = %f\n',mean(reshape((Psi4-Psi1).^2,[],1)))
fprintf('max abs error = %f\n',max(reshape(abs(Psi4-Psi1),[],1)))
end % of function gaussian_test
function output = Gauss3D(X,Y,Z,r0)
output = exp(-((X-r0(1)).^2 + (Y-r0(2)).^2 + (Z-r0(3)).^2));
end
function Psi = Gauss3D_fast(Psi,X,Y,Z,r0,trunc)
% sigma = sqrt(1/2)
x = X-r0(1);
y = Y-r0(2);
z = Z-r0(3);
mx = abs(x) < trunc*sqrt(1/2);
my = abs(y) < trunc*sqrt(1/2);
mz = abs(z) < trunc*sqrt(1/2);
Psi(my,mx,mz) = Psi(my,mx,mz) + exp(-x(mx).^2) .* reshape(exp(-y(my).^2),[],1) .* reshape(exp(-z(mz).^2),1,1,[]);
% Note! the line above uses implicit singleton expansion. For older MATLABs use bsxfun
end
This is the output on my machine, reordered for readability (I'm still on MATLAB R2017a):
| time(s) | mean abs | mean sq. | max abs
--------------+----------+----------+----------+----------
original | 5.035762 | | |
tuncation = 5 | 0.169807 | 0.000000 | 0.000000 | 0.000005
tuncation = 3 | 0.054737 | 0.000452 | 0.000002 | 0.024378
DIPimage | 0.044099 | 0.000452 | 0.000002 | 0.024378
As you can see, using these two properties of the Gaussian we can reduce time from 5.0 s to 0.17 s, a 30x speedup, with hardly noticeable differences (truncating at 5*sigma). A further 3x speedup can be gained by allowing a small error. The smallest the truncation value, the faster this will go, but the larger the error will be.
I added that last method, the gaussianblob function from DIPimage (I'm an author), just to show that option in case you need to squeeze that bit of extra time from your code. That function is implemented in C++. This version that I used you will need to compile yourself. Our current official release implements this function still in M-file code, and is not as fast.
Further chance of improvement is if the fractional part of the coordinates is always the same (w.r.t. the pixel grid). In this case, you can draw the Gaussian once, and shift it over to each of the centroids.
Another alternative involves computing the Gaussian once, at a somewhat larger scale, and interpolating into it to generate each of the 1D Gaussians needed to generate the output. I did not implement this, I have no idea if it will be faster or if the time difference will be significant. In the old days, exp was expensive, I'm not sure this is still the case.
So, I am building off of the answer above me #Durkee. I enjoy these kinds of problems, so I thought a little about how to make each of the expansions implicit, and I have the one-line function below. Using this function I shaved .11 seconds off of the call, which is completely negligible. It looks like yours is pretty decent. The only advantage of mine might be how the code scales on a finer mesh.
xLin = [-10:.1:10]';
tic
psi2 = sum(exp(-sqrt((permute(xLin-r0(:,1)',[3 1 4 2])).^2 ...
+ (permute(xLin-r0(:,2)',[1 3 4 2])).^2 ...
+ (permute(xLin-r0(:,3)',[3 4 1 2])).^2)),4);
toc
The relative run times on my computer were (all things kept the same):
Original - 1.234085
Other - 2.445375
Mine - 1.120701
So this is a bit of an unusual problem where on my computer the unvectorized code actually works better than the vectorized code, here is my script
clear
[X,Y,Z]=meshgrid(-10:.1:10);
Psi=0*X;
nGauss = 20; %Sample nGauss as you didn't specify
r0 = rand(nGauss,3); % Just make this up as it doesn't really matter in this case
% Your original code
tic
for index = 1:nGauss
Psi = Psi + Gauss3D(X,Y,Z,[r0(index,1),r0(index,2),r0(index,3)]);
end
toc
% Vectorize these functions so we can use implicit broadcasting
X1 = X(:);
Y1 = Y(:);
Z1 = Z(:);
tic
val = [X1 Y1 Z1];
% Change the dimensions so that r0 operates on the right elements
r0_temp = permute(r0,[3 2 1]);
% Perform the gaussian combination
out = sum(exp(-sqrt(sum((val-r0_temp).^2,2))),3);
toc
% Check to make sure both functions match
sum(abs(vec(Psi)-vec(out)))
function output=Gauss3D(X,Y,Z,r0)
output=exp(-sqrt((X-r0(1)).^2 + (Y-r0(2)).^2 + (Z-r0(3)).^2));
end
function out = vec(in)
out = in(:);
end
As you can see, this is probably about as vectorized as you can get. The whole function is done using broadcasting and vectorized operations which normally improve performance ten-one hundredfold. However, in this case, this is not what we see
Elapsed time is 1.876460 seconds.
Elapsed time is 2.909152 seconds.
This actually shows the unvectorized version as being faster.
There could be a few reasons for this of which I am by no means an expert.
MATLAB uses a JIT compiler now which means that for loops are no longer inefficient.
Your code is already reasonably vectorized, you are operating at 8 million elements at once
Unless nGauss is 1000 or something, you're not looping through that much, and at that point, vectorization means you will run out of memory
I could be hitting some memory threshold where I am using too much memory and that is making my code inefficient, I noticed that when I lowered the resolution on the meshgrid the vectorized version worked better
As an aside, I tested this on my GTX 1060 GPU with single precision(single precision is 10x faster than double precision on most GPUs)
Elapsed time is 0.087405 seconds.
Elapsed time is 0.241456 seconds.
Once again the unvectorized version is faster, sorry I couldn't help you out but it seems that your code is about as good as you are going to get unless you lower the tolerances on your meshgrid.

How to improve the time consuming of log and exponentiation operation

The theoretical result is a mixture of gamma ratios, like:sum(
AiGamma(Bi)/Gamma(Ci)), in which A is a binomial coeff, and would be very hard to calculate by using nchoosek directly in matlab. So my solution is to decompose all elements in the results to prod(vector), however, as the vector getting longer, I meet digit problem. So I changed the solution to get x(1:n) = log(vector) and then rst = sum(exp(x)). In practice, I found this is quite time consuming, especially when the # of gamma terms is very large.
Here is a code section:
gamma_sum = zeros(1,x2+1);
coef = ones(1,x2+1);
% sub_gamma_sum = zeros(1,x2+1);
% coef(1) = prod(1./sqrt(1:x2));
coef(1) = sum(log(1:x2))/2-sum(log([1:1-1 1:x2-1+1]));
if x1>0
% gamma_sum(1) = gamma(beta)/gamma(alpha+beta)/...
% prod((alpha+beta:alpha+beta+x1-1));
% gamma_sum(1) = prod(1./(alpha+beta:alpha+beta+x1-1));
gamma_sum(1) = sum(log(1./(alpha+beta:alpha+beta+x1-1)));
else
% gamma_sum(1) = gamma(beta)/gamma(alpha+beta);
% gamma_sum(1) = 1;
gamma_sum(1) = log(1);
end
for i = 2:x2+1
% coef(i) = prod((1:x2)./[1:i-1 1:x2-i+1]);
% coef(i) = exp(sum(log(1:x2))/2-sum(log([1:i-1 1:x2-i+1])));
coef(i) = sum(log(1:x2))/2-sum(log([1:i-1 1:x2-i+1]));
% coef(i) = prod(1./[1:i-1 1:x2-i+1])*exp(sum(log(1:x2))/2);
% gamma_sum(i) = prod((beta:beta+i-2)./(alpha+beta:alpha+beta+i-2))*prod(1./(alpha+beta+i-1:alpha+beta+x1+i-2));%% den has x1+i-1 terms
gamma_sum(i) = sum(log((beta:beta+i-2)./(alpha+beta:alpha+beta+i-2)))+sum(log(1./(alpha+beta+i-1:alpha+beta+x1+i-2)));
end
In the code, coef is the Ai, and gamma_sum is the rest part. Just found that when x2, i.e. the number of the terms of the gamma terms, the computing time is really troublesome. P.S: I tried to replace all for loop with matrix operation, but when x2 increases the matrix size also makes the computing time consuming. Is there any way to solve the problem, like use some other method to solve the digit problem(number exceeds 1e300 or number less than e-200) more efficiently, i.e. guarantee the precision and increase the speed.
This might make your system slower, but you can try vpa() for your big numbers, if you need a high precision and if you have a Symbolic Math toolbox. Here is the example:
>> exp(1000)
ans =
Inf
>> vpa('exp(1000)',1000)
ans =
197007111401704699388887935224332312531693798532384578995280299138506385078244119347497807656302688993096381798752022693598298173054461289923262783660152825232320535169584566756192271567602788071422466826314006855168508653497941660316045367817938092905299728580132869945856470286534375900456564355589156220422320260518826112288638358372248724725214506150418881937494100871264232248436315760560377439930623959705844189509050047074217568.2267578083308102070668818911968536445918206584929433885943734416066833995904928281627706135987730904979566512246702227965470280600169740154332169201122794194769119334980240147712089576923975942544366215939426101781299421858554271852298015286303411058042095685866168239536053428580900735188184273075136717125183129388223688310255949141146674987544438726686065824907707203395789112200325628195551034220107289821072957315749621922062772097208051047568893649549635990627082681006282905378167473398226026683503867394140748723651685213836918959449223430784235236845739442
In this way you would be able to use enormous numbers in your calculations at the expense of increased memory usage and lower speed.

Vectorizing the solution of a linear equation system in MATLAB

Summary: This question deals with the improvement of an algorithm for the computation of linear regression.
I have a 3D (dlMAT) array representing monochrome photographs of the same scene taken at different exposure times (the vector IT) . Mathematically, every vector along the 3rd dimension of dlMAT represents a separate linear regression problem that needs to be solved. The equation whose coefficients need to be estimated is of the form:
DL = R*IT^P, where DL and IT are obtained experimentally and R and P must be estimated.
The above equation can be transformed into a simple linear model after applying a logarithm:
log(DL) = log(R) + P*log(IT) => y = a + b*x
Presented below is the most "naive" way to solve this system of equations, which essentially involves iterating over all "3rd dimension vectors" and fitting a polynomial of order 1 to (IT,DL(ind1,ind2,:):
%// Define some nominal values:
R = 0.3;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(3)+P;
rMAT = 0.1*randn(3)+R;
%// Generate "fake" observation data:
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%// Regression:
sol = cell(size(rMAT)); %// preallocation
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedP = cellfun(#(x)x(1),sol); %// Estimate of pMAT
fittedR = cellfun(#(x)exp(x(2)),sol); %// Estimate of rMAT
The above approach seems like a good candidate for vectorization, since it does not utilize MATLAB's main strength that is MATrix operations. For this reason, it does not scale very well and takes much longer to execute than I think it should.
There exist alternative ways to perform this computation based on matrix division, as demonstrated here and here, which involve something like this:
sol = [ones(size(x)),log(x)]\log(y);
That is, appending a vector of 1s to the observations, followed by mldivide to solve the equation system.
The main challenge I'm facing is how to adapt my data to the algorithm (or vice versa).
Question #1: How can the matrix-division-based solution be extended to solve the problem presented above (and potentially replace the loops I am using)?
Question #2 (bonus): What is the principle behind this matrix-division-based solution?
The secret ingredient behind the solution that includes matrix division is the Vandermonde matrix. The question discusses a linear problem (linear regression), and those can always be formulated as a matrix problem, which \ (mldivide) can solve in a mean-square error sense‡. Such an algorithm, solving a similar problem, is demonstrated and explained in this answer.
Below is benchmarking code that compares the original solution with two alternatives suggested in chat1, 2 :
function regressionBenchmark(numEl)
clc
if nargin<1, numEl=10; end
%// Define some nominal values:
R = 5;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(numEl)+P;
rMAT = 0.1*randn(numEl)+R;
%// Generate "fake" measurement data using the relation "DL = R*IT.^P"
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%% // Method1: loops + polyval
disp('-------------------------------Method 1: loops + polyval')
tic; [fR,fP] = method1(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method2: loops + Vandermonde
disp('-------------------------------Method 2: loops + Vandermonde')
tic; [fR,fP] = method2(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method3: vectorized Vandermonde
disp('-------------------------------Method 3: vectorized Vandermonde')
tic; [fR,fP] = method3(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
function [fittedR,fittedP] = method1(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method2(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = flipud([ones(numel(IT),1) log(IT(:))]\log(squeeze(dlMAT(ind1,ind2,:)))).'; %'
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method3(IT,dlMAT)
N = 1; %// Degree of polynomial
VM = bsxfun(#power, log(IT(:)), 0:N); %// Vandermonde matrix
result = fliplr((VM\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
%// Compressed version:
%// result = fliplr(([ones(numel(IT),1) log(IT(:))]\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
fittedR = exp(real(reshape(result(:,2),size(dlMAT,1),size(dlMAT,2))));
fittedP = real(reshape(result(:,1),size(dlMAT,1),size(dlMAT,2)));
The reason why method 2 can be vectorized into method 3 is essentially that matrix multiplication can be separated by the columns of the second matrix. If A*B produces matrix X, then by definition A*B(:,n) gives X(:,n) for any n. Moving A to the right-hand side with mldivide, this means that the divisions A\X(:,n) can be done in one go for all n with A\X. The same holds for an overdetermined system (linear regression problem), in which there is no exact solution in general, and mldivide finds the matrix that minimizes the mean-square error. In this case too, the operations A\X(:,n) (method 2) can be done in one go for all n with A\X (method 3).
The implications of improving the algorithm when increasing the size of dlMAT can be seen below:
For the case of 500*500 (or 2.5E5) elements, the speedup from Method 1 to Method 3 is about x3500!
It is also interesting to observe the output of profile (here, for the case of 500*500):
Method 1
Method 2
Method 3
From the above it is seen that rearranging the elements via squeeze and flipud takes up about half (!) of the runtime of Method 2. It is also seen that some time is lost on the conversion of the solution from cells to matrices.
Since the 3rd solution avoids all of these pitfalls, as well as the loops altogether (which mostly means re-evaluation of the script on every iteration) - it unsurprisingly results in a considerable speedup.
Notes:
There was very little difference between the "compressed" and the "explicit" versions of Method 3 in favor of the "explicit" version. For this reason it was not included in the comparison.
A solution was attempted where the inputs to Method 3 were gpuArray-ed. This did not provide improved performance (and even somewhat degradaed them), possibly due to wrong implementation, or the overhead associated with copying matrices back and forth between RAM and VRAM.

tensile tests in matlab

The problem says:
Three tensile tests were carried out on an aluminum bar. In each test the strain was measured at the same values of stress. The results were
where the units of strain are mm/m.Use linear regression to estimate the modulus of elasticity of the bar (modulus of elasticity = stress/strain).
I used this program for this problem:
function coeff = polynFit(xData,yData,m)
% Returns the coefficients of the polynomial
% a(1)*x^(m-1) + a(2)*x^(m-2) + ... + a(m)
% that fits the data points in the least squares sense.
% USAGE: coeff = polynFit(xData,yData,m)
% xData = x-coordinates of data points.
% yData = y-coordinates of data points.
A = zeros(m); b = zeros(m,1); s = zeros(2*m-1,1);
for i = 1:length(xData)
temp = yData(i);
for j = 1:m
b(j) = b(j) + temp;
temp = temp*xData(i);
end
temp = 1;
for j = 1:2*m-1
s(j) = s(j) + temp;
temp = temp*xData(i);
end
end
for i = 1:m
for j = 1:m
A(i,j) = s(i+j-1);
end
end
% Rearrange coefficients so that coefficient
% of x^(m-1) is first
coeff = flipdim(gaussPiv(A,b),1);
The problem is solved without a program as follows
MY ATTEMPT
T=[34.5,69,103.5,138];
D1=[.46,.95,1.48,1.93];
D2=[.34,1.02,1.51,2.09];
D3=[.73,1.1,1.62,2.12];
Mod1=T./D1;
Mod2=T./D2;
Mod3=T./D3;
xData=T;
yData1=Mod1;
yData2=Mod2;
yData3=Mod3;
coeff1 = polynFit(xData,yData1,2);
coeff2 = polynFit(xData,yData2,2);
coeff3 = polynFit(xData,yData3,2);
x1=(0:.5:190);
y1=coeff1(2)+coeff1(1)*x1;
subplot(1,3,1);
plot(x1,y1,xData,yData1,'o');
y2=coeff2(2)+coeff2(1)*x1;
subplot(1,3,2);
plot(x1,y2,xData,yData2,'o');
y3=coeff3(2)+coeff3(1)*x1;
subplot(1,3,3);
plot(x1,y3,xData,yData3,'o');
What do I have to do to get this result?
As a general advice:
avoid for loops wherever possible.
avoid using i and j as variable names, as they are Matlab built-in names for the imaginary unit (I really hope that disappears in a future release...)
Due to m being an interpreted language, for-loops can be very slow compared to their compiled alternatives. Matlab is named MATtrix LABoratory, meaning it is highly optimized for matrix/array operations. Usually, when there is an operation that cannot be done without a loop, Matlab has a built-in function for it that runs way way faster than a for-loop in Matlab ever will. For example: computing the mean of elements in an array: mean(x). The sum of all elements in an array: sum(x). The standard deviation of elements in an array: std(x). etc. Matlab's power comes from these built-in functions.
So, your problem. You have a linear regression problem. The easiest way in Matlab to solve this problem is this:
%# your data
stress = [ %# in Pa
34.5 69 103.5 138] * 1e6;
strain = [ %# in m/m
0.46 0.95 1.48 1.93
0.34 1.02 1.51 2.09
0.73 1.10 1.62 2.12]' * 1e-3;
%# make linear array for the data
yy = strain(:);
xx = repmat(stress(:), size(strain,2),1);
%# re-formulate the problem into linear system Ax = b
A = [xx ones(size(xx))];
b = yy;
%# solve the linear system
x = A\b;
%# modulus of elasticity is coefficient
%# NOTE: y-offset is relatively small and can be ignored)
E = 1/x(1)
What you did in the function polynFit is done by A\b, but the \-operator is capable of doing it way faster, way more robust and way more flexible than what you tried to do yourself. I'm not saying you shouldn't try to make these thing yourself (please keep on doing that, you learn a lot from it!), I'm saying that for the "real" results, always use the \-operator (and check your own results against it as well).
The backslash operator (type help \ on the command prompt) is extremely useful in many situations, and I advise you learn it and learn it well.
I leave you with this: here's how I would write your polynFit function:
function coeff = polynFit(X,Y,m)
if numel(X) ~= numel(X)
error('polynFit:size_mismathc',...
'number of elements in matrices X and Y must be equal.');
end
%# bad condition number, rank errors, etc. taken care of by \
coeff = bsxfun(#power, X(:), m:-1:0) \ Y(:);
end
I leave it up to you to figure out how this works.