Vectorization of double for loop including sine of two variables - matlab

I need to numerically evaluate some integrals which are all of the form shown in this image:
These integrals are the matrix elements of a N x N matrix, so I need to evaluate them for all possible combinations of n and m in the range of 1 to N. The integrals are symmetric in n and m which I have implemented in my current nested for loop approach:
function [V] = coulomb3(N, l, R, R0, c, x)
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2];
rl1 = r1.^(2*l);
rl2 = r2.^(2*l);
sines = zeros(N, length(r));
V = zeros(N, N);
for i = 1:N;
sines(i, :) = sin(i*pi*r/R0);
end
x1 = length(r1);
x2 = length(r);
for nn = 1:N
for mm = 1:nn
f1 = (1/6)*rl1.*r1.^2.*sines(nn, 1:x1).*sines(mm, 1:x1);
f2 = ((R^2/2)*rl2 - (R^3/3)*rl2.*r2.^(-1)).*sines(nn, x1+1:x2).*sines(mm, x1+1:x2);
value = 4*pi*c*x*trapz([f1 f2]);
V(nn, mm) = value;
V(mm, nn) = value;
end
end
I figured that calling sin(x) in the loop was a bad idea, so I calculate all the needed values and store them. To evaluate the integrals I used trapz, but as the first and the second/third integrals have different ranges the function values need to be calculated separately and then combined.
I've tried a couple different ways of vectorization but the only one that gives the correct results takes much longer than the above loop (used gmultiply but the arrays created are enourmous). I've also made an analytical solution (which is possible assuming m and n are integers and R0 > R > 0) but these solutions involve a cosine integral (cosint in MATLAB) function which is extremely slow for large N.
I'm not sure the entire thing can be vectorized without creating very large arrays, but the inner loop at least should be possible. Any ideas would be be greatly appreciated!
The inputs I use currently are:
R0 = 1000;
R = 8.4691;
c = 0.393*10^(-2);
x = 0.01;
l = 0 # Can reasonably be 0-6;
N = 20; # Increasing the value will give the same results,
# but I would like to be able to do at least N = 600;
Using these values
V(1, 1:3) = 873,379900963549 -5,80688363271849 -3,38139152472590
Although the diagonal values never converge with increasing R0 so they are less interesting.

You will lose the gain from the symmetricity of the problem with my approach, but this means a factor of 2 loss. Odds are that you'll still benefit in the end.
The idea is to use multidimensional arrays, making use of trapz supporting these inputs. I'll demonstrate the first term in your figure, as the two others should be done similarly, and the point is the technique:
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2].';
rl1 = r1.'.^(2*l);
rl2 = r2.'.^(2*l);
sines = zeros(length(r),N); %// CHANGED!!
%// V = zeros(N, N); not needed now, see later
%// you can define sines in a vectorized way as well:
sines = sin(r*(1:N)*pi/R0); %//' now size [Nr, N] !
%// note that implicitly r is of size [Nr, 1, 1]
%// and sines is of size [Nr, N, 1]
sines2mat = permute(sines,[1, 3, 2]); %// size [Nr, 1, N]
%// the first term in V: perform integral along first dimension
%//V1 = 1/6*squeeze(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1))*x; %// 4*pi*c prefactor might be physics, not math
V1 = 1/6*permute(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1),[2,3,1])*x; %// 4*pi*c prefactor might be physics, not math
The key point is that bsxfun(#times,r.^(2*l+2),sines) is a matrix of size [Nr,N,1], which is again multiplied by sines2mat using bsxfun, the result is of size [Nr,N,N] and an element (k1,k2,k3) corresponds to an integrand at radial point k1, n=k2 and m=k3. Using trapz() with explicitly the first dimension (which would be default) reduces this to an array of size [1,N,N], which is just what you need after a good squeeze(). Update: as per #Dev-iL's comment you should use permute instead of squeeze to get rid of the leading singleton dimension, as that might be more efficent.
The two other terms can be handled the same way, and of course it might still help if you restructure the integrals based on overlapping and non-overlapping parts.

Related

MATLAB Indexing Conventions for Vectors / 1D-Arrays

Consider the preallocation of the following two vectors:
vecCol = NaN( 3, 1 );
vecRow = NaN( 1, 3 );
Now the goal is to assign values to those vectors (e.g. within a loop if vectorization is not possible). Is there a convention or best practice regarding the indexing?
Is the following approach recommended?
for k = 1:3
vecCol( k, 1 ) = 1; % Row, Column
vecRow( 1, k ) = 2; % Row, Column
end
Or is it better to code as follows?
for k = 1:3
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
It makes no difference functionally. If the context means that the vectors are always 1D (your naming convention in this example helps) then you can just use vecCol(i) for brevity and flexibility. However, there are some advantages to using the vecCol(i,1) syntax:
It's explicitly clear which type of vector you're using. This is good if it matters, e.g. when using linear algebra, but might be irrelevant if direction is arbitrary.
If you forget to initialise (bad but it happens) then it will ensure the direction is as expected
It's a good habit to get into so you don't forget when using 2D arrays
It appears to be slightly quicker. This will be negligible on small arrays but see the below benchmark for vectors with 10^8 elements, and a speed improvement of >10%.
function benchie()
% Benchmark. Set up large row/column vectors, time value assignment using timeit.
n = 1e8;
vecCol = NaN(n, 1); vecRow = NaN(1, n);
f = #()fullidx(vecCol, vecRow, n);
s = #()singleidx(vecCol, vecRow, n);
timeit(f)
timeit(s)
end
function fullidx(vecCol, vecRow, n)
% 2D indexing, copied from the example in question
for k = 1:n
vecCol(k, 1) = 1; % Row, Column
vecRow(1, k) = 2; % Row, Column
end
end
function singleidx(vecCol, vecRow, n)
% Element indexing, copied from the example in question
for k = 1:n
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
end
Output (tested on Windows 64-bit R2015b, your mileage may vary!)
% f (full indexing): 2.4874 secs
% s (element indexing): 2.8456 secs
Iterating this benchmark over increasing n, we can produce the following plot for reference.
A general rule of thumb in programming is "explicit is better than implicit". Since there is no functional difference between the two, I'd say it depends on context which one is cleaner/better:
if the context uses a lot of matrix algebra and the distinction between row and column vectors is important, use the 2-argument indexing to reduce bugs and facilitate reading
if the context doesn't disciminate much between the two and you're just using vectors as simple arrays, using 1-argument indexing is cleaner

"out of memory" error for mvregress in matlab

I am trying to use mvregress with the data I have with dimensionality of a couple of hundreds. (3~4). Using 32 gb of ram, I can not compute beta and I get "out of memory" message. I couldn't find any limitation of use for mvregress that prevents me to apply it on vectors with this degree of dimensionality, am I doing something wrong? is there any way to use multivar linear regression via my data?
here is an example of what goes wrong:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
%without residual term:
A_hat=mvregress(X',Y');
%wit residual term:
[B, y_hat]=mlrtrain(X,Y)
where
function [B, y_hat]=mlrtrain(X,Y)
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = [kron([Xmat(i,:)],eye(d))];
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
end
the error is:
Error using bsxfun
Out of memory. Type HELP MEMORY for your options.
Error in kron (line 36)
K = reshape(bsxfun(#times,A,B),[ma*mb na*nb]);
Error in mvregress (line 319)
c{j} = kron(eye(NumSeries),Design(j,:));
and this is result of whos command:
whos
Name Size Bytes Class Attributes
A 400x400 1280000 double
N 400x1000 3200000 double
X 400x1000 3200000 double
Y 400x1000 3200000 double
dataVariance 1x1 8 double
dim 1x1 8 double
mixtureCenters 400x1 3200 double
noiseVariance 1x1 8 double
nsamp 1x1 8 double
Okay, I think I have a solution for you, short version first:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = kron(Xmat(i,:),speye(d));
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
Strangely, I could not access the function's workspace, it did not appear in the call stack. This is why I put the function after the script here.
Here's the explanation that might also help you in the future:
Looking at the kron definition, the result when inserting an m by n and a p by q matrix has size mxp by nxq, in your case 400 by 1001 and 1000 by 1000, that makes a 400000 by 1001000 matrix, which has 4*10^11 elements. Now you have four hundred of them, and each element takes up 8 bytes for double precision, that is a total size of about 1.281 Petabytes of memory (or 1.138 Pebibytes, if you prefer), well out of reach even with your grand 32 Gibibyte.
Seeing that one of your matrices, the eye one, contains mostly zeros, and the resulting matrix contains all possible element product combinations, most of them will be zero, too. For such cases specifically, MATLAB offers the sparse matrix format, which saves a lot of memory depending on the number of zero elements in a matrix by only storing nonzero ones. You can convert a full matrix to a sparse representation with sparse(X), or you get an eye matrix directly by using speye(n), which is what I did above. The sparse property propagates to the result, which you should now have enough memory for (I have with 1/4 of your memory available, and it works).
However, what remains is the problem Matthew Gunn mentioned in a comment. I get an error saying:
Error using mvregress (line 260)
Insufficient data to estimate either full or least-squares models.
Preface
If your regressors are all the same across each regression equation and you're interested in the OLS estimate, you can replace a call to mvregress with a simple call to \.
It appears in the call to mlrtrain you had a matrix transposition error (since corrected). In the language of mvregress, n is the number of observations, d is the number of outcome variables. You generate a matrix Y that is d by n. But THEN when you should call mlrtrain(X', Y') not mlrtrain(X, Y).
If below isn't specifically, what you're looking for, I suggest you precisely define what you're trying to estimate.
What I would have written if I were you
So much that's been said here is completely off base that I'm posting code of what I would have written if I were you. I've reduced the dimensionality to show the equivalence in your special case to simply calling \. I've also written stuff in a more standard way (i.e. having observations run down the rows and not making matrix transposition errors).
dim=5; % These can go way higher but only if you use my code
nsamp=20; % rather than call mvregress
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X = randn(nsamp, dim)*sqrt(dataVariance ) + repmat(mixtureCenters', nsamp, 1); %'
E = randn(nsamp, dim)*sqrt(noiseVariance); %noise should be mean zero
B = 2*eye(dim);
Y = X*B+E;
% without constant:
B_hat = mvregress(X,Y); %<-------- slow, blows up with high dimension
B_hat2 = X \ Y; %<-------- fast, fine with higher dimensions
norm(B_hat - B_hat2) % show numerical equivalent if basically 0
% with constant:
B_constant_hat = mlrtrain(X,Y) %<-------- slow, blows up with high dimension
B_constant_hat2 = [ones(nsamp, 1), X] \ Y; % <-- fast, and fine with higher dimensions
norm(B_constant_hat - B_constant_hat2) % show numerical equivalent if basically 0
Explanation
I'll assume you have:
An nsamp by dim sized data matrix X.
An nsamp by ny sized matrix of outcome variables Y
You want the results from regressing each column of Y on data matrix X. That is, we're doing multivariate regression but there's a common data matrix X.
That is, we're estimating:
y_{ij} = \sum_k b_k * x_{ik} + e_{ijk} for i=1...nsamp, j = 1...ny, k=1...dim
If you're trying to do something different than this, you need to clearly state what you're trying to do!
To regress Y on X you could do:
[beta_mvr, sigma_mvr, resid_mvr] = mvregress(X, Y);
This appears to be horribly slow. The following should match mvregress for the case where you're using the same data matrix for each regression.
beta_hat = X \ Y; % estimate beta using least squares
resid = Y - X * beta_hat; % calculate residual
If you want to construct a new data matrix with a vector of ones, you would do:
X_withones = [ones(nsamp, 1), X];
Further clarification for some that are confused
Let's say we want to run the regression
y_i = \sum_j x_{ij} + e_i i=1...n, j=1...k
We can construct the data matrix n by k datamatrix X and an n by 1 outcome vector y. The OLS estimate is bhat = pinv(X' * X) * X' * y which can also be computed in MATLAB with bhat = X \ y.
If you want to do this multiple times (i.e. run multivariate regression on the same data matrix X), you can construct an outcome matrix Y where EACH column represents a separate outcome variable. Y = [ya, yb, yc, ...]. Trivially, the OLS solution is B = pinv(X'*X)*X'*Y which can be computed as B = X \ Y. The first column of B is the result of regressing Y(:,1) on X. The second column of B is the result of regressing Y(:,2) on X, etc... Under these conditions, this is equivalent to a call to B = mvregress(X, Y)
Even more test code
If regressors are the same and estimation is by simple OLS, there is an equivalence between multivariate regression and equation by equation ordinary least squares.
d = 10;
k = 15;
n = 100;
C = RandomCorr(d + k, 1); %Use any method you like to generate a random correlation matrix
s = randn(d+k , 1) * 10;
S = (s * s') .* C; % generate covariance matrix
mu = randn(d+k,1);
data = mvnrnd(ones(n, 1) * mu', S);
Y = data(:,1:d);
X = data(:,d+1:end);
[b1, sigma] = mvregress(X, Y);
b2 = X \ Y;
norm(b1 - b2)
You will notice b1 and b2 are numerically equivalent. They are equivalent even though sigma is EXTREMELY different from zero.

Determining regression coefficients for data - MATLAB

I am doing a project involving scientific computing. The following are three variables and their values I got after some experiments.
There is also an equation with three unknowns, a, b and c:
x=(a+0.98)/y+(b+0.7)/z+c
How do I get values of a,b,c using the above? Is this possible in MATLAB?
This sounds like a regression problem. Assuming that the unexplained errors in measurements are Gaussian distributed, you can find the parameters via least squares. Basically, you'd have to rewrite the equation so that you get this to the form of ma + nb + oc = p and then you have 6 equations with 3 unknowns (a, b, c) and these parameters can be found through optimization by least squares. Therefore, with some algebra, we get:
za + yb + yzc = xyz - 0.98z - 0.7z
As such, m = z, n = y, o = yz, p = xyz - 0.98z - 0.7z. I'll leave that for you as an exercise to verify that my algebra is right. You can then form the matrix equation:
Ax = d
We would have 6 equations and we want to solve for x where x = [a b c]^{T}. To solve for x, you can employ what is known as the pseudoinverse to retrieve the parameters that best minimize the error between the true output and the output that is generated by these parameters if you were to use the same input data.
In other words:
x = A^{+}d
A^{+} is the pseudoinverse of the matrix A and is matrix-vector multiplied with the vector d.
To put our thoughts into code, we would define our input data, form the A matrix and d vector where each row shared between them both is one equation, and then employ the pseudoinverse to find our parameters. You can use the ldivide (\) operator to do the job:
%// Define x y and z
x = [9.98; 8.3; 8.0; 7; 1; 12.87];
y = [7.9; 7.5; 7.4; 6.09; 0.9; 11.23];
z = [7.1; 5.6; 5.9; 5.8; -1.8; 10.8];
%// Define A matrix
A = [z y y.*z];
%// Define d vector
d = x.*y.*z - 0.98*z - 0.7*z;
%// Find parameters via least-squares
params = A\d;
params stores the parameters a, b and c, and we get:
params =
-37.7383
-37.4008
19.5625
If you want to double-check how close the values are, you can simply use the above expression in your post and compare with each of the values in x:
a = params(1); b = params(2); c = params(3);
out = (a+0.98)./y+(b+0.7)./z+c;
disp([x out])
9.9800 9.7404
8.3000 8.1077
8.0000 8.3747
7.0000 7.1989
1.0000 -0.8908
12.8700 12.8910
You can see that it's not exactly close, but the parameters you got would be the best in a least-squares error sense.
Bonus - Fitting with RANSAC
You can see that some of the predicted values (right column in the output) are more off than others. This is because we used all points in your data to find the appropriate model. One technique that is used to minimize error and increase the robustness of the model estimation is to use something called RANSAC, or RANdom SAmple Consensus. The basic methodology behind RANSAC is that for a certain number of iterations, you take your data and randomly sample the least amount of points necessary to find a model. Once you find this model, you find the overall error if you were to use these parameters to describe your data. You keep randomly choosing points, finding your model, and finding the error and the iteration that produced the least amount of error would be the parameters you keep to define the overall model.
As you can see above, one error that we can define is the sum of absolute differences between the true x points and the predicted x points. There are many other measures, such as the sum of squared errors, but let's stick with something simple for now. If you take a look at the above formulation, we need a minimum of three equations in order to define a, b and c, and so for each iteration, we'd randomly select three points without replacement I might add, find our model, determine the error, and keep iterating and finding the parameters with the least amount of error.
Therefore, you could write a RANSAC algorithm like so:
%// Define cost and number of iterations
cost = Inf;
iterations = 50;
%// Set seed for reproducibility
rng(123);
%// Define x y and z
x = [9.98; 8.3; 8.0; 7; 1; 12.87];
y = [7.9; 7.5; 7.4; 6.09; 0.9; 11.23];
z = [7.1; 5.6; 5.9; 5.8; -1.8; 10.8];
for idx = 1 : iterations
%// Determine where we would need to sample
ind = randperm(numel(x), 3);
xs = x(ind); ys = y(ind); zs = z(ind); %// Sample
%// Define A matrix
A = [zs ys ys.*zs];
%// Define d vector
d = xs.*ys.*zs - 0.98*zs - 0.7*zs;
%// Find parameters via least-squares
params = A\d;
%// Determine error
a = params(1); b = params(2); c = params(3);
out = (a+0.98)./y+(b+0.7)./z+c;
err = sum(abs(x - out));
%// If error produced is less than current error
%// then save parameters
if err < cost
cost = err;
final_params = params;
end
end
When I run the above code, I get for my parameters:
final_params =
-38.1519
-39.1988
19.7472
Comparing this with our x, we get:
a = final_params(1); b = final_params(2); c = final_params(3);
out = (a+0.98)./y+(b+0.7)./z+c;
disp([x out])
9.9800 9.6196
8.3000 7.9162
8.0000 8.1988
7.0000 7.0057
1.0000 -0.1667
12.8700 12.8725
As you can see, the values are improved - especially the fourth and sixth points... and compare it to the previous version:
9.9800 9.7404
8.3000 8.1077
8.0000 8.3747
7.0000 7.1989
1.0000 -0.8908
12.8700 12.8910
You can see that the second value is worse off than the previous version, but the other numbers are much more closer to the true values.
Have fun!

MATLAB: Block matrix multiplying without loops

I have a block matrix [A B C...] and a matrix D (all 2-dimensional). D has dimensions y-by-y, and A, B, C, etc are each z-by-y. Basically, what I want to compute is the matrix [D*(A'); D*(B'); D*(C');...], where X' refers to the transpose of X. However, I want to accomplish this without loops for speed considerations.
I have been playing with the reshape command for several hours now, and I know how to use it in other cases, but this use case is different from the other ones and I cannot figure it out. I also would like to avoid using multi-dimensional matrices if at all possible.
Honestly, a loop is probably the best way to do it. In my image-processing work I found a well-written loop that takes advantage of Matlab's JIT compiler is often faster than all the extra overhead of manipulating the data to be able to use a vectorised operation. A loop like this:
[m n] = size(A);
T = zeros(m, n);
AT = A';
for ii=1:m:n
T(:, ii:ii+m-1) = D * AT(ii:ii+m-1, :);
end
contains only built-in operators and the bare minimum of copying, and given the JIT is going to be hard to beat. Even if you want to factor in interpreter overhead it's still only a single statement with no functions to consider.
The "loop-free" version with extra faffing around and memory copying, is to split the matrix and iterate over the blocks with a hidden loop:
blksize = size(D, 1);
blkcnt = size(A, 2) / blksize;
blocks = mat2cell(A, blksize, repmat(blksize,1,blkcnt));
blocks = cellfun(#(x) D*x', blocks, 'UniformOutput', false);
T = cell2mat(blocks);
Of course, if you have access to the Image Processing Toolbox, you can also cheat horribly:
T = blockproc(A, size(D), #(x) D*x.data');
Prospective approach & Solution Code
Given:
M is the block matrix [A B C...], where each A, B, C etc. are of size z x y. Let the number of such matrices be num_mat for easy reference later on.
If those matrices are concatenated along the columns, then M would be of size z x num_mat*y.
D is the matrix to be multiplied with each of those matrices A, B, C etc. and is of size y x y.
Now, as stated in the problem, the output you are after is [D*(A'); D*(B'); D*(C');...], i.e. the multiplication results being concatenated along the rows.
If you are okay with those multiplication results to be concatenated along the columns instead i.e. [D*(A') D*(B') D*(C') ...],
you can achieve the same with some reshaping and then performing the
matrix multiplications for the entire M with D and thus have a vectorized no-loop approach. Thus, to get such a matrix multiplication result, you can do -
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
But, if you HAVE to get an output with the multiplication results being concatenated along the rows, you need to do some more reshaping like so -
out = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
Benchmarking
This section covers benchmarking codes comparing the proposed vectorized approach against a naive JIT powered loopy approach to get the desired output. As discussed earlier, depending on how the output array must hold the multiplication results, you can have two cases.
Case I: Multiplication results concatenated along the columns
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(z,y*num_mat);
for k1 = 1:y:y*num_mat
out1(:,k1:k1+y-1) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
toc
Case II: Multiplication results concatenated along the rows
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(y*num_mat,z);
for k1 = 1:y:y*num_mat
out1(k1:k1+y-1,:) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
out2 = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
toc
Runtimes
Case I:
---------------------------- With loopy approach
Elapsed time is 3.889852 seconds.
---------------------------- With proposed approach
Elapsed time is 3.051376 seconds.
Case II:
---------------------------- With loopy approach
Elapsed time is 3.798058 seconds.
---------------------------- With proposed approach
Elapsed time is 3.292559 seconds.
Conclusions
The runtimes suggest about a good 25% speedup with the proposed vectorized approach! So, hopefully this works out for you!
If you want to get A, B, and C from a bigger matrix you can do this, assuming the bigger matrix is called X:
A = X(:,1:y)
B = X(:,y+1:2*y)
C = X(:,2*y+1:3*y)
If there are N such matrices, the best way is to use reshape like:
F = reshape(X, x,y,N)
Then use a loop to generate a new matrix I call it F1 as:
F1=[];
for n=1:N
F1 = [F1 F(:,:,n)'];
end
Then compute F2 as:
F2 = D*F1;
and finally get your result as:
R = reshape(F2,N*y,x)
Note: this for loop does not slow you down as it is just to reformat the matrix and the multiplication is done in matrix form.

Vectorising 5 nested FOR loops

I am writing a program in MATLAB as a part of my project based on DFT.
Let the N x N data matrix be X and the corresponding DFT matrix be Y, then the DFT coefficients can be expressed as
Y(k1,k2) = ∑(n1=0:N-1)∑(n2=0:N-1)[X(n1,n2)*(WN^(n1k1+n2k2))] (1)
0≤k1,k2≤N-1
Where WN^k=e^((-j2πk)/N)
Since the twiddle factor WN is periodic, (1) can be expressed as
Y(k1,k2)=∑(n1=0:N-1)∑(n1=0:N-1)[X(n1,n2)*(WN^([(n1k1+n2k2)mod N) ] (2)
The exponent ((n1k1 +n2k2)) N = p is satisfied by a set of (n1,n2) for a given (k1,k2). Hence, by grouping such data and applying the property that WN^(p+N /2) = -(WN^P),
(2) can be expressed as
Y(k1,k2)= ∑(p=0:M-1)[Y(k1,k2,p)*(WN^p)] (3)
Where
Y(k1,k2,p)= ∑(∀(n1,n2)|z=p)X(n1,n2) - ∑(∀(n1,n2)|z=p+M)X(n1,n2) (4)
z=[(n1k1+n2k2)mod N] (5)
I am coding a program to find Y(k1,k2,p).ie I need to create slices of 2d matrices(ie a 3D matrix in which each slice is a 2D matrix )from a given 2D square matrix (which is the matrix X)..Dimensions of X can be upto 512.
Based on the above equations,I have written a code as follows.I need to vectorise it.
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
for n2=1:N
N1=n1-1; N2=n2-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod((N1*K1+N2*K2),N);
if (z==P)
Y(k1,k2,p)= Y(k1,k2,p)+ X(n1,n2);
elsif (z==(P+M))
Y(k1,k2,p)= Y(k1,k2,p)- X(n1,n2);
end
end
end
end
end
As there is 5 FOR loops, the execution time is very large for large dimensions of N. Hence please provide me a solution for eliminating the FOR loops and vectorising the code..I need to make the code execute in maximum speed...Thanks Again..
Here is a first hint to vectorize the most inner loop.
From your code, we can notice that n1, N1, P, K1 and K2 are constant in this loop.
So we can rewrite z as a mask vector as follows:
z = mod(N1*K1+K2*(0:N-1));
Then your if-statement is equivalent to adding the sum of all elements in X so that z==P minus the sum of all elements in X so that z==P+M. Rewriting this is straightforward:
Y(k1,k2,p)= Y(k1,k2,p)+sum(X(n1,z==P))-sum(X(n1,z==P+M));
So your program can be first written as follows:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
N1=n1-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod(N1*K1+K2*(0:N-1),N);
Y(k1,k2,p) = sum(X(n1,z==P))-sum(X(n1,z==P+M));
end
end
end
end
Then you can do the same thing with n1; for that, you need to construct a 2D array for z, such as:
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Notice that size(z)==size(X).Then the 2D sum for Y becomes:
Y(k1,k2,p) = Y(k1,k2,p)+sum(X(z==P))-sum(X(z==P+M));
The += operation is here no longer needed, since you access only once to each element of Y:
Y(k1,k2,p)= sum(X(n1,z==P))-sum(X(n1,z==P+M));
And so we discard one more loop:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
P=p-1; K1=k1-1; K2=k2-1;
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Y(k1,k2,p) = sum(X(z==P))-sum(X(z==P+M));
end
end
end
Concerning the other loops, I don't think it worths it to vectorize them, as you have to build a 5D array, which could be very huge in memory. My advise is to keep z as a 2D array, as it is of the size of X. If it does not fit well in memory, just vectorize the most inner loop.