I am using MATLAB to prototype a few matrix multiply techniques and compare efficiency. Eventually, I will move the prototype codes to C. It is for a homework assignment where we need to write an efficient matrix multiply routine (by being aware of cache size, locality, etc.).
I am curious about the efficiency differences between these two very similar loops:
Matrix Multiply Loop 1
- sum over columns of A times elements of B -> column of C
function [C] = dgemm_naivepe( A,B,C,n )
for j=1:n
tempcol=zeros(n,1);
for k=1:n
for i=1:n
tempcol(i)=tempcol(i)+A(i+(k-1)*n)*B(k+(j-1)*n);
end
end
for k=1:n
C(k+(j-1)*n)=tempcol(k);
end
end
end
Matrix Multiply Loop 2
- sum over columns of A times elements of B -> column of C
function [C] = dgemm_naivepe( A,B,C,n )
for j=1:n
for k=1:n
for i=1:n
C(i+(j-1)*n)=C(i+(j-1)*n)+A(i+(k-1)*n)*B(k+(j-1)*n);
end
end
end
end
After several test runs of various matrix sizes, I found that Loop 1 is faster than Loop 2. Could someone help me understand why this is?
edit: As you can see, the matrices are stored as 1D arrays in column major ordering.
Related
Given two random variables X and Y, where X=(x1,..,xn) and Y=(y1,...,yn) in a nx2 matrix A, so A=[X Y], i need to perform the next operation:
median((x-median(x))(y-median(y)))
I'm trying to obtain an estimator of the covariance matrix using the median instead the mean, for a nxt matrix where t represents the number of random variables and n the length of the data set.
So far, I made the next code:
for i=1:n
for j=1:n
a1=median(A(:,i));
a2=median(A(:,j));
SMM(i,j)=median(((A(:,i)-a1(ones(t,1),:)).*(A(:,j)-a2(ones(t,1),:))));
end
end
However, theoretically I must obtain a semidefinite (positive or negative) symmetric matrix, however that's not the case with this code.
Am I making any mistake in the code formulation?
Various points:
For each of your columns of A (x, y), the median (a1, a2) doesn't change. You should compute these outside the loops.
The loops go over n, rather than t, which are the variables and the indices to the output matrix.
I would first subtract the median from each column, to avoid repeatedly doing the same computations:
A = A - median(A,1); % be explicit about which dimension to take the median over!
Next, we'd loop over the txt output elements of the covariance matrix, and compute each of the elements:
t = size(A,2);
SMM = zeros(t,t); % always preallocate output arrays before a loop
for j=1:t
for i=1:t
SMM(i,j) = median(A(:,i).*A(:,j));
end
end
The loop can likely be vectorized, but that leads to a large intermediate matrix, which slow down code also. So it might not be worth the effort to vectorize. Only try it if this code is too slow!
It should also be possible to run the inner loop from i=j:t, to skip computing the redundant half of the symmetric matrix, instead copying over the previously computed values.
My code works however it's fairly slow and i need to run it multiple times so it's very inefficient. i am positive there is a more efficient way of calculating it.
The code is an implementation of this equation:
where k(x,y) is the dot product of the two vectors
xi and yj are the rows i,j of the two matrices A and B, respectively.
I'd like to also note that the number of rows in each matrix is in the thousands.
here is my code
m=size(A,1);
Kxx=0;
for i=1:m
x=A(i,:);
X=A(i+1:end,:);
Kxx=Kxx+2*sum(dot(ones(m-i,1)*x,X,2));
end
Kxx=Kxx/(m*(m-1));
n=size(B,1);
Kyy=0;
for j=1:n
y=B(j,:);
YY=B(j+1:end,:);
Kyy=Kyy+2*sum(dot(ones(n-j,1)*y,YY,2));
end
Kyy=Kyy/(n*(n-1));
Kxy=0;
for i=1:m
x=A(i,:);
for j=1:n
y=B(j,:);
Kxy=Kxy+dot(x,y);
end
end
Kxy=Kxy*2/(m*n);
Dxy=Kxx+Kyy-Kxy;
Your edit makes our jub much easier. Here's what you just have to do for a fully vectorized solution:
C=A*A'; %'
Kxx=sum(sum(C-diag(diag(C))))/m/(m-1);
C=B*B'; %'
Kyy=sum(sum(C-diag(diag(C))))/n/(n-1);
Kxy=2*mean(reshape(A*B.',[],1)); %'
Dxy=Kxx+Kyy-Kxy;
Thanks to #hiandbaii for pointing out that the equivalent of dot for complex vectors involves the conjugate transpose rather than the transpose.
Original, looping version for historical sentimental reasons:
I'm not sure whether the first two loops can be vectorized without huge memory overhead. So while I figure this out, here's a version in which the first two loops are a bit simplified, and the third loop is replaced by a vectorized operation:
%dummy input data
A=rand(5);
B=rand(5);
m=size(A,1);
Kxx=0;
for l=1:m
x=A(l,:);
X=A(l+1:end,:);
Kxx=Kxx+2*sum(X*x.'); %'
end
Kxx=Kxx/(m*(m-1));
n=size(B,1);
Kyy=0;
for l=1:n
y=B(l,:);
YY=B(l+1:end,:);
Kyy=Kyy+2*sum(YY*y.'); %'
end
Kyy=Kyy/(n*(n-1));
Kxy=2*mean(reshape(A*B.',[],1)); %'
Dxy=Kxx+Kyy-Kxy;
I was wondering if someone could help me with my problem.
Let say that I have the coordinates of MxN vectors in a tensor r of dimensions [M,N,3]. I would like to save in a 3M-by-3N block matrix all dyadic products r_0'*r_0, where r_0 is the vector r_0 = r(m,n,:) for some m and n, and I would like to do this without using for loops.
If haven't explain myself correctly, here is an example code that shows what I would like to obtain (but using for loops, of course):
N=10;
M=5;
r=rand(M,N,3);
Dyadic=zeros(3*M,3*N);
for m=1:M
a1=3*m-2;
a2=3*m;
for n=1:N
b1=3*n-2;
b2=3*n;
aux(3)=r(m,n,3);
aux(2)=r(m,n,2);
aux(1)=r(m,n,1);
Dyadic(a1:a2,b1:b2)=transpose(aux)*aux
end
end
Thanks in advance!
You need to use bsxfun(#times and then re-arrange elements to have the desired output -
%// Get the multipliication result
mat_mult = bsxfun(#times,permute(r,[1 2 4 3]),r);
%// OR if you would like to keep mat_mult as 3D that could be potentially faster -
%// mat_mult = bsxfun(#times,reshape(r,[],3),permute(reshape(r,[],3),[1 3 2]));
%// Re-arrange elements to have them the way you are indexing in nested loops
Dyadic = reshape(permute(reshape(mat_mult,M,N,3,[]),[3 1 4 2]),M*3,N*3);
The major play about this solution is really the re-arrangement of elements after we have the multiplication result.
Quick runtime tests with the input r as 1000 x 1000 x 3 sized array, show that this bsxfun based approach gives over 20x speedup over the nested loop code listed in the question!
I am trying to improve the performance of my code by converting some iterations into matrix operations in Matlab. One of these is the following code and I need to figure out how can I avoid using loop in the operation.
Here gamma_ic & bow are two dimensional matrices.
c & z are variables set from outer iterations.
for z=1:maxNumber,
for c=1:K,
n = 0;
for y2=1:number_documents,
n = n+(gamma_ic(y2,c)*bow(y2,z));
end
mu(z,c) = n / 2.3;
end
end
Appreciate your assistance.
Edit. Added The loop for c and z. The iteration goes on till the maximum indices in gamma_ic & bow. Added mu which is another two dimensional matrix to show usage of n.
This should work for you to get mu, which seems to be the desired output -
mu = bow(1:number_documents,1:maxNumber).'*gamma_ic(1:number_documents,1:K)./2.3
Im am currently writing a code to implement a numerical approximation to the 3D steady state heat equation using finite difference matrix methods. This involves discritising the 2nd order PDE into the matrix A and solving Ax=b. where x is temperature at each of the specified grid points. Further information on this type of question can be found here:
http://people.nas.nasa.gov/~pulliam/Classes/New_notes/Matrix_ODE.pdf
To complete this problem, I have represented the 3D matrix A by a 2D array calling the values in the 1D array b using an indexing function of the form:
i+(j-1)*Nx+Nx*Ny*(k-1)
for the (i,j,k)th element of the 3D matrix where Nx, Ny, Nz are the number of points in the x,y,z coordinates. There ends up being a lot of loop computation in order to create the matrix A and b and I was wondering what is the most computationally efficient and less memory exhaustive way to run these loops, i.e. is it better to use something like
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx)=D4;
end
end
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx+Nx*Ny*(Nz-1))=D3;
end
end
or should I condense these into a single loop like:
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx)=D4;
b(i+(j-1)*Nx+Nx*Ny*(Nz-1))=D3;
end
end
I have preallocated both the arrays A and b. Is there a vectorised way to do this also?
Assuming Nx, Ny, Nz, D3 and D4 to be scalars and that you are using pre-allocation for b with zeros, you may try this vectorized approach -
I = 2:Nx-1; %// Vectors to represent i
J = 1:Ny; %// Vectors to represent j
ind1 = bsxfun(#plus,I,[(J-1)*Nx]'); %//' Indices, 1st set of nested loops
ind2 = bsxfun(#plus,I,[(J-1)*Nx+Nx*Ny*(Nz-1)]'); %//' Indices, 2nd set of loops
b(ind1) = D4; %// Assign values for 1st set
b(ind2) = D3; %// Assign values for 2nd set
The second method should be slightly faster since it performs the same number of calculations with fewer increments of the loop variables. You can look into MATLAB's built-in stopwatch commands tic and toc to time your code. http://www.mathworks.com/help/matlab/ref/tic.html
Something more vectorized might be possible but I would need to know more about the format of the arrays that contain D3 and D4. The reshape() function might be able to help.