Vectorizing code - matlab

I dont quite get the vectorizing way of thinking of matlab, mostly due to the simple examples provided in the documentation, and i hope someone can help me understand it a little better.
So, what i'm trying to accomplish is to take a sample of NxN from a matrix of ncols x nrows x ielements and compute the average for each ielement and store the maximum of the averages. Using for loops, the code would look like this:
for x = 1+margin : nrows-margin
for y = 1+margin : ncols-margin
for i=1:ielem
% take a NxN sample
sample = input_matrix(y-margin:y+margin,x-margin:x+margin,i)
% compute the average of all elements
result(i) = mean2(sample);
end %for i
% store the max of the computed averages
output_matrix(y,x)=max(result);
end %for y
end %for x
can anyone do a good vectorization of this example of a situation ? T

First of all, vectorization is not as important as it once was, due to enhancements in compiling the code before it is ran, but it's still a very common practice and can lead to some enhancements. Older Matlab version executed one line at a time, which would leave a for loop much slower than a vectorized version of the same code.
The part of your matrix that could be vectorized is the inner more for loop. I'll show a simple example of what you are trying to do, I'll let you take the example and put it into your code.
input=randn(5,5,3);
max(mean(mean(input,1),2))
Basically, the inner two mean take the mean of the input array, and the outer max will find the maximum value over the range. If you want, you can break it out step by step, and see what it does. The mean(input,1) will take the mean over the first dimension, mean(input,2) over the second, etc. After the first two means are done, all that is left is a vector, which the max function will easily work. It should be noted that the size of the vector pre-max is [1 1 3], the dimensions are preserved when doing this operation.

Related

Understanding how to count FLOPs

I am having a hard time grasping how to count FLOPs. One moment I think I get it, and the next it makes no sense to me. Some help explaining this would greatly be appreciated. I have looked at all other posts about this topic and none have completely explained in a programming language I am familiar with (I know some MATLAB and FORTRAN).
Here is an example, from one of my books, of what I am trying to do.
For the following piece of code, the total number of flops can be written as (n*(n-1)/2)+(n*(n+1)/2) which is equivalent to n^2 + O(n).
[m,n]=size(A)
nb=n+1;
Aug=[A b];
x=zeros(n,1);
x(n)=Aug(n,nb)/Aug(n,n);
for i=n-1:-1:1
x(i) = (Aug(i,nb)-Aug(i,i+1:n)*x(i+1:n))/Aug(i,i);
end
I am trying to apply the same principle above to find the total number of FLOPs as a function of the number of equations n in the following code (MATLAB).
% e = subdiagonal vector
% f = diagonal vector
% g = superdiagonal vector
% r = right hand side vector
% x = solution vector
n=length(f);
% forward elimination
for k = 2:n
factor = e(k)/f(k­‐1);
f(k) = f(k) – factor*g(k‐1);
r(k) = r(k) – factor*r(k‐1);
end
% back substitution
x(n) = r(n)/f(n);
for k = n‐1:­‐1:1
x(k) = (r(k)‐g(k)*x(k+1))/f(k);
end
I'm by no means expert at MATLAB but I'll have a go.
I notice that none of the lines of your code index ranges of your vectors. Good, that means that every operation I see before me is involving a single pair of numbers. So I think the first loop is 5 FLOPS per iteration, and the second is 3 per iteration. And then there's that single operation in the middle.
However, MATLAB stores everything by default as a double. So the loop variable k is itself being operated on once per loop and then every time an index is calculated from it. So that's an extra 4 for the first loop and 2 for the second.
But wait - the first loop has 'k-1' twice, so in theory one could optimise that a bit by calculating and storing that, reducing the number of FLOPs by one per iteration. The MATLAB interpreter is probably able to spot that sort of optimisation for itself. And for all I know it can work out that k could in fact be an integer and everything is still okay.
So the answer to your question is that it depends. Do you want to know the number of FLOPs the CPU does, or the minimum number expressed in your code (ie the number of operations on your vectors alone), or the strict number of FLOPs that MATLAB would perform if it did no optimisation at all? MATLAB used to have a flops() function to count this sort of thing, but it's not there anymore. I'm not an expert in MATLAB by any means, but I suspect that flops() has gone because the interpreter has gotten too clever and does a lot of optimisation.
I'm slightly curious to know why you wish to know. I used to use flops() to count how many operations a piece of maths did as a crude way of estimating how much computing grunt I'd need to make it work in real time written in C.
Nowadays I look at the primitives themselves (eg there's a 1k complex FFT, that'll be 7us on that CPU according to the library datasheet, there's a 2k vector multiply, that'll be 2.5us, etc). It gets a bit tricky because one has to consider cache speeds, data set sizes, etc. The maths libraries (eg fftw) themselves are effectively opaque so that's all one can do.
So if you're counting the FLOPs for that reason you'll probably not get a very good answer.

Duplicating a 2d matrix in matlab along a 3rd axis MANY times

I'm looking to duplication a 784x784 matrix in matlab along a 3rd axis. The following code seems to work:
mat = reshape(repmat(mat, 1,10000),784,784,10000);
Unfortunately, it takes so long to run it's worthless (changing the 10,000s to 1000 makes it take a few minutes, and using 10,000 makes my whole machine freeze up practically). is there a faster way to do this?
For reference, I'm looking to use mvnpdf on 10,000 vectors each of length 784, using the same covariance matrix for each. So my final call looks like
mvnpdf(X,mu,mat)
%size(X) = (10000,784), size(mu) = (10000,784), size(mat) = 784,784,10000
If there's a way to do this that's not repeating the covariance matrix 10,000 times, that'd be helpful too. Thanks!
For replication in more than 2 dimensions, you need to supply the replication counts as an array:
out = repmat(mat,[1,1,10000])
Creating a 784x784 matrix 10,000 times isn't going to take advantage of the vectorization in MATLAB, which is going to be more useful for small arrays. Avoiding a for loop also won't help too much, given the following:
The main speedup you can gain here is by computing the inverse of the covariance matrix once, and then computing the pdf yourself. The inverse of sigma takes O(n^3), and you are needlessly doing that 10,000 times. (Also, the square root determinant can be precomputed.) For reference, the PDF of the multivariate normal distribution is computed as follows:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Properties
Better to just compute the inverse once, and then compute z = x - mu for each value, then doing z'Sz for each pdf value, and applying a simple function and a constant. But wait! You can vectorize that, too.
I don't have MATLAB in front of me, but this is basically what you need to do, and it'll run in an instant.
s = inv(sigma);
c = -0.5*log(det(s)) - (k/2)*log(2*pi);
z = x - mu; % 10000 x 784 matrix
ans = exp( c - 0.5 .* dot(z*s, z, 2) ); % 10000 x 1 vector

MATLAB: Convolution of Matrix Valued Function

I've written this code to perform the 1-d convolution of a 2-d matrix valued function (k is my time index, kend is on the order of 10e3). Is there a faster or cleaner way to do this, perhaps using built in functions?
for k=1:kend
C(:,:,k)=zeros(3);
for l=0:k-1
C(:,:,k)=C(:,:,k)+A(:,:,k-l)*B(:,:,l+1);
end
end
NEW SOLUTION:
This is a newer solution built on the older solution, which solved the previously given formula. The code in the question is actually a modification of that formula, in which the overlap between the two matrices in the third dimension is repeatedly shifted (it's akin to a convolution along the third dimension of the data). The previous solution I gave only computed the result for the last iteration of the code in the question (i.e. k = kend). So, here's a full solution that should be much more efficient than the code in the question for kend on the order of 1000:
kend = size(A,3); %# Get the value for kend
C = zeros(3,3,kend); %# Preallocate the output
Anew = reshape(flipdim(A,3),3,[]); %# Reshape A into a 3-by-3*kend matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Reshape B into a 3*kend-by-3 matrix
for k = 1:kend
C(:,:,k) = Anew(:,3*(kend-k)+1:end)*Bnew(1:3*k,:); %# Index Anew and Bnew so
end %# they overlap in steps
%# of three
Even when using just kend = 100, this solution came out to be about 30 times faster for me than the one in the question and about 4 times faster than a pure for-loop-based solution (which would involve 5 loops!). Note that the discussion below of floating-point accuracy still applies, so it is normal and expected that you will see slight differences between the solutions on the order of the relative floating-point accuracy.
OLD SOLUTION:
Based on this formula you linked to in a comment:
it appears that you actually want to do something different than the code you provided in the question. Assuming A and B are 3-by-3-by-k matrices, the result C should be a 3-by-3 matrix and the formula from your link written out as a set of nested for loops would look like this:
%# Solution #1: for loops
k = size(A,3);
C = zeros(3);
for i = 1:3
for j = 1:3
for r = 1:3
for l = 0:k-1
C(i,j) = C(i,j) + A(i,r,k-l)*B(r,j,l+1);
end
end
end
end
Now, it is possible to perform this operation without any for loops by reshaping and reorganizing A and B appropriately:
%# Solution #2: matrix multiply
Anew = reshape(flipdim(A,3),3,[]); %# Create a 3-by-3*k matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Create a 3*k-by-3 matrix
C = Anew*Bnew; %# Perform a single matrix multiply
You could even rework the code you have in your question to create a solution with a single loop that performs a matrix multiply of your 3-by-3 submatrices:
%# Solution #3: mixed (loop and matrix multiplication)
k = size(A,3);
C = zeros(3);
for l = 0:k-1
C = C + A(:,:,k-l)*B(:,:,l+1);
end
So now the question: Which one of these approaches is faster/cleaner?
Well, "cleaner" is very subjective, and I honestly couldn't tell you which of the above pieces of code makes it any easier to understand what the operation is doing. All the loops and variables in the first solution make it a little hard to track what's going on, but it clearly mirrors the formula. The second solution breaks it all down into a simple matrix operation, but it's difficult to see how it relates to the original formula. The third solution seems like a middle-ground between the two.
So, let's make speed the tie-breaker. If I time the above solutions for a number of values of k, I get these results (in seconds needed to perform 10,000 iterations of the given solution, MATLAB R2010b):
k | loop | matrix multiply | mixed
-----+--------+-----------------+--------
5 | 0.0915 | 0.3242 | 0.1657
10 | 0.1094 | 0.3093 | 0.2981
20 | 0.1674 | 0.3301 | 0.5838
50 | 0.3181 | 0.3737 | 1.3585
100 | 0.5800 | 0.4131 | 2.7311 * The matrix multiply is now fastest
200 | 1.2859 | 0.5538 | 5.9280
Well, it turns out that for smaller values of k (around 50 or less) the for-loop solution actually wins out, showing once again that for loops are not as "evil" as they used to be considered in older versions of MATLAB. Under certain circumstances, they can be more efficient than a clever vectorization. However, when the value of k is larger than around 100, the vectorized matrix-multiply solution starts to win out, scaling much more nicely with increasing k than the for-loop solution does. The mixed for-loop/matrix-multiply solution scales atrociously for reasons that I'm not exactly sure of.
So, if you expect k to be large, I'd go with the vectorized matrix-multiply solution. One thing to keep in mind is that the results you get from each solution (the matrix C) will differ ever so slightly (on the level of the floating-point precision) since the order of additions and multiplications performed for each solution are different, thus leading to a difference in accumulation of rounding errors. In short, the difference between the results for these solutions should be negligible, but you should be aware of it.
Have you looked into Matlab's conv method?
I can't compare it against your provided code, because what you provided gives me a problem with trying to access the zeroth element of A. (When k=1, k-1=0.)
Have you considered using FFTs to convolve? A convolution operation is simply a point-wise multiplication in the frequency domain. You'll have to take some precaution with finite sequences, as you'll end up with circular convolution if you're not careful (but this is trivial to take care of).
Here's a simple example for a 1D case.
>> a=rand(4,1);
>> b=rand(3,1);
>> c=conv(a,b)
c =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
The same using FFTs
>> A=fft(a,6);
>> B=fft(b,6);
>> C=real(ifft(A.*B))
C =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
A convolution of an M point vector and an N point vector results in an M+N-1 point vector. So, I've padded each of the vectors a and b with zeros before taking the FFT (this is automatically taken care of when I take the 4+3-1=6 point FFT of it).
EDIT
Although the equation that you showed is similar to a circular convolution, it's not exactly it. So you can ditch the FFT approach, and the built-in conv* functions. To answer your question, here's the same operation done without explicit loops:
dim1=3;dim2=dim1;
dim3=10;
a=rand(dim1,dim2,dim3);
b=rand(dim1,dim2,dim3);
mIndx=cellfun(#(x)(1:x),num2cell(1:dim3),'UniformOutput',false);
fun=#(x)sum(reshape(cell2mat(cellfun(#(y,z)a(:,:,y)*b(:,:,z),num2cell(x),num2cell(fliplr(x)),'UniformOutput',false)),[dim1,dim2,max(x)]),3);
c=reshape(cell2mat(cellfun(#(x)fun(x),mIndx,'UniformOutput',false)),[dim1,dim2,dim3]);
mIndx here is a cell, where the ith cell contains a vector 1:i. This is your l index (as others have noted, please don't use l as a variable name).
The next line is an anonymous function that does the convolution operation, making use of the fact that the k index is just the l index flipped around. The operations are carried out on individual cells, and then assembled.
The last line actually performs the operations on the matrices.
The answer is the same as that obtained with the loops. However, you'll find that the looped solution is actually an order of magnitude faster (I averaged 0.007s for my code and 0.0006s for the loop). This is because the loop is pretty straightforward, whereas with this sort of nested construction, there's plenty of function call overheads and repeated reshaping that slow it down.
MATLAB's loops have come a long way since the early days when loops were dreaded. Certainly, vectorized operations are blazing fast; but not everything can be vectorized, and sometimes, loops are more efficient than such convoluted anonymous functions. I could probably shave off a few more tenths here and there by optimizing my construction (or maybe taking a different approach), but I'm not going to do that.
Remember that good code should be readable, as well as efficient and minor optimization at the cost of readability serves no one. Although I wrote the code above, I certainly will not be able to decipher what it does if I revisited it a month later. Your looped code was clear, readable and fast and I would suggest that you stick with it.

how to get the maximally independent vectors given a set of vectors in MATLAB?

If I am given a set of vectors (they can be provided as the column vectors of a matrix), and I want to get the maximally independent vectors, what is the best way to go about it?
I could add one vector to the result set at a time to see if the rank of the newly formed matrix is increased or not. But I feel it is not very efficient. Of course, I could go back to do Gauss elimination to work this out. But I am just wondering if there is a better (efficient and numerically stable and robut) approach to this problem.
Thanks.
Edit
Feel the addition by watching the rank increasing is probably not valid. We can do deletion by watching if the rank is decreasing though.
This code will do the trick. It's a little bit dirty because it grows rInd on the fly, which isn't the most efficient, but the idea is more important. It uses the QR decomposition, which is basically Gram-Schmidt orthogonalization. From this, it goes through the rows of r until it finds the next vector in A that adds something linearly independent to the currently known basis.
iUnderConsideration = 1;
[q,r] = qr(A);
rInd = [];
for j = 1:size(r,2),
if(r(iUnderConsideration,j) ~= 0)
rInd = [rInd r(:,j)];
iUnderConsideration = iUnderConsideration + 1;
end
if(iUnderConsideration > size(r,1))
break;
end
end
q*rInd %here's your answer
As a side note, this code will chose the vectors of your matrix A without changing them. svd wouldn't give you these directly.
[U,S,V]=svd(vectors);
U(1:size(vectors,1),1:size(vectors,2))=vectors;
U now contains the original vectors plus an optimally orthogonal set.
Doing RREF and looking for columns with the leading zeros is your best bet:
matr(:,logical(sum(rref(matr)==1)))
This will give you the basis for the column space of the matrix.
SVD is your answer.
The MATLAB reference for SVD.

Beginning Matlab question (matrix of zeros)

Why create a matrix of 0's in Matlab? For example,
A=zeros(5,5);
for i = 1:5
A(i)=exp(i);
end
Following on from j_random_hacker's answer, it's much more efficient in MATLAB to pre-allocate an array rather than letting MATLAB expand it. MATLAB can expand arrays if you simply assign elements off the current "end" of the array, like so:
x = []
for ii=1:1e4
x(ii) = 1/ii;
end
That's really inefficient because at each step in the loop, MATLAB will re-allocate "x" to be one element larger than it was previously. The following is much faster:
x = zeros( 1, 1e4 );
for ii=1:1e4
x(ii) = 1/ii;
end
(Probably fastest still in this case is: x = 1./(1:1e4);, but the pre-allocation route is what you need when you can't resolve things to a vectorised operation)
This is identical to asking: Why create a variable with value 0?
Usually you would do this if you plan to accumulate a bunch of results together somehow. In this case, you have to start "somewhere".
Although it is possible to start out with an empty matrix and expand it by concatenating (adding) new elements, vector extension is highly inefficient in MATLAB because it requires new memory every time another element is concatenated. Preallocation establishes a matrix that's the right size in advance, then each zero element can be replaced with the correct value. This method is much more efficient, especially in programs involving looping.
This is helpful if you are going to work on large matrix. Or if you are going to work with sparse matrix. This is also helpful when you are using the same vector or matrix again and again.