Beginning Matlab question (matrix of zeros) - matlab

Why create a matrix of 0's in Matlab? For example,
A=zeros(5,5);
for i = 1:5
A(i)=exp(i);
end

Following on from j_random_hacker's answer, it's much more efficient in MATLAB to pre-allocate an array rather than letting MATLAB expand it. MATLAB can expand arrays if you simply assign elements off the current "end" of the array, like so:
x = []
for ii=1:1e4
x(ii) = 1/ii;
end
That's really inefficient because at each step in the loop, MATLAB will re-allocate "x" to be one element larger than it was previously. The following is much faster:
x = zeros( 1, 1e4 );
for ii=1:1e4
x(ii) = 1/ii;
end
(Probably fastest still in this case is: x = 1./(1:1e4);, but the pre-allocation route is what you need when you can't resolve things to a vectorised operation)

This is identical to asking: Why create a variable with value 0?
Usually you would do this if you plan to accumulate a bunch of results together somehow. In this case, you have to start "somewhere".

Although it is possible to start out with an empty matrix and expand it by concatenating (adding) new elements, vector extension is highly inefficient in MATLAB because it requires new memory every time another element is concatenated. Preallocation establishes a matrix that's the right size in advance, then each zero element can be replaced with the correct value. This method is much more efficient, especially in programs involving looping.

This is helpful if you are going to work on large matrix. Or if you are going to work with sparse matrix. This is also helpful when you are using the same vector or matrix again and again.

Related

Sparse Matrix Assignment becomes very slow in Matlab

I am filling a sparse matrix P (230k,290k) with values coming from a text file which I read line by line, here is the (simplified) code
while ...
C = textscan(text_line,'%d','delimiter',',','EmptyValue', 0);
line_number = line_number+1;
P(line_number,:)=C{1};
end
the problem I have is that while at the beginning the
P(line_number,:)=C{1};
statement is fast, after a few thousands lines become exterely slow, I guess because Matlab need to find the memory space to allocate every time. Is there a way to pre-allocate memory with sparse matrixes? I don't think so but maybe I am missing something. Any other advise which can speed up the operation (e.g. having a lot of free RAM can make the difference?)
There's a sixth input argument to sparse that tells the number of nonzero elements in the matrix. That's used by Matlab to preallocate:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an
m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space
allocated for nzmax nonzeros.
So you could initiallize with
P = sparse([],[],[],230e3,290e3,nzmax);
You can make a guess about the number of nonzeros (perhaps checking file size?) and use that as nzmax. If it turns you need more nonzero elements in the end, Matlab will preallocate on the fly (slowly).
By far the fastest way to generate a sparse matrix wihtin matlab is to load all the values in at once, then generate the sparse matrix in one call to sparse. You have to load the data and arrange it into vectors defining the row and column indices and values for each filled cell. You can then call sparse using the S = sparse(i,j,s,m,n) syntax.

Vectorizing code

I dont quite get the vectorizing way of thinking of matlab, mostly due to the simple examples provided in the documentation, and i hope someone can help me understand it a little better.
So, what i'm trying to accomplish is to take a sample of NxN from a matrix of ncols x nrows x ielements and compute the average for each ielement and store the maximum of the averages. Using for loops, the code would look like this:
for x = 1+margin : nrows-margin
for y = 1+margin : ncols-margin
for i=1:ielem
% take a NxN sample
sample = input_matrix(y-margin:y+margin,x-margin:x+margin,i)
% compute the average of all elements
result(i) = mean2(sample);
end %for i
% store the max of the computed averages
output_matrix(y,x)=max(result);
end %for y
end %for x
can anyone do a good vectorization of this example of a situation ? T
First of all, vectorization is not as important as it once was, due to enhancements in compiling the code before it is ran, but it's still a very common practice and can lead to some enhancements. Older Matlab version executed one line at a time, which would leave a for loop much slower than a vectorized version of the same code.
The part of your matrix that could be vectorized is the inner more for loop. I'll show a simple example of what you are trying to do, I'll let you take the example and put it into your code.
input=randn(5,5,3);
max(mean(mean(input,1),2))
Basically, the inner two mean take the mean of the input array, and the outer max will find the maximum value over the range. If you want, you can break it out step by step, and see what it does. The mean(input,1) will take the mean over the first dimension, mean(input,2) over the second, etc. After the first two means are done, all that is left is a vector, which the max function will easily work. It should be noted that the size of the vector pre-max is [1 1 3], the dimensions are preserved when doing this operation.

apply matrix randomisation without for loop in matlab

A question about matlab and randomisation of a 3d matrix respecting the rows and columns.
I have a n x n x s matrix M and I want to mess it up a bit, but with some control.
I can achieve my wish with a for loop
for j=1:size(M,3)
r=randperm(size(M,1));
random_M(:,:,j)=M(r,r,j);
end
Is there a way to perform this without having to loop over j? I need many randomisation iterations and could afford the benefits of indexing.
Cheers!
edit: Some more thoughts following Alexandrew's comments
I have created a function that randomises a squeezed version of M:
function randomMat=randomiseMat(Mat)
[rows,cols]=size(Mat);
r=randperm(rows);
randomMat=Mat(r,r);
then, using arrayfun I seem to get what I want:
randomM=arrayfun(#(x) randomiseMat(M(:,:,x)),1:size(M,3),'UniformOutput', false)
however, randomM is now a cell array of size (1,size(M,3)) with each cell containing randomised array.
Is there a way to make it in a 3d matrix just like the input M?
You can calculate all the values for r in one go, and then use arrayfun:
[nRows,nCols,nPages] = size(M);
[~,r]=sort(rand(nRows,nPages));
%# you should test on a realistic example whether a for-loop
%# isn't faster here
outCell = arrayfun(#(x) M(r(:,x),r(:,x),x), 1:nPages,'UniformOutput',false);
randomM = cat(3,outCell{:});

MATLAB: Convolution of Matrix Valued Function

I've written this code to perform the 1-d convolution of a 2-d matrix valued function (k is my time index, kend is on the order of 10e3). Is there a faster or cleaner way to do this, perhaps using built in functions?
for k=1:kend
C(:,:,k)=zeros(3);
for l=0:k-1
C(:,:,k)=C(:,:,k)+A(:,:,k-l)*B(:,:,l+1);
end
end
NEW SOLUTION:
This is a newer solution built on the older solution, which solved the previously given formula. The code in the question is actually a modification of that formula, in which the overlap between the two matrices in the third dimension is repeatedly shifted (it's akin to a convolution along the third dimension of the data). The previous solution I gave only computed the result for the last iteration of the code in the question (i.e. k = kend). So, here's a full solution that should be much more efficient than the code in the question for kend on the order of 1000:
kend = size(A,3); %# Get the value for kend
C = zeros(3,3,kend); %# Preallocate the output
Anew = reshape(flipdim(A,3),3,[]); %# Reshape A into a 3-by-3*kend matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Reshape B into a 3*kend-by-3 matrix
for k = 1:kend
C(:,:,k) = Anew(:,3*(kend-k)+1:end)*Bnew(1:3*k,:); %# Index Anew and Bnew so
end %# they overlap in steps
%# of three
Even when using just kend = 100, this solution came out to be about 30 times faster for me than the one in the question and about 4 times faster than a pure for-loop-based solution (which would involve 5 loops!). Note that the discussion below of floating-point accuracy still applies, so it is normal and expected that you will see slight differences between the solutions on the order of the relative floating-point accuracy.
OLD SOLUTION:
Based on this formula you linked to in a comment:
it appears that you actually want to do something different than the code you provided in the question. Assuming A and B are 3-by-3-by-k matrices, the result C should be a 3-by-3 matrix and the formula from your link written out as a set of nested for loops would look like this:
%# Solution #1: for loops
k = size(A,3);
C = zeros(3);
for i = 1:3
for j = 1:3
for r = 1:3
for l = 0:k-1
C(i,j) = C(i,j) + A(i,r,k-l)*B(r,j,l+1);
end
end
end
end
Now, it is possible to perform this operation without any for loops by reshaping and reorganizing A and B appropriately:
%# Solution #2: matrix multiply
Anew = reshape(flipdim(A,3),3,[]); %# Create a 3-by-3*k matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Create a 3*k-by-3 matrix
C = Anew*Bnew; %# Perform a single matrix multiply
You could even rework the code you have in your question to create a solution with a single loop that performs a matrix multiply of your 3-by-3 submatrices:
%# Solution #3: mixed (loop and matrix multiplication)
k = size(A,3);
C = zeros(3);
for l = 0:k-1
C = C + A(:,:,k-l)*B(:,:,l+1);
end
So now the question: Which one of these approaches is faster/cleaner?
Well, "cleaner" is very subjective, and I honestly couldn't tell you which of the above pieces of code makes it any easier to understand what the operation is doing. All the loops and variables in the first solution make it a little hard to track what's going on, but it clearly mirrors the formula. The second solution breaks it all down into a simple matrix operation, but it's difficult to see how it relates to the original formula. The third solution seems like a middle-ground between the two.
So, let's make speed the tie-breaker. If I time the above solutions for a number of values of k, I get these results (in seconds needed to perform 10,000 iterations of the given solution, MATLAB R2010b):
k | loop | matrix multiply | mixed
-----+--------+-----------------+--------
5 | 0.0915 | 0.3242 | 0.1657
10 | 0.1094 | 0.3093 | 0.2981
20 | 0.1674 | 0.3301 | 0.5838
50 | 0.3181 | 0.3737 | 1.3585
100 | 0.5800 | 0.4131 | 2.7311 * The matrix multiply is now fastest
200 | 1.2859 | 0.5538 | 5.9280
Well, it turns out that for smaller values of k (around 50 or less) the for-loop solution actually wins out, showing once again that for loops are not as "evil" as they used to be considered in older versions of MATLAB. Under certain circumstances, they can be more efficient than a clever vectorization. However, when the value of k is larger than around 100, the vectorized matrix-multiply solution starts to win out, scaling much more nicely with increasing k than the for-loop solution does. The mixed for-loop/matrix-multiply solution scales atrociously for reasons that I'm not exactly sure of.
So, if you expect k to be large, I'd go with the vectorized matrix-multiply solution. One thing to keep in mind is that the results you get from each solution (the matrix C) will differ ever so slightly (on the level of the floating-point precision) since the order of additions and multiplications performed for each solution are different, thus leading to a difference in accumulation of rounding errors. In short, the difference between the results for these solutions should be negligible, but you should be aware of it.
Have you looked into Matlab's conv method?
I can't compare it against your provided code, because what you provided gives me a problem with trying to access the zeroth element of A. (When k=1, k-1=0.)
Have you considered using FFTs to convolve? A convolution operation is simply a point-wise multiplication in the frequency domain. You'll have to take some precaution with finite sequences, as you'll end up with circular convolution if you're not careful (but this is trivial to take care of).
Here's a simple example for a 1D case.
>> a=rand(4,1);
>> b=rand(3,1);
>> c=conv(a,b)
c =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
The same using FFTs
>> A=fft(a,6);
>> B=fft(b,6);
>> C=real(ifft(A.*B))
C =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
A convolution of an M point vector and an N point vector results in an M+N-1 point vector. So, I've padded each of the vectors a and b with zeros before taking the FFT (this is automatically taken care of when I take the 4+3-1=6 point FFT of it).
EDIT
Although the equation that you showed is similar to a circular convolution, it's not exactly it. So you can ditch the FFT approach, and the built-in conv* functions. To answer your question, here's the same operation done without explicit loops:
dim1=3;dim2=dim1;
dim3=10;
a=rand(dim1,dim2,dim3);
b=rand(dim1,dim2,dim3);
mIndx=cellfun(#(x)(1:x),num2cell(1:dim3),'UniformOutput',false);
fun=#(x)sum(reshape(cell2mat(cellfun(#(y,z)a(:,:,y)*b(:,:,z),num2cell(x),num2cell(fliplr(x)),'UniformOutput',false)),[dim1,dim2,max(x)]),3);
c=reshape(cell2mat(cellfun(#(x)fun(x),mIndx,'UniformOutput',false)),[dim1,dim2,dim3]);
mIndx here is a cell, where the ith cell contains a vector 1:i. This is your l index (as others have noted, please don't use l as a variable name).
The next line is an anonymous function that does the convolution operation, making use of the fact that the k index is just the l index flipped around. The operations are carried out on individual cells, and then assembled.
The last line actually performs the operations on the matrices.
The answer is the same as that obtained with the loops. However, you'll find that the looped solution is actually an order of magnitude faster (I averaged 0.007s for my code and 0.0006s for the loop). This is because the loop is pretty straightforward, whereas with this sort of nested construction, there's plenty of function call overheads and repeated reshaping that slow it down.
MATLAB's loops have come a long way since the early days when loops were dreaded. Certainly, vectorized operations are blazing fast; but not everything can be vectorized, and sometimes, loops are more efficient than such convoluted anonymous functions. I could probably shave off a few more tenths here and there by optimizing my construction (or maybe taking a different approach), but I'm not going to do that.
Remember that good code should be readable, as well as efficient and minor optimization at the cost of readability serves no one. Although I wrote the code above, I certainly will not be able to decipher what it does if I revisited it a month later. Your looped code was clear, readable and fast and I would suggest that you stick with it.

how to get the maximally independent vectors given a set of vectors in MATLAB?

If I am given a set of vectors (they can be provided as the column vectors of a matrix), and I want to get the maximally independent vectors, what is the best way to go about it?
I could add one vector to the result set at a time to see if the rank of the newly formed matrix is increased or not. But I feel it is not very efficient. Of course, I could go back to do Gauss elimination to work this out. But I am just wondering if there is a better (efficient and numerically stable and robut) approach to this problem.
Thanks.
Edit
Feel the addition by watching the rank increasing is probably not valid. We can do deletion by watching if the rank is decreasing though.
This code will do the trick. It's a little bit dirty because it grows rInd on the fly, which isn't the most efficient, but the idea is more important. It uses the QR decomposition, which is basically Gram-Schmidt orthogonalization. From this, it goes through the rows of r until it finds the next vector in A that adds something linearly independent to the currently known basis.
iUnderConsideration = 1;
[q,r] = qr(A);
rInd = [];
for j = 1:size(r,2),
if(r(iUnderConsideration,j) ~= 0)
rInd = [rInd r(:,j)];
iUnderConsideration = iUnderConsideration + 1;
end
if(iUnderConsideration > size(r,1))
break;
end
end
q*rInd %here's your answer
As a side note, this code will chose the vectors of your matrix A without changing them. svd wouldn't give you these directly.
[U,S,V]=svd(vectors);
U(1:size(vectors,1),1:size(vectors,2))=vectors;
U now contains the original vectors plus an optimally orthogonal set.
Doing RREF and looking for columns with the leading zeros is your best bet:
matr(:,logical(sum(rref(matr)==1)))
This will give you the basis for the column space of the matrix.
SVD is your answer.
The MATLAB reference for SVD.