Construct cell which contains different sizes of matrices of different dimension - matlab

My task now is want to construct a cell C which contain matrices which first dimensions are contained in a vector
n = [12 23 54].
While their second dimensions are fixed with
r = 3.
So, I want the cell C = {rand(12,3), rand(23,3), rand(54,3)}.
I know for-loop can serve my purpose as:
C=cell(3,1) % pre-allocation
for i = 1 : length(n)
C{i} = rand(n(i),r);
end
May I know if I can do it smarter without using a for loop in Matlab? Thank you

There's really no harm in using a for loop in this particular scenario (and in most cases where the only alternative is cellfun or arrayfun) as it is easier for MATLAB's JIT compiler to handle, but if you're really averse to a for loop you can use arrayfun combined within non-uniform output to give you the result you want.
C = arrayfun(#(x)rand(x, r), n, 'UniformOutput', false);
This may actually be slower than the for loop for the reasons mentioned above. But hey, it's one line so that's all that matters!

for and while loops have their place, even in Matlab. You've probably been told to avoid them because vectorized operations are so much faster when you're iterating over the rows, columns or other dimensions of a packed numeric array. But with higher-level constructs, like cell arrays, there's often no advantage (and a readability penalty) to trying to do things all in neat quasi-vectorized statements. Your existing solution is probably the best approach.

A shorter alternative, just for fun:
C = mat2cell(rand(sum(n),r), n,r)';
But a plain loop is almost certainly fastest in this case, because mat2cell uses a loop, as well as copious checks on its inputs.

Related

Matlab: need some help for a seemingly simple vectorization of an operation

I would like to optimize this piece of Matlab code but so far I have failed. I have tried different combinations of repmat and sums and cumsums, but all my attempts seem to not give the correct result. I would appreciate some expert guidance on this tough problem.
S=1000; T=10;
X=rand(T,S),
X=sort(X,1,'ascend');
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(cc,:)-X(c,:))-(cc-c)/T;
Result=Result+abs(d');
end
end
Basically I create 1000 vectors of 10 random numbers, and for each vector I calculate for each pair of values (say the mth and the nth) the difference between them, minus the difference (n-m). I sum over of possible pairs and I return the result for every vector.
I hope this explanation is clear,
Thanks a lot in advance.
It is at least easy to vectorize your inner loop:
Result=zeros(S,1);
for c=1:T-1
d=(X(c+1:T,:)-X(c,:))-((c+1:T)'-c)./T;
Result=Result+sum(abs(d),1)';
end
Here, I'm using the new automatic singleton expansion. If you have an older version of MATLAB you'll need to use bsxfun for two of the subtraction operations. For example, X(c+1:T,:)-X(c,:) is the same as bsxfun(#minus,X(c+1:T,:),X(c,:)).
What is happening in the bit of code is that instead of looping cc=c+1:T, we take all of those indices at once. So I simply replaced cc for c+1:T. d is then a matrix with multiple rows (9 in the first iteration, and one fewer in each subsequent iteration).
Surprisingly, this is slower than the double loop, and similar in speed to Jodag's answer.
Next, we can try to improve indexing. Note that the code above extracts data row-wise from the matrix. MATLAB stores data column-wise. So it's more efficient to extract a column than a row from a matrix. Let's transpose X:
X=X';
Result=zeros(S,1);
for c=1:T-1
d=(X(:,c+1:T)-X(:,c))-((c+1:T)-c)./T;
Result=Result+sum(abs(d),2);
end
This is more than twice as fast as the code that indexes row-wise.
But of course the same trick can be applied to the code in the question, speeding it up by about 50%:
X=X';
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(:,cc)-X(:,c))-(cc-c)/T;
Result=Result+abs(d);
end
end
My takeaway message from this exercise is that MATLAB's JIT compiler has improved things a lot. Back in the day any sort of loop would halt code to a grind. Today it's not necessarily the worst approach, especially if all you do is use built-in functions.
The nchoosek(v,k) function generates all combinations of the elements in v taken k at a time. We can use this to generate all possible pairs of indicies then use this to vectorize the loops. It appears that in this case the vectorization doesn't actually improve performance (at least on my machine with 2017a). Maybe someone will come up with a more efficient approach.
idx = nchoosek(1:T,2);
d = bsxfun(#minus,(X(idx(:,2),:) - X(idx(:,1),:)), (idx(:,2)-idx(:,1))/T);
Result = sum(abs(d),1)';
Update: here are the results for the running times for the different proposals (10^5 trials):
So it looks like the transformation of the matrix is the most efficient intervention, and my original double-loop implementation is, amazingly, the best compared to the vectorized versions. However, in my hands (2017a) the improvement is only 16.6% compared to the original using the mean (18.2% using the median).
Maybe there is still room for improvement?

bsxfun() Invalid output dimensions

I have a function that takes upto seven arguments and returns a row vector. The first three arguments are vectors (column, column, row) and the remaining four are optional scalars.
I want to use bsxfun() to apply the function to a vector of its last argument. Below is my attempt to do that.
o = #(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff) ELE452Functions.EvaluateBER(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff);
oo = #(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff) bsxfun(#(N0,channel_cutoff) o(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff), N0' , channel_cutoff);
when I try to call the function with a vector, oo(m,pulse,N0,1,1,1,[0.5 0.2]); for example, I get this error:
Error using bsxfun
Invalid output dimensions.
I am not experienced in using bsxfun and I tried to follow the documentation.
Update:
May be this is a clearer way to ask my question:
I want to use bsxfun to rewrite (improve) the code below with out a loop.
for i=1:length(channel_normalized_cuttoffs)
BER_LPchannel(i,:) = ELE452Functions.EvaluateBER(m,pulse,N0,1,1,1,channel_normalized_cuttoffs(i));
end
The idea behind bsxfun is to evaluate a certain function for all possible combinations of two elements (b in bsxfun stands for binary), each coming from one of the arrays. (NB: This is valid if you use it with a row and a column vector. But bsxfun can also do more.)
What you want to achieve is simply: For all entries of a single array, evaluate a function.
So bsxfun is just not the correct choice here.
You could use arrayfun instead, but this still may not perform a lot better than your original for loop, as it looks like the Matlab JIT Compiler would be able to optimize most of it, considering it's simplicity.
As I don't have the code of your function, I'm not able to test it, but your solution might look a lot like this:
evalBER = #(CNcutoffi) ELE452Functions.EvaluateBER(m,pulse,N0,1,1,1,CNcutoffi);
BER_LPchannel = arrayfun(evalBER, channel_normalized_cuttoffs, 'UniformOutput', false)

Addressing multiple ranges via indices in a vector

To begin, this problem is easily solvable with a for-loop. However, I'm trying to force/teach myself to think vector-wise to take advantage of what Matlab does best.
Simplified, here is the problem explanation:
I have a vector with data in it.
I have a 2xN array of start/stop indices that represent ranges of interesting data in the vector.
I want to perform calculations on each of those ranges, resulting in a number (N results, corresponding to each start/stop range.)
In code, here's a pseudoexample of what I'd like to have at the end:
A = 1:10000;
startIndicies = [5 100 1000];
stopIndicies = [10 200 5000];
...
calculatedResults = [func(A(5:10)) func(A(100:200)) func(A(1000:5000))]
The length of A, and of the start/stop index array is variable.
Like I said, I can easily solve this with a for loop. However since could be used with a large data set, I'd like to know if there's a good solution without a for loop.
Here is one possible solution, although, I won't call it a fully vectorized solution, rather a one liner one.
out = cellfun(#(i,j) fun(A(i:j)), num2cell(startIndicies), num2cell(stopIndicies) );
or, if you plan to have homogeneous outputs,
out = arrayfun(#(i,j) fun(A(i:j)), startIndicies, stopIndicies);

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.

What's the best way to iterate through columns of a matrix?

I want to apply a function to all columns in a matrix with MATLAB. For example, I'd like to be able to call smooth on every column of a matrix, instead of having smooth treat the matrix as a vector (which is the default behaviour if you call smooth(matrix)).
I'm sure there must be a more idiomatic way to do this, but I can't find it, so I've defined a map_column function:
function result = map_column(m, func)
result = m;
for col = 1:size(m,2)
result(:,col) = func(m(:,col));
end
end
which I can call with:
smoothed = map_column(input, #(c) (smooth(c, 9)));
Is there anything wrong with this code? How could I improve it?
The MATLAB "for" statement actually loops over the columns of whatever's supplied - normally, this just results in a sequence of scalars since the vector passed into for (as in your example above) is a row vector. This means that you can rewrite the above code like this:
function result = map_column(m, func)
result = [];
for m_col = m
result = horzcat(result, func(m_col));
end
If func does not return a column vector, then you can add something like
f = func(m_col);
result = horzcat(result, f(:));
to force it into a column.
Your solution is fine.
Note that horizcat exacts a substantial performance penalty for large matrices. It makes the code be O(N^2) instead of O(N). For a 100x10,000 matrix, your implementation takes 2.6s on my machine, the horizcat one takes 64.5s. For a 100x5000 matrix, the horizcat implementation takes 15.7s.
If you wanted, you could generalize your function a little and make it be able to iterate over the final dimension or even over arbitrary dimensions (not just columns).
Maybe you could always transform the matrix with the ' operator and then transform the result back.
smoothed = smooth(input', 9)';
That at least works with the fft function.
A way to cause an implicit loop across the columns of a matrix is to use cellfun. That is, you must first convert the matrix to a cell array, each cell will hold one column. Then call cellfun. For example:
A = randn(10,5);
See that here I've computed the standard deviation for each column.
cellfun(#std,mat2cell(A,size(A,1),ones(1,size(A,2))))
ans =
0.78681 1.1473 0.89789 0.66635 1.3482
Of course, many functions in MATLAB are already set up to work on rows or columns of an array as the user indicates. This is true of std of course, but this is a convenient way to test that cellfun worked successfully.
std(A,[],1)
ans =
0.78681 1.1473 0.89789 0.66635 1.3482
Don't forget to preallocate the result matrix if you are dealing with large matrices. Otherwise your CPU will spend lots of cycles repeatedly re-allocating the matrix every time it adds a new row/column.
If this is a common use-case for your function, it would perhaps be a good idea to make the function iterate through the columns automatically if the input is not a vector.
This doesn't exactly solve your problem but it would simplify the functions' usage. In that case, the output should be a matrix, too.
You can also transform the matrix to one long column by using m(:,:) = m(:). However, it depends on your function if this would make sense.