Finding the index of a specific value in a cell in MATLAB - matlab

I have a two dimensional cell where every element is either a) empty or b) a vector of varying length with values ranging from 0 to 2. I would like to get the indices of the cell elements where a certain value occurs or even better, the "complete" index of every occurrence of a certain value.
I'm currently working on an agent based model of disease spreading and this is done in order to find the positions of infected agents.
Thanks in advance.

Here's how I would do it:
% some example data
A = { [], [], [3 4 5]
[4 8 ], [], [0 2 3 0 1] };
p = 4; % value of interest
% Finding the indices:
% -------------------------
% use cellfun to find indices
I = cellfun(#(x) find(x==p), A, 'UniformOutput', false);
% check again for empties
% (just for consistency; you may skip this step)
I(cellfun('isempty', I)) = {[]};
Call this method1.
A loop is also possible:
I = cell(size(A));
for ii = 1:numel(I)
I{ii} = find(A{ii} == p);
end
I(cellfun('isempty',I)) = {[]};
Call this method2.
Comparing the two methods for speed like so:
tic; for ii = 1:1e3, [method1], end; toc
tic; for ii = 1:1e3, [method2], end; toc
gives
Elapsed time is 0.483969 seconds. % method1
Elapsed time is 0.047126 seconds. % method2
on Matlab R2010b/32bit w/ Intel Core i3-2310M#2.10GHz w/ Ubuntu 11.10/2.6.38-13. This is mostly due to JIT on loops (and how terribly cellfun and anonymous functions seem to be implemented, mumblemumble..)
Anyway, in short, use the loop: it's better readable, and an order of magnitude faster than the vectorized solution.

Related

How to compute the GCD of a vector in GNU Octave / Matlab

gcd (A1, A2, ...) computes the GCD of elements A1(1), A2(1), .... Being the elements stored in a vector A, how to compute gcd (A)?
(I mean, gcd (4, 2, 8) = 2, gcd ([4, 2, 8] will raise an error in GNU Octave 4.0.0).
With cell array expansion
Here is a one-liner, valid only in octave (thanks to nirvana-msu for pointing out matlab's limitation):
A = [10 25 15];
gcd(num2cell(A){:})
# ans = 5
This use cell array expansion, which is a bit hidden there :
Accessing multiple elements of a cell array with the ‘{’ and ‘}’
operators will result in a comma separated list of all the requested
elements
so here A{:} is interpreted as A(1), A(2), A(3), and thus gcd(A{:}) as gcd(A(1), A(2), A(3))
Performance
Still under octave
A = 3:259;
tic; gcd(num2cell(A){:}); toc
Elapsed time is 0.000228882 seconds.
while with the gcd_vect in #nirvana_msu answer,
tic; gcd_vect(A); toc
Elapsed time is 0.0184669 seconds.
This is because using recursion implies a high performance penalty (at least under octave). And actually for more than 256 elements in A, recursion limit is exhausted.
tic; gcd_vect(1:257); toc
<... snipped bunch of errors as ...>
error: evaluating argument list element number 2
error: called from
gcd_vect at line 8 column 13
This could be improved a lot by using a Divide and conquer algorithm
While the cell array expansion (octave only) scales well:
A = 127:100000;
tic; gcd(num2cell(A){:}); toc
Elapsed time is 0.0537438 seconds.
Divide and conquer algorithm (best)
This one should work under matlab too (not tested though. Feedback welcome).
It uses recursion too, like in other answers, but with Divide and conquer
function g = gcd_array(A)
N = numel(A);
if (mod(N, 2) == 0)
% even number of elements
% separate in two parts of equal length
idx_cut = N / 2;
part1 = A(1:idx_cut);
part2 = A(idx_cut+1:end);
% use standard gcd to compute gcd of pairs
g = gcd(part1(:), part2(:));
if ~ isscalar(g)
% the result was an array, compute its gcd
g = gcd_array(g);
endif
else
% odd number of elements
% separate in one scalar and an array with even number of elements
g = gcd(A(1), gcd_array(A(2:end)));
endif
endfunction
timings:
A = 127:100000;
tic; gcd_array(A); toc
Elapsed time is 0.0184278 seconds.
So this seems even better than cell array expansion.
The following is crude, but seems to work on simple examples
function g = gcd_array(vals)
if length(vals) == 1
g = vals;
else
g = gcd(vals(1), gcd_array(vals(2:end)));
endif
Note that unlike Octave, Matlab gcd function requires exactly two input arguments. You can use recursion to handle that, due to the fact that gcd(a,b,c) = gcd(a,gcd(b,c)). The following function accepts both input formats - either a single vector, or multiple scalars inputs, and should work both in Matlab and Octave:
function divisor = gcd_vect(a, varargin)
if ~isempty(varargin)
a = [a, varargin{:}];
elseif length(a) == 1
divisor = a;
return;
end
divisor = gcd(a(1), gcd_vect(a(2:end)));
end

Mablab/Octave - use cellfun to index one matrix with another

I have a cell containing a random number of matrices, say a = {[300*20],....,[300*20]};. I have another cell of the same format, call it b, that contains the logicals of the position of the nan terms in a.
I want to use cellfun to loop through the cell and basically let the nan terms equal to 0 i.e. a(b)=0.
Thanks,
j
You could define a function that replaces any NaN with zero.
function a = nan2zero(a)
a(isnan(a)) = 0;
Then you can use cellfun to apply this function to your cell array.
a0 = cellfun(#nan2zero, a, 'UniformOutput', 0)
That way, you don't even need any matrices b.
First, you should probably give the tick to #s.bandara, as that was the first correct answer and it used cellfun (as you requested). Do NOT give it to this answer. The purpose of this answer is to provide some additional analysis.
I thought I'd look into the efficiency of some of the possible approaches to this problem.
The first approach is the one advocated by #s.bandara.
The second approach is similar to the one advocated by #s.bandara, but it uses b to convert nan to 0, rather than using isnan. In theory, this method may be faster, since nothing is assigned to b inside the function, so it should be treated "By Ref".
The third approach uses a loop to get around using cellfun, since cellfun is often slower than an explicit loop
The results of a quick speed test are:
Elapsed time is 3.882972 seconds. %# First approach (a, isnan, and cellfun, eg #s.bandara)
Elapsed time is 3.391190 seconds. %# Second approach (a, b, and cellfun)
Elapsed time is 3.041992 seconds. %# Third approach (loop-based solution)
In other words, there are (small) savings to be made by passing b in rather than using isnan. And there are further (small) savings to be made by using a loop rather than cellfun. But I wouldn't lose sleep over it. Remember, the results of any simulation are specific to the specified inputs.
Note, these results were consistent across several runs, I used tic and toc to do this, albeit with many loops over each method. If I wanted to be really thorough, I should use timeit from FEX. If anyone is interested, the code for the three methods follows:
%# Build some example matrices
T = 1000; N = 100; Q = 50; M = 100;
a = cell(1, Q); b = cell(1, Q);
for q = 1:Q
a{q} = randn(T, N);
b{q} = logical(randi(2, T, N) - 1);
a{q}(b{q}) = nan;
end
%# Solution using a, isnan, and cellfun (#s.bandara solution)
tic
for m = 1:M
Soln2 = cellfun(#f1, a, 'UniformOutput', 0);
end
toc
%# Solution using a, b, and cellfun
tic
for m = 1:M
Soln1 = cellfun(#f2, a, b, 'UniformOutput', 0);
end
toc
%# Solution using a loop to avoid cellfun
tic
for m = 1:M
Soln3 = cell(1, Q);
for q = 1:Q
Soln3{q} = a{q};
Soln3{q}(b{q}) = 0;
end
end
toc
%# Solution proposed by #EitanT
[K, N] = size(a{1});
tic
for m = 1:M
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
Soln4 = mat2cell(a0, K, N * ones(size(a)));
end
toc
where:
function x1 = f1(x1)
x1(isnan(x1)) = 0;
and:
function x1 = f2(x1, x2)
x1(x2) = 0;
UPDATE: A fourth approach has been suggested by #EitanT. This approach concatenates the cell array of matrices into one large matrix, performs the operation on the large matrix, then optionally converts it back to a cell array. I have added the code for this procedure to my testing routine above. For the inputs specified in my testing code, ie T = 1000, N = 100, Q = 50, and M = 100, the timed run is as follows:
Elapsed time is 3.916690 seconds. %# #s.bandara
Elapsed time is 3.362319 seconds. %# a, b, and cellfun
Elapsed time is 2.906029 seconds. %# loop-based solution
Elapsed time is 4.986837 seconds. %# #EitanT
I was somewhat surprised by this as I thought the approach of #EitanT would yield the best results. On paper, it seems extremely sensible. Note, we can of course mess around with the input parameters to find specific settings that advantage different solutions. For example, if the matrices are small, but the number of them is large, then the approach of #EitanT does well, eg T = 10, N = 5, Q = 500, and M = 100 yields:
Elapsed time is 0.362377 seconds. %# #s.bandara
Elapsed time is 0.299595 seconds. %# a, b, and cellfun
Elapsed time is 0.352112 seconds. %# loop-based solution
Elapsed time is 0.030150 seconds. %# #EitanT
Here the approach of #EitanT dominates.
For the scale of the problem indicated by the OP, I found that the loop based solution usually had the best performance. However, for some Q, eg Q = 5, the solution of #EitanT managed to edge ahead.
Hmm.
Given the nature of the contents of your cell array, there may exist an even faster solution: you can convert your cell data to a single matrix and use vector indexing to replace all NaN values in it at once, without the need of cellfun or loops:
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
If you want to convert it back to a cell array, that's fine:
[M, N] = size(a{1});
mat2cell(a0, M, N * ones(size(a)))
P.S.
Work with a 3-D matrix instead of a cell array, if possible. Vectorized operations are usually much faster in MATLAB.

Use a vector to index a matrix without linear index

G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.

MATLAB: duplicating vector 'n' times [duplicate]

This question already has answers here:
Octave / Matlab: Extend a vector making it repeat itself?
(3 answers)
Closed 9 years ago.
I have a vector, e.g.
vector = [1 2 3]
I would like to duplicate it within itself n times, i.e. if n = 3, it would end up as:
vector = [1 2 3 1 2 3 1 2 3]
How can I achieve this for any value of n? I know I could do the following:
newvector = vector;
for i = 1 : n-1
newvector = [newvector vector];
end
This seems a little cumbersome though. Any more efficient methods?
Try
repmat([1 2 3],1,3)
I'll leave you to check the documentation for repmat.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
EDIT : Reply to #Li-aung Yip's doubts
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Based on Abhinav's answer and some tests, I wrote a function which is ALWAYS faster than repmat()!
It uses the same parameters, except for the first parameter which must be a vector and not a matrix.
function vec = repvec( vec, rows, cols )
%REPVEC Replicates a vector.
% Replicates a vector rows times in dim1 and cols times in dim2.
% Auto optimization included.
% Faster than repmat()!!!
%
% Copyright 2012 by Marcel Schnirring
if ~isscalar(rows) || ~isscalar(cols)
error('Rows and cols must be scaler')
end
if rows == 1 && cols == 1
return % no modification needed
end
% check parameters
if size(vec,1) ~= 1 && size(vec,2) ~= 1
error('First parameter must be a vector but is a matrix or array')
end
% check type of vector (row/column vector)
if size(vec,1) == 1
% set flag
isrowvec = 1;
% swap rows and cols
tmp = rows;
rows = cols;
cols = tmp;
else
% set flag
isrowvec = 0;
end
% optimize code -> choose version
if rows == 1
version = 2;
else
version = 1;
end
% run replication
if version == 1
if isrowvec
% transform vector
vec = vec';
end
% replicate rows
if rows > 1
cc = vec(:,ones(1,rows));
vec = cc(:);
%indices = 1:length(vec);
%c = indices';
%cc = c(:,ones(rows,1));
%indices = cc(:);
%vec = vec(indices);
end
% replicate columns
if cols > 1
%vec = vec(:,ones(1,cols));
indices = (1:length(vec))';
indices = indices(:,ones(1,cols));
vec = vec(indices);
end
if isrowvec
% transform vector back
vec = vec';
end
elseif version == 2
% calculate indices
indices = (1:length(vec))';
% replicate rows
if rows > 1
c = indices(:,ones(rows,1));
indices = c(:);
end
% replicate columns
if cols > 1
indices = indices(:,ones(1,cols));
end
% transform index when row vector
if isrowvec
indices = indices';
end
% get vector based on indices
vec = vec(indices);
end
end
Feel free to test the function with all your data and give me feedback. When you found something to even improve it, please tell me.

Indexing of unknown dimensional matrix

I have a non-fixed dimensional matrix M, from which I want to access a single element.
The element's indices are contained in a vector J.
So for example:
M = rand(6,4,8,2);
J = [5 2 7 1];
output = M(5,2,7,1)
This time M has 4 dimensions, but this is not known in advance. This is dependent on the setup of the algorithm I'm writing. It could likewise be that
M = rand(6,4);
J = [3 1];
output = M(3,1)
so I can't simply use
output=M(J(1),J(2))
I was thinking of using sub2ind, but this also needs its variables comma separated..
#gnovice
this works, but I intend to use this kind of element extraction from the matrix M quite a lot. So if I have to create a temporary variable cellJ every time I access M, wouldn't this tremendously slow down the computation??
I could also write a separate function
function x= getM(M,J)
x=M(J(1),J(2));
% M doesn't change in this function, so no mem copy needed = passed by reference
end
and adapt this for different configurations of the algorithm. This is of course a speed vs flexibility consideration which I hadn't included in my question..
BUT: this is only available for getting the element, for setting there is no other way than actually using the indices (and preferably the linear index). I still think sub2ind is an option. The final result I had intended was something like:
function idx = getLinearIdx(J, size_M)
idx = ...
end
RESULTS:
function lin_idx = Lidx_ml( J, M )%#eml
%LIDX_ML converts an array of indices J for a multidimensional array M to
%linear indices, directly useable on M
%
% INPUT
% J NxP matrix containing P sets of N indices
% M A example matrix, with same size as on which the indices in J
% will be applicable.
%
% OUTPUT
% lin_idx Px1 array of linear indices
%
% method 1
%lin_idx = zeros(size(J,2),1);
%for ii = 1:size(J,2)
% cellJ = num2cell(J(:,ii));
% lin_idx(ii) = sub2ind(size(M),cellJ{:});
%end
% method 2
sizeM = size(M);
J(2:end,:) = J(2:end,:)-1;
lin_idx = cumprod([1 sizeM(1:end-1)])*J;
end
method 2 is 20 (small number of index sets (=P) to convert) to 80 (large number of index sets (=P)) times faster than method 1. easy choice
For the general case where J can be any length (which I assume always matches the number of dimensions in M), there are a couple options you have:
You can place each entry of J in a cell of a cell array using the num2cell function, then create a comma-separated list from this cell array using the colon operator:
cellJ = num2cell(J);
output = M(cellJ{:});
You can sidestep the sub2ind function and compute the linear index yourself with a little bit of math:
sizeM = size(M);
index = cumprod([1 sizeM(1:end-1)]) * (J(:) - [0; ones(numel(J)-1, 1)]);
output = M(index);
Here is a version of gnovices option 2) which allows to process a whole matrix of subscripts, where each row contains one subscript. E.g for 3 subscripts:
J = [5 2 7 1
1 5 2 7
4 3 9 2];
sizeM = size(M);
idx = cumprod([1 sizeX(1:end-1)])*(J - [zeros(size(J,1),1) ones(size(J,1),size(J,2)-1)]).';