Using find with a struct - matlab

I have a struct that holds thousands of samples of data. Each data point contains multiple objects. For example:
Structure(1).a = 7
Structure(1).b = 3
Structure(2).a = 2
Structure(2).b = 6
Structure(3).a = 1
Structure(3).b = 6
...
... (thousands more)
...
Structure(2345).a = 4
Structure(2345).b = 9
... and so on.
If I wanted to find the index number of all the '.b' objects containing the number 6, I would have expected the following function would do the trick:
find(Structure.b == 6)
... and I would expect the answer to contain '2' and '3' (for the input shown above).
However, this doesn't work. What is the correct syntax and/or could I be arranging my data in a more logical way in the first place?

The syntax Structure.b for an array of structs gives you a comma-separated list, so you'll have to concatenate them all (for instance, using brackets []) in order to obtain a vector:
find([Structure.b] == 6)
For the input shown above, the result is as expected:
ans =
2 3
As Jonas noted, this would work only if there are no fields containing empty matrices, because empty matrices will not be reflected in the concatenation result.
Handling structs with empty fields
If you suspect that these fields may contain empty matrices, either convert them to NaNs (if possible...) or consider using one of the safer solutions suggested by Rody.
In addition, I've thought of another interesting workaround for this using strings. We can concatenate everything into a delimited string to keep the information about empty fields, and then tokenize it back (this, in my humble opinion, is easier to be done in MATLAB than handle numerical values stored in cells).
Inspired by Jonas' comment, we can convert empty fields to NaNs like so:
str = sprintf('%f,', Structure.b)
B = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN)
and this allows you to apply find on the contents of B:
find(B{:} == 6)
ans =
2
3

Building on EitanT's answer with Jonas' comment, a safer way could be
>> S(1).a = 7;
S(1).b = 3;
S(2).a = 2;
S(2).b = 6;
S(3).a = 1;
S(3).b = [];
S(4).a = 1;
S(4).b = 6;
>> find( cellfun(#(x)isequal(x,6),{S.b}) )
ans =
2 4
It's probably not very fast though (compared to EitanT's version), so only use this when needed.

Another answer to this question! This time, we'll compare the performance of the following 4 methods:
My original method
EitanT's original method (which does not handle emtpies)
EitanT's improved method using strings
A new method: a simple for-loop
Another new method: a vectorized, emtpy-safe version
Test code:
% Set up test
N = 1e5;
S(N).b = [];
for ii = 1:N
S(ii).b = randi(6); end
% Rody Oldenhuis 1
tic
sol1 = find( cellfun(#(x)isequal(x,6),{S.b}) );
toc
% EitanT 1
tic
sol2 = find([S.b] == 6);
toc
% EitanT 2
tic
str = sprintf('%f,', S.b);
values = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN);
sol3 = find(values{:} == 6);
toc
% Rody Oldenhuis 2
tic
ids = false(N,1);
for ii = 1:N
ids(ii) = isequal(S(ii).b, 6);
end
sol4 = find(ids);
toc
% Rody Oldenhuis 3
tic
idx = false(size(S));
SS = {S.b};
inds = ~cellfun('isempty', SS);
idx(inds) = [SS{inds}]==6;
sol5 = find(idx);
toc
% make sure they are all equal
all(sol1(:)==sol2(:))
all(sol1(:)==sol3(:))
all(sol1(:)==sol4(:))
all(sol1(:)==sol5(:))
Results on my machine at work (AMD A6-3650 APU (4 cores), 4GB RAM, Windows 7 64 bit):
Elapsed time is 28.990076 seconds. % Rody Oldenhuis 1 (cellfun)
Elapsed time is 0.119165 seconds. % EitanT 1 (no empties)
Elapsed time is 22.430720 seconds. % EitanT 2 (string manipulation)
Elapsed time is 0.706631 seconds. % Rody Oldenhuis 2 (loop)
Elapsed time is 0.207165 seconds. % Rody Oldenhuis 3 (vectorized)
ans =
1
ans =
1
ans =
1
ans =
1
On my Homebox (AMD Phenom(tm) II X6 1100T (6 cores), 16GB RAM, Ubuntu64 12.10):
Elapsed time is 0.572098 seconds. % cellfun
Elapsed time is 0.119557 seconds. % no emtpties
Elapsed time is 0.220903 seconds. % string manipulation
Elapsed time is 0.107345 seconds. % loop
Elapsed time is 0.180842 seconds. % cellfun-with-string
Gotta love that JIT :)
and wow...anyone know why the two systems behave so differently?
Also, little known fact -- cellfun with one of the possible string arguments is incredibly fast (which goes to show how much overhead anonymous functions require...).
Still, if you can be absolutely sure there are no empties, go for EitanT's original answer; that's what Matlab is for. If you can't be sure, just go for the loop.

Related

MATLAB - sparse to dense matrix

I have a sparse matrix in a data file produced by a code(which is not MATLAB). The data file consists of four columns. The first two column are the real and imaginary part of a matrix entry and the third and fourth columns are the corresponding row and column index respectively.
I convert this into a dense matrix in Matlab using the following script.
tic
dataA = load('sparse_LHS.dat');
toc
% Initialise matrix
tic
Nr = 15; Nz = 15; Neq = 5;
A (Nr*Nz*Neq,Nr*Nz*Neq) = 0;
toc
tic
lA = length(dataA)
rowA = dataA(:,3); colA = dataA(:,4);
toc
tic
for i = 1:lA
A(rowA(i), colA(i)) = complex(dataA(i,1), dataA(i,2));
end
toc
This scipt is, however, very slow(the for loop is the culprit).
Elapsed time is 0.599023 seconds.
Elapsed time is 0.001978 seconds.
Elapsed time is 0.000406 seconds.
Elapsed time is 275.462138 seconds.
Is there any fast way of doing this in matlab?
Here is what I tried so far:
parfor - This gives me
valid indices are restricted in parfor loops
I tired to recast the for loop as something like this:
A(rowA(:),colA(:)) = complex(dataA(:,1), dataA(:,2));
and I get an error
Subscripted assignment dimension mismatch.
The reason your last try doesn't work is that Matlab can't take a list of subscripts for both columns and rows, and match them to assign elements in order. Instead, it's making all the combinations of rows and columns from the list - this is how it looks:
dataA = magic(4)
dataA =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
dataA([1,2],[1,4]) =
16 13
5 8
So we got 4 elements ([1,1],[1,4],[2,1],[2,4]) instead of 2 ([1,1] and [2,4]).
In order to use subscripts in a list, you need to converts them to linear indexing, and one simple way to do this is using the function sub2ind.
Using this function you can write the following code to do it all at once:
% Initialise matrix
Nr = 15; Nz = 15; Neq = 5;
A(Nr*Nz*Neq,Nr*Nz*Neq) = 0;
% Place all complex values from dataA(:,1:2) in to A by the subscripts in dataA(:,3:4):
A(sub2ind(size(A),dataA(:,3),dataA(:,4))) = complex(dataA(:,1), dataA(:,2));
sub2ind is not such a quick function (but it will be much quicker than your loop), so if you have a lot of data, you might want to do the computation of the linear index by yourself:
rowA = dataA(:,3);
colA = dataA(:,4);
% compute the linear index:
ind = (colA-1)*size(A,1)+rowA;
% Place all complex values from dataA(:,1:2) in to A by the the index 'ind':
A(ind) = complex(dataA(:,1), dataA(:,2));
P.S.:
If you are using Matlab R2015b or later:
A = zeros(Nr*Nz*Neq,Nr*Nz*Neq);
is quicker than:
A(Nr*Nz*Neq,Nr*Nz*Neq) = 0;

Finding the index of a specific value in a cell in MATLAB

I have a two dimensional cell where every element is either a) empty or b) a vector of varying length with values ranging from 0 to 2. I would like to get the indices of the cell elements where a certain value occurs or even better, the "complete" index of every occurrence of a certain value.
I'm currently working on an agent based model of disease spreading and this is done in order to find the positions of infected agents.
Thanks in advance.
Here's how I would do it:
% some example data
A = { [], [], [3 4 5]
[4 8 ], [], [0 2 3 0 1] };
p = 4; % value of interest
% Finding the indices:
% -------------------------
% use cellfun to find indices
I = cellfun(#(x) find(x==p), A, 'UniformOutput', false);
% check again for empties
% (just for consistency; you may skip this step)
I(cellfun('isempty', I)) = {[]};
Call this method1.
A loop is also possible:
I = cell(size(A));
for ii = 1:numel(I)
I{ii} = find(A{ii} == p);
end
I(cellfun('isempty',I)) = {[]};
Call this method2.
Comparing the two methods for speed like so:
tic; for ii = 1:1e3, [method1], end; toc
tic; for ii = 1:1e3, [method2], end; toc
gives
Elapsed time is 0.483969 seconds. % method1
Elapsed time is 0.047126 seconds. % method2
on Matlab R2010b/32bit w/ Intel Core i3-2310M#2.10GHz w/ Ubuntu 11.10/2.6.38-13. This is mostly due to JIT on loops (and how terribly cellfun and anonymous functions seem to be implemented, mumblemumble..)
Anyway, in short, use the loop: it's better readable, and an order of magnitude faster than the vectorized solution.

Use a vector to index a matrix without linear index

G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.

MATLAB: duplicating vector 'n' times [duplicate]

This question already has answers here:
Octave / Matlab: Extend a vector making it repeat itself?
(3 answers)
Closed 9 years ago.
I have a vector, e.g.
vector = [1 2 3]
I would like to duplicate it within itself n times, i.e. if n = 3, it would end up as:
vector = [1 2 3 1 2 3 1 2 3]
How can I achieve this for any value of n? I know I could do the following:
newvector = vector;
for i = 1 : n-1
newvector = [newvector vector];
end
This seems a little cumbersome though. Any more efficient methods?
Try
repmat([1 2 3],1,3)
I'll leave you to check the documentation for repmat.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
EDIT : Reply to #Li-aung Yip's doubts
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Based on Abhinav's answer and some tests, I wrote a function which is ALWAYS faster than repmat()!
It uses the same parameters, except for the first parameter which must be a vector and not a matrix.
function vec = repvec( vec, rows, cols )
%REPVEC Replicates a vector.
% Replicates a vector rows times in dim1 and cols times in dim2.
% Auto optimization included.
% Faster than repmat()!!!
%
% Copyright 2012 by Marcel Schnirring
if ~isscalar(rows) || ~isscalar(cols)
error('Rows and cols must be scaler')
end
if rows == 1 && cols == 1
return % no modification needed
end
% check parameters
if size(vec,1) ~= 1 && size(vec,2) ~= 1
error('First parameter must be a vector but is a matrix or array')
end
% check type of vector (row/column vector)
if size(vec,1) == 1
% set flag
isrowvec = 1;
% swap rows and cols
tmp = rows;
rows = cols;
cols = tmp;
else
% set flag
isrowvec = 0;
end
% optimize code -> choose version
if rows == 1
version = 2;
else
version = 1;
end
% run replication
if version == 1
if isrowvec
% transform vector
vec = vec';
end
% replicate rows
if rows > 1
cc = vec(:,ones(1,rows));
vec = cc(:);
%indices = 1:length(vec);
%c = indices';
%cc = c(:,ones(rows,1));
%indices = cc(:);
%vec = vec(indices);
end
% replicate columns
if cols > 1
%vec = vec(:,ones(1,cols));
indices = (1:length(vec))';
indices = indices(:,ones(1,cols));
vec = vec(indices);
end
if isrowvec
% transform vector back
vec = vec';
end
elseif version == 2
% calculate indices
indices = (1:length(vec))';
% replicate rows
if rows > 1
c = indices(:,ones(rows,1));
indices = c(:);
end
% replicate columns
if cols > 1
indices = indices(:,ones(1,cols));
end
% transform index when row vector
if isrowvec
indices = indices';
end
% get vector based on indices
vec = vec(indices);
end
end
Feel free to test the function with all your data and give me feedback. When you found something to even improve it, please tell me.

How do I compare elements of one row with every other row in the same matrix

I have the matrix:
a = [ 1 2 3 4;
2 4 5 6;
4 6 8 9]
and I want to compare every row with every other two rows one by one. If they share the same key then the result will tell they have a common key.
Using #gnovice's idea of getting all combinations with nchoosek, I propose yet another two solutions:
one using ismember (as noted by #loren)
the other using bsxfun with the eq function handle
The only difference is that intersect sorts and keeps only the unique common keys.
a = randi(30, [100 20]);
%# a = sort(a,2);
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys1 = cell(N,1);
keys2 = cell(N,1);
keys3 = cell(N,1);
tic
for i=1:N
keys1{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:);
keys2{i} = query( ismember(query, set) ); %# unique(...)
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:)';
keys3{i} = query( any(bsxfun(#eq, query, set),1) ); %'# unique(...)
end
toc
... with the following time comparisons:
Elapsed time is 0.713333 seconds.
Elapsed time is 0.289812 seconds.
Elapsed time is 0.135602 seconds.
Note that even by sorting a beforehand and adding a call to unique inside the loops (commented parts), these two methods are still faster than intersect.
Here's one solution (which is generalizable to larger matrices than the sample in the question):
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys = cell(N,1);
for i = 1:N
keys{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
The function NCHOOSEK is used to generate all of the unique combinations of row comparisons. For the matrix a in your question, you will get comparisons = [1 2; 1 3; 2 3], meaning that we will need to compare rows 1 and 2, then 1 and 3, and finally 2 and 3. keys is a cell array that stores the results of each comparison. For each comparison, the function INTERSECT is used to find the common values (i.e. keys). For the matrix a given in the question, you will get keys = {[2 4], 4, [4 6]}.