cellfun with two arrays of indices - matlab

I have one big cell with N by 1 dimension. Each row is either a string or a double. A string is a variable name and the sequential doubles are its values until the next string (another variable name). For example:
data = {
var_name1;
val1;
val2;
val3;
val4;
val5;
var_name2;
val1;
val2;
var_name3;
val1;
val2;
val3;
val4;
val5;
val6;
val7}
and so on. I want to separate the data cell into three cells; {var_name and it's 5 values}, {var_name and it's 2 values}, {var_name and it's 7 values}. I try not to loop as much as possible and have found that vectorization along with cellfun works really well. Is it possible? The data cell has close to million rows.

I believe the following should do what you're after. The main pieces are to use cumsum to work out which name each row corresponds to, and then accumarray to build up lists per name.
% Make some data
data = {'a'; 1; 2; 3;
'b'; 4; 5;
'c'; 6; 7; 8; 9;
'd';
'e'; 10; 11; 12};
% Which elements are the names?
isName = cellfun(#ischar, data);
% Use CUMSUM to work out for each row, which name it corresponds to
whichName = cumsum(isName);
% Pick out only the values from 'data', and filter 'whichName'
% for just the values
justVals = data(~isName);
whichName = whichName(~isName);
% Use ACCUMARRAY to build up lists per name. Note that the function
% used by ACCUMARRAY must return something scalar from a column of
% values, so we return a scalar cell containing a row-vector
% of those values
listPerName = accumarray(whichName, cell2mat(justVals), [], #(x) {x.'});
% All that remains is to prepend the name to each cell. This ends
% up with each row of output being a cell like {'a', [1 2 3]}.
% It's simple to make the output be {'a', 1, 2, 3} by adding
% a call to NUM2CELL on 'v' in the anonymous function.
nameAndVals = cellfun(#(n, v) [{n}, v], data(isName), listPerName, ...
'UniformOutput', false);

cellfun is for applying a function to each element of a cell.
When you pass multiple arguments to cellfun like that, it takes the ith argument of data, indx_first, and indx_last, and uses each of them in the anonymous function. Substituting those variables in, your function evaluates to x(y : z), for each element x in data. In other words, you're doing data{i}(y : z), i.e., indexing the actual elements of the cell array, rather than indexing the cell array itself. I don't think that's what you want. Really you want data{y : z}, for each (y, z) pair given by corresponding elements in indx_first and indx_last, right?
If that's indeed the case, I don't see a vectorized way to solve your problem, because each of the "variables" has different size. But you do know how many variables you have, which is the size of indx_first. So I'd pre-allocate and then loop, like so:
>> vars = cell(length(indx_first), 2);
>> for i = 1:length(vars)
vars{i, 1} = data{indx_first(i) - 1}; % store variable name in first column
vars{i, 2} = [data{indx_first(i) : indx_last(i)}]; % store data in last column
end
At the end of this, you'll have a cell array with 2 columns. The first column in each row is the name of the variable. The second is the actual data. I.e.
{'var_name1', [val1 val2 val3 val4 val5];
'var_name2', [val1 val2];
.
.
.

Related

How to sort all elements of a 2d matrix in MATLAB?

The sort() function sorts the elements row/column wise but how to sort the elements absolutely? The result should be another matrix with smallest element in (1,1) , second smallest in (1,2) and so on.
Take some random input
input = rand(5,10);
If you want the output to be a row vector, simply use
sortedRow = sort(input(:)).';
If you want the result to be the same shape as the input, then use
sortedOriginalShape = reshape(sort(input(:)), size(input,2), size(input,1)).';
Note that when maintaining the shape, we must use the reversed size dimensions and then transpose. This is because otherwise the result is column-wise i.e. smallest element in (1,1), next in (2,1) etc, which is the opposite of what you requested.
You can use the column operator (:) to vectorize all elements of 'nxm' matrix as a vector of 'nxm' elements and sort this vector. Then you can use direct assignement or 'reshape' function to store elements as matricial form.
All you need to know is that matlab use column-major-ordering to vectorize/iterate elements:
A = rand(3, 5);
A(:) = sort(A(:);
Will preserve colum-major-ordering, or as you said you prefer row-major ordering:
A = rand(3, 5);
A = reshape(sort(A(:)), fliplr(size(A)).';
Note the fliplr to store columnwise with reversed dimension and then the .' operator to transpose again the result.
EDIT
Even if matlab uses column-major-ordering for storing elements in memory, here below are two generic routines to work with row-major-order whatever the number of dimension of your array (i.e. no limited to 2D):
function [vector] = VectorizeWithRowMajorOrdering(array)
%[
axisCount = length(size(array)); % Squeezed size of original array
permutation = fliplr(1:(axisCount + 2)); % +2 ==> Trick to vectorize data in correct order
vector = permute(array, permutation);
vector = vector(:);
%]
end
function [array] = ReshapeFromRowMajorOrdering(vector, siz)
%[
siz = [siz( : ).' 1]; % Fix size if only one dim
array = NaN(siz); % Init
axisCount = length(size(array)); % Squeezed size!
permutation = fliplr(1:(axisCount + 2)); % +2 ==> Trick to vectorize data in correct order
array = reshape(vector, [1 1 fliplr(size(array))]);
array = ipermute(array, permutation);
%]
end
This can be useful when working with data coming from C/C++ (these languages use row-major-ordering). In your case this can be used this way:
A = rand(3, 5);
A = ReshapeFromRowMajorOrdering(sort(A(:)), size(A));

Acessing multiple structure fields in matlab without looping through it

I Have a 8x18 structure with each cel containing a column vector of occurrences of a single event. I want to obtain data from some of these fields concatenated in a single array, without having to loop through it. I can't seem to find a way to vertically concatenate the fields I am interested in in a single array.
As an example I create the following structure with between 1 and 5 occurrences per cell:
s(62).vector(8,18).heading.occurrences=[1;2;3];
for i=1:62
for j=1:8
for k=1:18
y=ceil(rand(1)*5);
s(i).vector(j,k).heading.occurrences=rand(y,1);
end
end
end
Now if want to obtain all occurrences in several cells while keeping i constant at for instant i=1 the following works:
ss=s(1).vector([1 26 45]);
h=[ss.heading];
cell2mat({h.occurrences}')
Now I would want to do the same for s, for instance s([1 2 3]).vector([1 26 45]), how would that work? I have tried xx=s([1 2 3]), yy=xx.vector([1 26 45]) but this however yields the error:
Expected one output from a curly brace or dot indexing expression, but there were 3 results.
Is this also possible with a vector operation?
Here's a vectorized solution that accommodates using index vectors for s and the field vector:
sIndex = [1 2 3]; % Indices for s
vIndex = [1 26 45]; % Indices for 'vector' field
v = reshape(cat(3, s(sIndex).vector), 144, []);
h = [v(vIndex, :).heading];
out = vertcat(h.occurrences);
It uses cat to concatenate all the vector fields into an 8-by-18-by-numel(sIndex) matrix, reshapes that into a 144-by-numel(sIndex) matrix, then indexes the rows specified by vIndex and collects their heading and occurrences fields, using vertcat instead of cell2mat.
It's difficult to vectorize the entire operation, but this should work.
% get vector field and store in cell array
s_new = { s(1:3).vector };
% now extract heading field, this is a cell-of-cells
s_new_heading = cellfun(#(x) { x.heading }', s_new, 'UniformOutput', false);
occurences = {};
for iCell = 1:length(s_new_heading)
% use current cell
cellHere = s_new_heading{iCell};
% retain indices of interest, these could be different for each cell
cellHere = cellHere([ 1 26 45 ]);
% extract occurrences
h = cellfun(#(x) x.occurrences, cellHere, 'UniformOutput', false);
h_mat = cell2mat(h);
% save them in cell array
occurences = cat(1, occurences, h_mat);
end

operations with structure in matlab

I have a structure 1x300 called struct with 3 fields but I'm using only the third field called way. This field is, for each 300 lines, a vertor of index.
Here an exemple with 3 lines to explain my problem : I woud like to search if the last index of the first line is present in an other vector (line) of the field way.
way
[491751 491750 491749 492772 493795 494819 495843 496867]
[491753 491754 491755 491756]
[492776 493800 494823 495847 496867]
I tried with intersect function :
Inter=intersect(struct(1).way(end), struct.way);
but Matlab returns me an error :
Error using intersect (line 80)
Too many input arguments.
Error in file2 (line 9)
Inter=intersect(struct(1).way(end), struct.way);
I don't understand why I have this error. Any explanations and/or other(s) solution(s)?
Let the data be defined as
st(1).way = [491751 491750 491749 492772 493795 494819 495843 496867];
st(2).way = [491753 491754 491755 491756];
st(3).way = [492776 493800 494823 495847 496867]; % define the data
sought = st(1).way(end);
If you want to know which vectors contain the desired value: pack all vectors into a cell array and pass that to cellfun with an anonymous function as follows:
ind = cellfun(#(x) ismember(sought, x), {st.way});
This gives:
ind =
1×3 logical array
1 0 1
If you want to know for each vector the indices of the matching: modify the anonymous function to output a cell with the indices:
ind = cellfun(#(x) {find(x==sought)}, {st.way});
or equivalently
ind = cellfun(#(x) find(x==sought), {st.way}, 'UniformOutput', false);
The result is:
ind =
1×3 cell array
[8] [1×0 double] [5]
Or, to exclude the reference vector:
n = 1; % index of vector whose final element is sought
ind = cellfun(#(x) {find(x==st(n).way(end))}, {st([1:n-1 n+1:end]).way});
You propbably want to use ismember.
Consider what you are passing to the intersect/ismember functions too, struct.way isn't a valid argument, you may need to loop to iterate over each line of your struct (in this case it would be easier to have a cell array, or matrix with equal length rows).
output = zeros(300);
for ii = 1:300
for jj = 1:300
if ii ~= jj && ismember(struct(ii).way(end), struct(jj).way)
output(ii,jj) = 1;
end
end
end
Now you have a matrix output where the elements which are 1 identify a match between the last element in way in the struct row ii and the vector struct(jj).way, where ii are the matrix row numbers and jj the column numbers.

How to make calculations on certain cells (within a table) that meet specific criteria?

I have the following code:
L_sum = zeros(height(ABC),1);
for i = 1:height(ABC)
L_sum(i) = sum(ABC{i, ABC.L(i,4:281)});
end
Here my table:
Problem: My sum function sums the entire row values (col. 4-281) per date whereas I only want those cells to be added whose headers are in the cell array of ABC.L, for any given date.
X = ABC.L{1, 1}; gives (excerpt):
Red arrow: what sum function is referencing (L of same date).
Green arrow: what I am trying to reference now (L of previous date).
Thanks for your help
In general, in matlab you dont need to use for loops to do simple operations like selective sums.
Example:
Data=...
[1 2 3;
4 5 6;
7 8 9;
7 7 7];
NofRows=size(Data,1);
RowsToSum=3:NofRows;
ColToSum=[1,3];
% sum second dimension 2d array
Result=sum(Data(RowsToSum,ColToSum), 2)
% table mode
DataTable=array2table(Data);
Result2=sum(DataTable{RowsToSum,ColToSum}, 2)
To do that you need to first extract the columns you want to sum, and then sum them:
% some arbitrary data:
ABC = table;
ABC.L{1,1} = {'aa','cc'};
ABC.L{2,1} = {'aa','b'};
ABC.L{3,1} = {'aa','d'};
ABC.L{4,1} = {'b','d'};
ABC{1:4,2:5} = magic(4);
ABC.Properties.VariableNames(2:5) = {'aa','b','cc','d'}
% summing the correct columns:
L_sum = zeros(height(ABC),1);
col_names = ABC.Properties.VariableNames; % just to make things shorter
for k = 1:height(ABC)
% the following 'cellfun' compares each column to the values in ABC.L{k},
% and returns a cell array of the result for each of them, then
% 'cell2mat' converts it to logical array, and 'any' combines the
% results for all elements in ABC.L{k} to one logical vector:
col_to_sum = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),ABC.L{k},...
'UniformOutput', false).'),1);
% then a logical indexing is used to define the columns for summation:
L_sum(k) = sum(ABC{k,col_to_sum});
end

Find index of all (non-unique) elements in a cell array as they appear in a second (sorted and unique) cell array

A = {'A'; 'E'; 'A'; 'F'};
B = {'A';'B';'C';'D';'E'; 'F'};
I am trying to get for each string in cell array A, the index that matches that string in cell array B. A will have repeated values, B will not.
find(ismember(B, A) == 1)
outputs
1
5
6
but I want to get
1
5
1
6
preferably in a one liner. I can't use strcmp instead of ismember either as the vectors are different sizes.
The vectors will actually contain date strings, and I need the index not a logical index matrix, I'm interested in the number not to use it for indexing.
How do I do it?
You flip the arguments to ismember, and you use the second output argument:
[~,loc]=ismember(A,B)
loc =
1
5
1
6
The second output tells you where the elements of A are in B.
If you are working with very strict limits to how many lines you can have in your code, and are in no position to fire the manager who imposed such limitations, you may want to access the second output of ismember directly. In order to do this, you can create the following helper function that allows to directly access the i-th output of a function
function out = accessIthOutput(fun,ii)
%ACCESSITHOUTPUT returns the i-th output variable of the function call fun
%
% define fun as anonymous function with no input, e.g.
% #()ismember(A,B)
% where A and B are defined in your workspace
%
% Using the above example, you'd access the second output argument
% of ismember by calling
% loc = accessIthOutput(#()ismember(A,B),2)
%# get the output
[output{1:ii}] = fun();
%# return the i-th element
out = output{ii};