Vectorized extracting a list from MATLAB Cell Array - matlab

I have a two-index MATLAB Cell Array (AllData{1:12,1:400}) where each element is a structure. I would like to extract a list of values from this structure.
For example, I would like to do something like this to obtain a list of 12 values from this structure
MaxList = AllData{1:12,1}.MaxVal;
This comes up with an error
Expected one output from a curly brace or dot indexing expression, but there were 12 results
I can do this as a loop, but would prefer to vectorize:
clear MaxList
for i=1:12
MaxList(i) = AllData{i,1}.MaxVal;
end
How do I vectorize this?

If all structs are scalar and have the same fields, it's better to avoid the cell array and directly use a struct array. For example,
clear AllData
AllData(1,1).MaxVal = 10;
AllData(1,2).MaxVal = 11;
AllData(2,1).MaxVal = 12;
AllData(2,2).MaxVal = 13;
[AllData(:).OtherField] = deal('abc');
defines a 2×2 struct array. Then, what you want can be done simply as
result = [AllData(:,1).MaxVal];
If you really need a cell array of scalar structs, such as
clear AllData
AllData{1,1} = struct('MaxVal', 10, 'OtherField', 'abc');
AllData{1,2} = struct('MaxVal', 11, 'OtherField', 'abc');
AllData{2,1} = struct('MaxVal', 12, 'OtherField', 'abc');
AllData{2,2} = struct('MaxVal', 13, 'OtherField', 'abc');
you can use these two steps:
tmp = [AllData{:,1}];
result = [tmp.MaxVal];

Using the answer above as a starting point, it is also possible to extract a 2d array of vectors from the Cell Array Structure. In each element of the 2d AllData cell array is a 2048 element vector called DataSet. The following commands will extract all of these vectors to a 2d array:
tmp = [AllData{:,1}];
len = length(tmp(1).DataSet); % Gets the length of one vector of DataSet
tmp2 = [tmp.DataSet]; % Extracts all vectors to a large 1-d array
AllDataSets = reshape(tmp2,len,[])'; % Reshapes into a 2d array of vectors

Related

Extracting field from cell array of structure in matlab

I have a cell array (let's say size 10) where each cell is a structure with the same fields. Let's say they all have a field name x.
Is there a way to retreive in a vector the value of the field x for all the structure in the cell array? I would expect the function to return a vector of size 10 with in position 1, the value of the field x of the structure in cell 1 etc etc...
EDIT 1:
The structure in the cell array have 1 field which is the same for all but some others which are different.
First convert your cell array of structures, c, (with identical field names in the same order) to a structure array:
c = cell2mat(c)
Then, depending on the data types and sizes of the elements of the field, you may be able to use
[c.x]
to extract your vector of field x values in the "standard" way.
It is also possible that you can skip the conversion step and use cellfun(#(e)e.x, c) to do the extraction in one go.
The below code creates a cell array of structures, and extracts field 'x' of each structure to a vector v.
%create a cell array of structures
s1.a = 'hello';
s1.x = 1;
s2.a = 'world';
s2.x = 2;
c{1} = s1;
c{2} = s2;
v = zeros(1,2);
%extract to vector
for idx=1:size(c,2)
v(1,idx) = c{idx}.x;
end
Let's say you have
c = {s1, s2, s3, ...., sn};
where common field is 'field_1', then you have two options
Use cell2mat.
cc = cell2mat(c); % which converts your cell array of structs into an array of structs
value = [cc.field_1]; % if values are number
or
value = {cc.field_1}; % if values are characters, for example
Another option is to use cellfun.
If the field values are characters, you should set "UniformOutput" to "false"
value = cellfun(#(x) x.field_1, c, 'UniformOutput', false)
The first option is better. Also, try to avoid using cell/cellfun/arrayfun whenever you can, vectors are way faster and even a plain for loop is more effecient

Logical index of structure with various dimensioned fields

Lets say I have a structure like this:
S.index = 1:10;
S.testMatrix = zeros(3,3,10);
for x = 1:10
S.testMatrix(:,:,x) = magic(3) + x;
end
S.other = reshape(0:39, 4, 10);
It contains a 1x10 vector, a 3x3x10 multi-paged array and a 4x10 matrix. Now say I want to select only the entries corresponding to the indices between 2 and 8. mask = S.index > 2 & S.index < 8;
I tried structfun(#(x) x(mask), S, 'UniformOutput', 0); first which correctly worked for only the vector, which makes perfect sense. So then I figured all I needed to do was expand my mask. So I did this.
test = structfun(#(x) x(repmat(mask, size(x, ndims(x) - 1), 1)), S, 'UniformOutput',0);
The expanded mask was correct for the matrix but not the multi-paged array. And the 2D matrix was flattened to a vector.
If I was going to index these elements individually I would do something like this:
S2.index = S.index(mask);
S2.other = S.other(:,mask);
S2.testMatrix = S.testMatrix(:,:,mask);
My use case is for hundreds of structures each with 20+ fields. How do I script the indexing? The exact problem occurs is limited to a structure with 1xN vectors, 3xN and 4xN matrices and 3x3xN arrays. The mask is constructed based on one of the vectors representing time. The field names are constant for each structure so I could brute force the thing and type in the commands and run it as a function, but I'm looking for an intelligent way to index it.
Update: Here is something that looks promising.
fn = fieldnames(S);
for x = 1:length(fn)
extraDim = repmat({':'}, 1, ndims(S.(fn{x})) - 1);
S2.(fn{x}) = S.(fn{x})(extraDim{:}, mask);
end
You can exploit the fact that the string ':' can be used as an index instead of :, and build a comma-separated list of that string repeated the appropriate number of times for each field:
s = {':',':'}; % auxilary cell array to generate the comma-separated list
S2 = structfun(#(f) f(s{1:ndims(f)-1}, mask), S, 'UniformOutput', false);

How to store results created into a cell array during a multiple for loop into a 3d array? In Matlab

In Matlab, at the end of three different for loops (for a=1:240,b=1:5 and c=1:3), I generate a {1,3} cell array where each cell contains a (1,5) array that reports only the last result of the 240 iterations.
How can I generate, apart of this cell array, a (240,5,3) 3d array that stores the result of each iteration?
Or, equivalently, a cell array that stores again the information and then convert it into a (240,5,3) 3d array?
The code would be along the lines of:
%// Size of the problem
Na = 240;
Nb = 5;
Nc = 3;
%// Allocate empty cell array
result = cell(Na, Nb, Nc);
%// Loop
for a = 1:Na
for b = 1:Nb
for c = 1:Nc
%// Here is the code for computing the
%// result x of the last iteration.
result{a,b,c} = x;
end;
end;
end;

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

3D cell arrays in matlab

I am currently working using matlab, I have uploaded a csv file into a cell array that I have named B. What I now wish to do is to input the information of B into a 3-D cell array, the 3rd dimension of the array being the first column of B which are strings ranging from "chr1" to "chr24". The full length of B is m, and the maximum length of any "chr" is maxlength. I doubt that this is the best way of going about it but here is my code:
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
for i = 1:24,
if cnum == i;
for k = 2:9 ,
for l = 1:maxlength ,
C{l}{k}{i} = B{k}{j};
C{l}{k}{i}
end
end
end
end
end
The 3-D array that comes out of this does not match the corresponding values in the initial array. I also want to know if this is the right way to create a 3-D array, I can't seem to find anything on the matlab website about them.
Thanks
There are a few possible issues with your approach: First of all, Matlab indexing is different from c-style indexing into tables. myCell{i}{j} is the j-th element of the cell array that is contained in the i-th element of the cell array myCell. If you want to index into a 2-d cell array, you would get the contents of the element in row i, column j as myCell{i,j}.
If the columns 2 through 9 of your .csv file contain all numeric data, it may be a lot more convenient to use either a 1D cell array with an entry for every chromosome, or to use a 2D or 3D numeric array if you get, for each chromosome, a single row, or a table, respectively.
Here's one way to do it
%# convert chromosomes to numbers
chromosomes = B{1};
chromosomes = strrep(chromosomes,'X',25);
chromosomes = strrep(chromosomes,'Y',26);
tmp = regexp(chromsomes,'chr(\d+)','tokens','once');
cnum = cellfun(#(x)str2double(x{1}),tmp);
%# catenate the rest of B into a 2D cell array
allNumbers = cell2mat(cat(2,B{2:end}));
%# now we can make a table with [chromosomeNumber,allOtherNumbers]
finalTable = [chromosomeNumber,allNumbers]
%# alternatively, if there are multiple entries for each chromosome, we can
%# group the data in a cell array, so that the i-th entry corresponds to chr.i
%# for readability, use a loop
outputCell = cell(26,1); %# assume 26 chromosomes
for i=1:26
outputCell{i} = allNumbers(cnum==i,:);
end
I've managed to do this with only two for loops, here is my code:
C = zeros(26,8,maxlength);
next = zeros(1,26);
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
if Num == 'X'
cnum = 25;
end
if Num == 'Y'
cnum = 26;
end
next(cnum) = next(cnum) + 1;
for k = 2:9 ,
D{cnum}{k-1}{next(cnum)} = B{k}{j};
C(cnum,k-1,next(cnum)) = str2num(B{k}{j});
end
end