Loop to create cell array - matlab

I have a structure named data. The structure has 250 elements and one field called codes (whose dimension varies).
As an example: data(1).codes is a 300 x 1 cell of strings and data(2).codes is a 100 x 1 cell of strings.
What I am trying to do is to create a big cell with three columns: id count codes where id indexes the element number (1 to 250), count indexes the row of the string and codes are just the codes.
An example to make it clear:
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes= data(k).codes;
end
The loop above creates the columns I want. Now I just need to append them one below the other and then save to excel. If these where only numbers I knew how to concatenate/append matrices. But with cells I am unsure how to do it.
Here is what I have tried:
output = {};
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes= data(k).codes;
output{1,1} = {output{1,1}; id};
output{1,2} = {output{1,2}; count};
output{1,3} = {output{1,3};
end

Build up your output into a new cell array, allowing for pre-allocation, then concatenate all of your results.
% Initialise
output = cell(size(data,2), 1);
% Create output for each element of data
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes = data(k).codes;
% add to output
output{k} = [id, count, codes];
end
% Vertically concatenate all cell elements
output = vertcat(output{:});
Note: this assumes codes is numerical, and the output will be a numerical matrix. If it isn't, you will need to do some cell conversions for your numerical data (id and count) like so:
id = repmat({k}, size(data(k).codes,1), 1);
count = num2cell(linspace( ... )');

Related

accessing nth column of a cell array in matlab

I have a cell array with for example 3 cells, in which cells are (3,8), (3,2), (3, 30) matrices, now I want to access the nth column of whole data without converting my cell to matrix, for example, if I search for 8th column, it must be the second column of 3th cell. one way is to convert it into a matrix, but my cell is too long and it gives me out of memory when I try to convert the whole cell to the matrix. then I tried this the code below, but it doesn't work correctly. i want to know what i'm doing wrong.
any help is appreciated.
function [col,i,idx] = find_cellCol(cel, idx)
lgh = length(cel);
i = 1;
me = zeros(2,length(cel));
while( i <= lgh && length(cel{1,i})<=idx)
idx = idx - length(cel{1,i});
i = i+1;
end%end while
if idx == 0
col = cel{1,i-1}(:,end);
else
col = cel{1,i}(:,idx);
end
end
Get only the number of line of each matrix of each cell, then sum those number of line and check on wich cell you reach the 8th line.
%dummy data
x{1} = rand(3,8);
x{2} = rand(3,2);
x{3} = rand(3,20);
val = 8;
csize = cellfun(#(x) size(x,1),x); %get the number of line for each cell
csum = cumsum(csize); % [3,6,9]
ind = find(csum>=val,1); % on which cell do we reach the # line
x{ind}((val-csum(ind))+csize(ind),:) %access the right line
fprintf('Accessing the line %d of the cell %d',(val-csum(ind))+csize(ind),ind)
Which will return:
Accessing the line 2 of the cell 3
EDIT:
The given example mislead me since I was sure that you were trying to access a line (first dimension) and not a column (2nd dimension).
But if you want to access a column you can simply adjust the code above:
val = 8;
csize = cellfun(#(x) size(x,2),x); %get the size of the second dimension now.
csum = cumsum(csize);
ind = find(csum>=val,1);
x{ind}(:,(val-csum(ind))+csize(ind)) %access the right column

How to make calculations on certain cells (within a table) that meet specific criteria?

I have the following code:
L_sum = zeros(height(ABC),1);
for i = 1:height(ABC)
L_sum(i) = sum(ABC{i, ABC.L(i,4:281)});
end
Here my table:
Problem: My sum function sums the entire row values (col. 4-281) per date whereas I only want those cells to be added whose headers are in the cell array of ABC.L, for any given date.
X = ABC.L{1, 1}; gives (excerpt):
Red arrow: what sum function is referencing (L of same date).
Green arrow: what I am trying to reference now (L of previous date).
Thanks for your help
In general, in matlab you dont need to use for loops to do simple operations like selective sums.
Example:
Data=...
[1 2 3;
4 5 6;
7 8 9;
7 7 7];
NofRows=size(Data,1);
RowsToSum=3:NofRows;
ColToSum=[1,3];
% sum second dimension 2d array
Result=sum(Data(RowsToSum,ColToSum), 2)
% table mode
DataTable=array2table(Data);
Result2=sum(DataTable{RowsToSum,ColToSum}, 2)
To do that you need to first extract the columns you want to sum, and then sum them:
% some arbitrary data:
ABC = table;
ABC.L{1,1} = {'aa','cc'};
ABC.L{2,1} = {'aa','b'};
ABC.L{3,1} = {'aa','d'};
ABC.L{4,1} = {'b','d'};
ABC{1:4,2:5} = magic(4);
ABC.Properties.VariableNames(2:5) = {'aa','b','cc','d'}
% summing the correct columns:
L_sum = zeros(height(ABC),1);
col_names = ABC.Properties.VariableNames; % just to make things shorter
for k = 1:height(ABC)
% the following 'cellfun' compares each column to the values in ABC.L{k},
% and returns a cell array of the result for each of them, then
% 'cell2mat' converts it to logical array, and 'any' combines the
% results for all elements in ABC.L{k} to one logical vector:
col_to_sum = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),ABC.L{k},...
'UniformOutput', false).'),1);
% then a logical indexing is used to define the columns for summation:
L_sum(k) = sum(ABC{k,col_to_sum});
end

Join rows in Matrix

I have a very big matrix that looks like this:
id,value
1,434
2,454353
1,4353
3,3432
3,4323
[...]
There can be at most 2 rows with the same id.
I want to reshape the matrix into the following, preferably removing the id's which only appear once:
id,value1,value2
1,434,4353
3,3432,4323
[...]
Here is an alternative using accumarray to identify values sharing the same index. The code is commented and you can have a look at every intermediary output to see what exactly is going on.
clear
clc
%// Create matrix with your data
id = [1;2;1;3;3];
value = [434 ;454353;4353;3432;4323];
M = [id value]
%// Find unique indices to build final output.
UniqueIdx = unique(M(:,1),'rows')
%// Find values corresponding to every index. Use cell array to account for different sized outputs.
NewM = accumarray(id,value,[],#(x) {x})
%// Get number of elements
NumElements = cellfun(#(x) size(x,1),NewM)
%// Discard rows having orphan index.
NewM(NumElements==1) = [];
UniqueIdx(NumElements==1) = [];
%// Build Output.
Results = [UniqueIdx NewM{1} NewM{2}]
And the output. I can't use the function table to build a nice output but if you do the result looks much nicer :)
Results =
1 434 3432
3 4353 4323
This code does the interesting job of sorting the matrix according to the id and removing the orphans.
x = sortrows(x,1); % sort x according to index
idx = x(:,1);
idxs = 1:max(idx);
rm = idxs(hist(idx, idxs) == 1); %find orphans
x( ismember(x(:,1),rm), : ) = [] %remove orphans
This last part then just shapes the array the way you want it
y = reshape(x', 4, []);
y( 3, : ) = [];
y=y';

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

3D cell arrays in matlab

I am currently working using matlab, I have uploaded a csv file into a cell array that I have named B. What I now wish to do is to input the information of B into a 3-D cell array, the 3rd dimension of the array being the first column of B which are strings ranging from "chr1" to "chr24". The full length of B is m, and the maximum length of any "chr" is maxlength. I doubt that this is the best way of going about it but here is my code:
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
for i = 1:24,
if cnum == i;
for k = 2:9 ,
for l = 1:maxlength ,
C{l}{k}{i} = B{k}{j};
C{l}{k}{i}
end
end
end
end
end
The 3-D array that comes out of this does not match the corresponding values in the initial array. I also want to know if this is the right way to create a 3-D array, I can't seem to find anything on the matlab website about them.
Thanks
There are a few possible issues with your approach: First of all, Matlab indexing is different from c-style indexing into tables. myCell{i}{j} is the j-th element of the cell array that is contained in the i-th element of the cell array myCell. If you want to index into a 2-d cell array, you would get the contents of the element in row i, column j as myCell{i,j}.
If the columns 2 through 9 of your .csv file contain all numeric data, it may be a lot more convenient to use either a 1D cell array with an entry for every chromosome, or to use a 2D or 3D numeric array if you get, for each chromosome, a single row, or a table, respectively.
Here's one way to do it
%# convert chromosomes to numbers
chromosomes = B{1};
chromosomes = strrep(chromosomes,'X',25);
chromosomes = strrep(chromosomes,'Y',26);
tmp = regexp(chromsomes,'chr(\d+)','tokens','once');
cnum = cellfun(#(x)str2double(x{1}),tmp);
%# catenate the rest of B into a 2D cell array
allNumbers = cell2mat(cat(2,B{2:end}));
%# now we can make a table with [chromosomeNumber,allOtherNumbers]
finalTable = [chromosomeNumber,allNumbers]
%# alternatively, if there are multiple entries for each chromosome, we can
%# group the data in a cell array, so that the i-th entry corresponds to chr.i
%# for readability, use a loop
outputCell = cell(26,1); %# assume 26 chromosomes
for i=1:26
outputCell{i} = allNumbers(cnum==i,:);
end
I've managed to do this with only two for loops, here is my code:
C = zeros(26,8,maxlength);
next = zeros(1,26);
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
if Num == 'X'
cnum = 25;
end
if Num == 'Y'
cnum = 26;
end
next(cnum) = next(cnum) + 1;
for k = 2:9 ,
D{cnum}{k-1}{next(cnum)} = B{k}{j};
C(cnum,k-1,next(cnum)) = str2num(B{k}{j});
end
end