3D cell arrays in matlab - matlab

I am currently working using matlab, I have uploaded a csv file into a cell array that I have named B. What I now wish to do is to input the information of B into a 3-D cell array, the 3rd dimension of the array being the first column of B which are strings ranging from "chr1" to "chr24". The full length of B is m, and the maximum length of any "chr" is maxlength. I doubt that this is the best way of going about it but here is my code:
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
for i = 1:24,
if cnum == i;
for k = 2:9 ,
for l = 1:maxlength ,
C{l}{k}{i} = B{k}{j};
C{l}{k}{i}
end
end
end
end
end
The 3-D array that comes out of this does not match the corresponding values in the initial array. I also want to know if this is the right way to create a 3-D array, I can't seem to find anything on the matlab website about them.
Thanks

There are a few possible issues with your approach: First of all, Matlab indexing is different from c-style indexing into tables. myCell{i}{j} is the j-th element of the cell array that is contained in the i-th element of the cell array myCell. If you want to index into a 2-d cell array, you would get the contents of the element in row i, column j as myCell{i,j}.
If the columns 2 through 9 of your .csv file contain all numeric data, it may be a lot more convenient to use either a 1D cell array with an entry for every chromosome, or to use a 2D or 3D numeric array if you get, for each chromosome, a single row, or a table, respectively.
Here's one way to do it
%# convert chromosomes to numbers
chromosomes = B{1};
chromosomes = strrep(chromosomes,'X',25);
chromosomes = strrep(chromosomes,'Y',26);
tmp = regexp(chromsomes,'chr(\d+)','tokens','once');
cnum = cellfun(#(x)str2double(x{1}),tmp);
%# catenate the rest of B into a 2D cell array
allNumbers = cell2mat(cat(2,B{2:end}));
%# now we can make a table with [chromosomeNumber,allOtherNumbers]
finalTable = [chromosomeNumber,allNumbers]
%# alternatively, if there are multiple entries for each chromosome, we can
%# group the data in a cell array, so that the i-th entry corresponds to chr.i
%# for readability, use a loop
outputCell = cell(26,1); %# assume 26 chromosomes
for i=1:26
outputCell{i} = allNumbers(cnum==i,:);
end

I've managed to do this with only two for loops, here is my code:
C = zeros(26,8,maxlength);
next = zeros(1,26);
for j = 1:m ,
Ind = findstr(B{1}{j}, 'chr');
Num = B{1}{j}(Ind+3:end-1);
cnum = str2num(Num);
if Num == 'X'
cnum = 25;
end
if Num == 'Y'
cnum = 26;
end
next(cnum) = next(cnum) + 1;
for k = 2:9 ,
D{cnum}{k-1}{next(cnum)} = B{k}{j};
C(cnum,k-1,next(cnum)) = str2num(B{k}{j});
end
end

Related

accessing nth column of a cell array in matlab

I have a cell array with for example 3 cells, in which cells are (3,8), (3,2), (3, 30) matrices, now I want to access the nth column of whole data without converting my cell to matrix, for example, if I search for 8th column, it must be the second column of 3th cell. one way is to convert it into a matrix, but my cell is too long and it gives me out of memory when I try to convert the whole cell to the matrix. then I tried this the code below, but it doesn't work correctly. i want to know what i'm doing wrong.
any help is appreciated.
function [col,i,idx] = find_cellCol(cel, idx)
lgh = length(cel);
i = 1;
me = zeros(2,length(cel));
while( i <= lgh && length(cel{1,i})<=idx)
idx = idx - length(cel{1,i});
i = i+1;
end%end while
if idx == 0
col = cel{1,i-1}(:,end);
else
col = cel{1,i}(:,idx);
end
end
Get only the number of line of each matrix of each cell, then sum those number of line and check on wich cell you reach the 8th line.
%dummy data
x{1} = rand(3,8);
x{2} = rand(3,2);
x{3} = rand(3,20);
val = 8;
csize = cellfun(#(x) size(x,1),x); %get the number of line for each cell
csum = cumsum(csize); % [3,6,9]
ind = find(csum>=val,1); % on which cell do we reach the # line
x{ind}((val-csum(ind))+csize(ind),:) %access the right line
fprintf('Accessing the line %d of the cell %d',(val-csum(ind))+csize(ind),ind)
Which will return:
Accessing the line 2 of the cell 3
EDIT:
The given example mislead me since I was sure that you were trying to access a line (first dimension) and not a column (2nd dimension).
But if you want to access a column you can simply adjust the code above:
val = 8;
csize = cellfun(#(x) size(x,2),x); %get the size of the second dimension now.
csum = cumsum(csize);
ind = find(csum>=val,1);
x{ind}(:,(val-csum(ind))+csize(ind)) %access the right column

Loop to create cell array

I have a structure named data. The structure has 250 elements and one field called codes (whose dimension varies).
As an example: data(1).codes is a 300 x 1 cell of strings and data(2).codes is a 100 x 1 cell of strings.
What I am trying to do is to create a big cell with three columns: id count codes where id indexes the element number (1 to 250), count indexes the row of the string and codes are just the codes.
An example to make it clear:
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes= data(k).codes;
end
The loop above creates the columns I want. Now I just need to append them one below the other and then save to excel. If these where only numbers I knew how to concatenate/append matrices. But with cells I am unsure how to do it.
Here is what I have tried:
output = {};
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes= data(k).codes;
output{1,1} = {output{1,1}; id};
output{1,2} = {output{1,2}; count};
output{1,3} = {output{1,3};
end
Build up your output into a new cell array, allowing for pre-allocation, then concatenate all of your results.
% Initialise
output = cell(size(data,2), 1);
% Create output for each element of data
for k = 1:size(data,2)
id = repmat(k,size(data(k).codes,1),1);
count = linspace(1, size(data(k).codes,1), size(data(k).codes,1))';
codes = data(k).codes;
% add to output
output{k} = [id, count, codes];
end
% Vertically concatenate all cell elements
output = vertcat(output{:});
Note: this assumes codes is numerical, and the output will be a numerical matrix. If it isn't, you will need to do some cell conversions for your numerical data (id and count) like so:
id = repmat({k}, size(data(k).codes,1), 1);
count = num2cell(linspace( ... )');

Vectorize MATLAB code

Let's say we have three m-by-n matrices of equal size: A, B, C.
Every column in C represents a time series.
A is the running maximum (over a fixed window length) of each time series in C.
B is the running minimum (over a fixed window length) of each time series in C.
Is there a way to determine T in a vectorized way?
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows %loop over the rows (except row #1).
for col = 1:ncols %loop over the columns.
if C(row, col) > A(row-1, col)
T(row, col) = 1;
elseif C(row, col) < B(row-1, col)
T(row, col) = -1;
else
T(row, col) = T(row-1, col);
end
end
end
This is what I've come up with so far:
T = zeros(m, n);
T(C > circshift(A,1)) = 1;
T(C < circshift(B,1)) = -1;
Well, the trouble was the dependency with the ELSE part of the conditional statement. So, after a long mental work-out, here's a way I summed up to vectorize the hell-outta everything.
Now, this approach is based on mapping. We get column-wise runs or islands of 1s corresponding to the 2D mask for the ELSE part and assign them the same tags. Then, we go to the start-1 along each column of each such run and store that value. Finally, indexing into each such start-1 with those tagged numbers, which would work as mapping indices would give us all the elements that are to be set in the new output.
Here's the implementation to fulfill all those aspirations -
%// Store sizes
[m1,n1] = size(A);
%// Masks corresponding to three conditions
mask1 = C(2:nrows,:) > A(1:nrows-1,:);
mask2 = C(2:nrows,:) < B(1:nrows-1,:);
mask3 = ~(mask1 | mask2);
%// All but mask3 set values as output
out = [zeros(1,n1) ; mask1 + (-1*(~mask1 & mask2))];
%// Proceed if any element in mask3 is set
if any(mask3(:))
%// Row vectors for appending onto matrices for matching up sizes
mask_appd = false(1,n1);
row_appd = zeros(1,n1);
%// Get 2D mapped indices
df = diff([mask_appd ; mask3],[],1)==1;
cdf = cumsum(df,1);
offset = cumsum([0 max(cdf(:,1:end-1),[],1)]);
map_idx = bsxfun(#plus,cdf,offset);
map_idx(map_idx==0) = 1;
%// Extract the values to be used for setting into new places
A1 = out([df ; false(1,n1)]);
%// Map with the indices obtained earlier and set at places from mask3
newval = [row_appd ; A1(map_idx)];
mask3_appd = [mask_appd ; mask3];
out(mask3_appd) = newval(mask3_appd);
end
Doing this vectorized is rather difficult because the current row's output depends on the previous row's output. Doing vectorized operations usually means that each element should stand out on its own using some relationship that is independent of the other elements that surround it.
I don't have any input on how you would achieve this without a for loop but I can help you reduce your operations down to one instead of two. You can do the assignment vectorized per row, but I can't see how you'd do it all in one shot.
As such, try something like this instead:
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows
out = T(row-1,:); %// Change - Make a copy of the previous row
out(C(row,:) > A(row-1,:)) = 1; %// Set those elements of C
%// in the current row that are larger
%// than the previous row of A to 1
out(C(row,:) < B(row-1,:)) = -1; %// Same logic but for B now and it's
%// less than and the value is -1 instead
T(row,:) = out; %// Assign to the output
end
I'm currently figuring out how to do this with any loops whatsoever. I'll keep you posted.

How to store results created into a cell array during a multiple for loop into a 3d array? In Matlab

In Matlab, at the end of three different for loops (for a=1:240,b=1:5 and c=1:3), I generate a {1,3} cell array where each cell contains a (1,5) array that reports only the last result of the 240 iterations.
How can I generate, apart of this cell array, a (240,5,3) 3d array that stores the result of each iteration?
Or, equivalently, a cell array that stores again the information and then convert it into a (240,5,3) 3d array?
The code would be along the lines of:
%// Size of the problem
Na = 240;
Nb = 5;
Nc = 3;
%// Allocate empty cell array
result = cell(Na, Nb, Nc);
%// Loop
for a = 1:Na
for b = 1:Nb
for c = 1:Nc
%// Here is the code for computing the
%// result x of the last iteration.
result{a,b,c} = x;
end;
end;
end;

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end