How to shuffle the rows in a cell array - matlab

I have a cell array with x columns, each with a yx1 cell. I would like to randomize the "rows" within the columns. That is, for each yx1 cell with elements a_1, a_2, ... a_y, I would like to apply the same permutation to the indices of a_i.
I've got a function that does this,
function[Oarray] = shuffleCellArray(Iarray);
len = length(Iarray{1});
width = length(Iarray);
perm = randperm(len);
Oarray=cell(width, 0);
for i=1:width;
for j=1:len;
Oarray{i}{j}=Iarray{i}{perm(j)};
end;
end;
but as you can see it's a bit ugly. Is there a more natural way to do this?
I realize that I'm probably using the wrong data type, but for legacy reasons I'd like to avoid switching. But, if the answer is "switch" then I guess that's the answer.

I'm assuming you have a cell array of column vectors, such as
Iarray = {(1:5).' (10:10:50).' (100:100:500).'};
In that case, you could do it this way:
ind = randperm(numel(Iarray{1})); %// random permutation
Oarray = cellfun(#(x) x(ind), Iarray, 'UniformOutput', 0); %// apply that permutation
%// to each "column"
Or converting to an intermediate matrix and then back to a cell array:
ind = randperm(numel(Iarray{1})); %// random permutation
x = cat(2,Iarray{:}); %// convert to matrix
Oarray = mat2cell(x(ind,:), size(x,1), ones(1,size(x,2))); %// apply permutation to rows
%// and convert back

Related

Random permutation of each cell in a cell array

I have a 1-by-4 cell array, D. Each of the cell elements contains 2-by-2 double matrices. I want to do random permutation over each matrix independently which in result I will have the same size cell array as D but its matrices' elements will be permuted and then the inverse in order to obtain the original D again.
for a single matrix case I have the code and it works well as follows:
A=rand(3,3)
p=randperm(numel(A));
A(:)=A(p)
[p1,ind]=sort(p);
A(:)=A(ind)
but it doesn't work for a cell array.
The simplest solution for you is to use a loop:
nd = numel(D);
D_permuted{1,nd} = [];
D_ind{1,nd} = [];
for d = 1:nd)
A=D{d};
p=randperm(numel(A));
A(:)=A(p)
[~,ind]=sort(p);
D_permuted{d} = A;
D_ind{d} = ind;
end
Assuming your D matrix is just a list of identically sized (e.g. 2-by-2) matrices, then you could avoid the loop by using a 3D double matrix instead of the cell-array.
For example if you hade a D like this:
n = 5;
D = repmat([1,3;2,4],1,1,n)*10 %// Example data
Then you can do the permutation like this
m = 2*2; %// Here m is the product of the dimensions of each matrix you want to shuffle
[~,I] = sort(rand(m,n)); %// This is just a trick to get the equivalent of a vectorized form of randperm as unfortunately randperm only accepts scalars
idx = reshape(I,2,2,n);
D_shuffled = D(idx);

How to find indices of the largest N elements in a cell?

I Matlab, I know that I can use this to get the largest number of a cell.
cell_max = cellfun(#(x) max(x(:)), the_cell);
However, there are two problems with this. First, I need the index of the maximum values as well. Second, I need not the single largest value of each cell, but its N largest values.
Is this possible with cells in Matlab?
Update: I have a pixel matrix that I get by running a filter on some input image file. From that matrix, I then split this matrix into tiles and want to keep only the N largest values per tile, while all other entries should be set to zero. (So I don't need the indices in the end, but they would allow me to create a new empty cell and copy over the large values.)
Tiles = mat2tiles(FilterResult, tileSize, tileSize);
If there is an easier way for my use case then using the mat2tiles script, I'd be grateful to know.
The routine cellfun can return the multiple arguments of the function you're passing in (see the documentation). So, assuming each cell contains a numeric vector of values, you can obtain the N largest elements of each cell like this:
% Using random data for the_cell
the_cell{1, 1} = rand(1, 12);
the_cell{1, 2} = rand(1, 42);
the_cell{2, 1} = rand(1, 18);
the_cell{2, 2} = rand(1, 67);
% First sort elements in each cell in descending order and keep indices
[s, i] = cellfun(#(x)sort(x(:), 'descend'), the_cell, 'UniformOutput', false);
% Then, in each resulting `s` and `i` cell arrays,
% take only the `N` first elements and indices
N = 4;
NLargestValues = cellfun(#(x)x(1:N), s, 'UniformOutput', false);
NLargestIndices = cellfun(#(x)x(1:N), i, 'UniformOutput', false);
NB: UniformOutput is set to false because outputs are not scalar.
Update
From your updates and comments we had, you can put all operations in some tileOperations function:
% Operation to perform for each tile in the_cell array
function [tile] = tileOperations(tile, N)
%[
% Return unique sorted values of the tile
v = unique(tile(:));
% Find threshold index
i = length(v) - N;
if (i <= 1), return; end % Quick exit if not enough elements
% Set elements below threshold to zero
threshold = v(i);
tile(tile < threshold) = 0.0;
%]
end
You can then call cellfun only once to repetitively apply operations on all tiles in the_cell:
filteredTiles = cellfun(#(x)tileOperations(x), the_cell, 'UniformOutput', false);

Downsampling cell array elements, Matlab

Given a cell array of n elements (n > 1), each element being a 2-d array with x=k number of rows and y columns (variable across cell elements), what would be the best way to down-sample each cell element by randomly removing samples in the y-dim to match the shortest y length across all cell elements?
The snippet below is a mis-implementation and only for n=2, but goes in the right direction (I hope). Any help would be greatly appreciated, thanks!
sizeShortest = min(cellfun('size', data, 2));
sizeLongest = max(cellfun('size', data, 2));
idx = randperm(sizeLongest);
data = cellfun(#(x) x(:,idx(1:sizeShortest)), data, 'UniformOutput', false);
I guess I could use a for loop to go through each cell of the data array and check whether this element has a y length longer than the shortest y of all cells and randomly remove samples. But there's probably a better solution..
Thanks!
This does what you want:
sizeShortest = min(cellfun('size', data, 2));
sizeLongest = max(cellfun('size', data, 2));
f=#(x)(x(:,sort(getfield(randperm(size(x,2)),{1:sizeShortest}))))
data = cellfun(f, data, 'UniformOutput', false);
To explain it.
Generate indices up to the array size, not up to sizeLongest. Otherwise you get index out of bounds:
g=randperm(size(x,2))
Getfield is used to allow double indexing, what should be implemented is:
g(1:sizeShortest)
which means, selects the first indices. sort is put in to use the selected indices in order, and finally based on the indices, the right columns are selected
x(:,sort(...))
Assuming a case of cell array of numerals, you may try this -
%// c1 is input cell array
k = size(c1{1},1)
t1 = cellfun(#size,c1,'uni',0)
t2 = cellfun(#numel,c1)./k
mincols = min(t2)
m1 = (t2-1)./(mincols-1)
p1 = round(bsxfun(#times,0:mincols-1,m1)+1)
p2 = [0; cumsum(t2(1:end-1))]
p3 = reshape(bsxfun(#plus,p1,p2)',[],1) %//'
ha1 = horzcat(c1{:})
g1 = reshape(ha1(:,p3),k,mincols,[])
g2 = reshape(permute(g1,[1 3 2]),size(g1,1)*size(g1,3),[])
out = mat2cell(g2,k*ones(1,numel(c1)),mincols) %// desired downsampled output cell array

Matlab Mean over same-indexed elements across cells

I have a cell array of 53 different (40,000 x 2000) sparse matrices. I need to take the mean over the third dimension, so that for example element (2,5) is averaged across the 53 cells. This should yield a single (33,000 x 2016) output. I think there ought to be a way to do this with cellfun(), but I am not able to write a function that works across cells on the same within-cell indices.
You can convert from sparse matrix to indices and values of nonzeros entries, and then use sparse to automatically obtain the sum in sparse form:
myCell = {sparse([0 1; 2 0]), sparse([3 0; 4 0])}; %// example
C = numel(myCell);
M = cell(1,C); %// preallocate
N = cell(1,C);
V = cell(1,C);
for c = 1:C
[m n v] = find(myCell{c}); %// rows, columns and values of nonzero entries
M{c} = m.';
N{c} = n.';
V{c} = v.';
end
result = sparse([M{:}],[N{:}],[V{:}])/C; %'// "sparse" sums over repeated indices
This should do the trick, just initialize an empty array and sum over each element of the cell array. I don't see any way around using a for loop without concatenating it into one giant 3D array (which will almost definitely run out of memory)
running_sum=zeros(size(cell_arr{1}))
for i=1:length(cell_arr)
running_sum=running_sum+cell_arr{i};
end
means = running_sum./length(cell_arr);

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end