matlab parse file into cell array - matlab

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.

f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5

I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

Related

How to sort all elements of a 2d matrix in MATLAB?

The sort() function sorts the elements row/column wise but how to sort the elements absolutely? The result should be another matrix with smallest element in (1,1) , second smallest in (1,2) and so on.
Take some random input
input = rand(5,10);
If you want the output to be a row vector, simply use
sortedRow = sort(input(:)).';
If you want the result to be the same shape as the input, then use
sortedOriginalShape = reshape(sort(input(:)), size(input,2), size(input,1)).';
Note that when maintaining the shape, we must use the reversed size dimensions and then transpose. This is because otherwise the result is column-wise i.e. smallest element in (1,1), next in (2,1) etc, which is the opposite of what you requested.
You can use the column operator (:) to vectorize all elements of 'nxm' matrix as a vector of 'nxm' elements and sort this vector. Then you can use direct assignement or 'reshape' function to store elements as matricial form.
All you need to know is that matlab use column-major-ordering to vectorize/iterate elements:
A = rand(3, 5);
A(:) = sort(A(:);
Will preserve colum-major-ordering, or as you said you prefer row-major ordering:
A = rand(3, 5);
A = reshape(sort(A(:)), fliplr(size(A)).';
Note the fliplr to store columnwise with reversed dimension and then the .' operator to transpose again the result.
EDIT
Even if matlab uses column-major-ordering for storing elements in memory, here below are two generic routines to work with row-major-order whatever the number of dimension of your array (i.e. no limited to 2D):
function [vector] = VectorizeWithRowMajorOrdering(array)
%[
axisCount = length(size(array)); % Squeezed size of original array
permutation = fliplr(1:(axisCount + 2)); % +2 ==> Trick to vectorize data in correct order
vector = permute(array, permutation);
vector = vector(:);
%]
end
function [array] = ReshapeFromRowMajorOrdering(vector, siz)
%[
siz = [siz( : ).' 1]; % Fix size if only one dim
array = NaN(siz); % Init
axisCount = length(size(array)); % Squeezed size!
permutation = fliplr(1:(axisCount + 2)); % +2 ==> Trick to vectorize data in correct order
array = reshape(vector, [1 1 fliplr(size(array))]);
array = ipermute(array, permutation);
%]
end
This can be useful when working with data coming from C/C++ (these languages use row-major-ordering). In your case this can be used this way:
A = rand(3, 5);
A = ReshapeFromRowMajorOrdering(sort(A(:)), size(A));

Acessing multiple structure fields in matlab without looping through it

I Have a 8x18 structure with each cel containing a column vector of occurrences of a single event. I want to obtain data from some of these fields concatenated in a single array, without having to loop through it. I can't seem to find a way to vertically concatenate the fields I am interested in in a single array.
As an example I create the following structure with between 1 and 5 occurrences per cell:
s(62).vector(8,18).heading.occurrences=[1;2;3];
for i=1:62
for j=1:8
for k=1:18
y=ceil(rand(1)*5);
s(i).vector(j,k).heading.occurrences=rand(y,1);
end
end
end
Now if want to obtain all occurrences in several cells while keeping i constant at for instant i=1 the following works:
ss=s(1).vector([1 26 45]);
h=[ss.heading];
cell2mat({h.occurrences}')
Now I would want to do the same for s, for instance s([1 2 3]).vector([1 26 45]), how would that work? I have tried xx=s([1 2 3]), yy=xx.vector([1 26 45]) but this however yields the error:
Expected one output from a curly brace or dot indexing expression, but there were 3 results.
Is this also possible with a vector operation?
Here's a vectorized solution that accommodates using index vectors for s and the field vector:
sIndex = [1 2 3]; % Indices for s
vIndex = [1 26 45]; % Indices for 'vector' field
v = reshape(cat(3, s(sIndex).vector), 144, []);
h = [v(vIndex, :).heading];
out = vertcat(h.occurrences);
It uses cat to concatenate all the vector fields into an 8-by-18-by-numel(sIndex) matrix, reshapes that into a 144-by-numel(sIndex) matrix, then indexes the rows specified by vIndex and collects their heading and occurrences fields, using vertcat instead of cell2mat.
It's difficult to vectorize the entire operation, but this should work.
% get vector field and store in cell array
s_new = { s(1:3).vector };
% now extract heading field, this is a cell-of-cells
s_new_heading = cellfun(#(x) { x.heading }', s_new, 'UniformOutput', false);
occurences = {};
for iCell = 1:length(s_new_heading)
% use current cell
cellHere = s_new_heading{iCell};
% retain indices of interest, these could be different for each cell
cellHere = cellHere([ 1 26 45 ]);
% extract occurrences
h = cellfun(#(x) x.occurrences, cellHere, 'UniformOutput', false);
h_mat = cell2mat(h);
% save them in cell array
occurences = cat(1, occurences, h_mat);
end

operations with structure in matlab

I have a structure 1x300 called struct with 3 fields but I'm using only the third field called way. This field is, for each 300 lines, a vertor of index.
Here an exemple with 3 lines to explain my problem : I woud like to search if the last index of the first line is present in an other vector (line) of the field way.
way
[491751 491750 491749 492772 493795 494819 495843 496867]
[491753 491754 491755 491756]
[492776 493800 494823 495847 496867]
I tried with intersect function :
Inter=intersect(struct(1).way(end), struct.way);
but Matlab returns me an error :
Error using intersect (line 80)
Too many input arguments.
Error in file2 (line 9)
Inter=intersect(struct(1).way(end), struct.way);
I don't understand why I have this error. Any explanations and/or other(s) solution(s)?
Let the data be defined as
st(1).way = [491751 491750 491749 492772 493795 494819 495843 496867];
st(2).way = [491753 491754 491755 491756];
st(3).way = [492776 493800 494823 495847 496867]; % define the data
sought = st(1).way(end);
If you want to know which vectors contain the desired value: pack all vectors into a cell array and pass that to cellfun with an anonymous function as follows:
ind = cellfun(#(x) ismember(sought, x), {st.way});
This gives:
ind =
1×3 logical array
1 0 1
If you want to know for each vector the indices of the matching: modify the anonymous function to output a cell with the indices:
ind = cellfun(#(x) {find(x==sought)}, {st.way});
or equivalently
ind = cellfun(#(x) find(x==sought), {st.way}, 'UniformOutput', false);
The result is:
ind =
1×3 cell array
[8] [1×0 double] [5]
Or, to exclude the reference vector:
n = 1; % index of vector whose final element is sought
ind = cellfun(#(x) {find(x==st(n).way(end))}, {st([1:n-1 n+1:end]).way});
You propbably want to use ismember.
Consider what you are passing to the intersect/ismember functions too, struct.way isn't a valid argument, you may need to loop to iterate over each line of your struct (in this case it would be easier to have a cell array, or matrix with equal length rows).
output = zeros(300);
for ii = 1:300
for jj = 1:300
if ii ~= jj && ismember(struct(ii).way(end), struct(jj).way)
output(ii,jj) = 1;
end
end
end
Now you have a matrix output where the elements which are 1 identify a match between the last element in way in the struct row ii and the vector struct(jj).way, where ii are the matrix row numbers and jj the column numbers.

how to check the values of each variables from a resultant matrix in matlab?

I have sum of 3 cell arrays
A=72x1
B=72x720
C=72x90
resultant=A+B+C
size of resultant=72x64800
now when I find the minimum value with row and column indices I can locate the row element easily but how can I locate the column element in variables?
for example
after dong calculations for A,B,C I added them all and got a resultant in from of <72x(720x90)> or can say a matrix of integers of size <72x64800> then I found the minimum value of resultant with row and column index using the code below.
[minimumValue,ind]=min(resultant(:));
[row,col]=find(result== minimumValue);
then row got 14 and column got 6840 value..
now I can trace row 14 of all A,B,C variables easily but how can I know that the resultant column 6480 belongs to which combination of A,B,C?
Instead of using find, use the ind output from the min function. This is the linear index for minimumValue. To do that you can use ind2sub:
[r,c] = ind2sub(size(resultant),ind);
It is not quite clear what do you mean by resultant = A+B+C since you clearly don't sum them if you get a bigger array (72x64800), on the other hand, this is not a simple concatenation ([A B C]) since this would result in a 72x811 array.
However, assuming this is a concatenation you can do the following:
% get the 2nd dimension size of all matrices:
cols = cellfun(#(x) size(x,2),{A,B,C})
% create a vector with reapiting matrices names for all their columns:
mats = repelem(['A' 'B' 'C'],cols);
% get the relevant matrix for the c column:
mats(c)
so mats(c) will be the matrix with the minimum value.
EDIT:
From your comment I understand that your code looks something like this:
% arbitrary data:
A = rand(72,1);
B = rand(72,720);
C = rand(72,90);
% initializing:
K = size(B,2);
N = size(C,2);
counter = 1;
resultant = zeros(72,K*N);
% summing:
for k = 1:K
for n = 1:N
resultant(:,counter) = A + B(:,k) + C(:,n);
counter = counter+1;
end
end
% finding the minimum value:
[minimumValue,ind] = min(resultant(:))
and from the start of the answer you know that you can do this:
[r,c] = ind2sub(size(resultant),ind)
to get the row and column of minimumValue in resultant. So, in the same way you can do:
[Ccol,Bcol] = ind2sub([N,K],c)
where Bcol and Ccol is the column in B and C, respectively, so that:
minimumValue == A(r) + B(r,Bcol) + C(r,Ccol)
To see how it's working imagine that the loop above fills a matrix M with the value of counter, and M has a size of N-by-K. Because we fill M with a linear index, it will be filled in a column-major way, so the row will correspond to the n iterator, and the column will correspond to the k iterator. Now c corresponds to the counter where we got the minimum value, and the row and column of counter in M tells us the columns in B and C, so we can use ind2sub again to get the subscripts of the position of counter. Off course, we don't really need to create M, because the values within it are just the linear indices themselves.

How to shuffle the rows in a cell array

I have a cell array with x columns, each with a yx1 cell. I would like to randomize the "rows" within the columns. That is, for each yx1 cell with elements a_1, a_2, ... a_y, I would like to apply the same permutation to the indices of a_i.
I've got a function that does this,
function[Oarray] = shuffleCellArray(Iarray);
len = length(Iarray{1});
width = length(Iarray);
perm = randperm(len);
Oarray=cell(width, 0);
for i=1:width;
for j=1:len;
Oarray{i}{j}=Iarray{i}{perm(j)};
end;
end;
but as you can see it's a bit ugly. Is there a more natural way to do this?
I realize that I'm probably using the wrong data type, but for legacy reasons I'd like to avoid switching. But, if the answer is "switch" then I guess that's the answer.
I'm assuming you have a cell array of column vectors, such as
Iarray = {(1:5).' (10:10:50).' (100:100:500).'};
In that case, you could do it this way:
ind = randperm(numel(Iarray{1})); %// random permutation
Oarray = cellfun(#(x) x(ind), Iarray, 'UniformOutput', 0); %// apply that permutation
%// to each "column"
Or converting to an intermediate matrix and then back to a cell array:
ind = randperm(numel(Iarray{1})); %// random permutation
x = cat(2,Iarray{:}); %// convert to matrix
Oarray = mat2cell(x(ind,:), size(x,1), ones(1,size(x,2))); %// apply permutation to rows
%// and convert back