Join rows in Matrix - matlab

I have a very big matrix that looks like this:
id,value
1,434
2,454353
1,4353
3,3432
3,4323
[...]
There can be at most 2 rows with the same id.
I want to reshape the matrix into the following, preferably removing the id's which only appear once:
id,value1,value2
1,434,4353
3,3432,4323
[...]

Here is an alternative using accumarray to identify values sharing the same index. The code is commented and you can have a look at every intermediary output to see what exactly is going on.
clear
clc
%// Create matrix with your data
id = [1;2;1;3;3];
value = [434 ;454353;4353;3432;4323];
M = [id value]
%// Find unique indices to build final output.
UniqueIdx = unique(M(:,1),'rows')
%// Find values corresponding to every index. Use cell array to account for different sized outputs.
NewM = accumarray(id,value,[],#(x) {x})
%// Get number of elements
NumElements = cellfun(#(x) size(x,1),NewM)
%// Discard rows having orphan index.
NewM(NumElements==1) = [];
UniqueIdx(NumElements==1) = [];
%// Build Output.
Results = [UniqueIdx NewM{1} NewM{2}]
And the output. I can't use the function table to build a nice output but if you do the result looks much nicer :)
Results =
1 434 3432
3 4353 4323

This code does the interesting job of sorting the matrix according to the id and removing the orphans.
x = sortrows(x,1); % sort x according to index
idx = x(:,1);
idxs = 1:max(idx);
rm = idxs(hist(idx, idxs) == 1); %find orphans
x( ismember(x(:,1),rm), : ) = [] %remove orphans
This last part then just shapes the array the way you want it
y = reshape(x', 4, []);
y( 3, : ) = [];
y=y';

Related

Generating all combinations without repetition using MATLAB

i have 4 sets each contain 6 elements from which I want to generate all possible vectors of size 8 were the first two elements are from set1 second 2 from set2 third 2 from set3 forth 2 from set4 without repetition in the points taken from each set such that the elements 1,2 / 3,4 / 5,6/ 7,8 are always different. My target number combinations is (6choose2)^4 . Any help please.
D1=[2+2i,2+1i,1+2i,1+1i,2,1i];
D2=[-2+2i,-2+1i,-1+2i,-1+1i,-1,2i];
D3=[-2-2i,-2-i,-1-i,-1-1i,-2,-1i];
D4=[2-2i,2-i,1-2i,-1+1i,1,-2i];
So I found a way to get your combinations. You should really have given an more minimal example to explain your problem (that's how I solved it by the way).
The procedure is:
Get all the {2 element} unique combination for each set.
Then build an index of the result you obtain. Normally there should be an index for each subset but since they are all the same length, the number of unique combinations will be the same so you can just reuse 4x the same index.
Get all the combinations of these 4 sets of indices
Finally, rebuild the final matrix based on the indices combinations
The code look like:
%// prepare a few helper numbers
nSets = 4 ;
nElemPerSet = 2 ;
nCombs = nchoosek( numel(D1) ,nElemPerSet).^nSets ; %// <= nCombs=50625
%// for each set, get the unique combinations of 2 elements
s1 = nchoosek( D1 , nElemPerSet ) ;
s2 = nchoosek( D2 , nElemPerSet ) ;
s3 = nchoosek( D3 , nElemPerSet ) ;
s4 = nchoosek( D4 , nElemPerSet ) ;
%// now get the index of all the combinations of the above subsets
s = 1:size(s1,1) ;
combindex = all_combinations( repmat({s},1,4) ) ; %// <= size(combindex)=[50625 4]
%// now rebuild the full combinations based on above indices
combinations = zeros( nCombs , nSets*nElemPerSet ) ;
for ic = 1:nCombs
combinations(ic,:) = [s1(combindex(ic,1),:) s2(combindex(ic,2),:) s3(combindex(ic,3),:) s4(combindex(ic,4),:)] ;
end
There is probably a way to get rid of the last loop with an intelligent use of arrayfun but I leave that as an exercise to the reader.
This code works for your initial values of D1, D2, D3 and D4 as described in your question, but if you or anybody want to run it step by step to understand what's happening, I strongly recommend to try it with much simpler starting values. Something like:
%// define 4 non-complex sets of 4 values each (all different)
nVal=4 ;
D1 = 1:nVal ;
D2 = D1(end)+1:D1(end)+nVal ;
D3 = D2(end)+1:D2(end)+nVal ;
D4 = D3(end)+1:D3(end)+nVal ;
Note the use of the function all_combinations. This is just the answer I was mentioning in the comment (Generate a matrix containing all combinations of elements taken from n vectors) repackaged in a function. I suggest you have a look and bookmark it if you deal with combination problem often (also you can upvote it if it helps you, which it does here).
The repackaged function is:
function combs = all_combinations( vectors )
%// function combs = all_combinations( vectors )
%//
%// example input :
%// vectors = { [1 2], [3 6 9], [10 20] }; %//cell array of vectors
%//
%// Credit: Luis Mendo : https://stackoverflow.com/questions/21895335/generate-a-matrix-containing-all-combinations-of-elements-taken-from-n-vectors
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix

Generate random 2D matrix with unique rows in octave/matlab

I want to generate a 2D matrix(1000x3) with random values in the range of 1 to 10 in octave. Using randi(10,1000,3) will generate a matrix with repeated row values. But I want to generate unique(unrepeated) rows. Is there any way that, I can do that?
You can do that easily by getting the cartesian product to create all possibilities and shuffle the array as follows. To create the cartesian product, you will need my custom cartprod.m function that generates a cartesian product.
C = cartprod(1:10,1:10,1:10);
The following line then shuffles the cartesian product C.
S = C(randperm( size(C,1) ),:);
Notes:
Every row in S is unique and you can verify that size( unique( S ) ) == 1000.
I should note that this code works on Matlab 2015a. I haven't tested it in Octave, which is what OP seems to be using. I've been told the syntax is pretty much identical though.
You can generate all possible three-item sequences drawn from 1 through 10, with replacement, using the following function:
function result = nchoosek_replacement(n, k)
%// Edge cases: just return an empty matrix
if k < 1 || n < 1 || k >= n
result = [];
return
end
reps = n^(k-1);
result = zeros(n^k, k);
cur_col = repmat(1:n, reps, 1);
result(:,1) = cur_col(:);
%// Base case: when k is 1, just return the
%// fully populated matrix 'result'
if k == 1
return
end
%// Recursively generate a matrix that will
%// be used to populate columns 2:end
next = nchoosek_replacement(n, k-1);
%// Repeatedly use the matrix above to
%// populate the matrix 'result'
for i = 1:n
cur_range = (i-1)*reps+1:i*reps;
result(cur_range, 2:end) = next;
end
end
With this function defined, you can now generate all possible sequences. In this case there are exactly 1000 so they could simply be shuffled with randperm. A more general approach is to sample from them with randsample, which would also allow for smaller matrices if desired:
max_value = 10;
row_size = 3;
num_rows = 1000;
possible = nchoosek_replacement(max_value, row_size);
indices = randsample(size(possible, 1), num_rows);
data = possible(indices, :);

randomly disperse numbers in array

I am trying to randomly disperse different numbers in MATLAB array:
I have two 3's, four 2's and I want to randomly populate ones vector (size 10,1).
End result look something like this:
A = [1;3;1;2;3;2;2;1;1;2;1;1]
Then I want to fix the values in A but add more random elements but I can only replace with higher numbers:
For example, to the matrix above I will randomly add two more 2's and two more 3's giving something like this
A= [3;3;2;2;3;2;2;2;1;2;1;3]
M = [3;3;2;2;2;2];
M(end+1:end+4) = 1;
M=M(randperm(10))
The second half of your question needs a lot of clarification.
First part
You can use randsample for that:
A = ones(1,12); %// original values
v = [3 3 2 2 2 2]; %// values to "disperse" in A
ind_replace = randsample(1:numel(A), numel(v)); %// index of entries to be replaced
A(ind_replace) = v;
If you don't have randsample (which is part of the Statistics Toolbox), use randperm and select the first few elements:
ind_replace = randperm(numel(A));
ind_replace = ind_replace(1:numel(v));
A(ind) = v;
Second part
To only replace entries which equal 1:
v = [2 2 3 3]; %// values to "disperse" among the 1 values in A
ind_ones = find(A==1); %// index of entries which equal one
ind_replace = randsample(1:numel(ind_ones), numel(v)); %// index within the above
%// Or: ind_replace = randperm(numel(ind_ones));
%// ind_replace = ind_replace(1:numel(v));
A(ind_ones(ind_replace)) = v;
Note this generalizes the first part, that is, it can also be used when all entries of A equal 1.

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

Sum every n rows of matrix

Is there any way that I can sum up columns values for each group of three rows in a matrix?
I can sum three rows up in a manual way.
For example
% matrix is the one I wanna store the new data.
% data is the original dataset.
matrix(1,1:end) = sum(data(1:3, 1:end))
matrix(2,1:end) = sum(data(4:6, 1:end))
...
But if the dataset is huge, this wouldn't work.
Is there any way to do this automatically without loops?
Here are four other ways:
The obligatory for-loop:
% for-loop over each three rows
matrix = zeros(size(data,1)/3, size(data,2));
counter = 1;
for i=1:3:size(data,1)
matrix(counter,:) = sum(data(i:i+3-1,:));
counter = counter + 1;
end
Using mat2cell for tiling:
% divide each three rows into a cell
matrix = mat2cell(data, ones(1,size(data,1)/3)*3);
% compute the sum of rows in each cell
matrix = cell2mat(cellfun(#sum, matrix, 'UniformOutput',false));
Using third dimension (based on this):
% put each three row into a separate 3rd dimension slice
matrix = permute(reshape(data', [], 3, size(data,1)/3), [2 1 3]);
% sum rows, and put back together
matrix = permute(sum(matrix), [3 2 1]);
Using accumarray:
% build array of group indices [1,1,1,2,2,2,3,3,3,...]
idx = floor(((1:size(data,1))' - 1)/3) + 1;
% use it to accumulate rows (appliead to each column separately)
matrix = cell2mat(arrayfun(#(i)accumarray(idx,data(:,i)), 1:size(data,2), ...
'UniformOutput',false));
Of course all the solution so far assume that the number of rows is evenly divisble by 3.
This one-liner reshapes so that all the values needed for a particular cell are in a column, does the sum, and then reshapes the back to the expected shape.
reshape(sum(reshape(data, 3, [])), [], size(data, 2))
The naked 3 could be changed if you want to sum a different number of rows together. It's on you to make sure the number of rows in each group divides evenly.
Slice the matrix into three pieces and add them together:
matrix = data(1:3:end, :) + data(2:3:end, :) + data(3:3:end, :);
This will give an error if size(data,1) is not a multiple of three, since the three pieces wouldn't be the same size. If appropriate to your data, you might work around that by truncating data, or appending some zeros to the end.
You could also do something fancy with reshape and 3D arrays. But I would prefer the above (unless you need to replace 3 with a variable...)
Prashant answered nicely before but I would have a simple amendment:
fl = filterLength;
A = yourVector (where mod(A,fl)==0)
sum(reshape(A,fl,[]),1).'/fl;
There is the ",1" that makes the line run even when fl==1 (original values).
I discovered this while running it in a for loop like so:
... read A ...
% Plot data
hold on;
averageFactors = [1 3 10 30 100 300 1000];
colors = hsv(length(averageFactors));
clear legendTxt;
for i=1:length(averageFactors)
% ------ FILTERING ----------
clear Atrunc;
clear ttrunc;
clear B;
fl = averageFactors(i); % filter length
Atrunc = A(1:L-mod(L,fl),:);
ttrunc = t(1:L-mod(L,fl),:);
B = sum(reshape(Atrunc,fl,[]),1).'/fl;
tB = sum(reshape(ttrunc,fl,[]),1).'/fl;
length(B)
plot(tB,B,'color',colors(i,:) )
%kbhit ()
endfor