Generate random 2D matrix with unique rows in octave/matlab - matlab

I want to generate a 2D matrix(1000x3) with random values in the range of 1 to 10 in octave. Using randi(10,1000,3) will generate a matrix with repeated row values. But I want to generate unique(unrepeated) rows. Is there any way that, I can do that?

You can do that easily by getting the cartesian product to create all possibilities and shuffle the array as follows. To create the cartesian product, you will need my custom cartprod.m function that generates a cartesian product.
C = cartprod(1:10,1:10,1:10);
The following line then shuffles the cartesian product C.
S = C(randperm( size(C,1) ),:);
Notes:
Every row in S is unique and you can verify that size( unique( S ) ) == 1000.
I should note that this code works on Matlab 2015a. I haven't tested it in Octave, which is what OP seems to be using. I've been told the syntax is pretty much identical though.

You can generate all possible three-item sequences drawn from 1 through 10, with replacement, using the following function:
function result = nchoosek_replacement(n, k)
%// Edge cases: just return an empty matrix
if k < 1 || n < 1 || k >= n
result = [];
return
end
reps = n^(k-1);
result = zeros(n^k, k);
cur_col = repmat(1:n, reps, 1);
result(:,1) = cur_col(:);
%// Base case: when k is 1, just return the
%// fully populated matrix 'result'
if k == 1
return
end
%// Recursively generate a matrix that will
%// be used to populate columns 2:end
next = nchoosek_replacement(n, k-1);
%// Repeatedly use the matrix above to
%// populate the matrix 'result'
for i = 1:n
cur_range = (i-1)*reps+1:i*reps;
result(cur_range, 2:end) = next;
end
end
With this function defined, you can now generate all possible sequences. In this case there are exactly 1000 so they could simply be shuffled with randperm. A more general approach is to sample from them with randsample, which would also allow for smaller matrices if desired:
max_value = 10;
row_size = 3;
num_rows = 1000;
possible = nchoosek_replacement(max_value, row_size);
indices = randsample(size(possible, 1), num_rows);
data = possible(indices, :);

Related

Vectorize MATLAB code

Let's say we have three m-by-n matrices of equal size: A, B, C.
Every column in C represents a time series.
A is the running maximum (over a fixed window length) of each time series in C.
B is the running minimum (over a fixed window length) of each time series in C.
Is there a way to determine T in a vectorized way?
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows %loop over the rows (except row #1).
for col = 1:ncols %loop over the columns.
if C(row, col) > A(row-1, col)
T(row, col) = 1;
elseif C(row, col) < B(row-1, col)
T(row, col) = -1;
else
T(row, col) = T(row-1, col);
end
end
end
This is what I've come up with so far:
T = zeros(m, n);
T(C > circshift(A,1)) = 1;
T(C < circshift(B,1)) = -1;
Well, the trouble was the dependency with the ELSE part of the conditional statement. So, after a long mental work-out, here's a way I summed up to vectorize the hell-outta everything.
Now, this approach is based on mapping. We get column-wise runs or islands of 1s corresponding to the 2D mask for the ELSE part and assign them the same tags. Then, we go to the start-1 along each column of each such run and store that value. Finally, indexing into each such start-1 with those tagged numbers, which would work as mapping indices would give us all the elements that are to be set in the new output.
Here's the implementation to fulfill all those aspirations -
%// Store sizes
[m1,n1] = size(A);
%// Masks corresponding to three conditions
mask1 = C(2:nrows,:) > A(1:nrows-1,:);
mask2 = C(2:nrows,:) < B(1:nrows-1,:);
mask3 = ~(mask1 | mask2);
%// All but mask3 set values as output
out = [zeros(1,n1) ; mask1 + (-1*(~mask1 & mask2))];
%// Proceed if any element in mask3 is set
if any(mask3(:))
%// Row vectors for appending onto matrices for matching up sizes
mask_appd = false(1,n1);
row_appd = zeros(1,n1);
%// Get 2D mapped indices
df = diff([mask_appd ; mask3],[],1)==1;
cdf = cumsum(df,1);
offset = cumsum([0 max(cdf(:,1:end-1),[],1)]);
map_idx = bsxfun(#plus,cdf,offset);
map_idx(map_idx==0) = 1;
%// Extract the values to be used for setting into new places
A1 = out([df ; false(1,n1)]);
%// Map with the indices obtained earlier and set at places from mask3
newval = [row_appd ; A1(map_idx)];
mask3_appd = [mask_appd ; mask3];
out(mask3_appd) = newval(mask3_appd);
end
Doing this vectorized is rather difficult because the current row's output depends on the previous row's output. Doing vectorized operations usually means that each element should stand out on its own using some relationship that is independent of the other elements that surround it.
I don't have any input on how you would achieve this without a for loop but I can help you reduce your operations down to one instead of two. You can do the assignment vectorized per row, but I can't see how you'd do it all in one shot.
As such, try something like this instead:
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows
out = T(row-1,:); %// Change - Make a copy of the previous row
out(C(row,:) > A(row-1,:)) = 1; %// Set those elements of C
%// in the current row that are larger
%// than the previous row of A to 1
out(C(row,:) < B(row-1,:)) = -1; %// Same logic but for B now and it's
%// less than and the value is -1 instead
T(row,:) = out; %// Assign to the output
end
I'm currently figuring out how to do this with any loops whatsoever. I'll keep you posted.

Join rows in Matrix

I have a very big matrix that looks like this:
id,value
1,434
2,454353
1,4353
3,3432
3,4323
[...]
There can be at most 2 rows with the same id.
I want to reshape the matrix into the following, preferably removing the id's which only appear once:
id,value1,value2
1,434,4353
3,3432,4323
[...]
Here is an alternative using accumarray to identify values sharing the same index. The code is commented and you can have a look at every intermediary output to see what exactly is going on.
clear
clc
%// Create matrix with your data
id = [1;2;1;3;3];
value = [434 ;454353;4353;3432;4323];
M = [id value]
%// Find unique indices to build final output.
UniqueIdx = unique(M(:,1),'rows')
%// Find values corresponding to every index. Use cell array to account for different sized outputs.
NewM = accumarray(id,value,[],#(x) {x})
%// Get number of elements
NumElements = cellfun(#(x) size(x,1),NewM)
%// Discard rows having orphan index.
NewM(NumElements==1) = [];
UniqueIdx(NumElements==1) = [];
%// Build Output.
Results = [UniqueIdx NewM{1} NewM{2}]
And the output. I can't use the function table to build a nice output but if you do the result looks much nicer :)
Results =
1 434 3432
3 4353 4323
This code does the interesting job of sorting the matrix according to the id and removing the orphans.
x = sortrows(x,1); % sort x according to index
idx = x(:,1);
idxs = 1:max(idx);
rm = idxs(hist(idx, idxs) == 1); %find orphans
x( ismember(x(:,1),rm), : ) = [] %remove orphans
This last part then just shapes the array the way you want it
y = reshape(x', 4, []);
y( 3, : ) = [];
y=y';

Find the product of all entries of vector x

Here is what I am trying to do:
Let x be a vector with n entries x1,x2,...xn. Write a mat-lab program which computes the vector p with entries defined by
pk = X1*X2....Xk-1*Xk+1...Xn.
for each k =1,2,...n.
pk is the product of all the entries of x except xk. (use prod command of compute the product of all the entries, then divide by xk). Take the appropriate special action if either one of more the entries of x is zero. Using vectors throughout and no 'for' loop.
I spent too much time to figure out this problem. I still could not get it. Please help!
Brute force:
n = numel(x);
X = repmat(x(:),1,n); %// put vector in column form and repeat
X(1:n+1:end) = 1; %// make diagonal 1
result = prod(X); %// product of each column
Saving computations:
ind = find(x==0);
if numel(ind)>1 %// result is all zeros
result = zeros(size(x));
elseif numel(ind)==1 %// result is all zeros except at one entry
result = zeros(size(x));
result(ind) = prod(nonzeros(x));
else %// compute product of all elements and divide by each element
result = prod(x)./x;
end

Removing a random number of columns from a matrix

I need to take away a random number of columns from an arbitrarily large matrix, I've put my attempt below, but I'm certain that there is a better way.
function new = reduceMatrices(original, colsToTakeAway)
a = colsToTakeAway(1);
b = colsToTakeAway(2);
c = colsToTakeAway(3);
x = original(1:a-1);
y = original(a+1:b-1);
z = original(b+1:c-1);
if c == size(original, 2);
new = [x,y,z];
elseif (c+1) == size(original, 2);
new = [x,y,z,c+1]
else
new = [x,y,z,c+1:size(original, 2)];
end
Here's one approach. First, generate a row vector of random numbers with numcols elements, where numcols is the number of columns in the original matrix:
rc = rand(1,numcols)
Next make a vector of 1s and 0s from this, for example
lv = rc>0.75
which will produce something like
0 1 1 0 1
and you can use Matlab's logical indexing feature to write
original(:,lv)
which will return only those columns of original which correspond to the 1s in lv.
It's not entirely clear from your question how you want to make the vector of column selections, but this should give you some ideas.
function newM = reduceMatrices(original, colsToTakeAway)
% define the columns to keep := cols \ colsToTakeAway
colsToKeep = setdiff(1:size(original,2), colsToTakeAway);
newM = original(:, colsToKeep);
end

Apply function to all rows

I have a function, ranker, that takes a vector and assigns numerical ranks to it in ascending order. For example,
ranker([5 1 3 600]) = [3 1 2 4] or
ranker([42 300 42 42 1 42] = [3.5 6 3.5 3.5 1 3.5] .
I am using a matrix, variable_data and I want to apply the ranker function to each row for all rows in variable data. This is my current solution, but I feel there is a way to vectorize it and have it as equally fast :p
variable_ranks = nan(size(variable_data));
for i=1:1:numel(nmac_ids)
variable_ranks(i,:) = ranker(abs(variable_data(i,:)));
end
If you place the matrix rows into a cell array, you can then apply a function to each cell.
Consider this simple example of applying the SORT function to each row
a = rand(10,3);
b = cell2mat( cellfun(#sort, num2cell(a,2), 'UniformOutput',false) );
%# same as: b = sort(a,2);
You can even do this:
b = cell2mat( arrayfun(#(i) sort(a(i,:)), 1:size(a,1), 'UniformOutput',false)' );
Again, you version with the for loop is probably faster..
With collaboration from Amro and Jonas
variable_ranks = tiedrank(variable_data')';
Ranker has been replaced by the Matlab function in the Stat toolbox (sorry for those who don't have it),
[R,TIEADJ] = tiedrank(X) computes the
ranks of the values in the vector X.
If any X values are tied, tiedrank
computes their average rank. The
return value TIEADJ is an adjustment
for ties required by the nonparametric
tests signrank and ranksum, and for
the computation of Spearman's rank
correlation.
TIEDRANK will compute along columns in Matlab 7.9.0 (R2009b), however it is undocumented. So by transposing the input matrix, rows turn into columns and will rank them. The second transpose is then used to organize the data in the same manner as the input. There in essence is a very classy hack :p
One way would be to rewrite ranker to take array input
sizeData = size(variable_data);
[sortedData,almostRanks] = sort(abs(variable_data),2);
[rowIdx,colIdx] = ndgrid(1:sizeData(1),1:sizeData(2));
linIdx = sub2ind(sizeData,rowIdx,almostRanks);
variable_ranks = variable_data;
variable_ranks(linIdx) = colIdx;
%# break ties by finding subsequent equal entries in sorted data
[rr,cc] = find(diff(sortedData,1,2) == 0);
ii = sub2ind(sizeData,rr,cc);
ii2 = sub2ind(sizeData,rr,cc+1);
ii = sub2ind(sizeData,rr,almostRanks(ii));
ii2 = sub2ind(sizeData,rr,almostRanks(ii2));
variable_ranks(ii) = variable_ranks(ii2);
EDIT
Instead, you can just use TIEDRANK from TMW (thanks, #Amro):
variable_rank = tiedrank(variable_data')';
I wrote a function that does this, it's on the FileExchange tiedrank_(X,dim). And it looks like this...
%[Step 0a]: force dim to be 1, and compress everything else into a single
%dimension. We will reverse this process at the end.
if dim > 1
otherDims = 1:length(size(X));
otherDims(dim) = [];
perm = [dim otherDims];
X = permute(X,perm);
end
originalSiz = size(X);
X = reshape(X,originalSiz(1),[]);
siz = size(X);
%[Step 1]: sort and get sorting indicies
[X,Ind] = sort(X,1);
%[Step 2]: create matrix [D], which has +1 at the start of consecutive runs
% and -1 at the end, with zeros elsewhere.
D = zeros(siz,'int8');
D(2:end-1,:) = diff(X(1:end-1,:) == X(2:end,:));
D(1,:) = X(1,:) == X(2,:);
D(end,:) = -( X(end,:) == X(end-1,:) );
clear X
%[Step 3]: calculate the averaged rank for each consecutive run
[a,~] = find(D);
a = reshape(a,2,[]);
h = sum(a,1)/2;
%[Step 4]: insert the troublseome ranks in the relevant places
L = zeros(siz);
L(D==1) = h;
L(D==-1) = -h;
L = cumsum(L);
L(D==-1) = h; %cumsum set these ranks to zero, but we wanted them to be h
clear D h
%[Step 5]: insert the simple ranks (i.e. the ones that didn't clash)
[L(~L),~] = find(~L);
%[Step 6]: assign the ranks to the relevant position in the matrix
Ind = bsxfun(#plus,Ind,(0:siz(2)-1)*siz(1)); %equivalent to using sub2ind + repmat
r(Ind) = L;
%[Step 0b]: As promissed, we reinstate the correct dimensional shape and order
r = reshape(r,originalSiz);
if dim > 1
r = ipermute(r,perm);
end
I hope that helps someone.