MATLAB : Get k random subscripts with condition from a Matrix

MATLAB : Get k random subscripts with condition from a Matrix - matlab

I'd like get k random subscripts from A to be used in a C++ program with currently load data from a file given a set of subscripts.
I've a MxN matrix A with double values and a Mx1 matrix B with integers from 0 to 20.
How could i get k random subscripts from A with a condition from B ? In example, given:
A = [ 0.25 0.25 0.25 0.25
0.18 0.18 0.18 0.18
0.36 0.36 0.36 0.36
0.51 0.51 0.51 0.51 ]
B = [ 0
1
2
1 ]
I'm trying to get k = 1 random row subscript i from A if B(i) == 1. So, I'm looking for i == 1 or i == 4.
I've tried first create a logical index as:
idx = B == 1;
And then, get from A the elements with that condition as follow:
r = A( idx, : );
And finally, do a permutation in r to in order to get k rows:
randperm( size(r) )
But I'm stuck now as I do not know how to translate the permutation to the matrix A.
I'm also trying to understand the function [I,J] = ind2sub(siz,IND) but no idea right now how to join the subscripts with the randon permutation. Moreover, the results of randperm( size(r) ) are related to the size of r.
So, How could i get k random subscripts from A with a condition from B ? The idea is use the subscripts in a C++ program as input parameters

You can get the relevant row numbers as follows:
nrs=1:length(idx);
nrs=nrs(idx);
Now to permute them:
nrs_permidx = randperm(length(nrs))
permuted = nrs(nrs_permidx)
And I guess you can take it from here.

Related

vectorize lookup values in table of interval limits

Here is a question about whether we can use vectorization type of operation in matlab to avoid writing for loop.
I have a vector
Q = [0.1,0.3,0.6,1.0]
I generate a uniformly distributed random vector over [0,1)
X = [0.11,0.72,0.32,0.94]
I want to know whether each entry of X is between [0,0.1) or [0.1,0.3) or [0.3,0.6), or [0.6,1.0) and I want to return a vector which contains the index of the maximum element in Q that each entry of X is less than.
I could write a for loop
Y = zeros(length(X),1)
for i = 1:1:length(X)
Y(i) = find(X(i)<Q, 1);
end
Expected result for this example:
Y = [2,4,3,4]
But I wonder if there is a way to avoid writing for loop? (I see many very good answers to my question. Thank you so much! Now if we go one step further, what if my Q is a matrix, such that I want check whether )
Y = zeros(length(X),1)
for i = 1:1:length(X)
Y(i) = find(X(i)<Q(i), 1);
end

Use the second output of max, which acts as a sort of "vectorized find":
[~, Y] = max(bsxfun(#lt, X(:).', Q(:)), [], 1);
How this works:
For each element of X, test if it is less than each element of Q. This is done with bsxfun(#lt, X(:).', Q(:)). Note each column in the result corresponds to an element of X, and each row to an element of Q.
Then, for each element of X, get the index of the first element of Q for which that comparison is true. This is done with [~, Y] = max(..., [], 1). Note that the second output of max returns the index of the first maximizer (along the specified dimension), so in this case it gives the index of the first true in each column.
For your example values,
Q = [0.1, 0.3, 0.6, 1.0];
X = [0.11, 0.72, 0.32, 0.94];
[~, Y] = max(bsxfun(#lt, X(:).', Q(:)), [], 1);
gives
Y =
2 4 3 4

Using bsxfun will help accomplish this. You'll need to read about it. I also added a Q = 0 at the beginning to handle the small X case
X = [0.11,0.72,0.32,0.94 0.01];
Q = [0.1,0.3,0.6,1.0];
Q_extra = [0 Q];
Diff = bsxfun(#minus,X(:)',Q_extra (:)); %vectorized subtraction
logical_matrix = diff(Diff < 0); %find the transition from neg to positive
[X_categories,~] = find(logical_matrix == true); % get indices
% output is 2 4 3 4 1
EDIT: How long does each method take?
I got curious about the difference between each solution:
Test Code Below:
Q = [0,0.1,0.3,0.6,1.0];
X = rand(1,1e3);
tic
Y = zeros(length(X),1);
for i = 1:1:length(X)
Y(i) = find(X(i)<Q, 1);
end
toc
tic
result = arrayfun(#(x)find(x < Q, 1), X);
toc
tic
Q = [0 Q];
Diff = bsxfun(#minus,X(:)',Q(:)); %vectorized subtraction
logical_matrix = diff(Diff < 0); %find the transition from neg to positive
[X_categories,~] = find(logical_matrix == true); % get indices
toc
Run it for yourself, I found that when the size of X was 1e6, bsxfun was much faster, while for smaller arrays the differences were varying and negligible.
SAMPLE: when size X was 1e3
Elapsed time is 0.001582 seconds. % for loop
Elapsed time is 0.007324 seconds. % anonymous function
Elapsed time is 0.000785 seconds. % bsxfun

Octave has a function lookup to do exactly that. It takes a lookup table of sorted values and an array, and returns an array with indices for values in the lookup table.
octave> Q = [0.1 0.3 0.6 1.0];
octave> x = [0.11 0.72 0.32 0.94];
octave> lookup (Q, X)
ans =
1 3 2 3
The only issue is that your lookup table has an implicit zero which be fixed easily with:
octave> lookup ([0 Q], X) # alternatively, just add 1 at the results
ans =
2 4 3 4

You can create an anonymous function to perform the comparison, then apply it to each member of X using arrayfun:
compareFunc = #(x)find(x < Q, 1);
result = arrayfun(compareFunc, X, 'UniformOutput', 1);
The Q array will be stored in the anonymous function ( compareFunc ) when the anonymous function is created.
Or, as one line (Uniform Output is the default behavior of arrayfun):
result = arrayfun(#(x)find(x < Q, 1), X);

Octave does a neat auto-vectorization trick for you if the vectors you have are along different dimensions. If you make Q a column vector, you can do this:
X = [0.11, 0.72, 0.32, 0.94];
Q = [0.1; 0.3; 0.6; 1.0; 2.0; 3.0];
X <= Q
The result is a 6x4 matrix indicating which elements of Q each element of X is less than. I made Q a different length than X just to illustrate this:
0 0 0 0
1 0 0 0
1 0 1 0
1 1 1 1
1 1 1 1
1 1 1 1
Going back to the original example you have, you can do
length(Q) - sum(X <= Q) + 1
to get
2 4 3 4
Notice that I have semicolons instead of commas in the definition of Q. If you want to make it a column vector after defining it, do something like this instead:
length(Q) - sum(X <= Q') + 1
The reason that this works is that Octave implicitly applies bsxfun to an operation on a row and column vector. MATLAB will not do this until R2016b according to #excaza's comment, so in MATLAB you can do this:
length(Q) - sum(bsxfun(#le, X, Q)) + 1
You can play around with this example in IDEOne here.

Inspired by the solution posted by #Mad Physicist, here is my solution.
Q = [0.1,0.3,0.6,1.0]
X = [0.11,0.72,0.32,0.94]
Temp = repmat(X',1,4)<repmat(Q,4,1)
[~, ind]= max( Temp~=0, [], 2 );
The idea is that make the X and Q into the "same shape", then use element wise comparison, then we obtain a logical matrix whose row tells whether a given element in X is less than each of the element in Q, then return the first non-zero index of each row of this logical matrix. I haven't tested how fast this method is comparing to other methods

How to zero out the centre k by k matrix in an input matrix with odd number of columns and rows

I am trying to solve this problem:
Write a function called cancel_middle that takes A, an n-by-m
matrix, as an input where both n and m are odd numbers and k, a positive
odd integer that is smaller than both m and n (the function does not have to
check the input). The function returns the input matrix with its center k-by-k
matrix zeroed out.
Check out the following run:
>> cancel_middle(ones(5),3)
ans =
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
My code works only when k=3. How can I generalize it for all odd values of k? Here's what I have so far:
function test(n,m,k)
A = ones(n,m);
B = zeros(k);
A((end+1)/2,(end+1)/2)=B((end+1)/2,(end+1)/2);
A(((end+1)/2)-1,((end+1)/2)-1)= B(1,1);
A(((end+1)/2)-1,((end+1)/2))= B(1,2);
A(((end+1)/2)-1,((end+1)/2)+1)= B(1,3);
A(((end+1)/2),((end+1)/2)-1)= B(2,1);
A(((end+1)/2),((end+1)/2)+1)= B(2,3);
A(((end+1)/2)+1,((end+1)/2)-1)= B(3,1);
A(((end+1)/2)+1,((end+1)/2))= B(3,2);
A((end+1)/2+1,(end+1)/2+1)=B(3,3)
end

You can simplify your code. Please have a look at
Matrix Indexing in MATLAB. "one or both of the row and column subscripts can be vectors", i.e. you can define a submatrix. Then you simply need to do the indexing correct: as you have odd numbers just subtract m-k and n-k and you have the number of elements left from your old matrix A. If you divide it by 2 you get the padding on the left/right, top/bottom. And another +1/-1 because of Matlab indexing.
% Generate test data
n = 13;
m = 11;
A = reshape( 1:m*n, n, m )
k = 3;
% Do the calculations
start_row = (n-k)/2 + 1
start_col = (m-k)/2 + 1
A( start_row:start_row+k-1, start_col:start_col+k-1 ) = zeros( k )

function b = cancel_middle(a,k)
[n,m] = size(a);
start_row = (n-k)/2 + 1;
start_column = (m-k)/2 + 1;
end_row = (n-k)/2 + k;
end_column = (m-k)/2 + k;
a(start_row:end_row,start_column:end_column) = 0;
b = a;
end
I have made a function in an m file called cancel_middle and it basically converts the central k by k matrix as a zero matrix with the same dimensions i.e. k by k.
the rest of the matrix remains the same. It is a general function and you'll need to give 2 inputs i.e the matrix you want to convert and the order of submatrix, which is k.

Gcd of polynomials modulo k

I want to ask Matlab to tell me, for example, the greatest common divisor of polynomials of x^4+x^3+2x+2 and x^3+x^2+x+1 over fields like Z_3[x] (where an answer is x+1) and Z_5[x] (where an answer is x^2-x+2).
Any ideas how I would implement this?

Here's a simple implementation. The polynomials are encoded as arrays of coefficients, starting from the lowest degree: so, x^4+x^3+2x+2 is [2 2 0 1 1]. The function takes two polynomials p, q and the modulus k (which should be prime for the algorithm to work property).
Examples:
gcdpolyff([2 2 0 1 1], [1 1 1 1], 3) returns [1 1] meaning 1+x.
gcdpolyff([2 2 0 1 1], [1 1 1 1], 5) returns [1 3 2] meaning 1+3x+2x^2; this disagrees with your answer but I hand-checked and it seems that yours is wrong.
The function first pads arrays to be of the same length. As long as they are not equal, is identifies the higher-degree polynomial and subtracts from it the lower-degree polynomial multiplied by an appropriate power of x. That's all.
function g = gcdpolyff(p, q, k)
p = [p, zeros(1, numel(q)-numel(p))];
q = [q, zeros(1, numel(p)-numel(q))];
while nnz(mod(p-q,k))>0
dp = find(p,1,'last');
dq = find(q,1,'last');
if (dp>=dq)
p(dp-dq+1:dp) = mod(p(1+dp-dq:dp) - q(1:dq), k);
else
q(dq-dp+1:dq) = mod(q(dq-dp+1:dq) - p(1:dp), k);
end
end
g = p(1:find(p,1,'last'));
end
The names of the variables dp and dq are slightly misleading: they are not degrees of p and q, but rather degrees + 1.

Matlab convert columnar data into ndarray

Is there a simple (ideally without multiple for loops) way to group a vector of values according to a set of categories in Matlab?
I have data matrix in the form
CATEG_A CATEG_B CATEG_C ... VALUE
1 1 1 ... 0.64
1 2 1 ... 0.86
1 1 1 ... 0.74
1 1 2 ... 0.56
...
etc.
and what I want is an N-dimensional array
all_VALUE( CATEG_A, CATEG_B, CATEG_C, ..., index ) = VALUE_i
of course there may be any number of values with the same category combination, so size(end) would be the number of value in the biggest category -- and the remaining items would be padded with nan.
Alternatively I'd be happy with
all_VALUE { CATEG_A, CATEG_B, CATEG_C, ... } ( index )
i.e. a cell array of vectors. I suppose it's a bit like creating a pivot table, but with n-dimensions, and not computing the mean.
I found this function in the help
A = accumarray(subs,val,[],#(x) {x})
but I couldn't fathom how to make it do what I wanted!

This is also a mess, but works. It goes the ND-array way.
X = [1 1 1 0.64
1 2 1 0.86
1 1 1 0.74
1 1 2 0.56]; %// data
N = size(X,1); %// number of values
[~, ~, label] = unique(X(:,1:end-1),'rows'); %// unique labels for indices
cumLabel = cumsum(sparse(1:N, label, 1),1); %// used for generating a cumulative count
%// for each label. The trick here is to separate each label in a different column
lastInd = full(cumLabel((1:N).'+(label-1)*N)); %'// pick appropriate values from
%// cumLabel to generate the cumulative count, which will be used as last index
%// for the result array
sizeY = [max(X(:,1:end-1),[],1) max(lastInd)]; %// size of result
Y = NaN(sizeY); %// initiallize result with NaNs
ind = mat2cell([X(:,1:end-1) lastInd], ones(1,N)); %// needed for comma-separated list
Y(sub2ind(sizeY, ind{:})) = X(:,end); %// linear indexing of values into Y
The result in your example is the following 4D array:
>> Y
Y(:,:,1,1) =
0.6400 0.8600
Y(:,:,2,1) =
0.5600 NaN
Y(:,:,1,2) =
0.7400 NaN
Y(:,:,2,2) =
NaN NaN

It's a mess but here is one solution
[U,~,subs] = unique(X(:,1:end-1),'rows');
sz = max(U);
Uc = mat2cell(U, size(U,1), ones(1,size(U,2)));
%// Uc is converted to cell matrices so that we can take advantage of the {:} notation which returns a comma-separated-list which allows us to pass a dynamic number of arguments to functions like sub2ind
I = sub2ind(sz, Uc{:});
G = accumarray(subs, X(:,end),[],#(x){x});
A{prod(max(U))} = []; %// Pre-assign the correct number of cells to A so we can reshape later
A(I) = G;
reshape(A, sz)
On your example data (ignoring the ...s) this returns:
A(:,:,1) =
[2x1 double] [0.8600]
A(:,:,2) =
[0.5600] []
where A(1,1,1) is [0.74; 0.64]

Random binary matrix with two non-trivial constraints

I need to generate a random matrix of K columns and N rows containing ones and zeroes, such that:
a) Each row contains exactly k ones.
b) Each row is different from the other (combinatorics imposes that if N > nchoosek(K, k) there will be nchoosek(K,k) rows).
Assume I want N = 10000 (out of all the possible nchoosek(K, k) = 27405 combinations), different 1×K vectors (with K = 30) containing k (with k = 4) ones and K - k zeroes.
This code:
clear all; close
N=10000; K=30; k=4;
M=randi([0 1],N,K);
plot(sum(M,2)) % condition a) not satisfied
does not satisfy neither a) nor b).
This code:
clear all; close;
N=10000;
NN=N; K=30; k=4;
tempM=zeros(NN,K);
for ii=1:NN
ttmodel=tempM(ii,:);
ttmodel(randsample(K,k,false))=1; %satisfies condition a)
tempM(ii,:)=ttmodel;
end
Check=bi2de(tempM); %from binary to decimal
[tresh1,ind,tresh2] = unique(Check);%drop the vectors that appear more than once in the matrix
M=tempM(ind,:); %and satisfies condition b)
plot(sum(M,2)) %verify that condition a) is satisfied
%Effective draws, Wanted draws, Number of possible combinations to draw from
[sum(sum(M,2)==k) N nchoosek(K,k) ]
satisfies condition a) and partially condition b). I say partially because unless NN>>N the final matrix will contain less than N rows each different from each other.
Is there a better and faster way (that possible avoids the for cycle and the need of having NN>>N) to solve the problem?

First, generate N unique k-long permutations of the positions of ones:
cols = randperm(K, N);
cols = cols(:, 1:k);
Then generate the matching row indices:
rows = meshgrid(1:N, 1:k)';
and finally create the sparse matrix with:
A = sparse(rows, cols, 1, N, K);
To obtain the full form of the matrix, use full(A).
Example
K = 10;
k = 4;
N = 5;
cols = randperm(K, N);
cols = cols(:, 1:k);
rows = meshgrid(1:N, 1:k)';
A = sparse(rows, cols , 1, N, K);
full(A)
The result I got is:
ans =
1 1 0 0 0 0 0 1 0 1
0 0 1 1 0 1 0 0 0 1
0 0 0 1 1 0 1 0 1 0
0 1 0 0 0 0 1 0 1 1
1 1 1 0 0 1 0 0 0 0
This computation should be pretty fast even for large values of K and N. For K = 30, k = 4, N = 10000 the result was obtained in less than 0.01 seconds.

You could use randperm(n) to generate random sequences of integers from 1 to n, and store the nonrepeated sequences as rows in a matrix M until size(unique(M,'rows'),1)==size(M,1). Then you could use M to index a logical matrix with the appropriate number of true values in each row.

If you have enough memory for nchoosek(K,k) integers, build an array of those, use a partial Fisher-Yates shuffle to get a proper uniformly random subset of N of those. Now, given the array of N integers, interpret each as the rank of the combination representing each row of your final array. If you use colexicographical ordering of combinations, computing the combination from a rank is pretty simple (though it uses lots of binomial combination functions, so it pays to have a fast one).
I'm not a Matlab guy, but I've done things similar to this in C. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[] with indices from 0 to n-1 for the rth combination of k out of n elements in colex order. You would interpret these as the positions of the 1s in your row.