Sample specific rows in matlab - matlab

I have a scenario in which there is a Label matrix of dimension N x 1.
The example entries in the label matrix is given below
Label = [1; 3; 5; ....... 6]
I would like to random sample 'm1' records of label1, 'm2' records of label2 etc. so that the output LabelIndicatorMatrix (N x 1 dimension) look something like
LabelIndicatorMatrix = [1; 1; 0;.....1]
1 represent record has been chosen, 0 represent record not chosen during sampling. The output matrix satisfies the following condition
Sum(LabelIndicatorMatrix) = m1+m2...m6

One possible solution:
Label = randi([1 6], [100 1]); %# random Nx1 vector of labels
m = [2 3 1 0 1 2]; %# number of records to sample from each category
LabelIndicatorMatrix = false(size(Label)); %# marks selected records
uniqL = unique(Label); %# unique labels: 1,2,3,4,5,6
for i=1:numel(uniqL)
idx = find(Label == uniqL(i)); %# indices where label==k
ord = randperm(length(idx)); %# random permutation
ord = ord(1:min(m(i),end)); %# pick first m_k
LabelIndicatorMatrix( idx(ord) ) = true; %# mark them as selected
end
To make sure we satisfy the requirements, we check:
>> sum(LabelIndicatorMatrix) == sum(m)
ans =
1
Here is my attempt at a vectorized solution:
Label = randi([1 6], [100 1]); %# random Nx1 vector of labels
m = [2 3 1 0 1 2]; %# number of records to sample from each category
%# some helper functions
firstN = #(V,n) V(1:min(n,end)); %# first n elements from vector
pickN = #(V,n) firstN(V(randperm(length(V))), n); %# pick n elements from vector
%# randomly sample labels, and get indices
idx = bsxfun(#eq, Label, unique(Label)'); %'# idx(:,k) indicates where label==k
[r c] = find(idx); %# row/column indices
idx = arrayfun(#(k) pickN(r(c==k),m(k)), 1:size(idx,2), ...
'UniformOutput',false); %# sample m(k) from labels==k
%# mark selected records
LabelIndicatorMatrix = false(size(Label));
LabelIndicatorMatrix( vertcat(idx{:}) ) = true;
%# check results are correct
assert( sum(LabelIndicatorMatrix)==sum(m) )

you could start with this little sample of code, it selects random samples of your label vector and find which values of your label vector have been selected at least once:
Label = [1; 3; 5; ....... 6];
index = randi(N,m1,1);
index = unique(index);
LabelIndicatorMatrix = zeros(N,1);
LabelIndicatorMatrix(index)=1;
That said I am not sure I understand the final condition on the LabelIndicatorMatrix.

Related

Read data rows one by one without specifying a range

Can I read each record of a dataset without specifying a range, i.e. not specifying for 1=1:n?
For example :
A = [4 2;
2 4;
2 3;
3 6;
4 4];
I want to read/get rows from A one by one, A(1,:) to A(5,:), and stop reading when the last record is found: A(5,:).
Thanks.
So you don't want to specify some maximum length?
To get the number of rows in a MATLAB matrix, you can use any of these methods:
n = size(A, 1); % Size in dimension 1 (rows)
% or
n = length(A); % Length of largest array dimension, so needs rows > columns
% or
n = numel(A(:,1)); % Gets number of elements (numel) in column 1 of A
Then loop like so
for k = 1:size(A,1)
temp = A(k, :); % Do something with row k
end

Matlab: How to read data into a matrix

I have a data file matrix.txt, it has three columns. The first column stores the row index, the second column stores the column index, the third column stores the value. How do I read these into a matrix called mat. To be explicit, suppose our mat is a n*n square matrix, let n=2 for instance. In the text file, it has:
0 0 10
1 1 -10
The element in mat not specified is 0. Thus mat is supposed to be:
mat = 10 0
0 -10
How do I achieve this?
This should work for the generic 2-D case.
% Read in matrix specification
fID = fopen('matrix.txt');
tmp = fscanf(fID, '%u%u%f', [3 inf])';
fclose(fID);
% Use the maximum row and column subscripts to obtain the matrix size
tmp(:, 1:2) = tmp(:, 1:2) + 1; % MATLAB doesn't use 0-based indexing
matsize = [max(tmp(:,1)), max(tmp(:,2))];
% Convert subscripts to linear indices
lidx = sub2ind(matsize, tmp(:,1), tmp(:,2));
mat = zeros(matsize); % Initialize matrix
mat(lidx) = tmp(:,3); % Assign data
Using a sample matrix.txt:
0 0 10
1 1 -10
1 2 20
We receive:
>> mat
mat =
10 0 0
0 -10 20
Since in MATLAB, indices begin with 1 (not zero), we should add 1 to our indices in code.
r and c stand for row and column.
Alsom and n is for m by n zero matrix
A = importdata('matrix.txt');
r = A(:, 1)';
c = A(:, 2)';
m = max(r);
n = max(c);
B = zeros(m + 1, n + 1);
for k = 1:size(A,1);
B(r(k) + 1, c(k) + 1) = A(k, 3);
end
Result:
B =
10 0
0 -10
I see I am too slow, but I decided post my answer anyway...
I initialized matrix A as a vector, and used reshape:
%Load all file to matrix at once
%You may consider using fopen and fscanf, in case Matrix.txt is not ordered perfectly.
row_column_val = load('Matrix.txt', '-ascii');
R = row_column_val(:, 1) + 1; %Get vector of row indexes (add 1 - convert to Matalb indeces).
C = row_column_val(:, 2) + 1; %Get vector of column indexes (add 1 - convert to Matalb indeces).
V = row_column_val(:, 3); %Get vector of values.
nrows = max(R); %Number of rows in matrix.
ncols = max(C); %Number of columns in matrix.
A = zeros(nrows*ncols, 1); %Initialize A as a vector instead of a matrix (length of A is nrows*ncols).
%Put value v in place c*ncols + r for all elements of V, C and R.
%The formula is used for column major matrix (Matlab stored matrices in column major format).
A((C-1)*nrows + R) = V;
A = reshape(A, [nrows, ncols]);

How to do this kind of sorting (in MATLAB)?

I have a square matrix and I want to get a 2 by n matrix that will have indices of matrix in a sorted order. For example, I want to get from this matrix
0 0 0
1 0 0
2 3 0
something like this
[3 2; 3 1; 2 1] ....
(3,2) being the indices of the biggest element in the matrix, (3,1) the second biggest and so on. It would be good if it could ignore zeros (or NaN-s instead of zeros).
Additional information about the matrix: it is positive, but not necessarily 3 by 3, diagonal elements and every element above the diagonal is either 0 or NaN (a side question, which is processed faster, NaNs or 0s?)
This considers only non-zero elements:
[ii, jj, aa] = find(A);
[~, kk] = sort(aa, 'descend');
result = [ii(kk) jj(kk)];
Assuming your matrix is in A, you need to use the ind2sub function,
Edited to remove zero indices
[Ap, i] = sort(A(:), 'descend');
[r,c] = ind2sub(size(A), i);
orderedPairs = [r,c];
orderedPairsSansZeros = orderedPairs(Ap ~= 0, :);
This following should work. The matrix sortix is what you're looking for. I've replaced a zero in your (1,3) element with NaN so you can see that NaN's don't show up in your final ordered matrix.
matrix = [0, 0, NaN;
1, 3, 0;
2, 3, 0];
new_matrix = matrix;
%new_matrix(new_matrix(:)==0) = NaN; % uncomment to get rid of zeros
saveix = 1;
for i=1:length(matrix(:))
[maxVal, maxIndex] = max(new_matrix(:));
allMax = ismember(new_matrix, maxVal);
idx = find(allMax);
for ix=1:length(idx)
[sortix(saveix, 1), sortix(saveix, 2)] = ind2sub(size(matrix), ...
idx(ix));
saveix = saveix + 1;
end
new_matrix(idx) = NaN;
end

How to index a matrix with the column maxima of other matrix

I have 2 matrices A and B.
I find the max values in the columns of A, and keep their indices in I. So far so good.
Now, I need to choose those arrays of B with the same index as stored in I. I don't know how to do this.
See below:
A = [1,2,3; 0,8,9]
B = [0,1,2; 4,2,3]
[~,I] = max(A)
h = B(I)
I need to get these values of B:
h = [0 2 3]
But the code results in a different one. How can I fix it?
A =
1 2 3
0 8 9
B =
0 1 2
4 2 3
I =
1 2 2
h =
0 4 4
Thanks in advance
The max function how you used it works like
If A is a matrix, then max(A) is a row vector containing the maximum value of each column.
so M = max(A) is equivalent to M = max(A,[],1). But rather use the third input if you're not sure.
If you use max to find the maxima in the columns of the matrix, it returns the row indices. The column indices are for your case simply 1:size(A,2) = [1 2 3].
Now you need to convert your row and column indices to linear indices with sub2ind:
%// data
A = [1,2,3; 0,8,9]
B = [0,1,2; 4,2,3]
%// find maxima of each column in A
[~, I] = max( A, [], 1 ) %// returns row indices
%// get linear indices for both, row indices and column indices
I = sub2ind( size(A), I, 1:size(A,2) )
%// index B
h = B(I)
returns:
h =
0 2 3

matlab remove for loop in matrix computation

I'm working on a problem on Matlab according to Matrix. I think my code could be improved by remove the for loop. But I really don't know how to fix this one. Can anyone help me, please?
the code is:
K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
for i = 1:K
ids = (idx == i);
centroids(i,:) = sum(bsxfun(#times, X, ids))./ sum(ids);
end
in this code, data points X is 4x2. There are K=3 centroids, so the centroids is a matrix of 3x2. This code is part of a K-mean function, which is using data points and their closest centroids to find new position of centroids.
I want to make the code as something without the FOR loop, maybe beginning like this:
ids = bsxfun(#eq, idx, 1:K);
centroids = ..............
You can avoid the bsxfun by using logical indexing, this seems to be a worthwhile performance increase, at least for small matrices X. It is best for small K, and for a small number of rows of X.
K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
centroids=zeros(K,2);
for i = 1:K
ids = (idx == i);
centroids(i,:) = sum(X(ids,:),1)./sum(ids);
end
If X has a large number of rows, this method is fastest:
K = 3;
X = [1 2; 3 4; 5 6; 7 8];
idx = [1;2;3;1];
centroids=zeros(K,2);
t=bsxfun(#eq,idx,1:K);
centroids=bsxfun(#rdivide,t.'*X,sum(t).');
And if K is very large, Luis' accumarray method is fastest.
You could apply accumarray. Note that accumarray only works when X is a column. So, if X has two columns, you can call accumarray twice:
centroids(:,1) = accumarray(idx, X(:,1), [], #mean)
centroids(:,2) = accumarray(idx, X(:,2), [], #mean)
Alternatively, if X contains two columns of real numbers, you can use complex to "pack" the two columns into one complex column, and then unpack the results:
centroids = accumarray(idx, complex(X(:,1),X(:,2)), [], #mean);
centroids = [ real(centroids) imag(centroids)];
If X has an arbitrary number of columns, possibly with complex numbers, you can loop over columns:
centroids = NaN(K, size(X,2)); %// preallocate
for col = 1:size(X,2);
centroids(:,col) = accumarray(idx, X(:,col), [], #mean);
end