Faster way to group data in the same quartile range - matlab

Consider a column of a 10 x 10 matrix K, say K(:,1)
I would like to create a 10x4 binary matrix which tells us which quarter range the row entry belongs to. For example
ith row of binary matirx : [ 1 0 0 0 ] => K(i,1)<prctile(K(:,1),25)
My code:
%%%
K = randi(10,10);
BINMAT = zeros(size(K,1),4);
y_1 = prctile(K(:,1),25) ;
ID_1 = find(K(:,1) < y_1);
BINMAT(ID_1,1)=1;
y_2 = prctile(K(:,1),50);
ID_2 = find(( K(:,1) > y_1 & K(:,1) < y_2 ));
BINMAT(ID_2,2)=1;
y_3 = prctile(K(:,1),75);
ID_3 = find(( K(:,1) > y_2 & K(:,1) < y_3 ));
BINMAT(ID_3,3)=1;
y_4 = prctile(K(:,1),100);
ID_4 = find((K(:,1) > y_3 & K(:,1) < y_4 ));
BINMAT(ID_4,4)=1;
%%%
If I have to do this not just for one column but for a set of columns, say A = [ 1 2 5 6], and BINMAT should have 16 columns (4 for each column of K) .Is there a faster way to do this?

You can use a for loop that iterates over the desired column indexes given by A:
K = randi(10,10);
A = [1 2 5 6]; % columns in K to process
BINMAT = zeros(size(K,1), 4*length(A));
cnt = 0; % helper
for col_indx = A
y_1 = prctile(K(:,col_indx),25) ;
ID_1 = find(K(:,col_indx) < y_1);
BINMAT(ID_1, 4*cnt + 1) = 1;
y_2 = prctile(K(:,col_indx),50);
ID_2 = find(( K(:,col_indx) > y_1 & K(:,col_indx) < y_2 ));
BINMAT(ID_2, 4*cnt + 2)=1;
y_3 = prctile(K(:,col_indx),75);
ID_3 = find(( K(:,col_indx) > y_2 & K(:,col_indx) < y_3 ));
BINMAT(ID_3, 4*cnt + 3)=1;
y_4 = prctile(K(:,col_indx),100);
ID_4 = find((K(:,col_indx) > y_3 & K(:,col_indx) < y_4 ));
BINMAT(ID_4, 4*cnt + 4)=1;
cnt = cnt + 1;
end
I have noticed that many of the rows of BINMAT contain only zeros because the code you posted does not take values equal to y_1, y_2, y_3 and y_4 into account. I think you should use K(:,col_indx) >= y_1 ... and so on.

Another suggestion:
K = randi(10,10)
p = 25:25:100;
Y = prctile(K, p);
Y = [zeros(1, size(Y, 2)) ;Y];
BINMAT = zeros(size(K, 1), length(p), size(K, 2));
for j = 1:size(K, 2)
for i = 1:length(p)
BINMAT(Y(i, j) <= K(:,j) & K(:, j) <= Y(i+1, j), i, j) = 1;
end
end
Then, BINMAT(:, :, i) is the binary matrix, as you defined it, for K(:, i).

Percentile is, at its heart, the position of an element in the sorted list. So using sort directly will provide the most efficient solution, since you want multiple percentiles out of multiple columns.
First we need a way to assign fixed bins to the sorted positions. Here's the vector that I think prctile uses, but since 10 doesn't split evenly into 4 bins, it's somewhat arbitrary. (in other words, do you assign element 3 to the 0-25% bin or the 25%-50% bin)? floor(4*(0.5+(0:9).')/10)+1
Now we just need to sort each column, and assign the sort position of each original element to one of those positions. The second output of sort does most of the work:
K = randi(10,10);
A = [1 2 5 6]; % columns in K to process
BINMAT = zeros(size(K,1), 4*length(A));
bins = floor(4*(0.5+(0:9).')/10)+1;
[sortedK, idx] = sort(K(:,A));
% The k'th element of idx belongs to the c(k) bin. So now generate the output.
% We need to offset to the correct block of BINMAT for each column
offset_bins = bsxfun(#plus, bins, 4*(0:length(A)-1));
BINMAT(sub2ind(size(BINMAT), idx, offset_bins)) = 1;

Related

How can I do vectorization for this matlab "for loop"?

I have some matlab code as follow, constructing KNN similarity weight matrix.
[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
D = D < threshold;
W = zeros(n, n);
for i=1:size(I,2)
W(I(:,i), i) = D(:,i);
W(i, I(:,i)) = D(:,i)';
end
I want to vectorize the for loop. I have tried
W(I) = D;
but failed to get the correct value.
I add test case here:
n = 5;
D = [
1 1 1 1 1
0 1 1 1 1
0 0 0 0 0
];
I = [
1 2 3 4 5
5 4 5 2 3
3 1 1 1 1
];
There are some undefined variables that makes it hard to check what it is doing, but this should do the same as your for loop:
D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
D = D < threshold;
W = zeros(n);
% set the diagonal values
W(sub2ind(size(X), I(1, :), I(1, :))) = D(1,:);
% set the other values
W(sub2ind(size(W), I(2, :), 1:size(I, 2))) = D(2, :);
W(sub2ind(size(W), 1:size(I, 2), I(2, :))) = D(2, :).';
I splited the directions, it works now with your test case.
A possible solution:
idx1 = reshape(1:n*n,n,n).';
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
Here assumed that you repeatedly want to compute D and I so compute idx only one time and use it repeatedly.
n = 5;
idx1 = reshape(1:n*n,n,n).';
%for k = 1 : 1000
%[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
%D = D < threshold;
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
%end
But if n isn't constant and it varies in each iteration it is better to change the way idx1 is computed:
n = 5;
%for k = 1 : 1000
%n = randi([2 10]);%n isn't constant
%[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
%D = D < threshold;
idx1 = bsxfun(#plus,(0:n:n^2-1).',1:size(I,2));
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
%end
You can cut some corners with linear indices but if your matrices are big then you should only take the nonzero components of D. Following copies all values of D
W = zeros(n);
W(reshape(sub2ind([n,n],I,[1;1;1]*[1:n]),1,[])) = reshape(D,1,[]);

ind2sub for nonzero elements of triangular matrix

I just wanted to simply find the index of (row, col) that is a minimum point of a matrix A. I can use
[minval, imin] = min( A(:) )
and MATLAB built in function
[irow, icol] = ind2sub(imin);
But for efficiency reason, where matrix A is trigonal, i wanted to implement the following function
function [i1, i2] = myind2ind(ii, N);
k = 1;
for i = 1:N
for j = i+1:N
I(k, 1) = i; I(k, 2) = j;
k = k + 1;
end
end
i1 = I(ii, 1);
i2 = I(ii, 2);
this function returns 8 and 31 for the following input
[irow, icol] = myind2ind(212, 31); % irow=8, icol = 31
How can I implement myind2ind function more efficient way without using the internal "I"?
The I matrix can be generated by nchoosek.
For example if N = 5 we have:
N =5
I= nchoosek(1:N,2)
ans =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
so that
4 repeated 1 times
3 repeated 2 times
2 repeated 3 times
1 repeated 4 times
We can get the number of rows of I with the Gauss formula for triangular number
(N-1) * (N-1+1) /2 =
N * (N -1) / 2 =
10
Given jj = size(I,1) + 1 - ii as a row index I that begins from the end of I and using N * (N -1) / 2 we can formulate a quadratic equation:
N * (N -1) / 2 = jj
(N^2 -N)/2 =jj
So
N^2 -N - 2*jj = 0
Its root is:
r = (1+sqrt(8*jj))/2
r can be rounded and subtracted from N to get the first element (row number of triangular matrix) of the desired output.
R = N + 1 -floor(r);
For the column number we find the index of the first element idx_first of the current row R:
idx_first=(floor(r+1) .* floor(r)) /2;
The column number can be found by subtracting current linear index from the linear index of the first element of the current row and adding R to it.
Here is the implemented function:
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end

How to oppositely order two vectors in Matlab?

I have the code below for oppositely ordering two vectors. It works, but I want to specify the line
B_diff(i) = B(i) - B(i+1);
to hold true not just for only
B_diff(i) = B(i) - B(i+1); but for
B_diff(i) = B(i) - B(i+k); where k can be any integer less than or equal to n. The same applies to "A". Any clues as to how I can achieve this in the program?
For example, I want to rearrange the first column of the matrix
A =
1 4
6 9
3 8
4 2
such that, the condition should hold true not only for
(a11-a12)(a21-a22)<=0;
but also for all
(a11-a13)(a21-a23)<=0;
(a11-a14)(a21-a24)<=0;
(a12-a13)(a22-a23)<=0;
(a12-a14)(a22-a24)<=0; and
(a13-a14)(a23-a24)<=0;
## MATLAB CODE ##
A = xlsread('column 1');
B = xlsread('column 2');
n = numel(A);
B_diff = zeros(n-1,1); %Vector to contain the differences between the elements of B
count_pos = 0; %To count the number of positive entries in B_diff
for i = 1:n-1
B_diff(i) = B(i) - B(i+1);
if B_diff(i) > 0
count_pos = count_pos + 1;
end
end
A_desc = sort(A,'descend'); %Sort the vector A in descending order
if count_pos > 0 %If B_diff contains positive entries, divide A_desc into two vectors
A_less = A_desc(count_pos+1:n);
A_great = sort(A_desc(1:count_pos),'ascend');
A_new = zeros(n,1); %To contain the sorted elements of A
else
A_new = A_desc; %This is then the sorted elements of A
end
if count_pos > 0
A_new(1) = A_less(1);
j = 2; %To keep track of the index for A_less
k = 1; %To keep track of the index for A_great
for i = 1:n-1
if B_diff(i) <= 0
A_new(i+1) = A_less(j);
j = j + 1;
else
A_new(i+1) = A_great(k);
k = k + 1;
end
end
end
A_diff = zeros(n-1,1);
for i = 1:n-1
A_diff(i) = A_new(i) - A_new(i+1);
end
diff = [A_diff B_diff]
prod = A_diff.*B_diff
The following code orders the first column of A opposite to the order of the second column.
A= [1 4; 6 9; 3 8; 4 2]; % sample matrix
[~,ix]=sort(A(:,2)); % ix is the sorting permutation of A(:,2)
inverse=zeros(size(ix));
inverse(ix) = numel(ix):-1:1; % the un-sorting permutation, reversed
B = sort(A(:,1)); % sort the first column
A(:,1)=B(inverse); % permute the first column according to inverse
Result:
A =
4 4
1 9
3 8
6 2

Multiplying a vector by random numbers while keeping the sum the same (MATLAB)

I'm trying to multiply (element wise) a vector V of length N by a randomly generated number in the range (a,b), while keeping the sum of the vector equal to a total amount, E. I want to do this in MATLAB, but I'm not sure how. Getting random numbers between a certain range I know how to do:
minrand = 0;
maxrand = 1;
randfac = (maxrand-minrand).*rand(1,N) + minrand;
But yeah, beyond that I'm pretty clueless. I guess the random numbers can't really be generated like this, because if we call the random numbers the vector R, then I want that
R_1*V1 + R_2*V2 .... + R_N*V_N = E. So I guess it's a big equation. Is there any way to solve it, while putting constraints on the max and min values of R?
You can pick pairs of two elements (in all combinations) and add and subtract an equal random number.
% Make up a random vector
N=10;
randfac = 10*rand(1,N);
%OP Answer here: Given randfac with sum E re-randomize it
E = sum(randfac);
minrand = 0;
maxrand = 2;
disp(randfac)
% v = [6.4685 2.9652 6.6567 1.6153 7.3581 0.0237 7.1025
% 3.2381 1.9176 1.3561]
disp(sum(randfac))
% E = 38.7019
r = minrand + (maxrand-minrand)*rand(N*N,1);
k = 1;
for i=1:N
for j=1:N
randfac(i) = randfac(i)-r(k);
randfac(j) = randfac(j)+r(k);
k = k + 1;
end
end
disp(randfac)
% v = [5.4905 0.7051 4.7646 1.3479 9.3722 -1.4222 7.9275
% 7.5777 1.7549 1.1836]
disp(sum(randfac))
% E = 38.7019
Just divide the vector with the sum and multiply with the target E.
randfac = (maxrand-minrand).*rand(1,N) + minrand;
randfac = E*randfac/sum(randfac);
as long as the operator is linear, the result is going to retain it's randomness. Below is some sample code:
minrand = 0;
maxrand = 1;
N = 1000; %size
v = (maxrand-minrand).*rand(1,N) + minrand;
E = 100; %Target sum
A = sum(v);
randfac = (E/A)*v;
disp(sum(randfac))
% 100.0000
First of all with random numbers in the interval of [a b] you can't guarantee that you will have the same summation (same E). For example if [a b]=[1 2] of course the E will increase.
Here is an idea, I don't know how random is this!
For even N I randomize V then divide it in two rows and multiply one of them with random numbers in [a b] but the second column will be multiplied to a vector to hold the summation fixed.
N = 10;
V = randi(100,[1 N]);
E = sum(V);
idx = randperm(N);
Vr = V(idx);
[~,ridx] = sort(idx);
Vr = reshape(Vr,[2 N/2]);
a = 1;
b = 3;
r1 = (b - a).*rand(1,N/2) + a;
r2 = (sum(Vr) - r1.*Vr(1,:))./Vr(2,:);
r = reshape([r1;r2],1,[]);
r = r(ridx);
Enew = sum(V.*r);
The example results are,
V = [12 82 25 51 81 51 31 87 6 74];
r = [2.8018 0.7363 1.9281 0.5451 1.9387 -0.4909 1.3076 0.8904 2.9236 0.8440];
with E = 500 as well as Enew.
I'm simply assigning one random number to a pair (It can be considered as half random!).
Okay, I have found a way to somewhat do this, but it is not elegant and there are probably better solutions. Starting with an initial vector e, for which sum(e) = E, I can randomize its values and end up with an e for which sum(e) is in the range [(1-threshold)E,(1+thresholdE)]. It is computationally expensive, and not pretty.
The idea is to first multiply e by random numbers in a certain range. Then, I will check what the sum is. If it is too big, I will decrease the value of the random numbers smaller than half of the range until the sum is no longer too big. If it is too small, I do the converse, and iterate until the sum is within the desired range.
e = somepredefinedvector
minrand = 0;
maxrand = 2;
randfac = (maxrand-minrand).*rand(1,N) + minrand;
e = randfac.*e;
threshold = 0.001;
while sum(e) < (1-threshold)*E || sum(e) > (1+threshold)*E
if sum(e) > (1+threshold)*E
for j = 1:N
if randfac(j) > (maxrand-minrand)/2
e(j) = e(j)/randfac(j);
randfac(j) = ((maxrand-minrand)/2-minrand).*rand(1,1) + minrand;
e(j) = randfac(j)*e(j);
end
if sum(e) > (1-threshold)*E && sum(e) < (1+threshold)*E
break
end
end
elseif sum(e) < (1-threshold)*E
for j = 1:N
if randfac(j) < (maxrand-minrand)/2
e(j) = e(j)/randfac(j);
randfac(j) = (maxrand-(maxrand-minrand)/2).*rand(1,1) + (maxrand-minrand)/2;
e(j) = randfac(j)*e(j);
end
if sum(e) > (1-threshold)*E && sum(e) < (1+threshold)*E
break
end
end
end
end

Subtracting each elements of a row vector , size (1 x n) from a matrix of size (m x n)

I have two matrices of big sizes, which are something similar to the following matrices.
m; with size 1000 by 10
n; with size 1 by 10.
I would like to subtract each element of n from all elements of m to get ten different matrices, each has size of 1000 by 10.
I started as follows
clc;clear;
nrow = 10000;
ncol = 10;
t = length(n)
for i = 1:nrow;
for j = 1:ncol;
for t = 1:length(n);
m1(i,j) = m(i,j)-n(1);
m2(i,j) = m(i,j)-n(2);
m3(i,j) = m(i,j)-n(3);
m4(i,j) = m(i,j)-n(4);
m5(i,j) = m(i,j)-n(5);
m6(i,j) = m(i,j)-n(6);
m7(i,j) = m(i,j)-n(7);
m8(i,j) = m(i,j)-n(8);
m9(i,j) = m(i,j)-n(9);
m10(i,j) = m(i,j)-n(10);
end
end
end
can any one help me how can I do it without writing the ten equations inside the loop? Or can suggest me any convenient way especially when the two matrices has many columns.
Why can't you just do this:
m01 = m - n(1);
...
m10 = m - n(10);
What do you need the loop for?
Even better:
N = length(n);
m2 = cell(N, 1);
for k = 1:N
m2{k} = m - n(k);
end
Here we go loopless:
nrow = 10000;
ncol = 10;
%example data
m = ones(nrow,ncol);
n = 1:ncol;
M = repmat(m,1,1,ncol);
N = permute( repmat(n,nrow,1,ncol) , [1 3 2] );
result = bsxfun(#minus, M, N );
%or just
result = M-N;
Elapsed time is 0.018499 seconds.
or as recommended by Luis Mendo:
M = repmat(m,1,1,ncol);
result = bsxfun(#minus, m, permute(n, [1 3 2]) );
Elapsed time is 0.000094 seconds.
please make sure that your input vectors have the same orientation like in my example, otherwise you could get in trouble. You should be able to obtain that by transposements or you have to modify this line:
permute( repmat(n,nrow,1,ncol) , [1 3 2] )
according to your needs.
You mentioned in a comment that you want to count the negative elements in each of the obtained columns:
A = result; %backup results
A(A > 0) = 0; %set non-negative elements to zero
D = sum( logical(A),3 );
which will return the desired 10000x10 matrix with quantities of negative elements. (Please verify it, I may got a little confused with the dimensions ;))
Create the three dimensional result matrix. Store your results, for example, in third dimension.
clc;clear;
nrow = 10000;
ncol = 10;
N = length(n);
resultMatrix = zeros(nrow, ncol, N);
neg = zeros(ncol, N); % amount of negative values
for j = 1:ncol
for i = 1:nrow
for t = 1:N
resultMatrix(i,j,t) = m(i,j) - n(t);
end
end
for t = 1:N
neg(j,t) = length( find(resultMatrix(:,j,t) < 0) );
end
end