I am super new to matlab. I want to implement the KNN algorithm. I tried to read the fitcknn classifier but I can't get it.
I have matrix x that has 4 input vectors (each vector has 3 features)
1 2 3
5 19 20
1 2 4
8 19 21
I want to get out an output matrix Y that gives me the nearest neighbors (in order) for each vector of the input matrix.
For example: y in this case will be
3 2 4
4 3 1
1 2 4
2 3 1
Explanation: the first row of matrix Y shows that the closest vectors to vector 1 are: vector 3 then vector 2 then vector 4.
Is there a library to do this classification (using the cosine distance as a similarity function)?
Thanks.
n = size(x,1);
dist = squareform(pdist(x,'cosine')); %// distance matrix
dist(1:n+1:end) = inf; %// self-distance doesn't count
[~, y] = sort(dist,2);
y = y(:,1:n-1);
To save memory, you can work in chunks using pdist2 instead of pdist:
n = size(x,1);
m = 100; %// chunk size. As large as memory allows. Divisor of n
y = NaN(n,n-1); %// pre-allocate results
for ii = 0:m:size(x,1)-1
ind = ii+(1:m); %// current chunk: these rows
dist_chunk = pdist2(x(ind,:),x,'cosine'); %// results for this chunk
[~, y_chunk] = sort(dist_chunk,2);
y(ind,:) = y_chunk(:,2:end); %// fill results, except self-distance
end
Related
Is there a way to do the following?
I would like to turn a MATLAB array:
>> (1:10)'
ans =
1
2
3
4
5
6
7
8
9
10
Into the following sequential matrix (I am not sure what is the name for this):
ans =
NaN NaN NaN 1
NaN NaN NaN 2
NaN NaN NaN 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
I am able to do this, with the following function, but it iterates over each row and it is slow:
function V = vec2stack(X, num_sequences)
n = numel(X);
len_out = n - num_sequences + 1;
V = zeros(len_out,num_sequences);
for kk = 1:len_out
V(kk,:) = X(kk:kk +num_sequences - 1)';
end
V = [nan(num_sequences,num_sequences); V(1:end-1,:)];
end
X = 1:10;
AA = vec2stack(X,3)
Running the above function for a vector of length 1,000,000 takes about 1 second, which is not fast enough for my purposes,
tic;
Lag = vec2stack(1:1000000,5);
toc;
Elapsed time is 1.217854 seconds.
You can use repmat() to repeat the X vector horizontally. Then, notice that each column is one more than the previous. You can add a row-vector that has the same number of columns as the matrix, and Matlab will broadcast the vector onto the entire matrix and do the addition for you.
In older versions of Matlab, you might need to explicitly repmat() the row vector to get the shapes to match.
function V = vec2stack(X, num_sequences)
% Repeat X, 1 time vertically, num_seq times horizontally
% X(:) ensures it's a column vector so we have the correct shape
mat = repmat(X(:), 1, num_sequences);
% make a row vector from 0:num_sequences-1
vec = 0:(num_sequences-1);
% or explicitly repmat on vec if you need to:
% vec = repmat(0:(num_sequences-1), numel(X), 1);
% Add the two. Matlab broadcasts the row vector onto the matrix
% Because they have the same number of columns
mat = mat + vec;
% Build the return matrix
V = [nan(num_sequences, num_sequences); mat];
end
X = (1:10)';
AA = vec2stack(X,3)
% You can easily add the last column as another column
Testing speedon Octave Online:
%% Loopy version
tic;
Lag = vec2stack(1:1000000,5);
toc;
Elapsed time is 17.4092 seconds
%% Vectorized version
tic;
Lag = vec2stack((1:1000000)',5);
toc;
Elapsed time is 0.110762 seconds.
~150x speedup. Pretty cool!
I am trying to index (not get) the diagonals of a matrix in matlab.
Say I have a matrix "M", that is n by n. Then I want to obtain all indeces of all possible diagonals in the matrix "M".
I know that the center diagonal is indexed by
M(1:(n+1):end)
and all the following diagonals above it are indexed as:
M((1+1*n):(n+1):end)
M((1+2*n):(n+1):end)...
M((1+n*n):(n+1):end)
Now I also want to get the diagonals below. I cannot for the life of me figure out how to however.
Reproducible example:
rng(1); % set seed
n = 4;
M = rand(n);
yielding
M =
0.562408 0.947364 0.655088 0.181702
0.960604 0.268834 0.469042 0.089167
0.578719 0.657845 0.516215 0.419000
0.226410 0.601666 0.169212 0.378740
where I would like to index the lower diagonals, e.g. the subdiagonal:
0.960604 0.657845 0.169212
That is, I don't need to get the diagonal by e.g. the diags function, but access the index (since I ultimately want to replace the matrix entries diagonal by diagonal).
As you already noted, you can use the diag function to get the main diagonal and other diagonals above or below the main diagonals,
M = magic(4) % Test data
M =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
diag(M, -1)
ans =
5
7
15
but you can not assign values to the diagonal with the diag function:
diag(M, -1) = [3; 2; 1]
Index in position 2 is invalid. Array indices must be positive integers or logical values.
Instead, we can use logical indexing by indexing the array M with a logical matrix of the same size. We can easily create this matrix using the diag function, by creating a diagonal matrix with ones on the specified diagonal:
diag(ones(1, 3), -1)
ans =
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
To use this matrix for logical indexing, we need to convert it from double to logical with the logical function.
M(logical(diag(ones(1, 3), -1)))
ans =
5
7
15
or assign new values to it with
M(logical(diag(ones(1, 3), -1))) = [99, 98, 97]
M =
16 2 3 13
99 11 10 8
9 98 6 12
4 14 97 1
There is a slightly more performant way of using diag to get indices to a diagonal:
n = 5; % matrix size
M = reshape(1:n*n,n,n); % matrix with linear indices
indices = diag(M, ii); % indices to diagonal ii
However, it is much easier to just compute the right indices directly. As discovered by OP, the upper diagonal elements are given by:
indices = (1+ii*n):(n+1):(n*n);
(Note that the parenthesis are not necessary, as the colon operator has the lowest precedence.)
The lower diagonal elements are given by:
indices = (1+ii):(n+1):((n-ii)*n);
Both series are identical for the main diagonal, where ii=0.
We can verify correctness of these calculations by using the first method:
n = 5; % matrix size
M = reshape(1:n*n,n,n); % matrix with linear indices
for ii=1:n-1
indices = (1+ii*n):(n+1):(n*n);
assert(isequal(indices, diag(M, ii).'))
indices = (1+ii):(n+1):((n-ii)*n);
assert(isequal(indices, diag(M, -ii).'))
end
I have two (or more but if solved for two, it's solved for any number) 2-by-N matrices which represent points with an x (the first row) and y (the second row) coordinates. The points are always sorted in the increasing x coordinate. What I want to do is I want to merge these two matrices into one 3-by-N matrix so that if two points (one from each matrix) have the same x coordinate, they would form one column in the new matrix, the first row being the x coordinate and the second and third row being the two y coordinates. However, if there is a point in one matrix that has x coordinate different than all other points in the second matrix, I still want to have full 3-element column that is placed such that the x coordinates are still sorted and the missing value from the other matrix is replaced by the nearest value with lower x coordinate (or NaN if there is none).
Better to explain by example.
First matrix:
1 3 5 7 % x coordinate
1 2 3 4 % y coordinate
Second matrix:
2 3 4 7 8 % x coordinate
5 6 7 8 9 % y coordinate
Desired result:
1 2 3 4 5 7 8 % x coordinate
1 1 2 2 3 4 4 % y coordinate from first matrix
NaN 5 6 7 7 8 9 % y coordinate from second matrix
My question is, how can I do it effectively in matlab/octave and numpy? (Effectively because I can always do it "manually" with loops but that doesn't seem right.)
You can do it with interp1 and the keyword 'previous' for strategy (you can also choose 'nearest' if you do not care if it is larger or smaller) and 'extrap' for allowing extrapolation.
Define the matrices
a=[...
1 3 5 7;...
1 2 3 4];
b=[...
2 3 4 7 8;...
5 6 7 8 9];
Then find the interpolation points
x = unique([a(1,:),b(1,:)]);
And interpolate
[x ; interp1(a(1,:),a(2,:),x,'previous','extrap') ; interp1(b(1,:),b(2,:),x,'previous','extrap') ]
Timeit results:
I tested the algorithms on
n = 1e6;
a = cumsum(randi(3,2,n),2);
b = cumsum(randi(2,2,n),2);
and got:
Wolfie: 1.7473 s
Flawr: 0.4927 s
Mine: 0.2757 s
This verions uses set operations:
a=[...
1 3 5 7;...
1 2 3 4];
b=[...
2 3 4 7 8;...
5 6 7 8 9];
% compute union of x coordinates
c = union(a(1,:),b(1,:));
% find indices of x of a and b coordinates in c
[~,~,ia] = intersect(a(1,:),c);
[~,~,ib] = intersect(b(1,:),c);
% create output matrix
d = NaN(3,numel(c));
d(1,:) = c;
d(2,ia) = a(2,:);
d(3,ib) = b(2,:);
% fill NaNs
m = isnan(d);
m(:,1) = false;
i = find(m(:,[2:end,1])); %if you have multiple consecutive nans you have to repeat these two steps
d(m) = d(i);
disp(d);
Try it online!
Your example:
a = [1 3 5 7; 1 2 3 4];
b = [2 3 4 7 8; 5 6 7 8 9];
% Get the combined (unique, sorted) `x` coordinates
output(1,:) = unique([a(1,:), b(1,:)]);
% Initialise y values to NaN
output(2:3, :) = NaN;
% Add x coords from `a` and `b`
output(2, ismember(output(1,:),a(1,:))) = a(2,:);
output(3, ismember(output(1,:),b(1,:))) = b(2,:);
% Replace NaNs in columns `2:end` with the previous value.
% A simple loop has the advantage of capturing multiple consecutive NaNs.
for ii = 2:size(output,2)
colNaN = isnan(output(:, ii));
output(colNaN, ii) = output(colNaN, ii-1);
end
If you have more than 2 matrices (as suggested in your question) then I'd advise
Store them in a cell array, and loop over them to do the calls to ismember, instead of having one code line per matrix hardcoded.
The NaN replacement loop is already vectorised for any number of rows.
This is the generic solution for any number of matrices, demonstrated with a and b:
mats = {a, b};
cmats = horzcat(mats);
output(1, :) = unique(cmats(1,:));
output(2:numel(mats)+1, :) = NaN;
for ii = 1:size(mats)
output(ii+1, ismember(output(1,:), mats{ii}(1,:))) = mats{ii}(2,:);
end
for ii = 2:size(output,2)
colNaN = isnan(output(:,ii));
output(colNaN, ii) = output(colNaN, ii-1);
end
How to covert vector A to symmetric matrix M in MATLAB
Such that M is a symmetric matrix (i.e. A21=A12) and all diagonal terms are equal (i.e. A11=A22=A33=A44).
Use hankel to help you create the symmetric matrix, then when you're finished, set the diagonal entries of this intermediate result to be the first element of the vector in A:
M = hankel(A,A(end:-1:1));
M(eye(numel(A))==1) = A(1);
Example
>> A = [1;2;3;4]
A =
1
2
3
4
>> M = hankel(A,A(end:-1:1));
>> M(eye(numel(A))==1) = A(1)
M =
1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1
As you can see, M(i,j) = M(j,i) except for the diagonal, where each element is equal to A(1).
Suppose I have matrix, where each cell of this matrix describes a location (e.g. a bin of a histogram) in a two dimensional space. Lets say, some of these cells contain a '1' and some a '2', indicating where object number 1 and 2 are located, respectively.
I now want to find those cells that describe the "touching points" between the two objects. How do I do that efficiently?
Here is a naive solution:
X = locations of object number 1 (x,y)
Y = locations of object number 2 (x,y)
distances = pdist2(X,Y,'cityblock');
Locations (x,y) and (u,v) touch, iff the respective entry in distances is 1. I believe that should work, however does not seem very clever and efficient.
Does anyone have a better solution? :)
Thank you!
Use morphological operations.
Let M be your matrix with zeros (no object) ones and twos indicating the locations of different objects.
M1 = M == 1; % create a logical mask of the first object
M2 = M == 2; % logical mask of second object
dM1 = imdilate( M1, [0 1 0; 1 1 1; 0 1 0] ); % "expand" the mask to the neighboring pixels
[touchesY touchesX] =...
find( dM1 & M2 ); % locations where the expansion of first object overlap with second one
Code
%%// Label matrix
L = [
0 0 2 0 0;
2 2 2 1 1;
2 2 1 1 0
0 1 1 1 1]
[X_row,X_col] = find(L==1);
[Y_row,Y_col] = find(L==2);
X = [X_row X_col];
Y = [Y_row Y_col];
%%// You code works till this point to get X and Y
%%// Peform subtractions so that later on could be used to detect
%%// where Y has any index that touches X
%%// Subtract all Y from all X. This can be done by getting one
%%//of them and in this case Y into the third dimension and then subtracting
%%// from all X using bsxfun. The output would be used to index into Y.
Y_touch = abs(bsxfun(#minus,X,permute(Y,[3 2 1])));
%%// Perform similar subtractions, but this time subtracting all X from Y
%%// by putting X into the third dimension. The idea this time is to index
%%// into X.
X_touch = abs(bsxfun(#minus,Y,permute(X,[3 2 1]))); %%// for X too
%%// Find all touching indices for X, which would be [1 1] from X_touch.
%%// Thus, their row-sum would be 2, which can then detected and using `all`
%%// command. The output from that can be "squeezed" into a 2D matrix using
%%// `squeeze` command and then the touching indices would be any `ones`
%%// columnwise.
ind_X = any(squeeze(all(X_touch==1,2)),1)
%%// Similarly for Y
ind_Y = any(squeeze(all(Y_touch==1,2)),1)
%%// Get the touching locations for X and Y
touching_loc = [X(ind_X,:) ; Y(ind_Y,:)]
%%// To verify, let us make the touching indices 10
L(sub2ind(size(L),touching_loc(:,1),touching_loc(:,2)))=10
Output
L =
0 0 2 0 0
2 2 2 1 1
2 2 1 1 0
0 1 1 1 1
L =
0 0 10 0 0
2 10 10 10 1
10 10 10 10 0
0 10 10 1 1