How to improve the distance calculation on two separated datasets in matlab? - matlab

How to improve the distance calculation on the 2 separated datasets?
This is the code:
X = [ 3.6 79
1.8 54
3.333 74
2.283 62
4.533 85
2.883 55
4.7 88
3.6 85
1.95 51
4.35 85
1.833 54
3.917 84
4.2 78
1.75 47
4.7 83
2.167 52
1.75 62
4.8 84
1.6 52
4.25 79
1.8 51
1.75 47
3.45 78
3.067 69
4.533 74
3.6 83
1.967 55
4.083 76
3.85 78
4.433 79
4.3 73
4.467 77
3.367 66
4.033 80
3.833 74
2.017 52
1.867 48
4.833 80
1.833 59
4.783 90 ]
clc;
close all;
figure;
h(1) = plot(X(:,1),X(:,2),'bx');
hold on;
X1 = X(1:3,:);
X2 = X(4:40,:);
h(2) = plot(X1(1:3,1), X1(1:3,2),'rs','MarkerSize',10);
k=5;
[D2 ind] = sort(squeeze(sqrt(sum(bsxfun(#minus,X2,permute(X1,[3 2 1])).^2,2))))
ind_closest = ind(1:k,:)
x_closest = X(ind_closest,:)
for j = 1:length(x_closest);
h(3) =plot(x_closest(j,1),x_closest(j,2),'ko','MarkerSize',10);
end
The output is shown as in the picture below:
The problem is, the code does not pick the closest data points of red squared data points. I also tried to use pdist2 function from statistical toolbox,the result yields similar with the bsxfun function that i applied in my code.
I'm not sure which part in the code need to improve so that i can pick the data points that closest to the target.
Really appreciate if anyone can help me to improve my code

If the closest point means closest to X, line 19 & line 20 should be replaced as
[D2 ind] = sort(squeeze(sqrt(sum(bsxfun(#minus,X,permute(X1,[3 2 1])).^2,2))))
ind_closest = ind(2:k+1,:)
If the closest point means closest to X2, then try this:
x_closest = X2(ind_closest,:)
In the meanwhile, I modified your code a little bit, since your h(3) could be optimized.
clc; clear; close all;
%load fisheriris
%X=meas(:,3:4);
load X
X=unique(X,'rows');
figure;
h(1) = plot(X(:,1),X(:,2),'bx');
hold on;
X1 = X([5 15 30],:);
h(2) = plot(X1(:,1), X1(:,2),'rs','MarkerSize',10);
[D2,ind] = sort(squeeze(sqrt(sum(bsxfun(#minus,X,permute(X1,[3 2 1])).^2,2))));
k=3;
ind_closest = unique(ind(2:k+1,:));
x_closest = X(ind_closest,:);
h(3) =plot(x_closest(:,1),x_closest(:,2),'ko','MarkerSize',10);
axis equal
It seems to be working fine.

Related

Matlab - Create a vector using another vector as the limits

Say I have the following columns vector Z
1 53 55 57 60 64 68 70 71 72 74 76 77 78 79 80 255
I want to use it to create a matrix such that each row would contain all the number between (and including) 2 adjacent elements in Z
So the output matrix should be something like this:
1 2 3 .... 53
53 54 55
55 56 57
57 58 60
....
80 81 ... 255
I've been searching for something similar but couldn't find it.
Thanks
See if this works for you -
lens = diff(Z)+1;
mask1 = bsxfun(#le,[1:max(lens)]',lens); %//'
array1 = zeros(size(mask1));
array1(mask1) = sort([1:255 Z(2:end-1)]);
out = array1.'; %//'# out is the desired output
Try this to break the monotony of bsxfun :) :
d = diff(Z);
N = max(d)+1;
R = zeros(length(Z)-1,N);
for i = 1:length(Z)-1
R(i,1:1+d(i)) = Z(i):Z(i+1);
end
EDIT:
I know that the general consensus is that one always should try to avoid loops in Matlab, but is this valid for this example? I know that this is a broad question, so lets focus on this particular problem and compare bsxfun to JIT loop. Comparing the two proposed solutions:
the code used for testing:
Z = [1 53 55 57 60 64 68 70 71 72 74 76 77 78 79 80 255];
%[1 3 4, 6];
nn = round(logspace(1,4,10));
tm1_nn = zeros(length(nn),1);
tm2_nn = zeros(length(nn),1);
for o = 1:length(nn)
tm1 = zeros(nn(o),1);
tm2 = zeros(nn(o),1);
% approach1
for k = 1:nn(o)+1
tic
d = diff(Z);
N = max(d)+1;
R = zeros(length(Z)-1,N);
for i = 1:length(Z)-1
R(i,1:1+d(i)) = Z(i):Z(i+1);
end
tm1(k) = toc;
end
%approach 2
for k = 1:nn(o)+1
tic
lens = diff(Z)+1;
mask1 = bsxfun(#le,[1:max(lens)]',lens); %//'
array1 = zeros(size(mask1));
array1(mask1) = sort([1:255 Z(2:end-1)]);
out = array1.';
tm2(k) = toc;
end
tm1_nn(o) = mean(tm1);%sum(tm1);%mean(tm1);%
tm2_nn(o) = mean(tm2);%sum(tm2);%mean(tm2);%
end
semilogx(nn,tm1_nn, '-ro', nn,tm2_nn, '-bo')
legend('JIT loop', 'bsxfun')
xlabel('log_1_0(Number of runs)')
%ylabel('Sum execution time')
ylabel('Mean execution time')
grid on
I encountered other tasks previously where the loop was faster. (or I mess up the comparison?)

Conditional coloring of histogram graph in MATLAB

I have a histogram that I want conditional coloring in it with this rule :
Values that are upper than 50 have red bars and values lower than 50 have blue bars.
Suppose that we have this input matrix:
X = [32 64 32 12 56 76 65 44 89 87 78 56 96 90 86 95 100 65];
I want default bins of MATLAB and applying this coloring on X-axes (bins). I'm using GUIDE to design my GUI and this histogram is an axes in my GUI.
This is our normal graph. Bars with upper values than 50 should be red and bars with lower values than 50 should be green (X-axes). Bars with upper values than 50 should be red and ?
I think this does what you want (as per comments). The bar around 50 is split into the two colors. This is done by using a patch to change the color of part of that bar.
%// Data:
X = [32 64 32 12 56 76 65 44 89 87 78 56 96 90 86 95 100 65]; %// data values
D = 50; %// where to divide into two colors
%// Histogram plot:
[y n] = hist(X); %// y: values; n: bin centers
ind = n>50; %// bin centers: greater or smaller than D?
bar(n(ind), y(ind), 1, 'r'); %// for greater: use red
hold on %// keep graph, Or use hold(your_axis_handle, 'on')
bar(n(~ind), y(~ind), 1, 'b'); %// for smaller: use blue
[~, nd] = min(abs(n-D)); %// locate bar around D: it needs the two colors
patch([(n(nd-1)+n(nd))/2 D D (n(nd-1)+n(nd))/2], [0 0 y(nd) y(nd)], 'b');
%// take care of that bar with a suitable patch
X = [32 64 32 12 56 76 65 44 89 87 78 56 96 90 86 95 100 65];
then you create an histogram, but you are only going to use this to get the numbers of bins, the numbers of elements and positions:
[N,XX]=hist(X);
close all
and finally here is the code where you use the Number of elements (N) and the position (XX) of the previous hist and color them
figure;
hold on;
width=8;
for i=1:length(N)
h = bar(XX(i), N(i),8);
if XX(i)>50
col = 'r';
else
col = 'b';
end
set(h, 'FaceColor', col)
end
here you can consider using more than one if and then you can set multiple colors
cheers
First sort X:
X = [32 64 32 12 56 76 65 44 89 87 78 56 96 90 86 95 100 65];
sorted_X = sort(X)
sorted_X :
sorted_X =
Columns 1 through 14
12 32 32 44 56 56 64 65 65 76 78 86 87 89
Columns 15 through 18
90 95 96 100
Then split the data based on 50:
idx1 = find(sorted_X<=50,1,'last');
A = sorted_X(1:idx1);
B = sorted_X(idx1+1:end);
Display it as two different histograms.
hist(A);
hold on;
hist(B);
h = findobj(gca,’Type’,’patch’);
display(h)
set(h(1),’FaceColor’,’g’,’EdgeColor’,’k’);
set(h(2),’FaceColor’,’r’,’EdgeColor’,’k’);

Contour plot coloured by clustering of points matlab

I have two vectors which are paired values
size(X)=1e4 x 1; size(Y)=1e4 x 1
Is it possible to plot a contour plot of some sort making the contours by the highest density of points? Ie highest clustering=red, and then gradient colour elsewhere?
If you need more clarification please ask.
Regards,
EXAMPLE DATA:
X=[53 58 62 56 72 63 65 57 52 56 52 70 54 54 59 58 71 66 55 56];
Y=[40 33 35 37 33 36 32 36 35 33 41 35 37 31 40 41 34 33 34 37 ];
scatter(X,Y,'ro');
Thank you for everyone's help. Also remembered we can use hist3:
x={0:0.38/4:0.38}; % # How many bins in x direction
y={0:0.65/7:0.65}; % # How many bins in y direction
ncount=hist3([X Y],'Edges',[x y]);
pcolor(ncount./sum(sum(ncount)));
colorbar
Anyone know why edges in hist3 have to be cells?
This is basically a question about estimating the probability density function generating your data and then visualizing it in a good and meaningful way I'd say. To that end, I would recommend using a more smooth estimate than the histogram, for instance Parzen windowing (a generalization of the histogram method).
In my code below, I have used your example dataset, and estimated the probability density in a grid set up by the range of your data. You here have 3 variables you need to adjust to use on your original data; Borders, Sigma and stepSize.
Border = 5;
Sigma = 5;
stepSize = 1;
X=[53 58 62 56 72 63 65 57 52 56 52 70 54 54 59 58 71 66 55 56];
Y=[40 33 35 37 33 36 32 36 35 33 41 35 37 31 40 41 34 33 34 37 ];
D = [X' Y'];
N = length(X);
Xrange = [min(X)-Border max(X)+Border];
Yrange = [min(Y)-Border max(Y)+Border];
%Setup coordinate grid
[XX YY] = meshgrid(Xrange(1):stepSize:Xrange(2), Yrange(1):stepSize:Yrange(2));
YY = flipud(YY);
%Parzen parameters and function handle
pf1 = #(C1,C2) (1/N)*(1/((2*pi)*Sigma^2)).*...
exp(-( (C1(1)-C2(1))^2+ (C1(2)-C2(2))^2)/(2*Sigma^2));
PPDF1 = zeros(size(XX));
%Populate coordinate surface
[R C] = size(PPDF1);
NN = length(D);
for c=1:C
for r=1:R
for d=1:N
PPDF1(r,c) = PPDF1(r,c) + ...
pf1([XX(1,c) YY(r,1)],[D(d,1) D(d,2)]);
end
end
end
%Normalize data
m1 = max(PPDF1(:));
PPDF1 = PPDF1 / m1;
%Set up visualization
set(0,'defaulttextinterpreter','latex','DefaultAxesFontSize',20)
fig = figure(1);clf
stem3(D(:,1),D(:,2),zeros(N,1),'b.');
hold on;
%Add PDF estimates to figure
s1 = surfc(XX,YY,PPDF1);shading interp;alpha(s1,'color');
sub1=gca;
view(2)
axis([Xrange(1) Xrange(2) Yrange(1) Yrange(2)])
Note, this visualization is actually 3-dimensional:
See this 4 minute video on the mathworks site:
http://blogs.mathworks.com/videos/2010/01/22/advanced-making-a-2d-or-3d-histogram-to-visualize-data-density/
I believe this should provide very close to exactly the functionality you require.
I would divide the area the plot covers into a grid and then count the number of points in each square of the grid. Here's an example of how that could be done.
% Get random data with high density
X=randn(1e4,1);
Y=randn(1e4,1);
Xmin=min(X);
Xmax=max(X);
Ymin=min(Y);
Ymax=max(Y);
% guess of grid size, could be divided into nx and ny
n=floor((length(X))^0.25);
% Create x and y-axis
x=linspace(Xmin,Xmax,n);
y=linspace(Ymin,Ymax,n);
dx=x(2)-x(1);
dy=y(2)-y(1);
griddata=zeros(n);
for i=1:length(X)
% Calculate which bin the point is positioned in
indexX=floor((X(i)-Xmin)/dx)+1;
indexY=floor((Y(i)-Ymin)/dy)+1;
griddata(indexX,indexY)=griddata(indexX,indexY)+1;
end
contourf(x,y,griddata)
Edit: The video in the answer by Marm0t uses the same technique but probably explains it in a better way.

Remove the minimum values per each column of a Matrix

If I had a matrix A such as:
63 55 85 21 71
80 65 85 48 53
55 60 93 71 66
21 65 40 33 21
61 90 80 48 50
... and so on how would I find the minimum values of each column and remove those numbers from the matrix completely, meaning essentially I would have one less row overall.
I though about using:
[C,I] = min(A);
A(I) = [];
but that wouldn't remove the necessary numbers, and also reshape would not work either. I would like for this to work with an arbitrary number of rows and columns.
A = [
63 55 85 21 71
80 65 85 48 53
55 60 93 71 66
21 65 40 33 21
61 90 80 48 50
];
B = zeros( size(A,1)-1, size(A,2));
for i=1:size(A,2)
x = A(:,i);
maxIndex = find(x==min(x(:)),1,'first');
x(maxIndex) = [];
B(:,i) = x;
end
disp(B);
Another vectorized solution:
M = mat2cell(A,5,ones(1,size(A,2)));
z = cellfun(#RemoveMin,M);
B = cell2mat(z);
disp(B);
function x = RemoveMin(x)
minIndex = find(x==min(x(:)),1,'first');
x(minIndex) = [];
x = {x};
end
Another solution:
[~,I] = min(A);
indexes = sub2ind(size(A),I,1:size(A,2));
B = A;
B(indexes) = [];
out = reshape(B,size(A)-[1 0]);
disp(out);
Personally I prefer the first because:
For loops aren't evil - many times they are actually faster (By using JIT optimizer)
The algorithm is clearer to the developer who reads your code.
But of course, its up to you.
Your original approach works if you convert the row indices resulting from min into linear indices:
[m, n] = size(A);
[~, row] = min(A,[],1);
A(row + (0:n-1)*m) = [];
A = reshape(A, m-1, n);

MATLAB: create new matrix from existing matrix according to specifications

Assume we have the following data:
H_T = [36 66 21 65 52 67 73; 31 23 19 33 36 39 42]
P = [40 38 39 40 35 32 37]
Using MATLAB 7.0, I want to create three new matrices that have the following properties:
The matrix H (the first part in matrix H_T) will be divided to 3 intervals:
Matrix 1: the 1st interval contains the H values between 20 to 40
Matrix 2: the 2nd interval contains the H values between 40 to 60
Matrix 3: the 3rd interval contains the H values between 60 to 80
The important thing is that the corresponding T and P will also be included in their new matrices meaning that H will control the new matrices depending on the specifications defined above.
So, the resultant matrices will be:
H_T_1 = [36 21; 31 19]
P_1 = [40 39]
H_T_2 = [52; 36]
P_2 = [35]
H_T_3 = [66 65 67 73; 23 33 39 42]
P_3 = [38 40 32 37]
Actually, this is a simple example and it is easy by looking to create the new matrices depending on the specifications, BUT in my values I have thousands of numbers which makes it very difficult to do that.
Here's a quick solution
[~,bins] = histc(H_T(1,:), [20 40 60 80]);
outHT = cell(3,1);
outP = cell(3,1);
for i=1:3
idx = (bins == i);
outHT{i} = H_T(:,idx);
outP{i} = P(idx);
end
then you access the matrices as:
>> outHT{3}
ans =
66 65 67 73
23 33 39 42
>> outP{3}
ans =
38 40 32 37