How to detect multiple instances of symbols in image using matlab - matlab

I have tried to use the code provided in this answer to detect symbols using template matching with the FFT (via fft2)
However, the code only detects one symbol but it doesn't detect all similar symbols.
The code has been adapted from the linked post and is shown below.
template=im2bw(imread(http://www.clipartkid.com/images/543/floor-plan-symb-aT0MYg-clipart.png));
background=im2bw(imread(http://www.the-house-plans-guide.com/images/blueprints/draw-floor-plan-step-6.png));
bx = size(background, 2);
by = size(background, 1);
tx = size(template, 2); % used for bbox placement
ty = size(template, 1);
pos=[];
%// Change - Compute the cross power spectrum
Ga = fft2(background);
Gb = fft2(template, by, bx);
c = real(ifft2((Ga.*conj(Gb))./abs(Ga.*conj(Gb))));
%% find peak correlation
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = find(c == max(c(:))); % Added to make code work
if ~isempty(ypeak) || ~isempty(xpeak)
pos=position;
plot(xpeak,ypeak,'x','LineWidth',1,'Color','g');
rectangle('position',position,'edgecolor','b','linewidth',1, 'LineStyle', '- ');
end
How may I use the above code to detect multiple symbols as opposed to just one?

Amitay is correct in his assessment. BTW, the code that you took comes from the following post: Matlab Template Matching Using FFT.
The code is only designed to detect one match from the template you specify. If you wish to detect multiple templates, there are various methodologies you can try each with their own advantages and disadvantages:
Use a global threshold and from the cross power spectrum, any values that surpass this threshold deem that there is a match.
Find the largest similarity in the cross power spectrum, and anything that is some distance away from this maximum would be deemed that there is a match. Perhaps a percentage away, or one standard deviation away may work.
Try to make a histogram of the unique values in the cross power spectrum and find the point where there is a clear separation between values that are clearly uncorrelated with the template and values that are correlated. I won't implement this for you here because it requires that we look at your image then find the threshold by examining the histogram so I won't do that for you. Instead you can try the first two cases and see where that goes.
You will have to loop over multiple matches should they arise, so you'll need to loop over the code that draws the rectangles in the image.
Case #1
The first case is very simple. All you have to do is modify the find statement so that instead of searching for the location with the maximum, simply find locations that exceed the threshold.
Therefore:
%% find peak correlation
thresh = 0.1; % For example
[ypeak, xpeak] = find(c >= thresh);
Case #2
This is very similar to the first case but instead of finding values that exceed the threshold, determine what the largest similarity value is (already done), and threshold anything that is above (1 - x)*max_val where x is a value between 0 and 1 and denotes the percentage you'd like away from the maximum value to be considered as match. Therefore, if you wanted at most 5% away from the maximum, x = 0.05 and so the threshold now becomes 0.95*max_val. Similarly for the standard deviation, just find what it is using the std function and ensuring that you convert it into one single vector so that you can compute the value for the entire image, then the threshold becomes max_val - std_val where std_val is the standard deviation of the similarity values.
Therefore, do something like this for the percentage comparison:
%% find peak correlation
x = 0.05; % For example
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = find(c >= (1-x)*max_c);
... and do this for the standard deviation comparison:
std_dev = std(abs(c(:)));
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = find(c >= (max_c - std_dev));
Once you finally establish this, you'll see that there are multiple matches. It's now a point of drawing all of the detected templates on top of the image. Using the post that you "borrowed" the code from, the code to draw the detected templates can be modified to draw multiple templates.
You can do that below:
%% display best matches
tx = size(template, 2);
ty = size(template, 1);
hFig = figure;
hAx = axes;
imshow(background, 'Parent', hAx);
hold on;
for ii = 1 : numel(xpeak)
position = [xpeak(ii), ypeak(ii), tx, ty]; % Draw match on figure
imrect(hAx, position);
end

Related

Fitting largest circle in free area in image with distributed particle

I am working on images to detect and fit the largest possible circle in any of the free areas of an image containing distributed particles:
(able to detect the location of particle).
One direction is to define a circle touching any 3-point combination, checking if the circle is empty, then finding the largest circle among all empty circles. However, it leads to a huge number of combination i.e. C(n,3), where n is the total number of particles in the image.
I would appreciate if anyone can provide me any hint or alternate method that I can explore.
Lets do some maths my friend, as maths will always get to the end!
Wikipedia:
In mathematics, a Voronoi diagram is a partitioning of a plane into
regions based on distance to points in a specific subset of the plane.
For example:
rng(1)
x=rand(1,100)*5;
y=rand(1,100)*5;
voronoi(x,y);
The nice thing about this diagram is that if you notice, all the edges/vertices of those blue areas are all to equal distance to the points around them. Thus, if we know the location of the vertices, and compute the distances to the closest points, then we can choose the vertex with highest distance as our center of the circle.
Interestingly, the edges of a Voronoi regions are also defined as the circumcenters of the triangles generated by a Delaunay triangulation.
So if we compute the Delaunay triangulation of the area, and their circumcenters
dt=delaunayTriangulation([x;y].');
cc=circumcenter(dt); %voronoi edges
And compute the distances between the circumcenters and any of the points that define each triangle:
for ii=1:size(cc,1)
if cc(ii,1)>0 && cc(ii,1)<5 && cc(ii,2)>0 && cc(ii,2)<5
point=dt.Points(dt.ConnectivityList(ii,1),:); %the first one, or any other (they are the same distance)
distance(ii)=sqrt((cc(ii,1)-point(1)).^2+(cc(ii,2)-point(2)).^2);
end
end
Then we have the center (cc) and radius (distance) of all possible circles that have no point inside them. We just need the biggest one!
[r,ind]=max(distance); %Tada!
Now lets plot
hold on
ang=0:0.01:2*pi;
xp=r*cos(ang);
yp=r*sin(ang);
point=cc(ind,:);
voronoi(x,y)
triplot(dt,'color','r','linestyle',':')
plot(point(1)+xp,point(2)+yp,'k');
plot(point(1),point(2),'g.','markersize',20);
Notice how the center of the circle is on one vertex of the Voronoi diagram.
NOTE: this will find the center inside [0-5],[0-5]. you can easily modify it to change this constrain. You can also try to find the circle that fits on its entirety inside the interested area (as opposed to just the center). This would require a small addition in the end where the maximum is obtained.
I'd like to propose another solution based on a grid search with refinement. It's not as advanced as Ander's or as short as rahnema1's, but it should be very easy to follow and understand. Also, it runs quite fast.
The algorithm contains several stages:
We generate an evenly-spaced grid.
We find the minimal distances of points in the grid to all provided points.
We discard all points whose distances are below a certain percentile (e.g. 95th).
We choose the region which contains the largest distance (this should contain the correct center if my initial grid is fine enough).
We create a new meshgrid around the chosen region and find distances again (this part is clearly sub-optimal, because the distances are computed to all points, including far and irrelevant ones).
We iterate the refinement within the region, while keeping an eye on the variance of the top 5% of values -> if it drops below some preset threshold we break.
Several notes:
I have made the assumption that circles cannot go beyond the scattered points' extent (i.e. the bounding square of the scatter acts as an "invisible wall").
The appropriate percentile depends on how fine the initial grid is. This will also affect the amount of while iterations, and the optimal initial value for cnt.
function [xBest,yBest,R] = q42806059
rng(1)
x=rand(1,100)*5;
y=rand(1,100)*5;
%% Find the approximate region(s) where there exists a point farthest from all the rest:
xExtent = linspace(min(x),max(x),numel(x));
yExtent = linspace(min(y),max(y),numel(y)).';
% Create a grid:
[XX,YY] = meshgrid(xExtent,yExtent);
% Compute pairwise distance from grid points to free points:
D = reshape(min(pdist2([XX(:),YY(:)],[x(:),y(:)]),[],2),size(XX));
% Intermediate plot:
% figure(); plot(x,y,'.k'); hold on; contour(XX,YY,D); axis square; grid on;
% Remove irrelevant candidates:
D(D<prctile(D(:),95)) = NaN;
D(D > xExtent | D > yExtent | D > yExtent(end)-yExtent | D > xExtent(end)-xExtent) = NaN;
%% Keep only the region with the largest distance
L = bwlabel(~isnan(D));
[~,I] = max(table2array(regionprops('table',L,D,'MaxIntensity')));
D(L~=I) = NaN;
% surf(XX,YY,D,'EdgeColor','interp','FaceColor','interp');
%% Iterate until sufficient precision:
xExtent = xExtent(~isnan(min(D,[],1,'omitnan')));
yExtent = yExtent(~isnan(min(D,[],2,'omitnan')));
cnt = 1; % increase or decrease according to the nature of the problem
while true
% Same ideas as above, so no explanations:
xExtent = linspace(xExtent(1),xExtent(end),20);
yExtent = linspace(yExtent(1),yExtent(end),20).';
[XX,YY] = meshgrid(xExtent,yExtent);
D = reshape(min(pdist2([XX(:),YY(:)],[x(:),y(:)]),[],2),size(XX));
D(D<prctile(D(:),95)) = NaN;
I = find(D == max(D(:)));
xBest = XX(I);
yBest = YY(I);
if nanvar(D(:)) < 1E-10 || cnt == 10
R = D(I);
break
end
xExtent = (1+[-1 +1]*10^-cnt)*xBest;
yExtent = (1+[-1 +1]*10^-cnt)*yBest;
cnt = cnt+1;
end
% Finally:
% rectangle('Position',[xBest-R,yBest-R,2*R,2*R],'Curvature',[1 1],'EdgeColor','r');
The result I'm getting for Ander's example data is [x,y,r] = [0.7832, 2.0694, 0.7815] (which is the same). The execution time is about half of Ander's solution.
Here are the intermediate plots:
Contour of the largest (clear) distance from a point to the set of all provided points:
After considering distance from the boundary, keeping only the top 5% of distant points, and considering only the region which contains the largest distance (the piece of surface represents the kept values):
And finally:
You can use bwdist from Image Processing Toolbox to compute the distance transform of the image. This can be regarded as a method to create voronoi diagram that well explained in #AnderBiguri's answer.
img = imread('AbmxL.jpg');
%convert the image to a binary image
points = img(:,:,3)<200;
%compute the distance transform of the binary image
dist = bwdist(points);
%find the circle that has maximum radius
radius = max(dist(:));
%find position of the circle
[x y] = find(dist == radius);
imshow(dist,[]);
hold on
plot(y,x,'ro');
The fact that this problem can be solved using a "direct search" (as can be seen in another answer) means one can look at this as a global optimization problem. There exist various ways to solve such problems, each appropriate for certain scenarios. Out of my personal curiosity I have decided to solve this using a genetic algorithm.
Generally speaking, such an algorithm requires us to think of the solution as a set of "genes" subject to "evolution" under a certain "fitness function". As it happens, it's quite easy to identify the genes and the fitness function in this problem:
Genes: x , y, r.
Fitness function: technically, maximum area of circle, but this is equivalent to the maximum r (or minimum -r, since the algorithm requires a function to minimize).
Special constraint - if r is larger than the euclidean distance to the closest of the provided points (that is, the circle contains a point), the organism "dies".
Below is a basic implementation of such an algorithm ("basic" because it's completely unoptimized, and there is lot of room for optimizationno pun intended in this problem).
function [x,y,r] = q42806059b(cloudOfPoints)
% Problem setup
if nargin == 0
rng(1)
cloudOfPoints = rand(100,2)*5; % equivalent to Ander's initialization.
end
%{
figure(); plot(cloudOfPoints(:,1),cloudOfPoints(:,2),'.w'); hold on; axis square;
set(gca,'Color','k'); plot(0.7832,2.0694,'ro'); plot(0.7832,2.0694,'r*');
%}
nVariables = 3;
options = optimoptions(#ga,'UseVectorized',true,'CreationFcn',#gacreationuniform,...
'PopulationSize',1000);
S = max(cloudOfPoints,[],1); L = min(cloudOfPoints,[],1); % Find geometric bounds:
% In R2017a: use [S,L] = bounds(cloudOfPoints,1);
% Here we also define distance-from-boundary constraints.
g = ga(#(g)vectorized_fitness(g,cloudOfPoints,[L;S]), nVariables,...
[],[], [],[], [L 0],[S min(S-L)], [], options);
x = g(1); y = g(2); r = g(3);
%{
plot(x,y,'ro'); plot(x,y,'r*');
rectangle('Position',[x-r,y-r,2*r,2*r],'Curvature',[1 1],'EdgeColor','r');
%}
function f = vectorized_fitness(genes,pts,extent)
% genes = [x,y,r]
% extent = [Xmin Ymin; Xmax Ymax]
% f, the fitness, is the largest radius.
f = min(pdist2(genes(:,1:2), pts, 'euclidean'), [], 2);
% Instant death if circle contains a point:
f( f < genes(:,3) ) = Inf;
% Instant death if circle is too close to boundary:
f( any( genes(:,3) > genes(:,1:2) - extent(1,:) | ...
genes(:,3) > extent(2,:) - genes(:,1:2), 2) ) = Inf;
% Note: this condition may possibly be specified using the A,b inputs of ga().
f(isfinite(f)) = -genes(isfinite(f),3);
%DEBUG:
%{
scatter(genes(:,1),genes(:,2),10 ,[0, .447, .741] ,'o'); % All
z = ~isfinite(f); scatter(genes(z,1),genes(z,2),30,'r','x'); % Killed
z = isfinite(f); scatter(genes(z,1),genes(z,2),30,'g','h'); % Surviving
[~,I] = sort(f); scatter(genes(I(1:5),1),genes(I(1:5),2),30,'y','p'); % Elite
%}
And here's a "time-lapse" plot of 47 generations of a typical run:
(Where blue points are the current generation, red crosses are "insta-killed" organisms, green hexagrams are the "non-insta-killed" organisms, and the red circle marks the destination).
I'm not used to image processing, so it's just an Idea:
Implement something like a gaussian filter (blur) which transforms each particle (pixels) to a round gradiant with r=image_size (all of them overlapping). This way, you should get a picture where the most white pixels should be the best results. Unfortunately, the demonstration in gimp failed because the extreme blurring made the dots disappearing.
Alternatively, you could incrementelly extend all existing pixels by marking all neighbour pixels in an area (example: r=4), the pixels left would be the same result (those with the biggest distance to any pixel)

Matlab Template Matching Using FFT

I am struggling with template matching in the Fourier domain in Matlab. Here are my images (the artist is RamalamaCreatures on DeviantArt):
My aim is to place a bounding box around the ear of the possum, like this example (where I performed template matching using normxcorr2):
Here is the Matlab code I am using:
clear all; close all;
template = rgb2gray(imread('possum_ear.jpg'));
background = rgb2gray(imread('possum.jpg'));
%% calculate padding
bx = size(background, 2);
by = size(background, 1);
tx = size(template, 2); % used for bbox placement
ty = size(template, 1);
%% fft
c = real(ifft2(fft2(background) .* fft2(template, by, bx)));
%% find peak correlation
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = find(c == max(c(:)));
figure; surf(c), shading flat; % plot correlation
%% display best match
hFig = figure;
hAx = axes;
position = [xpeak(1)-tx, ypeak(1)-ty, tx, ty];
imshow(background, 'Parent', hAx);
imrect(hAx, position);
The code is not functioning as intended - it is not identifying the correct region. This is the failed result - the wrong area is boxed:
This is the surface plot of the correlations for the failed match:
Hope you can help! Thanks.
What you're doing in your code is actually not correlation at all. You are using the template and performing convolution with the input image. If you recall from the Fourier Transform, the multiplication of the spectra of two signals is equivalent to the convolution of the two signals in time/spatial domain.
Basically, what you are doing is that you are using the template as a kernel and using that to filter the image. You are then finding the maximum response of this output and that's what is deemed to be where the template is. Where the response is being boxed makes sense because that region is entirely white, and using the template as the kernel with a region that is entirely white will give you a very large response, which is why it most likely identified that area to be the maximum response. Specifically, the region will have a lot of high values (~255 or so), and naturally performing convolution with the template patch and this region will give you a very large output due to the operation being a weighted sum. As such, if you used the template in a dark area of the image, the output would be small - which is false because the template is also consisting of dark pixels.
However, you can certainly use the Fourier Transform to locate where the template is, but I would recommend you use Phase Correlation instead. Basically, instead of computing the multiplication of the two spectra, you compute the cross power spectrum instead. The cross power spectrum R between two signals in the frequency domain is defined as:
Source: Wikipedia
Ga and Gb are the original image and the template in frequency domain, and the * is the conjugate. The o is what is known as the Hadamard product or element-wise product. I'd also like to point out that the division of the numerator and denominator of this fraction is also element-wise. Using the cross power spectrum, if you find the (x,y) location here that produces the absolute maximum response, this is where the template should be located in the background image.
As such, you simply need to change the line of code that computes the "correlation" so that it computes the cross power spectrum instead. However, I'd like to point out something very important. When you perform normxcorr2, the correlation starts right at the top-left corner of the image. The template matching starts at this location and it gets compared with a window that is the size of the template where the top-left corner is the origin. When finding the location of the template match, the location is with respect to the top-left corner of the matched window. Once you compute normxcorr2, you traditionally add the half of the rows and half of the columns of the maximum response to find the centre location.
Because we are more or less doing the same operations for template matching (sliding windows, correlation, etc.) with the FFT / frequency domain, when you finish finding the peak in this correlation array, you must also take this into account. However, your call to imrect to draw a rectangle around where the template matches takes in the top left corner of a bounding box anyway, so there's no need to do the offset here. As such, we're going to modify that code slightly but keep the offset logic in mind when using this code for later if want to find the centre location of the match.
I've modified your code as well to read in the images directly from StackOverflow so that it's reproducible:
clear all; close all;
template = rgb2gray(imread('http://i.stack.imgur.com/6bTzT.jpg'));
background = rgb2gray(imread('http://i.stack.imgur.com/FXEy7.jpg'));
%% calculate padding
bx = size(background, 2);
by = size(background, 1);
tx = size(template, 2); % used for bbox placement
ty = size(template, 1);
%% fft
%c = real(ifft2(fft2(background) .* fft2(template, by, bx)));
%// Change - Compute the cross power spectrum
Ga = fft2(background);
Gb = fft2(template, by, bx);
c = real(ifft2((Ga.*conj(Gb))./abs(Ga.*conj(Gb))));
%% find peak correlation
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = find(c == max(c(:)));
figure; surf(c), shading flat; % plot correlation
%% display best match
hFig = figure;
hAx = axes;
%// New - no need to offset the coordinates anymore
%// xpeak and ypeak are already the top left corner of the matched window
position = [xpeak(1), ypeak(1), tx, ty];
imshow(background, 'Parent', hAx);
imrect(hAx, position);
With that, I get the following image:
I also get the following when showing a surface plot of the cross power spectrum:
There is a clear defined peak where the rest of the output has a very small response. That's actually a property of Phase Correlation and so obviously, the location of the maximum value is clearly defined and this is where the template is located.
Hope this helps!
Just ended up implementing the same with python with similar ideas as #rayryeng's using scipy.fftpack.fftn() / ifftn() functions with the following result on the same target and template images:
import numpy as np
import scipy.fftpack as fp
from skimage.io import imread
from skimage.color import rgb2gray, gray2rgb
import matplotlib.pylab as plt
from skimage.draw import rectangle_perimeter
im = 255*rgb2gray(imread('http://i.stack.imgur.com/FXEy7.jpg')) # target
im_tm = 255*rgb2gray(imread('http://i.stack.imgur.com/6bTzT.jpg')) # template
# FFT
F = fp.fftn(im)
F_tm = fp.fftn(im_tm, shape=im.shape)
# compute the best match location
F_cc = F * np.conj(F_tm)
c = (fp.ifftn(F_cc/np.abs(F_cc))).real
i, j = np.unravel_index(c.argmax(), c.shape)
print(i, j)
# 214 317
# draw rectangle around the best match location
im2 = (gray2rgb(im)).astype(np.uint8)
rr, cc = rectangle_perimeter((i,j), end=(i + im_tm.shape[0], j + im_tm.shape[1]), shape=im.shape)
for x in range(-2,2):
for y in range(-2,2):
im2[rr + x, cc + y] = (255,0,0)
# show the output image
plt.figure(figsize=(10,10))
plt.imshow(im2)
plt.axis('off')
plt.show()
Also, the below animation shows the result obtained while locating a bird's template image inside a set of (target) frames extracted from a video with a flock of birds.
One thing to note: the output is very much dependent on the similarity of the size and shape of the object that is to be matched with the template, if it's quite different from that of the template image, the template may not be matched at all.

Logic of this FWHM script?

Could someone explain the logic of this program.
I dont understand why the y=y/max(y)
and,
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
The script:
function width = fwhm(x,y)
y = y / max(y);
N = length(y);
MicroscopeMag=10;
PixelWidth=7.8; % Pixel Pitch is 7.8 Microns.
%------- find index of center (max or min) of pulse---------------%
[~,centerindex] = max(y);% 479 S10 find center peak and coordinate
%------- find index of center (max or min) of pulse-----------------%
i = 2;
while sign(y(i)-0.5) == sign(y(i-1)-0.5) %trying to see the curve raise
i = i+1; %474 S10
end %first crossing is between v(i-1) & v(i)
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
i=centerindex+1; %471
%------- start search for next crossing at center--------------------%
while ((sign(y(i)-0.5) == sign(y(i-1)-0.5)) && (i <= N-1))
i = i+1;
end
if i ~= N
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
ttrail = x(i-1) + interp*(x(i)-x(i-1));
%width = ttrail - tlead; % FWHM
width=((ttrail - tlead)/MicroscopeMag)*PixelWidth;
% Lateral Magnification x Pixel pitch of 7.8 microns.
end
Thanks.
The two segments of code you specifically mention are both housekeeping: it's more about the compsci of it than the optics.
So the first line
y = y/max(y);
is normalising it to 1, i.e. dividing the whole series through by the maximum value. This is a fairly common practice and it's sensible to do it here, it saves the programmer from having to divide through by it later.
The next part,
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
and the corresponding block later on for ttrail, are about trying to interpolate the exact point(s) where the signal's value would be 0.5. Earlier it identifies the centre of the peak and the last index position before half-maximum, so now we have a range containing the leading edge of the signal.
The 'half-maximum' criterion requires us to find the point where that leading edge's value is 0.5 (we normalised to 1, so the half-maximum is by definition 0.5). The data probably won't have a sample at exactly that value - it'll go [... 0.4856 0.5024 ...] or something similar.
So these two lines are an attempt to determine in fractions of an index exactly where the line would cross the 0.5 value. It does this by simple linear interpolation:
y(i)-y(i-1)
gives us the delta_y between the two values either side, and
0.5-y(i-1)
gives us the shortfall. By taking the ratio we can linearly interpolate how far between the two index positions we should go to hit exactly 0.5.
The next line then works out the corresponding delta_x, which gives you the actual distance in terms of the timebase.
It does the same thing for the trailing edge, then uses these two interpolated values to give you a more precise value for the full-width.
To visualise this I would put a breakpoint at the i = 2 line and step through it, noting or plotting the values of y(i) as you go. stem is helpful for visualising discrete data, especially when you're working between index positions.
The program computes the resolution of a microscope using the Full Width at Half Maximum (FWHM) of the Point Spread Function (PSF) characterizing the microscope with a given objective/optics/etc.
The PSF normally looks like a gaussian:
and the FWHM tells you how good is your microscope system to discern small objects (i.e. the resolution). Let's say you are looking at 2 point objects, then the resolution (indirectly FWHM) is the minimum size those objects need to be if you are indeed to tell that there are 2 objects close to one another instead of one big object.
Now for the above function, it looks like it first compute the maximum of the PSF and then progressively goes down along the curve until it approximately reaches the half maximum. Then it's possible to compute the FWHM from the distribution of the PSF.
Hope that makes things a bit clearer!

Finding 2D area defined by contour lines in Matlab

I am having difficulty with calculating 2D area of contours produced from a Kernel Density Estimation (KDE) in Matlab. I have three variables:
X and Y = meshgrid which variable 'density' is computed over (256x256)
density = density computed from the KDE (256x256)
I run the code
contour(X,Y,density,10)
This produces the plot that is attached. For each of the 10 contour levels I would like to calculate the area. I have done this in some other platforms such as R but am having trouble figuring out the correct method / syntax in Matlab.
C = contourc(density)
I believe the above line would store all of the values of the contours allowing me to calculate the areas but I do not fully understand how these values are stored nor how to get them properly.
This little script will help you. Its general for contour. Probably working for contour3 and contourf as well, with adjustments of course.
[X,Y,Z] = peaks; %example data
% specify certain levels
clevels = [1 2 3];
C = contour(X,Y,Z,clevels);
xdata = C(1,:); %not really useful, in most cases delimters are not clear
ydata = C(2,:); %therefore further steps to determine the actual curves:
%find curves
n(1) = 1; %n: indices where the certain curves start
d(1) = ydata(1); %d: distance to the next index
ii = 1;
while true
n(ii+1) = n(ii)+d(ii)+1; %calculate index of next startpoint
if n(ii+1) > numel(xdata) %breaking condition
n(end) = []; %delete breaking point
break
end
d(ii+1) = ydata(n(ii+1)); %get next distance
ii = ii+1;
end
%which contourlevel to calculate?
value = 2; %must be member of clevels
sel = find(ismember(xdata(n),value));
idx = n(sel); %indices belonging to choice
L = ydata( n(sel) ); %length of curve array
% calculate area and plot all contours of the same level
for ii = 1:numel(idx)
x{ii} = xdata(idx(ii)+1:idx(ii)+L(ii));
y{ii} = ydata(idx(ii)+1:idx(ii)+L(ii));
figure(ii)
patch(x{ii},y{ii},'red'); %just for displaying purposes
%partial areas of all contours of the same plot
areas(ii) = polyarea(x{ii},y{ii});
end
% calculate total area of all contours of same level
totalarea = sum(areas)
Example: peaks (by Matlab)
Level value=2 are the green contours, the first loop gets all contour lines and the second loop calculates the area of all green polygons. Finally sum it up.
If you want to get all total areas of all levels I'd rather write some little functions, than using another loop. You could also consider, to plot just the level you want for each calculation. This way the contourmatrix would be much easier and you could simplify the process. If you don't have multiple shapes, I'd just specify the level with a scalar and use contour to get C for only this level, delete the first value of xdata and ydata and directly calculate the area with polyarea
Here is a similar question I posted regarding the usage of Matlab contour(...) function.
The main ideas is to properly manipulate the return variable. In your example
c = contour(X,Y,density,10)
the variable c can be returned and used for any calculation over the isolines, including area.

Find only relevant points in MATLAB

I have a MATLAB function that finds charateristic points in a sample. Unfortunatley it only works about 90% of the time. But when I know at which places in the sample I am supposed to look I can increase this to almost 100%. So I would like to know if there is a function in MATLAB that would allow me to find the range where most of my results are, so I can then recalculate my characteristic points. I have a vector which stores all the results and the right results should lie inside a range of 3% between -24.000 to 24.000. Wheras wrong results are always lower than the correct range. Unfortunatley my background in statistics is very rusty so I am not sure how this would be called.
Can somebody give me a hint what I would be looking for? Is there a function build into MATLAB that would give me the smallest possible range where e.g. 90% of the results lie.
EDIT: I am sorry if I didn't make my question clear. Everything in my vector can only range between -24.000 and 24.000. About 90% of my results will be in a range which spans approximately 1.44 ([24-(-24)]*3% = 1.44). These are very likely to be the correct results. The remaining 10% are outside of that range and always lower (why I am not sure taking then mean value is a good idea). These 10% are false and result from blips in my input data. To find the remaining 10% I want to repeat my calculations, but now I only want to check the small range.
So, my goal is to identify where my correct range lies. Delete the values I have found outside of that range. And then recalculate my values, not on a range between -24.000 and 24.000, but rather on a the small range where I already found 90% of my values.
The relevant points you're looking for are the percentiles:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
Other approachs exist to detect outliers, such as the IQR outlier test and the three standard deviation rule, among many others:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
and
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
BTW if you want to get the z value (|X|<z) that corresponds to 90% area under the curve, use:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)
Maybe you should try mean value (in matlab: mean) and standard deviation (in matlab: std)?
What is the statistic distribution of your data?
See also this wiki page, section "Interpretation and application".
In general for almost every distribution, very useful Chebyshev's inequalities take place.
In most of the cases this should work:
meanval = mean(data)
stDev = std(data)
and probably the most (75%) of your values will be placed in range:
<meanVal - 2*stDev, meanVal + 2*stDev>
it seems like maybe you want to find the number x in [-24,24] that maximizes the number of sample points in [x,x+1.44]; probably the fastest way to do this involves a sort of the sample points, which is ultimately nlog(n) time; a cheesy approximation would be as follows:
brkpoints = linspace(-24,24-1.44,n_brkpoints); %choose n_brkpoints big, but < # of sample points?
n_count = histc(data,[brkpoints,inf]); %count # data points between breakpoints;
accbins = 1.44 / (brkpoints(2) - brkpoints(1); %# of bins to accumulate;
cscount = cumsum(n_count); %half of the boxcar sum computation;
boxsum = cscount - [zeros(accbins,1);cscount(1:end-accbins)]; %2nd half;
[dum,maxi] = max(boxsum); %which interval has the maximal # counts?
lorange = brkpoints(maxi); %the lower range;
hirange = lorange + 1.44
this solution does fudge some of the corner case stuff about the bottom and top bin, etc.
note that if you're going to go by the Chebyshev inequality route, Petunin's Inequality is probably applicable, and will give a slight boost.