I have handwritten samples from two writers. I am using a feature extractor to extract features from both.
I want to display the similarity between the classes. As to show how identical both are and how difficult it can be for a classifier to classify them correctly.
I have read papers that use PCA to demonstrate this. I tried with PCA but I dont think I'm implementing this correctly. I'm using this to display the similarity.
[COEFF,SCORE] = princomp(features_extracted);
plot(COEFF,'.')
But for every class and every sample I get exactly the same plot. I mean they should be similar not exactly the same. What am I doing wrong?
You will struggle to show anything significant with only 10 samples per class, and over 4000 features.
Nevertheless, the following code will calculate PCA and show the relationship between the first two principal components (the components that contain 'most' variance).
% Truly indistinguishable data
dummy_data = randn(20, 4000);
% Uncomment this to make the data distinguishable
%dummy_data(1:10, :) = dummy_data(1:10, :) - 0.5;
% Normalise the data - this isn't technically required for the dummy data
% above, but is included for completeness.
dummy_data_normalised = dummy_data;
for f = 1:size(a, 2)
dummy_data_normalised(:, f) = dummy_data_normalised(:, f) - nanmean(dummy_data_normalised(:, f));
dummy_data_normalised(:, f) = dummy_data_normalised(:, f) / nanstd(dummy_data_normalised(:, f));
end
% Generate vector of 10 0's and 10 1's
class_labels = reshape(repmat([0 1], 10, 1), 20, 1);
% Perform PCA
pca_coeffs = pca(dummy_data_normalised);
% Calculate transformed data
dummy_data_pca = dummy_data_normalised * pca_coeffs;
figure;
hold on;
for class = unique(class_labels)'
% Plot first two components of first class
scatter(dummy_data_pca(class_labels == class, 1), dummy_data_pca(class_labels == class, 2), 'filled')
end
legend(strcat({'Class '},int2str(unique(class_labels)))')
For indistinguishable data, this will show a scatter plot similar to the following:
Clearly it is not possible to draw a separation boundary between the two classes.
If you uncomment the fifth line to make the data distinguishable, then the plot will instead come out as follows:
However, to repeat what I wrote in my comment, PCA does not necessarily find the components that give the best separation. It is an unsupervised method and only finds the components with the largest variance. In some applications, this is also the components that give good separation. With only 10 samples per class, you will not be able to demonstrate anything statistically significant. Also have a look at this question for more details on PCA and the number of samples per class.
EDIT: This also extends naturally to having more classes:
numer_of_classes = 10;
samples_per_class = 20;
% Truly indistinguishable data
dummy_data = randn(numer_of_classes * samples_per_class, 4000);
% Make the data distinguishable
for i = 1:numer_of_classes
ixd = (((i - 1) * samples_per_class) + 1):(i * samples_per_class);
dummy_data(ixd, :) = dummy_data(ixd, :) - (0.5 * (i - 1));
end
% Normalise the data
dummy_data_normalised = dummy_data;
for f = 1:size(a, 2)
dummy_data_normalised(:, f) = dummy_data_normalised(:, f) - nanmean(dummy_data_normalised(:, f));
dummy_data_normalised(:, f) = dummy_data_normalised(:, f) / nanstd(dummy_data_normalised(:, f));
end
% Generate vector of classes (1 to numer_of_classes)
class_labels = reshape(repmat(1:numer_of_classes, samples_per_class, 1), numer_of_classes * samples_per_class, 1);
% Perform PCA
pca_coeffs = pca(dummy_data_normalised);
% Calculate transformed data
dummy_data_pca = dummy_data_normalised * pca_coeffs;
figure;
hold on;
for class = unique(class_labels)'
% Plot first two components of first class
scatter(dummy_data_pca(class_labels == class, 1), dummy_data_pca(class_labels == class, 2), 'filled')
end
legend(strcat({'Class '},int2str(unique(class_labels)))')
Related
How can I plot a bar out of a
data = 1x10 cell
, where each value in the cell has a different dimension like 3x100, 3x40, 66x2 etc.
My goal is to get a bar plot, where I would have 10 group of bars and in every group three bars for each of the values. On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
On this example there are not group of bars, but my point is to show you how I want the confidence intervals shown. On the site, where I found this example they offer a solution where they have this command line
e1 = errorbar(mean(data), ci95);
but I have the problem that it can't find any ci95
So, are there any other effective ways to do it, without installing or downloading additional services?
I've found Patrick Happel's answer to not work because the figure window (and therefore the variable b) gets cleared out by subsequent calls to errorbar. Simply adding a hold on command takes care of this. To avoid confusion, here's a new answer that reproduces all of Patrick's original code, plus my small tweak:
%% Old answer
%Just to be safe, let's clear everything
clear all
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
%============================================
%Next thing is
% On the bar, I want it to be shown the median of the values, and I
% want to calculate the confidence interval and show it additionally.
%
%Are you really sure you want to plot the median? The median of some data
%is not connected to the variance of the data, and hus no type of error
%bars are required. I guess you want to show the mean. If you really want
%to show the median, a box plot might be a better alternative.
%
%The following code computes and plots the mean in a bar plot:
%============================================
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
%% New code
%This ensures that b continues to reference existing data in the next for
%loop, as the graphics objects can otherwise be deleted.
hold on
%% Continuing Patrick Happel's answer
%============================================
%Now, we need to compute the length of the error bars. Typical choices are
%the standard deviation of the data (already computed by the code above,
%stored in stds), the standard error or the 95% confidence interval (which
%is the 1.96fold of the standard error, assuming the underlying data
%follows a normal distribution).
%============================================
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
%============================================
%Last thing is to plot the error bars. Here I chose the ci95 as you asked
%in your question, if you want to change that, simply change the variable
%in the call to errorbar:
%============================================
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end
Since I am not sure how your data looks like, since in your question you stated that the elements of the cell contain data with different dimension like
3x100, 3x40, 66x2
I assume that your data can be arranged in columns or rows and that not all data requires three bars.
Since you did not provide a short piece of your data for us to test, I generate some artificial data:
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
Next thing is
On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
Are you really sure you want to plot the median? The median of some data is not connected to the variance of the data, and hus no type of error bars are required. I guess you want to show the mean. If you really want to show the median, a box plot might be a better alternative.
The following code computes and plots the mean in a bar plot:
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
Now, we need to compute the length of the error bars. Typical choices are the standard deviation of the data (already computed by the code above, stored in stds), the standard error or the 95% confidence interval (which is the 1.96fold of the standard error, assuming the underlying data follows a normal distribution).
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
Last thing is to plot the error bars. Here I chose the ci95 as you asked in your question, if you want to change that, simply change the variable in the call to errorbar:
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end
I'm trying to estimate the (unknown) original datapoints that went into calculating a (known) moving average. However, I do know some of the original datapoints, and I'm not sure how to use that information.
I am using the method given in the answers here: https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average, but in MATLAB (my code below). This method works quite well for large numbers of data points (>1000), but less well with fewer data points, as you'd expect.
window = 3;
datapoints = 150;
data = 3*rand(1,datapoints)+50;
moving_averages = [];
for i = window:size(data,2)
moving_averages(i) = mean(data(i+1-window:i));
end
length = size(moving_averages,2)+(window-1);
a = (tril(ones(length,length),window-1) - tril(ones(length,length),-1))/window;
a = a(1:length-(window-1),:);
ai = pinv(a);
daily = mtimes(ai,moving_averages');
x = 1:size(data,2);
figure(1)
hold on
plot(x,data,'Color','b');
plot(x(window:end),moving_averages(window:end),'Linewidth',2,'Color','r');
plot(x,daily(window:end),'Color','g');
hold off
axis([0 size(x,2) min(daily(window:end))-1 max(daily(window:end))+1])
legend('original data','moving average','back-calculated')
Now, say I know a smattering of the original data points. I'm having trouble figuring how might I use that information to more accurately calculate the rest. Thank you for any assistance.
You should be able to calculate the original data exactly if you at any time can exactly determine one window's worth of data, i.e. in this case n-1 samples in a window of length n. (In your case) if you know A,B and (A+B+C)/3, you can solve now and know C. Now when you have (B+C+D)/3 (your moving average) you can exactly solve for D. Rinse and repeat. This logic works going backwards too.
Here is an example with the same idea:
% the actual vector of values
a = cumsum(rand(150,1) - 0.5);
% compute moving average
win = 3; % sliding window length
idx = hankel(1:win, win:numel(a));
m = mean(a(idx));
% coefficient matrix: m(i) = sum(a(i:i+win-1))/win
A = repmat([ones(1,win) zeros(1,numel(a)-win)], numel(a)-win+1, 1);
for i=2:size(A,1)
A(i,:) = circshift(A(i-1,:), [0 1]);
end
A = A / win;
% solve linear system
%x = A \ m(:);
x = pinv(A) * m(:);
% plot and compare
subplot(211), plot(1:numel(a),a, 1:numel(m),m)
legend({'original','moving average'})
title(sprintf('length = %d, window = %d',numel(a),win))
subplot(212), plot(1:numel(a),a, 1:numel(a),x)
legend({'original','reconstructed'})
title(sprintf('error = %f',norm(x(:)-a(:))))
You can see the reconstruction error is very small, even using the data sizes in your example (150 samples with a 3-samples moving average).
I have a dataset containing two vectors of points, X and Y that represents measurements of an "exponential-like" phenomenon (i.e. Y = A*exp(b*x)). When fitting it with an exponential equation I'm getting a nice-looking fit, but when I'm using it to compute things it turns out that the fit is not quite as accurate as I would hope.
Currently my most promising idea is a piecewise exponential fit (taking about 6 (x,y) pairs each time), which seems to provide better results in cases I tested manually. Here's a diagram to illustrate what I mean to do:
// Assuming a window of size WS=4:
- - - - - - - - - - - - //the entire X span of the data
[- - - -]- - // the fit that should be evaluated for X(1)<= x < X(floor(WS/2)+1)
-[- - - -]- // the fit that should be evaluated for X(3)<= x < X(4)
...
- - - - - -[- - - -]- - // X(8)<= x < X(9)
... //and so on
Some Considerations:
I considered filtering the data before fitting, but this is tricky since I don't really know anything about the type of noise it contains.
I would like the piecewise fit (including all different cases) to be accessible using a single function handle. It's very similar to MATLAB's Shape-preserving "PCHIP" interpolant, only that it should use an exponential function instead.
The creation of the fit does not need to happen during the runtime of another code. I even prefer to create it separately.
I'm not worried about the potential unsmoothness of the final function.
The only way of going about this I could think of is defining an .m file as explained in Fit a Curve Defined by a File, but that would require manually writing conditions for almost as many cases as I have points (or obviously write a code that generates this code for me, which is also a considerable effort).
Relevant code snippets:
clear variables; clc;
%% // Definitions
CONSTS.N_PARAMETERS_IN_MODEL = 2; %// For the model y = A*exp(B*x);
CONSTS.WINDOW_SIZE = 4;
assert(~mod(CONSTS.WINDOW_SIZE,2) && CONSTS.WINDOW_SIZE>0,...
'WINDOW_SIZE should be a natural even number.');
%% // Example Data
X = [0.002723682418630,0.002687088539567,0.002634005004610,0.002582978173834,...
0.002530684550171,0.002462144527884,0.002397219225698,0.002341097974950,...
0.002287544321171,0.002238889510803]';
Y = [0.178923076435990,0.213320004768074,0.263918364216839,0.324208349386613,...
0.394340431220509,0.511466688684182,0.671285738221314,0.843849959919278,...
1.070756756433334,1.292800046096531]';
assert(CONSTS.WINDOW_SIZE <= length(X),...
'WINDOW_SIZE cannot be larger than the amount of points.');
X = flipud(X); Y = flipud(Y); % ascending sorting is needed later for histc
%% // Initialization
nFits = length(X)-CONSTS.WINDOW_SIZE+1;
coeffMat(nFits,CONSTS.N_PARAMETERS_IN_MODEL) = 0; %// Preallocation
ft = fittype( 'exp1' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
%% // Populate coefficient matrix
for ind1 = 1:nFits
[xData, yData] = prepareCurveData(...
X(ind1:ind1+CONSTS.WINDOW_SIZE-1),Y(ind1:ind1+CONSTS.WINDOW_SIZE-1));
%// Fit model to data:
fitresult = fit( xData, yData, ft, opts );
%// Save coefficients:
coeffMat(ind1,:) = coeffvalues(fitresult);
end
clear ft opts ind1 xData yData fitresult ans
%% // Get the transition points
xTrans = [-inf; X(CONSTS.WINDOW_SIZE/2+1:end-CONSTS.WINDOW_SIZE/2); inf];
At this point, xTrans and coeffMat contain all the required information to evaluate the fits. To show this we can look at a vector of some test data:
testPts = [X(1), X(1)/2, mean(X(4:5)), X(CONSTS.WINDOW_SIZE)*1.01,2*X(end)];
...and finally the following should roughly happen internally within the handle:
%% // Get the correct fit# to be evaluated:
if ~isempty(xTrans) %// The case of nFits==1
rightFit = find(histc(testPts(3),xTrans));
else
rightFit = 1;
end
%% // Actually evaluate the right fit
f = #(x,A,B)A*exp(B*x);
yy = f(testPts(3),coeffMat(rightFit,1),coeffMat(rightFit,2));
And so my problem is how to hold that last bit (along with all the fit coefficients) inside a single handle, that accepts an arbitrarily-sized input of points to interpolate on?
Related resources:
stackoverflow.com/questions/16777921/matlab-curve-fitting-exponential-vs-linear/
It's not all clear but why not to puts things into a class ?
classdef Piecewise < handle
methods
% Construction
function [this] = Piecewise(xmeas, ymeas)
... here compute xTrans and coeffMat...
end
% Interpolation
function [yinterp] = Evaluate(xinterp)
... Here use previously computed xTrans and coeffMat ...
end
end
properties(SetAccess=Private, GetAccess=Private)
xTrans;
coeffMat;
end
end
In this way you can prcompute xTrans vector and coeffMat matrix during construction and then later reuse these properties when you need to evaluate interpolant at xinterp values in Evaluate method.
% The real measured data
xmeas = ...
ymeas = ...
% Build piecewise interpolation object
piecewise = Piecewise(x,meas, ymeas);
% Rebuild curve at any new positions
xinterp = ...
yinterp = piecewise.Evaluate(xinterp);
Function like syntax
If you truly prefer to have function-handle like syntax, you can still internally use above Piecewise object and add extra static method to return it as a function handle:
classdef Piecewise < handle
... see code above...
methods(Static=true)
function [f] = MakeHandle(xmeas, ymeas)
%[
obj = Piecewise(xmeas, ymeas);
f = #(x)obj.Evaluate(x);
%]
end
end
This can be used like this:
f = Piecewise.MakeHandle(xmeas, ymeas);
yinterp = f(xinterp);
PS1: You can later put Evaluate and Piecewise constructor methods as private if you absolutely wanna to force this syntax.
PS2: To fully hide object oriented design, you can turn MakeHandle into a fully classic routine (will work the same as if static and users won't have to type Piecewise. in front of MakeHandle).
A last solution without oop
Put everything in a single .m file :
function [f] = MakeHandle(xmeas, ymeas)
... Here compute xTrans and coeffMat ...
f = #(x)compute(x, xTrans, coeffMat);% Passing xTrans/coeffMatt as hidden parameters
end
function [yinterp] = compute(xinterp, xTrans, coeffMat)
... do interpolation here...
end
As an extension of CitizenInsane's answer, the following is an alternative approach that allows a "handle-y" access to the inner Evaluate function.
function b = subsref(this,s)
switch s(1).type
case '()'
xval = s.subs{:};
b = this.Evaluate(xval);
otherwise %// Fallback to the default behavior for '.' and '{}':
builtin('subsref',this,s);
end
end
References: docs1, docs2, docs3, question1
P.S. docs2 is referenced because I initially tried to do subsref#handle (which is calling the superclass method, as one would expect in OOP when overriding methods in a subclass), but this doesn't work in MATLAB, which instead requires builtin() to achieve the same functionality.
Assuming a noiseless AR(1) process y(t)= a*y(t-1) . I have following conceptual questions and shall be glad for the clarification.
Q1 - Discrepancy between mathematical formulation and implementation - The mathematical formulation of AR model is in the form of y(t) = - summmation over i=1 to p[a*y(t-p)] + eta(t) where p=model order and eta(t) is a white gaussian noise. But when estimating coefficients using any method like arburg() or the least square, we simply call that function. I do not know if a white gaussian noise is implicitly added. Then, when we resolve the AR equation with the estimated coefficients, I have seen that the negative sign is not considered nor the noise term added.
What is the correct representation of AR model and how do I find the average coefficients over k number of trials when I have only a single sample of 1000 data points?
Q2 - Coding problem in How to simulate fitted_data for k number of trials and then find the residuals - I fitted a data "data" generated from unknown system and obtained the coefficient by
load('data.txt');
for trials = 1:10
model = ar(data,1,'ls');
original_data=data;
fitted_data(i)=coeff1*data(i-1); % **OR**
data(i)=coeff1*data(i-1);
fitted_data=data;
residual= original_data - fitted_data;
plot(original_data,'r'); hold on; plot(fitted_data);
end
When calculating residual is the fitted_data obtained as above by resolving the AR equation with the obtained coefficients? Matlab has a function for doing this but I wanted to make my own. So, after finding coefficients from the original data how do I resolve ? The coding above is incorrect. Attached is the plot of original data and the fitted_data.
If you model is simply y(n)= a*y(n-1) with scalar a, then here is the solution.
y = randn(10, 1);
a = y(1 : end - 1) \ y(2 : end);
y_estim = y * a;
residual = y - y_estim;
Of course, you should separate the data into train-test, and apply a on the test data. You can generalize this approach to y(n)= a*y(n-1) + b*y(n-2), etc.
Note that \ represents mldivide() function: mldivide
Edit:
% model: y[n] = c + a*y(n-1) + b*y(n-2) +...+z*y(n-n_order)
n_order = 3;
allow_offset = true; % alows c in the model
% train
y_train = randn(20,1); % from your data
[y_in, y_out] = shifted_input(y_train, n_order, allow_offset);
a = y_in \ y_out;
% now test
y_test = randn(20,1); % from your data
[y_in, y_out] = shifted_input(y_test, n_order, allow_offset);
y_estim = y_in * a; % same a
residual = y_out - y_estim;
here is shifted_input():
function [y_in, y_out] = shifted_input(y, n_order, allow_offset)
y_out = y(n_order + 1 : end);
n_rows = size(y, 1) - n_order;
y_in = nan(n_rows, n_order);
for k = 1 : n_order
y_in(:, k) = y(1 : n_rows);
y = circshift(y, -1);
end
if allow_offset
y_in = [y_in, ones(n_rows, 1)];
end
return
AR-type models can serve a number of purposes, including linear prediction, linear predictive coding, filtering noise. The eta(t) are not something we are interested in retaining, rather part of the point of the algorithms is to remove their influence to any extent possible by looking for persistent patterns in the data.
I have textbooks that, in the context of linear prediction, do not include the negative sign included in your expression prior to the sum. On the other hand Matlab's function lpcdoes:
Xp(n) = -A(2)*X(n-1) - A(3)*X(n-2) - ... - A(N+1)*X(n-N)
I recommend you look at function lpc if you haven't already, and at the examples from the documentation such as the following:
randn('state',0);
noise = randn(50000,1); % Normalized white Gaussian noise
x = filter(1,[1 1/2 1/3 1/4],noise);
x = x(45904:50000);
% Compute the predictor coefficients, estimated signal, prediction error, and autocorrelation sequence of the prediction error:
p = lpc(x,3);
est_x = filter([0 -p(2:end)],1,x); % Estimated signal
e = x - est_x; % Prediction error
[acs,lags] = xcorr(e,'coeff'); % ACS of prediction error
The estimated x is computed as est_x. Note how the example uses filter. Quoting the matlab doc again, filter(b,a,x) "is a "Direct Form II Transposed" implementation of the standard difference equation:
a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
- a(2)*y(n-1) - ... - a(na+1)*y(n-na)
which means that in the prior example est_x(n) is computed as
est_x(n) = -p(2)*x(n-1) -p(3)*x(n-2) -p(4)*x(n-3)
which is what you expect!
Edit:
As regards the function ar, the matlab documentation explains that the output coefficients have the same meaning as in the lp scenario discussed above.
The right way to evaluate the output of the AR model is to compute
data_armod(i)= -coeff(2)*data(i-1) -coeff(3)*data(i-2) -coeff(4)*data(i-3)
where coeff is the coefficient matrix returned with
model = ar(data,3,'ls');
coeff = model.a;
I am using Gonzalez frdescp function to get Fourier descriptors of a boundary. I use this code, and I get two totally different sets of numbers describing two identical but different in scale shapes.
So what is wrong?
im = imread('c:\classes\a1.png');
im = im2bw(im);
b = bwboundaries(im);
f = frdescp(b{1}); // fourier descriptors for the boundary of the first object ( my pic only contains one object anyway )
// Normalization
f = f(2:20); // getting the first 20 & deleting the dc component
f = abs(f) ;
f = f/f(1);
Why do I get different descriptors for identical - but different in scale - two circles?
The problem is that the frdescp code (I used this code, that should be the same as referred by you) is written also in order to center the Fourier descriptors.
If you want to describe your shape in a correct way, it is mandatory to mantain some descriptors that are symmetric with respect to the one representing the DC component.
The following image summarize the concept:
In order to solve your problem (and others like yours), I wrote the following two functions:
function descriptors = fourierdescriptor( boundary )
%I assume that the boundary is a N x 2 matrix
%Also, N must be an even number
np = size(boundary, 1);
s = boundary(:, 1) + i*boundary(:, 2);
descriptors = fft(s);
descriptors = [descriptors((1+(np/2)):end); descriptors(1:np/2)];
end
function significativedescriptors = getsignificativedescriptors( alldescriptors, num )
%num is the number of significative descriptors (in your example, is was 20)
%In the following, I assume that num and size(alldescriptors,1) are even numbers
dim = size(alldescriptors, 1);
if num >= dim
significativedescriptors = alldescriptors;
else
a = (dim/2 - num/2) + 1;
b = dim/2 + num/2;
significativedescriptors = alldescriptors(a : b);
end
end
Know, you can use the above functions as follows:
im = imread('test.jpg');
im = im2bw(im);
b = bwboundaries(im);
b = b{1};
%force the number of boundary points to be even
if mod(size(b,1), 2) ~= 0
b = [b; b(end, :)];
end
%define the number of significative descriptors I want to extract (it must be even)
numdescr = 20;
%Now, you can extract all fourier descriptors...
f = fourierdescriptor(b);
%...and get only the most significative:
f_sign = getsignificativedescriptors(f, numdescr);
I just went through the same problem with you.
According to this link, if you want invariant to scaling, make the comparison ratio-like, for example by dividing every Fourier coefficient by the DC-coefficient. f*1 = f1/f[0], f*[2]/f[0], and so on. Thus, you need to use the DC-coefficient where the f(1) in your code is not the actual DC-coefficient after your step "f = f(2:20); % getting the first 20 & deleting the dc component". I think the problem can be solved by keeping the value of the DC-coefficient first, the code after adjusted should be like follows:
% Normalization
DC = f(1);
f = f(2:20); % getting the first 20 & deleting the dc component
f = abs(f) ; % use magnitudes to be invariant to translation & rotation
f = f/DC; % divide the Fourier coefficients by the DC-coefficient to be invariant to scale