MATLAB - Changepoint analysis or "findchangepts": How does it work? - matlab

I am using the function findchangepts and use 'linear' which detects changes in mean and slope. How does it note a change? Is it by consecutive points until the next point has a different mean and slope?
Mathworks has the following explanation:
If x is a vector with N elements, then findchangepts partitions x into two regions, x(1:ipt-1) and x(ipt:N), that minimize the sum of the residual (squared) error of each region from its local mean.
How does the function get ipt?
Thanks in advance!
I am working with a single vector with N elements.

Related

Kullback Leibler Divergence of 2 Histograms in MatLab

I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.
The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.

Dividing equally a vector matlab

in MFCCs i have specified f_low and f_high which are my frequency min and max bands, and i am about to compute N equally distanced mel values between these two frequency values. So i wrote
f_low=1000;
f_high=fs/2;
filt_num=26; % number of filters
stp=round(f_high/filt_num); % step
f=f_low:stp:f_high; % my frequency vector
but i can't divide equally my f vector, maybe there is a function in matlab that does it , or am i missing something? Please help and thanks in advance.
A bit of digging around leads me to believe you want a linearly spaced vector with filt_num entries, starting at f_low and ending at f_high. You should use linspace for that as follows:
f = linspace(f_low,f_high,filt_num);
This is essentially the same as your last two lines of code. Keep in mind your code only works when f_high is larger than f_low. linspace does not have this issue, as it also supports descending vectors.

How To Do Linearly Separable Binary Classification?

I want to solve following optimization problem -
Cost Function: 1/2 ||W||^2
Subject to : Y_i(w.X_i - b) >= 1
Where X is a 700x3 matrix, Y is a vector stores the label of classes for those instances (valued as 1/-1) and w.X_i is the dot product of w and X_i.
I am using CVX -
cvx_begin
variable W(3);
variable B;
minimize (0.5*W'*W)
subject to
Y'*(X*W - B) >= 1;
cvx_end
then, I am plotting, w1.x1 + w2.x2 - b
which does not seem to be separating hyper-plane?
Whats wrong am I doing?
In short:
when you are doing w1.x1 + w2.x2 - b you are trying to specify a hyperplane at a particular location, which is also the same as specifying a particular point on a vector. To do either in a 3D space you need to use all three dimensions, so: w1.x1 + w2.x2 +w3.x3 - b
In longer:
When performing a linear classification such as this, the task can be viewed in two ways:
Finding a separating hyperplane such that all samples of one class are on one side, and all samples of the other class are on the other side.
Finding a projection of the multidimensional space which the samples are in, into a single dimensional line, such that there is a point on the line which clearly separates them.
These are identical tasks, since the single dimension in 2 is essentially how far each sample is from the separating hyperplane (and which side said sample is on). I find it helps to bear both of these viewpoints in mind, particularly since the separating hyperplane is the plane orthogonal to the single dimensional vector.
So, in the case you are dealing with, the weight vector w provided by the model is used to project the samples in matrix X onto a single dimensional line and the offset b indicates at which point along this vector the separating hyperplane occurs. By subtracting b from the projected values they are shifted such that this hyperplane is the one orthogonal to the line at point 0 which makes for simple thresholding.

How to remove the outliers located outside the predication bound in Matlab?

Hey guys I have one question related to processing of Time series, I have xy data and want to remove the outliers, so i defined it by ones that located outside the the prediction bound, I applied the regress functions [B, Bint, R, Rint, stats] = regress(y, x);but iam confused how to remove that ones?
any help??
Straight from the docs
[b,bint,r,rint] = regress(y,X) returns an n-by-2 matrix rint of
intervals that can be used to diagnose outliers. If the interval
rint(i,:) for observation i does not contain zero, the corresponding
residual is larger than expected in 95% of new observations,
suggesting an outlier.
Therefore, to find the location of outliers in your data, it should be just:
n = rint(:,1)>0|rint(:,2)<0;
Then you can either remove them, plot them in a different colour, or whatever.

Neigbouring nodes and separation in matlab

On matlab, I have an adjacency matrix and using a function, I would like to find out how to plot a histogram showing the degrees of separation between 2 given nodes(up to 10).
As of now I only have a function that finds a node's neighbours. Basically it'll be similar to the notion of 6 degrees of separation, except with 10.
Thanks!
function n=neighbour(A,v)
global n;
for i=1:length(v)
a=find(A(:,v(i))+A(v(i),:)');
n=setdiff(a(:)',v(i));
end
end
In general, this is solved using the Floyd–Warshall algorithm, which computes the shortest paths between all pairs of nodes in a graph.
Since you're using Matlab and because the distance between any two connected nodes is always the same ("1 step"), you could use a trick that involves matrix multiplication: if you have an adjacency matrix A, then raising A to the Nth power gives you a new matrix that tells you how many paths of length N exist between each pair of nodes. So, in a loop, raise A to the 1st power, the 2nd power, etc, and note at which power each element becomes nonzero. The maximum path length is equal to the number of nodes, so you can stop there.
Scale-Free Network Visualization, including histogram of the degrees of separation can be found in this link, might be helpful...