I would like to apply Mahalanobis distanc method to the data obained from the observation.
Each observation is a time response of the system. I have 30 onservations each 14000 points.
I would like to use MAHAL command in matlab. but it notifies me that the number of the rows in variable X must be greater than the columns. But the nature of my observations is so that for each observation I have 1 row (observation) and 14000 columns (time points).
I don't know how to overcome such a problem.
If anybody knows please help me.
You can't do that. The Mahalanobis distance of a point x from a group of values with mean mu and variance sigma is defined as sqrt((x-mu)*sigma^-1*(x-mu)). If sigma is not invertible - and it will not be if you have 30 observations and 14000 variables - the Mahalanobis distance is not defined.
Related
in MFCCs i have specified f_low and f_high which are my frequency min and max bands, and i am about to compute N equally distanced mel values between these two frequency values. So i wrote
f_low=1000;
f_high=fs/2;
filt_num=26; % number of filters
stp=round(f_high/filt_num); % step
f=f_low:stp:f_high; % my frequency vector
but i can't divide equally my f vector, maybe there is a function in matlab that does it , or am i missing something? Please help and thanks in advance.
A bit of digging around leads me to believe you want a linearly spaced vector with filt_num entries, starting at f_low and ending at f_high. You should use linspace for that as follows:
f = linspace(f_low,f_high,filt_num);
This is essentially the same as your last two lines of code. Keep in mind your code only works when f_high is larger than f_low. linspace does not have this issue, as it also supports descending vectors.
The original question was to model a lightbulb, which are used 24/7, and usually one lasts 25 days. A box of bulbs contains 12. What is the probability that the box will last longer than a year?
I had to use MATLAB to model a Gaussian curve based on an exponential variable.
The code below generates a Gaussian model with mean = 300 and std= sqrt(12)*25.
The reason I had to use so many different variables and add them up was because I was supposed to be demonstrating the central limit theorem. The Gaussian curve represents the probability of a box of bulbs lasting for a # of days, where 300 is the average number of days a box will last.
I am having trouble using the gaussian I generated and finding the probability for days >365. The statement 1-normcdf(365,300, sqrt(12)*25) was an attempt to figure out the expected value for the probability, which I got as .2265. Any tips on how to find the probability for days>365 based on the Gaussian I generated would be greatly appreciated.
Thank you!!!
clear all
samp_num=10000000;
param=1/25;
a=-log(rand(1,samp_num))/param;
b=-log(rand(1,samp_num))/param;
c=-log(rand(1,samp_num))/param;
d=-log(rand(1,samp_num))/param;
e=-log(rand(1,samp_num))/param;
f=-log(rand(1,samp_num))/param;
g=-log(rand(1,samp_num))/param;
h=-log(rand(1,samp_num))/param;
i=-log(rand(1,samp_num))/param;
j=-log(rand(1,samp_num))/param;
k=-log(rand(1,samp_num))/param;
l=-log(rand(1,samp_num))/param;
x=a+b+c+d+e+f+g+h+i+j+k+l;
mean_x=mean(x);
std_x=std(x);
bin_sizex=.01*10/param;
binsx=[0:bin_sizex:800];
u=hist(x,binsx);
u1=u/samp_num;
1-normcdf(365,300, sqrt(12)*25)
bar(binsx,u1)
legend(['mean=',num2str(mean_x),'std=',num2str(std_x)]);
[f, y]=ecdf(x) will create an empirical cdf for the data in x. You can then find the probability where it first crosses 365 to get your answer.
Generate N replicates of x, where N should be several thousand or tens of thousands. Then p-hat = count(x > 365) / N, and has a standard error of sqrt[p-hat * (1 - p-hat) / N]. The larger the number of replications is, the smaller the margin of error will be for the estimate.
When I did this in JMP with N=10,000 I ended up with [0.2039, 0.2199] as a 95% CI for the true proportion of the time that a box of bulbs lasts more than a year. The discrepancy with your value of 0.2265, along with a histogram of the 10,000 outcomes, indicates that actual distribution is still somewhat skewed. In other words, using a CLT approximation for the sum of 12 exponentials is going to give answers that are slightly off.
When I use [f,xi] = ksdensity(x) in Matlab, I get the probability density estimate, f, and xi evaluation points at which ksdensity calculates f.
My question is: How is each xi point calculated/determined? Is there a formula?
The documentation center says: Default is 100 equally spaced points that cover the range of data in x. So, they cover the range, but this does not explain how are calculated.
Thank you very much!
Juan
The standard method of achieving equally spaced points in MATLAB is using the linspace command. linspace(a,b,n) generates n linearly spaced points between and including a and b.
So it most probably is equivalent to xi = linspace(min(x),max(x)) (default number of points is 100 inlinspace).
I have a set of samples, S, and I want to find its PDF. The problem is when I use ksdensity I get values greater than one!
[f,xi] = ksdensity(S)
In array f, most of the values are greater than one! Would you please tell me what the problem can be? Thanks for your help.
For example:
S=normrnd(0.3035, 0.0314,1,1000);
ksdensity(S)
ksdensity, as the name says, estimates a probability density function over a continuous variable. Probability densities can be larger than 1, they can actually have arbitrary values from zero upwards. The constraint on probabilities is that their sum over an exhaustive range of possibilities has to be 1. For probability densities, the constraint is that the integral over the whole range of values is 1.
A crude approximation of an integral of the pdf estimated by ksdensity can be obtained in Matlab like this:
sum(f) * min(diff(xi))
assuming that the values in xi are equally spaced. The value of this expression should be approximately 1.
If in your application you believe this approximation is not close enough to 1, you might want to specify the grid of estimation points (second parameter pts) such that the spacing is finer or the range is wider than the one automatically generated by ksdensity.
I am trying to find the Mahalanobis distance of some points from the origin.The MATLAB command for that is mahal(Y,X)
But if I use this I get NaN as the matrix X =0 as the distance needs to be found from the origin.Can someone please help me with this.How should it be done
I think you are a bit confused about what mahal() is doing. First, computation of the Mahalanobis distance requires a population of points, from which the covariance will be calculated.
In the Matlab docs for this function it makes it clear that the distance being computed is:
d(I) = (Y(I,:)-mu)*inv(SIGMA)*(Y(I,:)-mu)'
where mu is the population average of X and SIGMA is the population covariance matrix of X. Since your population consists of a single point (the origin), it has no covariance, and so the SIGMA matrix is not invertible, hence the error where you get NaN/Inf values in the distances.
If you know the covariance structure that you want to use for the Mahalanobis distance, then you can just use the formula above to compute it for yourself. Let's say that the covariance you care about is stored in a matrix S. You want the distance w.r.t. the origin, so you don't need to subtract anything from the values in Y, all you need to compute is:
for ii = 1:size(Y,1)
d(ii) = Y(ii,:)*inv(S)*Y(ii,:)'; % Where Y(ii,:) is assumed to be a row vector.'
end