ttest results in Matlab? - matlab

I am computing those three matrices
A = [1 2 3 4 5 6]';
B = [50987548463 45764568 606978 7318 1674 4]';
C = [50 45 60 78 1 4]';
Why on earth does
ttest(A,B) returns 0 (no rejection of null hypothesis, which means the means are the same with 95% confidence level) while
ttest(A,C) returns 1 (rejection of null hypothesis, which means the means should be different with 95% confidence level)
I would expect rejection of null hypothesis for both ttest, but even more for ttest(A,B)!!

Mean and standard deviation of (A-B) set are high, but t-statistics is -1.0011, which is sufficient to reject H0. Mean and std of (A-C) are smaller, but t-statistics is -2.7612, which is not sufficient to reject H0 with only 5 degrees of freedom. You can check it using
[h1,p1,ci1,stats1] = ttest(A,B)
[h2,p2,ci2,stats2] = ttest(A,C)

Related

Calculating the probability of success in k (or less) Bernoulli trials out of n using matlab

I am trying to calculate the probability of success in 70 (or less) Bernoulli trials out of 100. I wrote it with Matlab. But, I get the probability to be 1 (it can't be 1 since its not success in all 100 trials).
Is my function OK?
syms k
f = nchoosek(100,k)*0.5^k*0.5^(100-k);
F = double(symsum(nchoosek(100,k)*0.5^k*0.5^(100-k),k,0,70));
If it is, how can I get a more accurate resault in Matlab?
Thanks
edit:
I have have a binary vector that represents success/failure in n trials (like tossing a coin 100 times). And I need the error of my sample (the way statistics does it.. but I don't know statistics). So I thought that maybe i will try to calculate "how far am I from being correct in all trials" which should be 1-F in my code. But then 70 successes out of 100 gives me error = 0 which is obviously not true..
edit2: In the example I gave here I need the probability that there are 70 successes in 100 trials.
You do have everything you need to answer this question.
In the formula you have posted, you sum the probabilities from 0 to 70, that is, it will calculate the probability to have 0 or 1 or 2 .. or 70 successes, which means 70 or less successes.
Without the sum, you get the probability to have exactly k successes. The probability to get exactly 70 successes is:
k = 70;
f = nchoosek(100,k)*0.5^k*0.5^(100-k)
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only
accurate to 15 digits
> In nchoosek (line 92)
f =
2.3171e-05
You receive a warning that the computation of nchoosek(100,70) is not exact (see below for a better way).
To compute the probability to get 70 or less successes, sum over the probabilities to get 0 or 1 or .. 70 successes:
>> f = 0;
>> for k=0:70;
f = f + nchoosek(100,k)*.5^k*.5^(100-k);
end
You will receive a lot of warnings, but you can look at f:
>> f
f =
1.0000
As you see, if rounded to four digits, the probability is 1. We know, however, that it must be slighly less than one´. If we ask Matlab to show more digits:
>> format long
we see that it is not exactly 1:
>> f
f =
0.999983919992352
If you compute 1-f, you will see that the result is not 0 (I switch back to showing less digits):
>> format short
>> 1-f
ans =
1.6080e-05
To get rid of the warnings and to simplify the code for computing the probabilities, Matlab provides several functions to deal with binomial distributions. For the probability to get exactly 70 successes, use
>> binopdf(70,100,.5)
ans =
2.3171e-05
and to get 70 or less successes:
>> format long
>> binocdf(70,100,.5)
ans =
0.999983919992352

Numbers between 0.25 and 0.75 of quartiles in a vector MATLAB

I have a vector of simple numbers such as:
a=[1 2 3 4 5 6 7 8]
I would like to have all the numbers of the vector that fall in between [25% 75%] quartiles. However, when I use the command below:
quantile(a,[0.25 0.75])
It only gives me 2 numbers of 2 and 6 (instead of 3,4,5,6).
Do you have any solution how I can do it?
Based on the mathematical definition of a quantile, the quantile() function should not be returning {3,4,5,6} given [0.25 0.75].
A quantile of a may be thought of as the inverse of the cumulative distribution function (CDF) for a. Since the CDF Fa(x) = P(a ≤ x) is a right-continuous increasing function, its inverse Fa-1(q) will be a one-to-one function as well.
Thus quantile(0.25) can only return a single value (scalar), the smallest value x such that P(a ≤ x) = 0.25.
However, logical indexing will do the trick. See code below.
% MATLAB R2017a
a = [1 2 3 4 5 6 7 8];
Q = quantile(a,[0.25 0.75]) % returns 25th & 75th quantiles of a
aQ = a(a>=Q(1) & a<=Q(2)) % returns elements of a between 25th & 75th quantiles (inclusive)

determine lag between two vector

I want to find the minimum amount of lag between two vector , I mean the minimum distance that something is repeated in vector based on another one
for example for
x=[0 0 1 2 2 2 0 0 0 0]
y=[1 2 2 2 0 0 1 2 2 2]
I want to obtain 4 for x to y and obtain 2 for y to x .
I found out a finddelay(x,y) function that works correctly only for x to y (it gives -4 for y to x).
is there any function that only give me lag based on going to the right direction of the vector? I will be so thankful if you'd mind helping me to get this result
I think this may be a potential bug in finddelay. Note this excerpt from the documentation (emphasis mine):
X and Y need not be exact delayed copies of each other, as finddelay(X,Y) returns an estimate of the delay via cross-correlation. However this estimated delay has a useful meaning only if there is sufficient correlation between delayed versions of X and Y. Also, if several delays are possible, as in the case of periodic signals, the delay with the smallest absolute value is returned. In the case that both a positive and a negative delay with the same absolute value are possible, the positive delay is returned.
This would seem to imply that finddelay(y, x) should return 2, when it actually returns -4.
EDIT:
This would appear to be an issue related to floating-point errors introduced by xcorr as I describe in my answer to this related question. If you type type finddelay into the Command Window, you can see that finddelay uses xcorr internally. Even when the inputs to xcorr are integer values, the results (which you would expect to be integer values as well) can end up having floating-point errors that cause them to be slightly larger or smaller than an integer value. This can then change the indices where maxima would be located. The solution is to round the output from xcorr when you know your inputs are all integer values.
A better implementation of finddelay for integer values might be something like this, which would actually return the delay with the smallest absolute value:
function delay = finddelay_int(x, y)
[d, lags] = xcorr(x, y);
d = round(d);
lags = -lags(d == max(d));
[~, index] = min(abs(lags));
delay = lags(index);
end
However, in your question you are asking for the positive delays to be returned, which won't necessarily be the smallest in absolute value. Here's a different implementation of finddelay that works correctly for integer values and gives preference to positive delays:
function delay = finddelay_pos(x, y)
[d, lags] = xcorr(x, y);
d = round(d);
lags = -lags(d == max(d));
index = (lags <= 0);
if all(index)
delay = lags(1);
else
delay = lags(find(index, 1)-1);
end
end
And here are the various results for your test case:
>> x = [0 0 1 2 2 2 0 0 0 0];
>> y = [1 2 2 2 0 0 1 2 2 2];
>> [finddelay(x, y) finddelay(y, x)] % The default behavior, which fails to find
% the delays with smallest absolute value
ans =
4 -4
>> [finddelay_int(x, y) finddelay_int(y, x)] % Correctly finds the delays with the
% smallest absolute value
ans =
-2 2
>> [finddelay_pos(x, y) finddelay_pos(y, x)] % Finds the smallest positive delays
ans =
4 2

Issue Regarding KL Divergence in MATLAB

I have posted earlier also regarding KL divergence query but unfortunately did not received a reply may be due to the complexity of my question so I have tried to explain my problem by using simple example.
I have one reference sensor signal and 1 measured value of some sensor.
I want to find out the error or difference between Ref and Measured sensor signal Values.
SO I am using KL divergence.
First I normalized my reference and sensor signal histogram and then applied KL divergence.
My data is too much and complicated means it contains a lot of zeroes and negative values and also 0.001 like these values also.
I was applying KL divergence but unfortunately was not being able to get some good results so I was wondering that may be I did not able to get the good concept of KL divergence or I am doing wrong at some point in The code.
It will nice of people if someone help me out in this. I shall be grateful.
I have also seen a post explaining KL divergence and had a look at it but some concepts remained missing so I am posting my example along with query.
Am i on right way or there is some fault in my concepts .
Thanks a lot in advance.
ref = [5 6 7 5 8 7 8 9 -2 -3 -4];
measured_sensor = [3 3 4 5 7 8 9 9 -1 -2 -3];
%normalized histograms for
C= hist( ref);
C1 = C ./ sum(C);
D = hist(measured_sensor);
D1 = D ./ sum(D);
figure(1)
ax11=subplot(321);
bar(C1)
ax12=subplot(322);
bar(D1)
d = zeros(size(C1));
goodIdx = C1>0 & D1>0;
d1 = sum(C1(goodIdx) .* log(C1(goodIdx) ./ D1(goodIdx)))
d2 = sum(D1(goodIdx) .* log(D1(goodIdx) ./ C1(goodIdx)))
d(goodIdx) = d1 + d2
Mean Based Gaussian Hysterisis (Means Error Finding)
ref = [5 6 7 5 8 7 8 9 -2 -3 -4];
measured_sensor = [3 3 4 5 7 8 9 9 -1 -2 -3];
sig_diff = ref - measured_sensor ;
m = mean(sig_diff)
deviation = std(sig_diff);
pos = sig_diff(sig_diff>0)
neg = sig_diff(sig_diff<0)
m_pos = mean(pos)
m_neg = mean (neg)
hysterisis = abs( m_pos)+ abs(m_neg)
figure(6)
ax11=subplot(321);
histfit(sig_diff)
hold on
plot([m m],[0 5000],'r')
plot([m-deviation m-deviation],[0 5000],'r')
plot([m+deviation m+deviation],[0 5000],'r')
hold off
The error value or Hysterisis value that I am getting with mean based Gaussian Distribution is 3.2500.
So I am expecting the error values from KL divergence near to 3.2500 value or in the range with some tolerance is also accepted.
Actually I am confused about these techniques that which one is giving me more precise version of error and with which technique i will get the best results.

Matlab chol function returns single number Choleksy decomposition

I have a matrix A which is 390 by 390 and contains numbers such as:
141270,991258825 -92423,2972762164
-92423,2972762164 60465,8168198016
139998,877391881 -91591,0460330842
30573,0969789307 -20001,7456206658 ...
If I try chol(A), Matlab fails and says that the matrix must be positive definite. Ok I saw in the API that [R,p] = chol(A) also works for negative definite matrices. I tried this, but R then becomes a 1x1 matrix. But I expect a 390x390 matrix.
The help file is slightly unclear here, but it doesn't mean that you can just use a non-positive definite matrix and get the same result by changing the way you call the function:
[R,p] = chol(A) for positive definite A, produces an upper triangular
matrix R from the diagonal and upper triangle of matrix A, satisfying
the equation R'*R=A and p is zero. If A is not positive definite, then
p is a positive integer and MATLAB® does not generate an error. When A
is full, R is an upper triangular matrix of order q=p-1 such that
R'*R=A(1:q,1:q).
If your matrix is not positive definite, p > 0, therefore the size of your result R will depend on p. In fact, I think this particular syntax is simply designed to allow you to use chol as a way of checking if A is positive definite, rather than just giving an error if it is not. The help file even says:
Note Using chol is preferable to using eig for determining positive definiteness.
Example - take pascal(5) and set the last element to something negative:
A =
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 35
1 5 15 35 -3
[R,p] = chol(A) returns
R =
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
p =
5
Sure enough, R'*R' == A(1:4,1:4)
By setting element X(2,2) to be negative, on the other hand, gives p of 2 and therefore a single value in R which will be sqrt(A(1,1). Setting A(1,1) to be negative returns p = 1 and an empty R.