matlab - calculate the 95 % interval around the mean - matlab

If I have a vector of monthly-averaged values like
aa = [1,2,3,2,1,3,5,3,4,8,9,7;...
11,12,3,21,1,3,15,3,4,8,19,7;...
21,2,3,2,1,23,5,3,34,84,9,7]';
where each column refers to the monthly-averaged values from different locations and each row represents the month of year. I can calculate the average of all of the sites as:
mean_a = nanmean(aa,2);
and thus can plot the averages of these as:
plot(1:12, mean_a);
How would I now calculate the 95 % confidence interval around these mean values?
Any advice would be appreciated.
My attempt:
Assuming a normal distribution:
aa = [1,2,3,2,1,3,5,3,4,8,9,7;...
11,12,3,21,1,3,15,3,4,8,19,7;...
21,2,3,2,1,23,5,3,34,84,9,7]';
mean_a = nanmean(aa,2);
sem = (nanstd(aa')./sqrt(size(aa,2))).*1.96;
errorbar(1:12,mean_a,sem);

Calculate the quantile using quantile: or if you know the distribution, multiply the standard deviation with the correct quantile value.

I know this is an old question but for the record, here is a function called confidence_intervals() that will give any confidence intervals for a dataset and can be used with the errorbar() function in Matlab. It can also be used, given the optional argument, to find the confidence intervals with the log-normal variance.
As in your example, the code becomes:
aa = [1,2,3,2,1,3,5,3,4,8,9,7;...
11,12,3,21,1,3,15,3,4,8,19,7;...
21,2,3,2,1,23,5,3,34,84,9,7]';
errorbar( 1:12, mean(aa), confidence_intervals( aa, 95 ) )

Related

Zero crossings around mean

I am working on developing a suite classifiers for EEG signals and I will be needing a zero-crossings around mean function, defined in the following manner:
Ideally if I have some vector with a range of values representing a sinusoid or any time varying signal, I will want to return a vector of Booleans of the same size as the vector saying if that particular value is a mean crossing. I have the following Matlab implementation:
ZX = #(x) sum(((x - mean(x)>0) & (x - mean(x)<0)) | ((x - mean(x)<0) & (x - mean(x)>0)));
Testing it on toy data:
[0 4 -6 9 -20 -5]
Yields:
0
EDIT:
Yet I believe it should return:
3
What am I missing here?
An expression like:
((x-m)>0) & ((x-m)<0)
is always going to return a vector of all zeros because no individual element of x is both greater and less than zero. You need to take into account the subscripts on the xs in the definition of ZX:
((x(1:end-1)-m)>0) & ((x(2:end)-m)<0)
You can use the findpeaks function on -abs(x), where x is your original data, to find the peak locations. This would give you the zero crossings in general for continuous signals which do not have zero as an actual maximum of the signal.
t = 0:0.01:10;
x = sin(pi*t);
plot(t,x)
grid
y = -abs(x);
[P,L] = findpeaks(y,t);
hold on
plot(L,P,'*')
A simple solution is to use movprod, and count the products which are negative, i.e.,
cnt = sum(sign(movprod(x-mean(x),2))<0);
With your toy example, you will get cnt = 3.

Sampling according to difference in function value

I have 20 values x1,...x20. Each value is between 0 and 1, for example 0.22,0.23,0.25,...
x = rand(20,1);
x = sort(x);
Now I would like to choose one data point but not uniform at random. The data point with the lowest value should have the highest probability and the other values should have a probability proportional to the difference in function value to the lowest value.
For example, if the lowest function value is 0.22, a data point with a function value of 0.23 has a difference to the best value of 0.23 - 0.22 = 0.01 and should therefore have a probability similar to the 0.22 value. But a value of 0.3 has a difference of 0.3 - 0.22 = 0.08 and should therefore have a much smaller probability.
How can this be done?
I would leave this as a comment, but I unfortunately don't have the rep yet.
This looks interesting, and I have a few questions for you. (I will edit this answer to be an answer later.)
The data point with the lowest value should have the highest probability and the other values should have a probability proportional to the difference in function value to the lowest value.
Lets take an array of 20 items, and subtract the lowest number from the entire array. This leaves us with our smallest value (which you want to be the most probable) as 0. We need to define a function now, that goes over all of the points and integrates to 1.
I've done the following:
x = rand(20, 1);
x = sort(x);
xx = x - x(1);
I suppose at this point we can invert our answers so the lowest point is 1.
Px = 1 - xx; %For probabilities
TotalP = sum(Px);
Now we have everything we need, I think... So lets see what we can make.
P = Px/TotalP; %This will be our probability.
SanityCheck = sum(P); %Make sure that it sums up to 1.
Looks like that works, so lets make our cumulative sum array, and get an element.
PI = cumsum(P); %This will be the integral form of the probability function.
test = rand; %Create a test number so we can place it in the integral function
index = find(PI > test, 1); %This will return the first entry that is greater than our test value...
result = x(index); %And here's our value
I hope this is along what you were looking for. If not, please comment and I'll get back to you. :)
[edited to incorporate comments]

How can I detect the minimum and maximum values every 50 rows

I'm trying to detect peak values in MATLAB. I'm trying to use the findpeaks function. The problem is that my data consists of 4200 rows and I just want to detect the minimum and maximum point in every 50 rows.After I'll use this code for real time accelerometer data.
This is my code:
[peaks,peaklocations] = findpeaks( filteredX, 'minpeakdistance', 50 );
plot( x, filteredX, x( peaklocations ), peaks, 'or' )
So you want to first reshape your vector into 50 sample rows and then compute the peaks for each row.
A = randn(4200,1);
B = reshape (A,[50,size(A,1)/50]); %//which gives B the structure of 50*84 Matrix
pks=zeros(50,size(A,1)/50); %//pre-define and set to zero/NaN for stability
pklocations = zeros(50,size(A,1)/50); %//pre-define and set to zero/NaN for stability
for i = 1: size(A,1)/50
[pks(1:size(findpeaks(B(:,i)),1),i),pklocations(1:size(findpeaks(B(:,i)),1),i)] = findpeaks(B(:,i)); %//this gives you your peak, you can alter the parameters of the findpeaks function.
end
This generates 2 matrices, pklocations and pks for each of your segments. The downside ofc is that since you do not know how many peaks you will get for each segment and your matrix must have the same length of each column, so I padded it with zero, you can pad it with NaN if you want.
EDIT, since the OP is looking for only 1 maximum and 1 minimum for each 50 samples, this can easily be satisfied by the min/max function in MATLAB.
A = randn(4200,1);
B = reshape (A,[50,size(A,1)/50]); %//which gives B the structure of 50*84 Matrix
[pks,pklocations] = max(B);
[trghs,trghlocations] = min(B);
I guess alternatively, you could do a max(pks), but it is simply making it complicated.

how to Average for every 10 values of 8200x1 array in matlab

i have data in .txt format and successfully imported data to a variable V which is 8200x1 matrix. Now I need to get average for every 10 values. Can any one help me with the code?
I think you are looking for colfilt. You can take average every 10 values as: [1,...,10] then [2,...,11] then [3,...,13] etc. as follows:
a=randi(10,[8200 1]);
b=colfilt(a,[10 1],'sliding',#(x) mean(x))
If you want to average over distinct blocks of 10 values as: [1,...,10],[11,...,20] etc., then just replace 'sliding' with 'distinct'.
You can do the same operation with blockproc and nlfilter but colfilt executes faster as stated in Mathworks colfilt documentation.
If you want the average of each separate block of size 10: reshape into a 10-row matrix and then average each column:
n = 10;
result = mean(reshape(V, n, []), 1);
If you want the average on a sliding window of length 10: use convolution:
result = conv(V, ones(1,n)/n, 'valid');

determine the frequency of a number if a simulation

I have the following function:
I have to generate 2000 random numbers from this function and then make a histogram.
then I have to determine how many of them is greater that 2 with P(X>2).
this is my function:
%function [ output_args ] = Weibullverdeling( X )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
for i=1:2000
% x= rand*1000;
%x=ceil(x);
x=i;
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
and it gives me the following image:
how can I possibly make it to tell me how many values Do i Have more than 2?
Very simple:
>> Y_greater_than_2 = Y(Y>2);
>> size(Y_greater_than_2)
ans =
1 1998
So that's 1998 values out of 2000 that are greater than 2.
EDIT
If you want to find the values between two other values, say between 1 and 4, you need to do something like:
>> Y_between = Y(Y>=1 & Y<=4);
>> size(Y_between)
ans =
1 2
This is what I think:
for i=1:2000
x=rand(1);
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
U is a uniform random variable from which you can get the X. So you need to use rand function in MATLAB.
After which you implement:
size(Y(Y>2),2);
You can implement the code directly (here k is your root, n is number of data points, y is the highest number of distribution, x is smallest number of distribution and lambda the lambda in your equation):
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
result=numel(X(X>2));
Lets split it and explain it detailed:
You want the k-th root of a number:
number.^(1/k)
you want the natural logarithmic of a number:
log(number)
you want to multiply sth.:
numberA.*numberB
you want to get lets say 1000 random numbers between x and y:
(x+rand(1,1000).*(y-x))
you want to combine all of that:
x= lower_bound;
y= upper_bound;
n= No_Of_data;
lambda=wavelength; %my guess
k= No_of_the_root;
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
So you just have to insert your x,y,n,lambda and k
and then check
bigger_2 = X(X>2);
which would return only the values bigger than 2 and if you want the number of elements bigger than 2
No_bigger_2=numel(bigger_2);
I'm going to go with the assumption that what you've presented is supposed to be a random variate generation algorithm based on inversion, and that you want real-valued (not complex) solutions so you've omitted a negative sign on the logarithm. If those assumptions are correct, there's no need to simulate to get your answer.
Under the stated assumptions, your formula is the inverse of the complementary cumulative distribution function (CCDF). It's complementary because smaller values of U give larger values of X, and vice-versa. Solve the (corrected) formula for U. Using the values from your Matlab implementation:
X = 3 * (-log(U))^(6/5)
X / 3 = (-log(U))^(6/5)
-log(U) = (X / 3)^(5/6)
U = exp(-((X / 3)^(5/6)))
Since this is the CCDF, plugging in a value for X gives the probability (or proportion) of outcomes greater than X. Solving for X=2 yields 0.49, i.e., 49% of your outcomes should be greater than 2.
Make suitable adjustments if lambda is inside the radical, but the algebra leading to solution is similar. Unless I messed up my arithmetic, the proportion would then be 55.22%.
If you still are required to simulate this, knowing the analytical answer should help you confirm the correctness of your simulation.