Is there some simple way of calculating of p-value of t-Test in MATLAB.
I found something like it however I think that it does not return correct values:
Pval=2*(1-tcdf(abs(t),n-2))
I want to calculate the p-value for the test that the slope of regression is equal to 0. Therefore I calculate the Standard Error
$SE= \sqrt{\frac{\sum_{s = i-w }^{i+w}{(y_{s}-\widehat{y}s})^2}{(w-2)\sum{s=i-w}^{i+w}{(x_{s}-\bar{x}})^2}}$
where $y_s$ is the value of analyzed parameter in time period $s$,
$\widehat{y}_s$ is the estimated value of the analyzed parameter in time period $s$,
$x_i$ is the time point of the observed value of the analysed parameter,
$\bar{x}$ is the mean of time points from analysed period and then
$t_{score} = (a - a_{0})/SE$ where $a_{0}$ where $a_{0} = 0$.
I checked that p values from ttest function and the one calculated using this formula:
% Let n be your sample size
% Let v be your degrees of freedom
% Then:
pvalues = 2*(1-tcdf(abs(t),n-v))
and they are the same!
Example with Matlab demo dataset:
load accidents
x = hwydata(:,2:3);
y = hwydata(:,4);
stats = regstats(y,x,eye(size(x,2)));
fprintf('T stat using built-in function: \t %.4f\n', stats.tstat.t);
fprintf('P value using built-in function: \t %.4f\n', stats.tstat.pval);
fprintf('\n\n');
n = size(x,1);
v = size(x,2);
b = x\y;
se = diag(sqrt(sumsqr(y-x*b)/(n-v)*inv(x'*x)));
t = b./se;
p = 2*(1-tcdf(abs(t),n-v));
fprintf('T stat using own calculation: \t\t %.4f\n', t);
fprintf('P value using own calculation: \t\t %.4f\n', p);
Related
I have 2 vectors A and B, each of length 10,000. For each of ind=1:10000, I want to compute the Pearson's correlation of A(1:ind) and B(1:ind). When I do this in a for loop, it takes too much time. parfor does not work with more than 2 workers in my machine. Is there a way to do this operation fast and save results in a vector C (apparently of length 10,000 where the first element is NaN)? I found the question Fast rolling correlation in Matlab, but this is a little different than what I need.
You can use this method to compute cumulative correlation coefficient:
function result = cumcor(x,y)
n = reshape(1:numel(x),size(x));
sumx = cumsum(x);
sumy = cumsum(y);
sumx2 = cumsum(x.^2);
sumy2 = cumsum(y.^2);
sumxy = cumsum(x.*y);
result = (n.*sumxy-sumx.*sumy)./(sqrt((sumx.^2-n.*sumx2).*(sumy.^2-n.*sumy2)));
end
Solution
I suggest the following approach:
Pearson correlation can be calculated by using the following formula:
calculating the accumulative mean of each of the random variabiles above efficiently is realtively easy
(X, Y, XY, X^2, Y^2).
given the accumulative mean calculated in 2, we can calculate the accumulative std of X and Y.
given the accumulative std of X,Y and accumulative mean above, we can calculate the accumulative pearson coefficient.
Code
%defines inputs
N = 10000;
X = rand(N,1);
Y = rand(N,1);
%calculates accumolative mean for X, Y, X^2, Y^2, XY
EX = accumMean(X);
EY = accumMean(Y);
EX2 = accumMean(X.^2);
EY2 = accumMean(Y.^2);
EXY = accumMean(X.*Y);
%calculates accumolative pearson correlation
accumPearson = zeros(N,1);
for ii=2:N
stdX = (EX2(ii)-EX(ii)^2).^0.5;
stdY = (EY2(ii)-EY(ii)^2).^0.5;
accumPearson(ii) = (EXY(ii)-EX(ii)*EY(ii))/(stdX*stdY);
end
%accumulative mean function, to be defined in an additional m file.
function [ accumMean ] = accumMean( vec )
accumMean = zeros(size(vec));
accumMean(1) = vec(1);
for ii=2:length(vec)
accumMean(ii) = (accumMean(ii-1)*(ii-1) +vec(ii))/ii;
end
end
Runtime
for N=10000:
Elapsed time is 0.002096 seconds.
for N=1000000:
Elapsed time is 0.240669 seconds.
Correctness
Testing the correctness of the code above could be done by calculative the accumulative pearson coefficient by corr function, and comparing it to the result given from the code above:
%ground truth for correctness comparison
gt = zeros(N,1)
for z=1:N
gt(z) = corr(X(1:z),Y(1:z));
end
Unfortunately, I dont have the Statistics and Machine Learning Toolbox, so I cant make this check.
I do think that it is a good start though, and you can continue from here :)
while using the lsim command of matlab I found out that the initial condition in my program doesn't affect the simulation's output.
y = lsim(F,input,time,x0);
Where F is a transfer function, and x0 the initial condition that I calculate with the state-space model.
The value of x0 doesn't affect y, I even replaced it with different numbers and the simulation's output y is always the same.
I'm actually trying to get the parameters of the tf from a real measured output. So that's the main part of the code:
tsmpl = 1e-2;
Sizeof_y = length(y_real);
R = zeros(Sizeof_y,3);
R(1,:) = [y_real(1) input(1) 0];
for i=2:Sizeof_y
R(i,:)=[y_real(i-1) input(i) input(i-1)];
end
p = pinv(R)*y_real;
z = tf('z',tsmpl);
num = p(2)*z+p(3);
den = z-p(1);
F = num/den;
sys = ss(F);
x0 = (y_real(1) - (sys.d*sig(1)))*pinv(sys.c);
y = lsim(F,input,time,x0);
y_real is the measured output. It's a vector of complexe numbers.
time is the time vector that represents the duration of the process. (given by the measurment) time = 6:0.01:24
input represents the test signal which is a vector defined like this:
Size_input = length(time);
Size_sine = length(halftime) ; %halftime is the duration of the exitation also known from the measurment
input = zeros(Size_input,1);
input(1:Size_sine) = complex(30);
Vectors y_real, time, and input have the same length.
I would be thankful for any idea :))
I have a stock index for which I need to compute the VaR using a MC simulation with the geometric brownian motion model as the stochastic process. This is my first try, disregarding the gbm, just to get familiar with the program and syntax:
x=logreturn; %file with 500 returns
mu=mean(x),
sigma=sigma(x),
rand=normrnd(mu,sigma,2000,1); %random normal distr numbers
VaR=quantile(rand,0.05); %95 percent VaR
VaR=-0.045
It's very basic and calculates the VaR only for one day, but I need to calculate the VaR for the first 250 days with a 250-day rolling window for mu and sigma.
Based on Ahmed's comment I tried to implement arrayfun and cellfun for the full function:
x=logreturn;
mu=movmean(x,250);
sigma=movstd(x,250);
mydata = normrnd(0,1,1,20000)
muforsample=arrayfun(#(v) mu*250, mu, 'un', false);
sigmaforsample=arrayfun(#(v) sigma*sqrt(250), sigma, 'un', false);
k=arrayfun(#(v) muforsample-(sigmaforsample.^2)/2, muforsample, sigmaforsample, 'un', false); %this line is faulty and gives an error message (too many input arguments)
t=1/504;
sqrtt=sqrt(t);
gbm=k*t+sigmaforsample*sqrtt; %didnt't try to fix this since I don't have a k
VaR=quantile(gbm, 0.05) %unchanged, needs cellfun too, right?
For k I need basically a value based on the corresponding muforsample and sigmaforsample so that I can calculate the gbm
Using a for loop is definitely easier than cellfun and arrayfun. Here's my solution:
% file with log returns until t(n-1)
x = logreturns;
t = 1/250;
% mu and sigma based on 250 returns, moving window
mu = movmean(x,250);
sigma = movstd(x,250);
% k for every cell
k = zeros(size(mu));
for i = 1:length(mu);
k(i) = mu(i)*250 - (((sigma(i)*sqrt(250)).^2)/2);
end
% 0.99 or 0.95 VaR, quantiles based on 500,000 values
VaR = zeros(size(mu));
for j = 1:length(mu)
VaR(j) = quantile(normrnd(0,1,1,500000) * sqrt(t) * ...
(sigma(j) * sqrt(250)) + k(j) * t, 0.01);
end
This is the formula by the way:
I'm completely lost at this using MATLAB functions, so here is the case:
lets assume I have SUM=0, and
I have a constant probability P that the user gives me, and I have to compare this constant P, with other M (also user gives M) random probabilities, if P is larger I add 1 to SUM, if P is smaller I add -1 to SUM... and at the end I want print on the screen the graph of the process.
I managed till now to make only one stage with this code:
function [result] = ex1(p)
if (rand>=p) result=1;
else result=-1;
end
(its like M=1)
How do You suggest I can modify this code in order to make it work the way I described it before (including getting a graph) ?
Or maybe I'm getting the logic wrong? the question says I get 1 with probability P, and -1 with probability (1-P), and the SUM is the same
Many thanks
I'm not sure how you achieve your input, but this should get you on the way:
p = 0.5; % Constant probability
m = 10;
randoms = rand(m,1) % Random probabilities
results = ones(m,1);
idx = find(randoms < p)
results(idx) = -1;
plot(cumsum(results))
For m = 1000:
You can do it like this:
p = 0.25; % example data
M = 20; % example data
random = rand(M,1); % generate values
y = cumsum(2*(random>=p)-1); % compute cumulative sum of +1/-1
plot(y) % do the plot
The important function here is cumsum, which does the cumulative sum on the sequence of +1/-1 values generated by 2*(random>=p)-1.
Example graph with p=0.5, M=2000:
I have a series of numbers. I calculated the "auto-regression" between them using Yule-Walker method.
But now how do I extend the series?
Whole working is as follows:
a) the series I use:
143.85 141.95 141.45 142.30 140.60 140.00 138.40 137.10 138.90 139.85 138.75 139.85 141.30 139.45 140.15 140.80 142.50 143.00 142.35 143.00 142.55 140.50 141.25 140.55 141.45 142.05
b) this data is loaded in to data using:
data = load('c:\\input.txt', '-ascii');
c) the calculation of the coefficients:
ar_coeffs = aryule(data,9);
this gives:
ar_coeffs =
1.0000 -0.9687 -0.0033 -0.0103 0.0137 -0.0129 0.0086 0.0029 -0.0149 0.0310
d) Now using this, how do I calculate the next number in the series?
[any other method of doing this (except using aryule()) is also fine... this is what I did, if you have a better idea, please let me know!]
For a real valued sequence x of length N, and a positive order p:
coeff = aryule(x, p)
returns the AR coefficients of order p of the data x (Note that coeff(1) is a normalizing factor). In other words it models values as a linear combination of the past p values. So to predict the next value, we use the last p values as:
x(N+1) = sum_[k=0:p] ( coeff(k)*x(N-k) )
or in actual MATLAB code:
p = 9;
data = [...]; % the seq you gave
coeffs = aryule(data, p);
nextValue = -coeffs(2:end) * data(end:-1:end-p+1)';
EDIT: If you have access to System Identification Toolbox, then you can use any of a number of functions to estimate AR/ARMAX models (ar/arx/armax) (or even find the order of AR model using selstruc):
m = ar(data, p, 'yw'); % yw for Yule-Walker method
pred = predict(m, data, 1);
coeffs = m.a;
nextValue = pred(end);
subplot(121), plot(data)
subplot(122), plot( cell2mat(pred) )
Your data has a non-zero mean. Doesn't the Yule-Walker model assume the data is the output of a linear filter excited by a zero-mean white noise process?
If you remove the mean, this example using ARYULE and LPC might be what you're looking for. The procedure boils down to:
a = lpc(data,9); % uses Yule-Walker modeling
pred = filter(-a(2:end),1,data);
disp(pred(end)); % the predicted value at time N+1