Stochastic spread method for pairs trading by Elliot et. al (2005) - Kalman filter + EM algorithm in MATLAB, am I doing something wrong? - matlab

I am implementing the Stochastic spread method for pairs trading by Elliott et. al (2005).
The procedure consists of modeling the spread between two stocks, log(P1)-log(P2), as a mean reverting process, calibrated from market observations.
The hidden state process for the spread can be written like this:
x_{t+1} = A + Bx_t + Ce_{t+1}
The observation process is:
y_t = x_t + D*w_t
Both e_t and w_t are i.i.d. Gauusian N(0,1).
Elliott gives the Kalman filter equations in his paper, which I have implemented in my code for the updating step:
function [xt_t,st_t,xt_tm,kt,st_tm]=EMupdate(DATA_t,xt_t_m1,st_t_m1,A,B,C2,D2)
st_tm=B^2*st_t_m1+C2;
kt=st_tm/(st_tm+D2);
xt_tm=A+B*xt_t_m1;
xt_t=xt_tm+kt*(DATA_t-xt_tm);
st_t=st_tm-kt*st_tm;
where
xt_t is x_{t|t}
xt_t_m1 is x_{t-1|t-1}
xt_tm is x_{t|t-1}
st_t is s_{t|t} (the MSE, denoted as P in e.g. Hamilton (1994))
st_t_m1 is s_{t-1|t-1}
st_tm is s_{t|t-1}
kt is the kalman gain for time t
DATA_t is the observed data for time t, y_t
A, B, C2, D2 are the estimated parameters (which I have estimated using the EM algorithm in another code).
This update step is done every time a new data point arrives. I am storing all the x's, s's and k's in vectors. I am supposed to compare y_t with x_{t|t-1}, and given a large deviation of the two, a trade should be initiated. However, the two follows each other very closely, and I am unsure whether I have done something wrong:
Can someone see if I am doing wrong?
Please tell me if I should link more of my code.
UPDATE: My procedure: (P is the same as s above)
To generate the spread between two stocks, I take the difference between the log-prices: y=log(p1)-log(p2).
I set a training period of 252 days, where I estimate the initial parameters (A, B, C2 and D2) using the EM algorithm. I implement the EM algorithm using all the data for the training period; that is y(1), y(2), ..., y(252) as well as initial guesses for A, B, C2 and D2:
2a. I set x_{1|1}=y(1). Furthermore I set the MSE, P_{1|1}=D2, my initial guess for D^2.
2b. I recursively calculate Kalman filters, x_{t|t}, x_{t+1|t}, P_{t|t}, P_{t+1|t} and k_{t} for all t=1...252 (the entire training period) using my initial guesses for A, B, C2 and D2.
2c. After I have calculated the kalman filters for the entire training period, I (backward) recursively calculate Kalman smoothers for the entire training period as well: t=1...252. These are x_{t|T}, P_{t|T}, P_{t,t-1|T} and j_{t}.
I then compute the log-likelihood value and the updated values for A, B, C2 and D2. Then I repeat the steps from 1 until the log-likelihood converges and I obtain optimal values for A, B, C2 and D2.
Is it correct to calculate Kalman filters for the entire training period before starting to calculate Kalman smoothers? Or should I, for example, calculate Kalman filters up till t=2, then Kalman smoothers for T=2, then Kalman filters up till t=3, then smoothers for T=3 etc.?
Now I have values for A, B, C2 and D2 and can begin my testperiod, also 252 days. I don't update my estimates for A, B, C2 and D2, but keep them constant. For each new observation I can compute Kalman filters (the same as in 2b). Finally I can compare y(t) to x_{t|t-1} for the training period.
My results look like this:
While a paper by Chen, Ren and Lu have the following results:
NB: Not the same security... but the difference is obvious nonetheless.

It seems that either you're underestimating the noise variance from the training data, or that your training data is not stationary in the period of your training window. try to increase the noise variance and you'll see that the filter actually smooths the time series. Your current under-estimation of noise variance leads the kalman filter to "forget" the past and give your last sample high weighting.

checking it is quite easy. Increase the measurement noise/error variance (the matrix R in Kalman filter) and see how it affects the output.
If the model is no longer linear-Gaussian Kalman filter will not be optimal. However, it still should smooth your data, so start "training" it until it provides acceptable prediction.

Related

Can I use ANOVA to compare coefficients significance from three different conditions (in linear regression)?

I'm trying to compare coefficients from linear regression between three different groups (A, B, C). And I want to check if one of the groups has a higher coefficient than the others, significantly. Can I use ANOVA for this?
The data with coefficients is like the below.
Condition
IV
DV
coef
p value
A
force
moving
0.1833
0.008
B
force
moving
-0.0758
0.001
C
force
moving
0.4973
0.000
Additional Info
I used 7 points Likert Scale for the survey.
I have done linear regression with multiple IVs and a Single DV.
And the table above is the data I got from linear regression.

How does covariance matrix (P) in Kalman filter get updated in relation to measurements and state estimate?

I am in the midst of implementing a Kalman filter based AHRS in C++. There's something rather strange to me in the equations of the filter.
I can't find the part where the P (covariance) matrix is actually updated to represent uncertainty of predictions.
During the "predict" step P estimate is calculated from its previous value, A and Q. From what I understand A (system matrix) and Q (covariance of noise) are constant. Then during "Correct" P is calculated from K, H and predicted P. H (observation matrix) is constant, so the only variable that affects P is K (Kalman gain). But K is calculated from predicted P, H and R (observation noise) that are either constants or the P itself. So where is the part of the equations that makes P relate to x? To me it seems like P is recursively looping here depending only on the constants and initial value of P. This doesn't make any sense. What am I missing?
You are not missing anything.
It can come as a surprise to realise that, indeed, the state error covariance matrix (P) in a linear kalman filter does not depend on the the data (z). One way to lessen the surprise is to note what the covariance is saying: it is how uncertain you should be in the estimated state, given that the models you are using (effectively A,Q and H,R) are accurate. It is not saying: this is the uncertainty. By judicious tweaking of Q and R you could change P arbitrarily. In particular you should not interpret P as a 'quality' figure, but rather look at the observation residuals. You could, for example, make P smaller by reducing R. However then the residuals would be larger compared with their computed sds.
When the observations come in at a constant rate, and always the same set of observations, P will tend to a steady state that could, in principal, be computed ahead of time.
However there is no difficulty in applying the kalman filter when you have varying times between observations and varying sets of observations at each time, for example if you have various sensor systems with different sampling periods. In this case you will see more variation in P, though again in principal this could be computed ahead of time.
Further the kalman filter can be extended (in various ways, eg the extended kalman filter and the unscented kalman filter) to handle non linear dynamics and non linear observations. In this case because the transition matrix (A) and the observation model matrix (H) have a state dependency, so too will P.

Identify parameters of ARIMA model

I am trying to build ARIMA model, I have 144 terms in my standardized time series, which represent residuals form original time series. This residuals, on which I would like to build ARIMA model, are obtained when I subtracted linear trend and periodical component from original time series, so residuals are stochastic component.
Because of that subtraction I modeled residuals like stationary series (d=0), so model is ARIMA(p,d,q)=ARIMA(?,0,?).
ACF and PACF functions of my residuals are not very clear as cases in literature for identification ARIMA models, and when I choose parameters p and q according to criteria that they are last values outside of confidence interval, I got values p=109, q=97. Matlab gave me error for this case:
Error using arima/estimate (line 386)
Input response series has an insufficient number of observations.
On the other side, when I am looking only to N/4 length of time series for identifying p and q parameters, I got p=36, q=34. Matlab gave me error for this case
Warning: Nonlinear inequality constraints are active; standard errors may be inaccurate.
In arima.estimate at 1113
Error using arima/validateModel (line 1306)
The non-seasonal autoregressive polynomial is unstable.
Error in arima/setLagOp (line 391)
Mdl = validateModel(Mdl);
Error in arima/estimate (line 1181)
Mdl = setLagOp(Mdl, 'AR' , LagOp([1 -coefficients(iAR)' ], 'Lags', [0 LagsAR ]));
How do I need to correct identify p and q parameters and what is wrong here? And wwhat does it mean in this partial autocorrelation diagram, why are last values so big?
This guide contains a lot of useful information about the correct estimation of ARIMA p and q parameters.
As long as I can remember from my studies, since ACF tails off after lag q - p and PACF tails off after lag p - q, the correct identification of p and q orders is not always straightforward and even the best practices provided by the above guide could not be enough to point you to the right direction.
Usually, a failproof approach is to apply an information criteria (like AIC, BIC or FPE) to several models with different orders of p and q. The model that presents the smallest value of the criterion is the best one. Let's say your maximum q and p desired order is 6 an that k is the number of observations, you could proceed as follows:
ll = zeros(6);
pq = zeros(6);
for p = 1:6
for q = 1:6
mod = arima(p,0,q);
[fit,~,fit_ll] = estimate(mod,Y,'print',false);
ll(p,q) = fit_ll;
pq(p,q) = p + q;
end
end
ll = reshape(ll,36,1);
pq = reshape(pq,36,1);
[~,bic] = aicbic(ll,pq+1,k);
bic = reshape(bic,6,6);
Once this is done, use the indices returned by the min function in order to find the optimal q and p orders.
On a side note, for what concerns your errors... well, the first one is pretty straightforward and is self-explanatory. The second one basically means that a correct model estimation is not possible.

sequence prediction using HMM Matlab

I'm currently learning the murphyk's toolbox for Hidden Markov's Model, However I'v a problem of determining my model's coefficients and also the algorithm for the sequence prediction by log likelihood.
My Scenario:
I have the flying bird's trajectory in 3D-space i.e its X,Y and Z which lies in Continuous HMM's category. I'v the 200 observations of flying bird i.e 500 rows data of trajectory, and I want to predict the sequence. I want to sample that in 20 datapoints . i.e after 10 points, so my first question is, Is following parameters are valid for my case?
O = 3; %Number of coefficients in a vector
T = 20; %Number of vectors in a sequence
nex = 50; %Number of sequences
M = 2; %Number of mixtures
Q = 20; %Number of states
And the second question is, What algorithm is appropriate for sequence prediction and is training is compulsory for that?
From what I understand, I'm assuming you're training 200 different classes (HMMs) and each class has 500 training examples (observation sequences).
O is the dimensionality of vectors, seems to be correct.
There is no need to have a fixed T, it depends on the observation sequences you have.
M is the number of multivariate Gaussians (or mixtures) in the GMM of a state. More will fit to your data better and give you better accuracy, but at the cost of performance. Choose a suitable value.
N does not need to be equal to T. For the best number of states N, you'll have to benchmark and see yourself:
Determinig the number of hidden states in a Hidden Markov Model
Yes, you have to train your classes using the Baum-Welch algorithm, optionally preceded by something like the segmental k-means procedure. After that you can easily perform isolated unit recognition using Forward/Backward probability or Viterbi probability by simply selecting the class with the highest probability.

MATLAB: IIR Filter coefficients

I am fairly new in signal processing, and one of my projects is to implement a C++ filter class. I need the higher order coefficients of typical filters such as Chebyshev types I and II, Butterworth, Elliptic, and unfortunately, most of the coefficient tables in the net only lists up to 10th order max. I decided to use MATLAB to generate these filters and get their higher order coefficients, however one thing that I'm confused about is that they only give out 1 set of coefficients, which I assume to be analogous to saying (ao,a1,a2.....an).
I learned that IIR filters have 2 sets of coefficients, usually expressed as a0,a1...an and b0,b1,...,bn. Here is my MATLAB code to generate these coefs and export them to an excel file:
%Chebyshev Filter Coefficients
filename = 'cheby2coefs.xlsx';
for Order = 1:64
fprintf('This is');
disp(Order);
fprintf('coefficients');
[i,j] = cheby2(Order, 20, 300/500);
disp([i,j]);
fprintf('\n');
xlswrite(filename,[i,j]',Order);
end
So far there has been little sources on the net on how to come up with these coefficients with MATLAB, so I'm having a hard time. My question is that, how exactly does one produce the IIR coefficients for these filters (Assuming they're IIR)?
It looks like you're on the right track. Your call to cheby2 is missing the Wst parameter (the stopband frequency). You should read MATLAB's official documentation for this command and verify your call.
Also, don't name the output variables i and j, it's bad practice. i and j are reserved names for the sqrt(-1) imaginary number. Name the output variables b and a at least.
Once you're done with Chebyshev, use butter and ellip for the Butterworth and elliptic filters, respectively.
This seems to be covered in the MATLAB documentation:
[b,a] = cheby2(n,R,Wst) designs an order n lowpass digital Chebyshev Type II filter with normalized stopband edge frequency Wst and stopband ripple R dB down from the peak passband value. It returns the filter coefficients in the length n+1 row vectors b and a, with coefficients in descending powers of z.