ARMA model forecast with Matlab - matlab

By using the data of the first sub-sample, identify a set of at least two ARMA models that
provide a good representation of the time-series. Compute the in-sample forecasts corresponding to a time horizon of 3 months (h = 1), 6 months (h = 2), 12 months (h = 4).
The data are from https://fred.stlouisfed.org/ the consumer price index of Italy and computing the infalction rate (data has to be taken quarterly) by the following command:
T = length(cpi);
time = table2array(table(:,1));
rate = nan(1,T-1);
for t = 2:T
rate(t) = (cpi(t)-cpi(t-1))/cpi(t-1);
end
rate(:,1) = [];

Related

Getting the correct output units from the PLOMB (Lomb-scargle periodogram) function

I am trying to analyze timeseries of wheel turns that were sampled at 1 minute intervals for 10 days. t is a 1 x 14000 array that goes from .1666 hours to 240 hours. analysis.timeseries.(grp).(chs) is a 1 x 14000 array for each of my groups of interest and their specific channels that specifize activity at each minute sampled. I'm interested in collecting the maximum power and the frequency it occurs at. My problem is I'm not sure what units f is coming out in. I would like to have it return in cycles per hour and span to a maximum period of 30 hours. I tried to use the Galileo example in the documentation as a guide, but it didn't seem to work.
Below is my code:
groups = {'GFF' 'GMF' 'SFF' 'SMF'};
chgroups = {chnamesGF chnamesGM chnamesSF chnamesSM};
t1 = (t * 3600); %matlab treats this as seconds so convert it to an hour form
onehour = seconds(hours(1));
for i = 1:4
grp = groups{1,i};
chn = chgroups{1,i};
for channel = 1:length(chn)
chs = chn{channel,1};
[pxx,f]= plomb(analysis.timeseries.(grp).(chs),t, 30/onehour,'normalized');
analysis.pxx.(grp).(chs) = pxx;
analysis.f.(grp).(chs) = f;
analysis.lsp.power.(grp).(chs) = max(pxx);
[row,col,v] = find(analysis.pxx.(grp).(chs) == analysis.lsp.power.(grp).(chs));
analysis.lsp.tau.(grp).(chs) = analysis.f.(grp).(chs)(row);
end
end
Not really an answer but it is hard to put a image in a comment.
Judging by this (plomb manual matlab),
I think that pxx is without dimension as for f is is the frequency so 1/(dimension of t) dimension. If your t is in hours I would say h^-1.
So I'd rather say try
[pxx,f]= plomb(analysis.timeseries.(grp).(chs),t*30.0/onehour,'normalized');

Matrix For Arrival Process of Data Packets for Different Time Slots in Matlab

I want to define a matrix (ag,t) that can take the values of an arrival process that follows Binomial Distribution. Its values should be different for each timeslot. I am not sure how to follow the problem and would really appreciate help in this regards. The attached picture explains the query.
This is what I have tried but I don't know how to define it such that it different for each time slot.
cluster_nbrs = 3;
G = zeros(1,cluster_nbrs);
p = 0.97; % Probability of packet being transferred
n = 1000; % # of time slots
arrival_gt = zeros(1,cluster_nbrs); % Arrival Process during slot t
for q = 1:cluster_nbrs
% Probability that arrival = n
arrival_gt(q) = binopdf(G(q),n,p);
end

Which Function in Matlab Should I use to Validate a Model forecast() or predict()?

I have used two types of models for modeling a SISO system with a time series data. The first is ARIMAx and the second one the Output-Error. Now, I should know which of the two performs best in forecasting the output giving the input in certain horizon, 15 days in my case, and only the necessary observed outputs for the model initialize properly. In Matlab, it is presented two functions in that seems to be used to validate models the forecast() and predict(). I have been reading the difference between predicting and forecasting and apparently people misconfuse a lot the two terms. I would like to know which of the two I should use to validate a model and choose the best one. The main point is that I have to test the model's performance for many horizons. In other words, how the model performs to forecast on the first day ahead, on the second day ahead until the 15th day ahead. I wrote the following code as an example:
close all
clear all
tic;
uhe = {'furnas'};
% Set the structures to be evaluated in ARMAx model
na = 10;
nb = 2;
nc = 1;
nk = 2;
% Set the structures to be evaluated in OE model
nbb = 10;
nf = 6;
nkk = 0;
u = 1;
% Read training dataset file and set iddata definitions
data_train = importdata(strcat('train_',uhe{u},'.dat'));
data_test = importdata(strcat('test_',uhe{u},'.txt'));
data_valid = importdata(strcat('valid_',uhe{u},'.txt'));
data_complet = vertcat(data_train, data_valid, data_test);
data_complet = iddata(data_complet(:,2),data_complet(:,1));
data_complet.TimeUnit = 'days';
data_complet.InputName = 'Chuva';
data_complet.OutputName = 'Vazão';
data_complet.InputUnit = 'm³/s';
data_complet.OutputUnit = 'm³/s';
data_complet.Name = 'Sistema Chuva-Vazão';
data_train = iddata(data_train(:,2),data_train(:,1));
data_train.TimeUnit = 'days';
data_train.InputName = 'Chuva';
data_train.OutputName = 'Vazão';
data_train.InputUnit = 'm³/s';
data_train.OutputUnit = 'm³/s';
data_train.Name = 'Sistema Chuva-Vazão';
data_valid = iddata(data_valid(:,2),data_valid(:,1));
data_valid.TimeUnit = 'days';
data_valid.InputName = 'Chuva';
data_valid.OutputName = 'Vazão';
data_valid.InputUnit = 'm³/s';
data_valid.OutputUnit = 'm³/s';
data_valid.Name = 'Sistema Chuva-Vazão';
data_test = iddata(data_test(:,2),data_test(:,1));
data_test.TimeUnit = 'days';
data_test.InputName = 'Chuva';
data_test.OutputName = 'Vazão';
data_test.InputUnit = 'm³/s';
data_test.OutputUnit = 'm³/s';
data_test.Name = 'Sistema Chuva-Vazão';
% Modeling training dataset with ARMAx
models_train_armax = armax(data_train,[na nb nc nk]);
% Modeling training dataset with OE
models_train_oe = oe(data_train,[nbb nf nkk]);
% Evalutaing the validation dataset ARMAX
x0 = findstates(models_train_armax,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_armax=idss(models_train_armax);
models_valid_armax = sim(ssmodel_armax,data_valid,OPT);
% Evaluating the validation dataset OE
x0 = findstates(models_train_oe,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_oe=idss(models_train_oe);
models_valid_oe = sim(ssmodel_oe,data_valid,OPT);
% Predicting Horizon
hz = 20;
% Applying predict function
opt = predictOptions('InitialCondition','e');
[y_armax_pred] = predict(ssmodel_armax,data_valid(1:end),hz,opt);
[y_oe_pred] = predict(ssmodel_oe,data_valid(1:end),hz,opt);
% Applying forecast function
opt = forecastOptions('InitialCondition','e');
[y_armax_fc] = forecast(ssmodel_armax,data_train((end-max([na nb nc nk])):end),hz,data_test.u(1:hz),opt);
[y_oe_fc] = forecast(ssmodel_oe,data_train((end-max([nbb nf nkk])):end),hz,data_test(1:hz),opt);
Depends on how you are trying to validate the model. Generally you would use the predict command as you would want backtest against previous data.
Alternatively you could use forecast if you have a cross-validation/holdout sample and you would like to test against that
Matlab's help has an interesting line regarding the difference between forecast and predict
forecast performs prediction into the future, in a time range beyond the last instant of measured data. In contrast, the predict command predicts the response of an identified model over the time span of measured data. Use predict to determine if the predicted result matches the observed response of an estimated model. If sys is a good prediction model, consider using it with forecast.
Also note that Matlab's help for predict also says that careful model validation should not use the default value of the prediction horizon.
For careful model validation, a one-step-ahead prediction (K = 1) is usually not a good test for validating the model sys over the time span of measured data. Even the trivial one step-ahead predictor, y(hat)(t)=y(t−1), can give good predictions. So a poor model may look fine for one-step-ahead prediction of data that has a small sample time. Prediction with K = Inf, which is the same as performing simulation with sim command, can lead to diverging outputs because low-frequency disturbances in the data are emphasized, especially for models with integration. Use a K value between 1 and Inf to capture the mid-frequency behavior of the measured data.

Time delay / lag estimation non periodic signals (and periodic signals)

I'm working on aligning different measurements from sensors. Some of these are periodic and I just used the maximum of the cross correlations and it worked fine. Now I have a couple of non periodic signals similar to ramp/sigmoids/step/hill functions that I want to align, but for these the cross correlation fails miserably (giving me always the maximum at lag 0).
What is the approach for these kind of signals?
Ideal the approach would work for both signals without prior knowledge which one I'm encountering.
Here is an example (with noise)
One possible approach is to take your aperiodic signal and coerce it into a periodic signal.
One way to do this is to first normalize your signal, and then append an inverted version of your signal (1 - normalizedSignal) to your signal. This makes it a periodic signal which then should be able to be fed into cross-correlation analysis relatively easily.
Here is an example I whipped up using an inverted sigmoid shifted in time.
function aperiodicxcorr()
% Time step at which to sample the sigmoid
dt = 0.1;
t = -10:dt:5;
% Artificial lags to apply to the second and third signals
actualLag2 = 3;
actualLag3 = 5;
% Now create signals that are negative sigmoids with delays
S1 = -sigmoid(t);
S2 = -sigmoid(t + actualLag2);
S3 = -sigmoid(t + actualLag3);
% Normalize each sigmal
S1 = normalize(S1);
S2 = normalize(S2);
S3 = normalize(S3);
% Concatenate the inverted signal with signal to make it periodic
S1 = cat(2, 1-S1, S1);
S2 = cat(2, 1-S2, S2);
S3 = cat(2, 1-S3, S3);
% Retrieve lag (in samples)
[corr2, lag2] = computeLag(S1, S2);
[corr3, lag3] = computeLag(S1, S3);
% Convert lags to time by multiplying by time step
lag2 = lag2 * dt;
lag3 = lag3 * dt;
fprintf('Lag of S2: %0.2f (r = %0.2f)\n', lag2, corr2);
fprintf('Lag of S3: %0.2f (r = %0.2f)\n', lag3, corr3);
end
function [corr, lag] = computeLag(A, B)
[corr, lags] = xcorr(A, B, 'coeff');
[corr, ind] = max(corr);
lag = lags(ind);
end
function data = normalize(data)
data = data - min(data(:));
data = data ./ max(data(:));
end
function S = sigmoid(t)
S = 1 ./ (1 + exp(-t));
end
The modification to the signal that I discussed, looks like this for the above code.
And the result of the fprintf statements at the bottom are:
Lag of S2: 3.00 (r = 1.00)
Lag of S3: 5.00 (r = 1.00)
And these match up with the specified lags.
The drawback of this is that it won't work for signals which are already periodic. That being said, periodicity is relatively easy to check (particularly for a normalized signal) by comparing the first and last values of your signal and ensuring that they are within a specified tolerance of one another.

MuPad in Matlab

I have a simple question want to use MuPad in Matlab to calculate it. I spent about 1 hour to calc it using my pen and paper, however it's interesting for me if it can be solved using MuPad.
I have n numbers, clustered in two groups (p and q), each of them with a mean (Mp and Mq). I have a measure called SSE (sum of square error) that calculates the sum of the squared distances between any number in a group to its mean (sum (x[i]-Mp)^2 + sum (x[j]-Mq)^2 where i loops on first group and j loops on the second). My question is about the value of the measure if I exchange the position of two records from their original group to the neighbor group ( q <= xq,xp => p ). Please note that the means of the groups are changed also after the exchange. The final formula (based on pen and paper) is as follows:
d = xq - xp
deltaSSE = SSE1 - SSE2 = d(d (np + nq)/(np nq) -2 (Mq-Mp))
where np and nq are the number of records in groups, xq and xp are the two records are considered for exchange the position, Mq and Mp are corresponding means (before exchange).
The most important problem I have with MuPad, is about the number of records in groups (it is always below 10).
Thank you for your help.
Example about the formula above: you have two groups "1 2 3" and "4 5 6". The SSE of such clustering is 1^2+0^2+1^2 + 1^2+0^2+1^2 = 4. Now I'm interested to know what is the SSE if I exchange the place of 3 and 6, without the complete calculation. based on the formula above, d=6-3=3, np=nq=3,Mp=(1+2+3)/3=2 and Mq=(4+5+6)/3=5, so deltaSSE = 3(3(3+3)/(3*3)-2(5-2))=-12, i.e the new SSE is 4+12=16. My question is about how to represent clusters of numbers without knowing the exact number of them in MuPad. The Simple form where the number of elements in groups are known, can be solved easily in MuPad.
Maybe all you need to represent a cluster of numbers is the count, mean and variance.
Mp = SUM(x{i},i=1..np)/np
Sp = (SUM(x{i}^2,i=1..np)-np*Mp^2)/(np-1)
With your example:
np = 3 nq = 3
Mp1 = (1.0+2.0+3.0)/3 = 2.0 Mq1 = (4.0+5.0+6.0)/3 = 5.0
Sp1 = ((1+2^2+3^2)-3*2^2)/(3-1)=1.0 Sq1 = ((4+5^2+6^2)-3*5^2)/(3-1)=1.0
SSE1 = (np-1)*Sp1 + (nq-1)*Sq1 = 4.0
Now to make a change between xp=3.0 and xq=6.0 you have the new quantities
d = xq - xp = 3.0
Mp2 = Mp1+d/np = 3.0
Sp2 = Sp1 + d*(2*(xp-Mp1)/(np-1)+d/np) = 7.0
Mq2 = Mq1-d/nq = 4.0
Sq2 = Sq1 + d*(2*(Mq1-xq)/(nq-1)+d/nq) = 1.0
SSE2 = (np-1)*Sp2 + (nq-1)*Sq2 = 16.0
Or with a little of algebra
SSE2 - SSE1 = 2*d*(Mq1-Mp1)-d^2/np-d^2/nq = 12.0
So to do all this, you don't need to keep track of all the numbers x{i} and x{j}, just their mean Mp & Mq and variance Sp & Sq.