I have a series of numbers. I calculated the "auto-regression" between them using Yule-Walker method.
But now how do I extend the series?
Whole working is as follows:
a) the series I use:
143.85 141.95 141.45 142.30 140.60 140.00 138.40 137.10 138.90 139.85 138.75 139.85 141.30 139.45 140.15 140.80 142.50 143.00 142.35 143.00 142.55 140.50 141.25 140.55 141.45 142.05
b) this data is loaded in to data using:
data = load('c:\\input.txt', '-ascii');
c) the calculation of the coefficients:
ar_coeffs = aryule(data,9);
this gives:
ar_coeffs =
1.0000 -0.9687 -0.0033 -0.0103 0.0137 -0.0129 0.0086 0.0029 -0.0149 0.0310
d) Now using this, how do I calculate the next number in the series?
[any other method of doing this (except using aryule()) is also fine... this is what I did, if you have a better idea, please let me know!]
For a real valued sequence x of length N, and a positive order p:
coeff = aryule(x, p)
returns the AR coefficients of order p of the data x (Note that coeff(1) is a normalizing factor). In other words it models values as a linear combination of the past p values. So to predict the next value, we use the last p values as:
x(N+1) = sum_[k=0:p] ( coeff(k)*x(N-k) )
or in actual MATLAB code:
p = 9;
data = [...]; % the seq you gave
coeffs = aryule(data, p);
nextValue = -coeffs(2:end) * data(end:-1:end-p+1)';
EDIT: If you have access to System Identification Toolbox, then you can use any of a number of functions to estimate AR/ARMAX models (ar/arx/armax) (or even find the order of AR model using selstruc):
m = ar(data, p, 'yw'); % yw for Yule-Walker method
pred = predict(m, data, 1);
coeffs = m.a;
nextValue = pred(end);
subplot(121), plot(data)
subplot(122), plot( cell2mat(pred) )
Your data has a non-zero mean. Doesn't the Yule-Walker model assume the data is the output of a linear filter excited by a zero-mean white noise process?
If you remove the mean, this example using ARYULE and LPC might be what you're looking for. The procedure boils down to:
a = lpc(data,9); % uses Yule-Walker modeling
pred = filter(-a(2:end),1,data);
disp(pred(end)); % the predicted value at time N+1
Related
I'm trying to verify if my implementation of Logistic Regression in Matlab is good. I'm doing so by comparing the results I get via my implementation with the results given by the built-in function mnrfit.
The dataset D,Y that I have is such that each row of D is an observation in R^2 and the labels in Y are either 0 or 1. Thus, D is a matrix of size (n,2), and Y is a vector of size (n,1)
Here's how I do my implementation:
I first normalize my data and augment it to include the offset :
d = 2; %dimension of data
M = mean(D) ;
centered = D-repmat(M,n,1) ;
devs = sqrt(sum(centered.^2)) ;
normalized = centered./repmat(devs,n,1) ;
X = [normalized,ones(n,1)];
I will be doing my calculations on X.
Second, I define the gradient and hessian of the likelihood of Y|X:
function grad = gradient(w)
grad = zeros(1,d+1) ;
for i=1:n
grad = grad + (Y(i)-sigma(w'*X(i,:)'))*X(i,:) ;
end
end
function hess = hessian(w)
hess = zeros(d+1,d+1) ;
for i=1:n
hess = hess - sigma(w'*X(i,:)')*sigma(-w'*X(i,:)')*X(i,:)'*X(i,:) ;
end
end
with sigma being a Matlab function encoding the sigmoid function z-->1/(1+exp(-z)).
Third, I run the Newton algorithm on gradient to find the roots of the gradient of the likelihood. I implemented it myself. It behaves as expected as the norm of the difference between the iterates goes to 0. I wrote it based on this script.
I verified that the gradient at the wOPT returned by my Newton implementation is null:
gradient(wOP)
ans =
1.0e-15 *
0.0139 -0.0021 0.2290
and that the hessian has strictly negative eigenvalues
eig(hessian(wOPT))
ans =
-7.5459
-0.0027
-0.0194
Here's the wOPT I get with my implementation:
wOPT =
-110.8873
28.9114
1.3706
the offset being the last element. In order to plot the decision line, I should convert the slope wOPT(1:2) using M and devs. So I set :
my_offset = wOPT(end);
my_slope = wOPT(1:d)'.*devs + M ;
and I get:
my_slope =
1.0e+03 *
-7.2109 0.8166
my_offset =
1.3706
Now, when I run B=mnrfit(D,Y+1), I get
B =
-1.3496
1.7052
-1.0238
The offset is stored in B(1).
I get very different values. I would like to know what I am doing wrong. I have some doubt about the normalization and 'un-normalization' process. But I'm not sure, may be I'm doing something else wrong.
Additional Info
When I tape :
B=mnrfit(normalized,Y+1)
I get
-1.3706
110.8873
-28.9114
which is a rearranged version of the opposite of my wOPT. It contains exactly the same elements.
It seems likely that my scaling back of the learnt parameters is wrong. Otherwise, it would have given the same as B=mnrfit(D,Y+1)
I am fitting an exponential decay function with lsqvurcefit in Matlab. To do this I first normalize my data because they differ several orders of magnitude. However Im not sure how to denormalize my fitted parameters.
My fitting model is s = O + A * exp(-t/T) where t and s are known and t is in the order of 10^-3 and s in the order of 10^5. So I subtract from them their mean and divide them by their standarddeviation. My goal is to find the best A, O and T that at the given times t will result most near s. However I dont know how to denormalize my resulting A O and T.
Might somebody know how to do this? I only found this question on SO about normalisation, but does not really address the same problem.
When you normalize, you must record the means and standard deviations for each of your featuers. Then you can easily use those values to denormalize.
e.g.
A = [1 4 7 2 9]';
B = 100 475 989 177 399]';
So you could just normalize right away:
An = (A - mean(A)) / std(A)
but then you can't get back to the original A. So first save the means and stds.
Am = mean(A); Bm = mean(B);
As = std(A); Bs = std(B);
An = (A - Am)/As;
Bn = (B - Bm)/Bs;
now do whatever processing you want and then to denormalize:
Ad = An*As + Am;
Bd = Bn*Bs + Bm;
I'm sure you can see that that's going to be an issue if you have a lot of features (i.e. you have to type code out for each feature, what a mission!) so lets assume your data is arranged as a matrix, data, where each sample is a row and each column is a feature. Now you can do it like this:
data = [A, B]
means = mean(data);
stds = std(data);
datanorm = bsxfun(#rdivide, bsxfun(#minus, data, means), stds);
%// Do processing on datanorm
datadenorm = bsxfun(#plus, bsxfun(#times, datanorm, stds), means);
EDIT:
After you have fit your model parameters (A,O and T) using normalized t and f then your model will expect normalized inputs and produce normalized outputs. So to use it you should first normalize t and then denormalize f.
So to find a new f by running the model on a normalized new t. So f(tn) where tn = (t - tm)/ts and tm is the mean of your training (or fitting) t set and ts the std. Then to get your correct magnitude f you must denormalize only f, so the full solution would be
f(tn)*fs + fm
So once again, all you need to do is save the mean and std you used to normalize.
I'm completely lost at this using MATLAB functions, so here is the case:
lets assume I have SUM=0, and
I have a constant probability P that the user gives me, and I have to compare this constant P, with other M (also user gives M) random probabilities, if P is larger I add 1 to SUM, if P is smaller I add -1 to SUM... and at the end I want print on the screen the graph of the process.
I managed till now to make only one stage with this code:
function [result] = ex1(p)
if (rand>=p) result=1;
else result=-1;
end
(its like M=1)
How do You suggest I can modify this code in order to make it work the way I described it before (including getting a graph) ?
Or maybe I'm getting the logic wrong? the question says I get 1 with probability P, and -1 with probability (1-P), and the SUM is the same
Many thanks
I'm not sure how you achieve your input, but this should get you on the way:
p = 0.5; % Constant probability
m = 10;
randoms = rand(m,1) % Random probabilities
results = ones(m,1);
idx = find(randoms < p)
results(idx) = -1;
plot(cumsum(results))
For m = 1000:
You can do it like this:
p = 0.25; % example data
M = 20; % example data
random = rand(M,1); % generate values
y = cumsum(2*(random>=p)-1); % compute cumulative sum of +1/-1
plot(y) % do the plot
The important function here is cumsum, which does the cumulative sum on the sequence of +1/-1 values generated by 2*(random>=p)-1.
Example graph with p=0.5, M=2000:
The problem says:
Three tensile tests were carried out on an aluminum bar. In each test the strain was measured at the same values of stress. The results were
where the units of strain are mm/m.Use linear regression to estimate the modulus of elasticity of the bar (modulus of elasticity = stress/strain).
I used this program for this problem:
function coeff = polynFit(xData,yData,m)
% Returns the coefficients of the polynomial
% a(1)*x^(m-1) + a(2)*x^(m-2) + ... + a(m)
% that fits the data points in the least squares sense.
% USAGE: coeff = polynFit(xData,yData,m)
% xData = x-coordinates of data points.
% yData = y-coordinates of data points.
A = zeros(m); b = zeros(m,1); s = zeros(2*m-1,1);
for i = 1:length(xData)
temp = yData(i);
for j = 1:m
b(j) = b(j) + temp;
temp = temp*xData(i);
end
temp = 1;
for j = 1:2*m-1
s(j) = s(j) + temp;
temp = temp*xData(i);
end
end
for i = 1:m
for j = 1:m
A(i,j) = s(i+j-1);
end
end
% Rearrange coefficients so that coefficient
% of x^(m-1) is first
coeff = flipdim(gaussPiv(A,b),1);
The problem is solved without a program as follows
MY ATTEMPT
T=[34.5,69,103.5,138];
D1=[.46,.95,1.48,1.93];
D2=[.34,1.02,1.51,2.09];
D3=[.73,1.1,1.62,2.12];
Mod1=T./D1;
Mod2=T./D2;
Mod3=T./D3;
xData=T;
yData1=Mod1;
yData2=Mod2;
yData3=Mod3;
coeff1 = polynFit(xData,yData1,2);
coeff2 = polynFit(xData,yData2,2);
coeff3 = polynFit(xData,yData3,2);
x1=(0:.5:190);
y1=coeff1(2)+coeff1(1)*x1;
subplot(1,3,1);
plot(x1,y1,xData,yData1,'o');
y2=coeff2(2)+coeff2(1)*x1;
subplot(1,3,2);
plot(x1,y2,xData,yData2,'o');
y3=coeff3(2)+coeff3(1)*x1;
subplot(1,3,3);
plot(x1,y3,xData,yData3,'o');
What do I have to do to get this result?
As a general advice:
avoid for loops wherever possible.
avoid using i and j as variable names, as they are Matlab built-in names for the imaginary unit (I really hope that disappears in a future release...)
Due to m being an interpreted language, for-loops can be very slow compared to their compiled alternatives. Matlab is named MATtrix LABoratory, meaning it is highly optimized for matrix/array operations. Usually, when there is an operation that cannot be done without a loop, Matlab has a built-in function for it that runs way way faster than a for-loop in Matlab ever will. For example: computing the mean of elements in an array: mean(x). The sum of all elements in an array: sum(x). The standard deviation of elements in an array: std(x). etc. Matlab's power comes from these built-in functions.
So, your problem. You have a linear regression problem. The easiest way in Matlab to solve this problem is this:
%# your data
stress = [ %# in Pa
34.5 69 103.5 138] * 1e6;
strain = [ %# in m/m
0.46 0.95 1.48 1.93
0.34 1.02 1.51 2.09
0.73 1.10 1.62 2.12]' * 1e-3;
%# make linear array for the data
yy = strain(:);
xx = repmat(stress(:), size(strain,2),1);
%# re-formulate the problem into linear system Ax = b
A = [xx ones(size(xx))];
b = yy;
%# solve the linear system
x = A\b;
%# modulus of elasticity is coefficient
%# NOTE: y-offset is relatively small and can be ignored)
E = 1/x(1)
What you did in the function polynFit is done by A\b, but the \-operator is capable of doing it way faster, way more robust and way more flexible than what you tried to do yourself. I'm not saying you shouldn't try to make these thing yourself (please keep on doing that, you learn a lot from it!), I'm saying that for the "real" results, always use the \-operator (and check your own results against it as well).
The backslash operator (type help \ on the command prompt) is extremely useful in many situations, and I advise you learn it and learn it well.
I leave you with this: here's how I would write your polynFit function:
function coeff = polynFit(X,Y,m)
if numel(X) ~= numel(X)
error('polynFit:size_mismathc',...
'number of elements in matrices X and Y must be equal.');
end
%# bad condition number, rank errors, etc. taken care of by \
coeff = bsxfun(#power, X(:), m:-1:0) \ Y(:);
end
I leave it up to you to figure out how this works.
Is there some simple way of calculating of p-value of t-Test in MATLAB.
I found something like it however I think that it does not return correct values:
Pval=2*(1-tcdf(abs(t),n-2))
I want to calculate the p-value for the test that the slope of regression is equal to 0. Therefore I calculate the Standard Error
$SE= \sqrt{\frac{\sum_{s = i-w }^{i+w}{(y_{s}-\widehat{y}s})^2}{(w-2)\sum{s=i-w}^{i+w}{(x_{s}-\bar{x}})^2}}$
where $y_s$ is the value of analyzed parameter in time period $s$,
$\widehat{y}_s$ is the estimated value of the analyzed parameter in time period $s$,
$x_i$ is the time point of the observed value of the analysed parameter,
$\bar{x}$ is the mean of time points from analysed period and then
$t_{score} = (a - a_{0})/SE$ where $a_{0}$ where $a_{0} = 0$.
I checked that p values from ttest function and the one calculated using this formula:
% Let n be your sample size
% Let v be your degrees of freedom
% Then:
pvalues = 2*(1-tcdf(abs(t),n-v))
and they are the same!
Example with Matlab demo dataset:
load accidents
x = hwydata(:,2:3);
y = hwydata(:,4);
stats = regstats(y,x,eye(size(x,2)));
fprintf('T stat using built-in function: \t %.4f\n', stats.tstat.t);
fprintf('P value using built-in function: \t %.4f\n', stats.tstat.pval);
fprintf('\n\n');
n = size(x,1);
v = size(x,2);
b = x\y;
se = diag(sqrt(sumsqr(y-x*b)/(n-v)*inv(x'*x)));
t = b./se;
p = 2*(1-tcdf(abs(t),n-v));
fprintf('T stat using own calculation: \t\t %.4f\n', t);
fprintf('P value using own calculation: \t\t %.4f\n', p);