I have a stock index for which I need to compute the VaR using a MC simulation with the geometric brownian motion model as the stochastic process. This is my first try, disregarding the gbm, just to get familiar with the program and syntax:
x=logreturn; %file with 500 returns
mu=mean(x),
sigma=sigma(x),
rand=normrnd(mu,sigma,2000,1); %random normal distr numbers
VaR=quantile(rand,0.05); %95 percent VaR
VaR=-0.045
It's very basic and calculates the VaR only for one day, but I need to calculate the VaR for the first 250 days with a 250-day rolling window for mu and sigma.
Based on Ahmed's comment I tried to implement arrayfun and cellfun for the full function:
x=logreturn;
mu=movmean(x,250);
sigma=movstd(x,250);
mydata = normrnd(0,1,1,20000)
muforsample=arrayfun(#(v) mu*250, mu, 'un', false);
sigmaforsample=arrayfun(#(v) sigma*sqrt(250), sigma, 'un', false);
k=arrayfun(#(v) muforsample-(sigmaforsample.^2)/2, muforsample, sigmaforsample, 'un', false); %this line is faulty and gives an error message (too many input arguments)
t=1/504;
sqrtt=sqrt(t);
gbm=k*t+sigmaforsample*sqrtt; %didnt't try to fix this since I don't have a k
VaR=quantile(gbm, 0.05) %unchanged, needs cellfun too, right?
For k I need basically a value based on the corresponding muforsample and sigmaforsample so that I can calculate the gbm
Using a for loop is definitely easier than cellfun and arrayfun. Here's my solution:
% file with log returns until t(n-1)
x = logreturns;
t = 1/250;
% mu and sigma based on 250 returns, moving window
mu = movmean(x,250);
sigma = movstd(x,250);
% k for every cell
k = zeros(size(mu));
for i = 1:length(mu);
k(i) = mu(i)*250 - (((sigma(i)*sqrt(250)).^2)/2);
end
% 0.99 or 0.95 VaR, quantiles based on 500,000 values
VaR = zeros(size(mu));
for j = 1:length(mu)
VaR(j) = quantile(normrnd(0,1,1,500000) * sqrt(t) * ...
(sigma(j) * sqrt(250)) + k(j) * t, 0.01);
end
This is the formula by the way:
Related
I have 2 vectors A and B, each of length 10,000. For each of ind=1:10000, I want to compute the Pearson's correlation of A(1:ind) and B(1:ind). When I do this in a for loop, it takes too much time. parfor does not work with more than 2 workers in my machine. Is there a way to do this operation fast and save results in a vector C (apparently of length 10,000 where the first element is NaN)? I found the question Fast rolling correlation in Matlab, but this is a little different than what I need.
You can use this method to compute cumulative correlation coefficient:
function result = cumcor(x,y)
n = reshape(1:numel(x),size(x));
sumx = cumsum(x);
sumy = cumsum(y);
sumx2 = cumsum(x.^2);
sumy2 = cumsum(y.^2);
sumxy = cumsum(x.*y);
result = (n.*sumxy-sumx.*sumy)./(sqrt((sumx.^2-n.*sumx2).*(sumy.^2-n.*sumy2)));
end
Solution
I suggest the following approach:
Pearson correlation can be calculated by using the following formula:
calculating the accumulative mean of each of the random variabiles above efficiently is realtively easy
(X, Y, XY, X^2, Y^2).
given the accumulative mean calculated in 2, we can calculate the accumulative std of X and Y.
given the accumulative std of X,Y and accumulative mean above, we can calculate the accumulative pearson coefficient.
Code
%defines inputs
N = 10000;
X = rand(N,1);
Y = rand(N,1);
%calculates accumolative mean for X, Y, X^2, Y^2, XY
EX = accumMean(X);
EY = accumMean(Y);
EX2 = accumMean(X.^2);
EY2 = accumMean(Y.^2);
EXY = accumMean(X.*Y);
%calculates accumolative pearson correlation
accumPearson = zeros(N,1);
for ii=2:N
stdX = (EX2(ii)-EX(ii)^2).^0.5;
stdY = (EY2(ii)-EY(ii)^2).^0.5;
accumPearson(ii) = (EXY(ii)-EX(ii)*EY(ii))/(stdX*stdY);
end
%accumulative mean function, to be defined in an additional m file.
function [ accumMean ] = accumMean( vec )
accumMean = zeros(size(vec));
accumMean(1) = vec(1);
for ii=2:length(vec)
accumMean(ii) = (accumMean(ii-1)*(ii-1) +vec(ii))/ii;
end
end
Runtime
for N=10000:
Elapsed time is 0.002096 seconds.
for N=1000000:
Elapsed time is 0.240669 seconds.
Correctness
Testing the correctness of the code above could be done by calculative the accumulative pearson coefficient by corr function, and comparing it to the result given from the code above:
%ground truth for correctness comparison
gt = zeros(N,1)
for z=1:N
gt(z) = corr(X(1:z),Y(1:z));
end
Unfortunately, I dont have the Statistics and Machine Learning Toolbox, so I cant make this check.
I do think that it is a good start though, and you can continue from here :)
I'm trying to verify if my implementation of Logistic Regression in Matlab is good. I'm doing so by comparing the results I get via my implementation with the results given by the built-in function mnrfit.
The dataset D,Y that I have is such that each row of D is an observation in R^2 and the labels in Y are either 0 or 1. Thus, D is a matrix of size (n,2), and Y is a vector of size (n,1)
Here's how I do my implementation:
I first normalize my data and augment it to include the offset :
d = 2; %dimension of data
M = mean(D) ;
centered = D-repmat(M,n,1) ;
devs = sqrt(sum(centered.^2)) ;
normalized = centered./repmat(devs,n,1) ;
X = [normalized,ones(n,1)];
I will be doing my calculations on X.
Second, I define the gradient and hessian of the likelihood of Y|X:
function grad = gradient(w)
grad = zeros(1,d+1) ;
for i=1:n
grad = grad + (Y(i)-sigma(w'*X(i,:)'))*X(i,:) ;
end
end
function hess = hessian(w)
hess = zeros(d+1,d+1) ;
for i=1:n
hess = hess - sigma(w'*X(i,:)')*sigma(-w'*X(i,:)')*X(i,:)'*X(i,:) ;
end
end
with sigma being a Matlab function encoding the sigmoid function z-->1/(1+exp(-z)).
Third, I run the Newton algorithm on gradient to find the roots of the gradient of the likelihood. I implemented it myself. It behaves as expected as the norm of the difference between the iterates goes to 0. I wrote it based on this script.
I verified that the gradient at the wOPT returned by my Newton implementation is null:
gradient(wOP)
ans =
1.0e-15 *
0.0139 -0.0021 0.2290
and that the hessian has strictly negative eigenvalues
eig(hessian(wOPT))
ans =
-7.5459
-0.0027
-0.0194
Here's the wOPT I get with my implementation:
wOPT =
-110.8873
28.9114
1.3706
the offset being the last element. In order to plot the decision line, I should convert the slope wOPT(1:2) using M and devs. So I set :
my_offset = wOPT(end);
my_slope = wOPT(1:d)'.*devs + M ;
and I get:
my_slope =
1.0e+03 *
-7.2109 0.8166
my_offset =
1.3706
Now, when I run B=mnrfit(D,Y+1), I get
B =
-1.3496
1.7052
-1.0238
The offset is stored in B(1).
I get very different values. I would like to know what I am doing wrong. I have some doubt about the normalization and 'un-normalization' process. But I'm not sure, may be I'm doing something else wrong.
Additional Info
When I tape :
B=mnrfit(normalized,Y+1)
I get
-1.3706
110.8873
-28.9114
which is a rearranged version of the opposite of my wOPT. It contains exactly the same elements.
It seems likely that my scaling back of the learnt parameters is wrong. Otherwise, it would have given the same as B=mnrfit(D,Y+1)
I'm writing a program in matlab to observe how a function evolves in time. I'd like to set up a matrix that fills its first row with the initial function, and progressively fills more rows based off of a time derivative (that's dependent on the spatial derivative). The function is arbitrary, the program just needs to 'evolve' it. This is what I have so far:
xleft = -10;
xright = 10;
xsampling = 1000;
tmax = 1000;
tsampling = 1000;
dt = tmax/tsampling;
x = linspace(xleft,xright,xsampling);
t = linspace(0,tmax,tsampling);
funset = [exp(-(x.^2)/100);cos(x)]; %Test functions.
funsetvel = zeros(size(funset)); %The functions velocities.
spacetimevalue1 = zeros(length(x), length(t));
spacetimevalue2 = zeros(length(x), length(t));
% Loop that fills the first functions spacetime matrix.
for j=1:length(t)
funsetvel(1,j) = diff(funset(1,:),x,2);
spacetimevalue1(:,j) = funsetvel(1,j)*dt + funset(1,j);
end
This outputs the error, Difference order N must be a positive integer scalar. I'm unsure what this means. I'm fairly new to Matlab. I will exchange the Euler-method for another algorithm once I can actually get some output along the proper expectation. Aside from the error associated with taking the spatial derivative, do you all have any suggestions on how to evaluate this sort of process? Thank you.
The problem says:
Three tensile tests were carried out on an aluminum bar. In each test the strain was measured at the same values of stress. The results were
where the units of strain are mm/m.Use linear regression to estimate the modulus of elasticity of the bar (modulus of elasticity = stress/strain).
I used this program for this problem:
function coeff = polynFit(xData,yData,m)
% Returns the coefficients of the polynomial
% a(1)*x^(m-1) + a(2)*x^(m-2) + ... + a(m)
% that fits the data points in the least squares sense.
% USAGE: coeff = polynFit(xData,yData,m)
% xData = x-coordinates of data points.
% yData = y-coordinates of data points.
A = zeros(m); b = zeros(m,1); s = zeros(2*m-1,1);
for i = 1:length(xData)
temp = yData(i);
for j = 1:m
b(j) = b(j) + temp;
temp = temp*xData(i);
end
temp = 1;
for j = 1:2*m-1
s(j) = s(j) + temp;
temp = temp*xData(i);
end
end
for i = 1:m
for j = 1:m
A(i,j) = s(i+j-1);
end
end
% Rearrange coefficients so that coefficient
% of x^(m-1) is first
coeff = flipdim(gaussPiv(A,b),1);
The problem is solved without a program as follows
MY ATTEMPT
T=[34.5,69,103.5,138];
D1=[.46,.95,1.48,1.93];
D2=[.34,1.02,1.51,2.09];
D3=[.73,1.1,1.62,2.12];
Mod1=T./D1;
Mod2=T./D2;
Mod3=T./D3;
xData=T;
yData1=Mod1;
yData2=Mod2;
yData3=Mod3;
coeff1 = polynFit(xData,yData1,2);
coeff2 = polynFit(xData,yData2,2);
coeff3 = polynFit(xData,yData3,2);
x1=(0:.5:190);
y1=coeff1(2)+coeff1(1)*x1;
subplot(1,3,1);
plot(x1,y1,xData,yData1,'o');
y2=coeff2(2)+coeff2(1)*x1;
subplot(1,3,2);
plot(x1,y2,xData,yData2,'o');
y3=coeff3(2)+coeff3(1)*x1;
subplot(1,3,3);
plot(x1,y3,xData,yData3,'o');
What do I have to do to get this result?
As a general advice:
avoid for loops wherever possible.
avoid using i and j as variable names, as they are Matlab built-in names for the imaginary unit (I really hope that disappears in a future release...)
Due to m being an interpreted language, for-loops can be very slow compared to their compiled alternatives. Matlab is named MATtrix LABoratory, meaning it is highly optimized for matrix/array operations. Usually, when there is an operation that cannot be done without a loop, Matlab has a built-in function for it that runs way way faster than a for-loop in Matlab ever will. For example: computing the mean of elements in an array: mean(x). The sum of all elements in an array: sum(x). The standard deviation of elements in an array: std(x). etc. Matlab's power comes from these built-in functions.
So, your problem. You have a linear regression problem. The easiest way in Matlab to solve this problem is this:
%# your data
stress = [ %# in Pa
34.5 69 103.5 138] * 1e6;
strain = [ %# in m/m
0.46 0.95 1.48 1.93
0.34 1.02 1.51 2.09
0.73 1.10 1.62 2.12]' * 1e-3;
%# make linear array for the data
yy = strain(:);
xx = repmat(stress(:), size(strain,2),1);
%# re-formulate the problem into linear system Ax = b
A = [xx ones(size(xx))];
b = yy;
%# solve the linear system
x = A\b;
%# modulus of elasticity is coefficient
%# NOTE: y-offset is relatively small and can be ignored)
E = 1/x(1)
What you did in the function polynFit is done by A\b, but the \-operator is capable of doing it way faster, way more robust and way more flexible than what you tried to do yourself. I'm not saying you shouldn't try to make these thing yourself (please keep on doing that, you learn a lot from it!), I'm saying that for the "real" results, always use the \-operator (and check your own results against it as well).
The backslash operator (type help \ on the command prompt) is extremely useful in many situations, and I advise you learn it and learn it well.
I leave you with this: here's how I would write your polynFit function:
function coeff = polynFit(X,Y,m)
if numel(X) ~= numel(X)
error('polynFit:size_mismathc',...
'number of elements in matrices X and Y must be equal.');
end
%# bad condition number, rank errors, etc. taken care of by \
coeff = bsxfun(#power, X(:), m:-1:0) \ Y(:);
end
I leave it up to you to figure out how this works.
The following function calculates the Gaussian Kernel and is part of the Kernel Ridge Regression algorithm that I wrote. I was wondering how could I modify this function properly in order to improve the execution time (i.e. get rid of the two for loops). Any ideas?
function [K] = calculate_krr_gaussiankernel(Xi,Xj,S)
K = zeros(size(Xi,1),size(Xj,1));
for Ixi = 1:size(Xi,1),
for Ixj = 1:size(Xj,1),
K(Ixi,Ixj) = exp((-norm(Xi(Ixi,:) - Xj(Ixj,:)) .^ 2) ./ (2 * (S .^ 2)));
end
end
end
EDIT: The formula:
Here's a version that's most likely faster. It might however give rise to memory issues for large Xi/Xj.
function K = calculate_krr_gaussiankernel(Xi, Xj, S)
%# create an array of difference between Xi(r,:) and Xj(s,:) for all r,s
delta = bsxfun(#minus, permute(Xi,[1 3 2]), permute(Xj,[3 1 2]));
%# calculate the squared norm
ssq = sum(delta.^2, 3);
%# calculate the kernel
K = exp(-ssq./(2*S.^2));
Here's an explanation of what I'm doing:
the bsxfun line: I reshape the inputs, such that I can get, at every (i,j), the difference vector in the third dimension
the ssq line simply takes the sum of squares. I could take the square root here and thus get the norm, but since we'll square that again, anyway, there's no point in that.
the final line implements the formula in the OP, where ssq is the squared norm of the differences.
You can certainly double the speed (approximately) since K is symmetric. In addition you can calculate the norm of the difference vector and then make a single call to exp() which may be faster than calling exp() over and over again. Putting this together:
function [K] = calculate_krr_gaussiankernel(Xi,Xj,S)
arg = zeros(size(Xi,1),size(Xj,1));
for Ixi = 1:size(Xi,1),
% diagonal elements can be done in outer loop:
arg(Ixi,Ixi) = norm(Xi(Ixi,:) - Xj(Ixi,:));
for Ixj = Ixi+1:size(Xj,1), % off-diagonals done once and copied
arg(Ixi,Ixj) = norm(Xi(Ixi,:) - Xj(Ixj,:));
arg(Ixj,Ixi) = arg(Ixi,Ixj);
end
end
end
K = exp(( -arg.^ 2) ./ (2 * (S .^ 2)))