I'm trying to do a MCS on a spread option, where one leg of the spread option is dependent on four underlyings. So I found the code on goddard http://www.goddardconsulting.ca/matlab-monte-carlo-assetpaths-corr.html
and modified it to handle more than two assets because it didn't work with more than two.
My code looks like this:
% do a Cholesky factorization on the correlation matrix
R = chol(corr);
% pre-allocate the output
S = nan(steps+1,nsims,nAssets);
% generate uncorrelated random sequence
x = randn(steps,size(corr,2));
% correlate the sequences
ep = x*R;
% generate correlated random sequences and paths
for idx = 1:nsims
A = repmat( drift*dt,steps,1 );
B = ep*diag(sig)*sqrt(dt);
[s1 s2] = size (B);
A = reshape(A, s1,s2); % A and B have to be reshaped to fit
% Generate potential paths
S(:,idx,:) = [ones(1,nAssets); cumprod(exp(A + B)) ]*diag(S0);
end
% If only one simulation then remove the unitary dimension
if nsims==1
S = squeeze(S);
end
and plotting it all I get a visually useful result. That is, I can use the generated paths as input to the "leg" I need for the option calculation.
My quite ridiculous problem is: I don't actually understand what
S(:,idx,:) = [ones(1,nAssets); cumprod(exp(A + B)) ]*diag(S0);
is doing. Could some one explain? I have iterated through it but it doesn't get any clearer. SO are the four end values of my four underlyings that I use as start value for the simulations.
What I actually would have preferred to do, as it worked well with non-correlated paths and I understand the matlab code, is to simply create a model with HWV in matlab, calculate drift and long-term mean and other needed factors, and in some way apply the cholesky to the wiener process, and then generate the correlated paths instead of as before uncorrelated paths. I woulf need to pass the random numbers, multiplied by the cholesky in order to get them correlated, to the matlab function "simulate", but I can't find out how to do that.
%%% EDIT SEPTEMBER 26 %%%
Ok, I finally managed to create an HWV-object for my 4 asstes, passing all kinds of vectors and arrays and matrices of the right sizes...
This is the correlation matrix I passed:
(order NATGAS GASOIL FUELOIL EURUSD)
1.0000 0.8819 0.8118 -0.5096
0.8819 1.0000 0.9744 -0.3065
0.8118 0.9744 1.0000 -0.2832
-0.5096 -0.3065 -0.2832 1.0000
As you can see, obviously the eurusd fx rate is negatively correlated with the hydrocarbons.
So why are the simulated assets paths practically identical?
The EURUSD should be completely different, as when I do the correlated GBM using the code above.
Related
I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)
I'm probably being a little dense but I'm not very mathsy and can't seem to understand the covariance element of creating multivariate data.
I'm after two columns of random data (representing two correlated variables).
I think I am right in needing to use the mvnrnd function and I understand that 'mu' must be a column of my mean vectors. As I need 4 distinct classes within my data these are going to be (1, 1) (-1 1) (1 -1) and (-1 -1). I assume I will have to do the function 4x with a different column of mean vectors each time and then combine them to get my full data set.
I don't understand what I should put for SIGMA - Matlab help tells me that it must be 'a d-by-d symmetric positive semi-definite matrix, or a d-by-d-by-n array' i.e. a covariance matrix. I don't understand how I create a covariance matrix for numbers that I am yet to generate.
Any advice would be greatly appreciated!
Assuming that I understood your case properly, I would go this way:
data = [normrnd(0,1,5000,1),normrnd(0,1,5000,1)]; %% your starting data series
MU = mean(data,1);
SIGMA = cov(data);
Now, it should be possible to feed mvnrnd with MU and SIGMA:
r = mvnrnd(MU,SIGMA,5000);
plot(r(:,1),r(:,2),'+') %% in case you wanna plot the results
I hope this helps.
I think your aim is to generate the simulated multivariate gaussian distributed data. For example, I use
k = 6; % feature dimension
mu = rand(1,k);
sigma = 10*eye(k,k);
unit matrix by 10 times is a symmetric positive semi-definite matrix. And the gaussian distribution will be more round than other type of sigma.
then you can use it as the above example of mvnrnd function and see the plot.
Here is my testing function:
function diff = svdtester()
y = rand(500,20);
[U,S,V] = svd(y);
%{
y = sprand(500,20,.1);
[U,S,V] = svds(y);
%}
diff_mat = y - U*S*V';
diff = mean(abs(diff_mat(:)));
end
There are two very similar parts: one finds the SVD of a random matrix, the other finds the SVD of a random sparse matrix. Regardless of which one you choose to comment (right now the second one is commented-out), we compute the difference between the original matrix and the product of its SVD components and return that average absolute difference.
When using rand/svd, the typical return (mean error) value is around 8.8e-16, basically zero. When using sprand/svds, the typical return values is around 0.07, which is fairly terrible considering the sparse matrix is 90% 0's to start with.
Am I misunderstanding how SVD should work for sparse matrices, or is something wrong with these functions?
Yes, the behavior of svds is a little bit different from svd. According to MATLAB's documentation:
[U,S,V] = svds(A,...) returns three output arguments, and if A is m-by-n:
U is m-by-k with orthonormal columns
S is k-by-k diagonal
V is n-by-k with orthonormal columns
U*S*V' is the closest rank k approximation to A
In fact, usually k will be somethings about 6, so you will get rather "rude" approximation. To get more exact approximation specify k to be min(size(y)):
[U, S, V] = svds(y, min(size(y)))
and you will get error of the same order of magnitude as in case of svd.
P.S. Also, MATLAB's documentations says:
Note svds is best used to find a few singular values of a large, sparse matrix. To find all the singular values of such a matrix, svd(full(A)) will usually perform better than svds(A,min(size(A))).
Basically I have two matrices, something like this:
> Matrix A (100 rows x 2 features)
Height - Weight
1.48 75
1.55 65
1.60 70
etc...
And Matrix B (same dimension of Matrix A but with different values of course)
I would like to understand if there is some correlation between Matrix A and Matrix B, which strategy do you suggest me?
The concept you are looking for is known as canonical correlation. It is a well developed bit of theory in the field of multivariate analysis. Essentially, the idea is to find a linear combination of the columns in your first matrix and a linear combination of the columns in your second matrix, such that the correlation between the two linear combinations is maximized.
This can be done manually using eigenvectors and eigenvalues, but if you have the statistics toolbox, then Matlab has already got it packaged and ready to go for you. The function is called canoncorr, and the documentation is here
A brief example of the usage of this function follows:
%# Set up some example data
CovMat = randi(5, 4, 4) + 20 * eye(4); %# Build a random covariance matrix
CovMat = (1/2) * (CovMat + CovMat'); %# Ensure random covriance matrix is symmetrix
X = mvnrnd(zeros(500, 4), CovMat); %# Simulate data using multivariate Normal
%# Partition the data into two matrices
X1 = X(:, 1:2);
X2 = X(:, 3:4);
%# Find the canonical correlations of the two matrices
[A, B, r] = canoncorr(X1, X2);
The first canonical correlation is the first element of r, and the second canonical correlation is the second element of r.
The canoncorr function also has a lot of other outputs. I'm not sure I'm clever enough to provide a satisfactory yet concise explanation of them here so instead I'm going to be lame and recommend you read up on it in a multivariate analysis textbook - most multivariate analysis textbooks will have a full chapter dedicated to canonical correlations.
Finally, if you don't have the statistics toolbox, then a quick google revealed the following FEX submission that claims to provide canonical correlation analysis - note, I haven't tested it myself.
Ok, let's have a short try:
A = [1:20; rand(1,20)]'; % Generate some data...
The best way to examine a 2-dimensional relationship is by looking at the data plots:
plot(A(:,1), A(:,2), 'o') % In the random data you should not see some pattern...
If we really want to compute some correlation coefficients, we can do this with corrcoef, as you mentioned:
B = corrcoef(A)
ans =
1.0000 -0.1350
-0.1350 1.0000
Here, B(1,1) is the correlation between column 1 and column 1, B(2,1) between column 1 and column 2 (and vice versa, thus B is symmetric).
One may argue about the usefulness of such a measure in a two-dimensional context - in my opinion you usually gain more insights by looking at the plots.
Suppose I have an arbitrary transformation matrix A such as,
A =
0.9966 0.0007 -6.5625
0.0027 0.9938 1.0598
0 0 1.0000
And a set of points such that their x and y coordinates are represented by X and Y respectively.
And suppose,
[Xf Yf] = tformfwd(maketform('projective',A),X,Y);
Now,
[Xff Yff] = tformfwd(maketform('projective',inv(A)),Xf,Yf);
[Xfi Yfi] = tforminv(maketform('projective',A),Xf,Yf);
[Xff Yff] and [Xfi Yfi] seem to be exactly the same (and they should).
Is tforminv just there for convenience or am I missing something here?
I'll preface this by saying it is my best guess...
It's possible that tforminv may perform the transformation without actually forming the inverse matrix. For example, you can solve a system of linear equations Ax = b in two ways:
x = inv(A)*b;
x = A\b;
According to the documentation for inv, the second option (using the matrix division operator) can perform better "from both an execution time and numerical accuracy standpoint" since it "produces the solution using Gaussian elimination, without forming the inverse". tforminv may do something similar and thus show better overall behavior compared with passing the inverse matrix to tformfwd.
If you were so inclined, you could probably try a number of different transformation matrices and test the two approaches (tforminv or tformfwd and inv) to see how accurate the results are and how fast they are each computed.