I'm trying to take 5 years of data from a netcdf file for a variable and create an annual cycle. So take all 5 Jans and average them, take all 5 Febs and average them, etc, and plot on a line graph. I'm just starting the code (I want to get this to work before I move on) and am getting the following error message of: In an assignment A(:) = B, the number of elements in A and B must be the same.
My question is, is there a better way to do this?
Thanks for your help in advance. I'm a newbie, so I know this may be a simple question.
ncid = netcdf.open('example.nc','NC_NOWRITE');
PS1 = netcdf.getVar(ncid,netcdf.inqVarID(ncid, 'ps'), 'single');
for i = 1:12
MonthlyPS1(i) = PS1(month==i);
end
That should dump data into bins for each month, where I can later calculate the average.
If PS1 is a 5-by-12 array of real numbers, then:
MonthlyPS1 = mean(PS1);
If PS1 is a 12-by-5 array of real numbers, then:
MonthlyPS1 = mean(PS1, 2);
If PS1 is a 60-by-1 array of real numbers, then:
MonthlyPS1 = mean(reshape(PS1, 12, 5), 2);
If PS1 is not an array of real numbers, you need to explain what PS1 is, as requested by CST-Link. Also, it is not clear what the variable month is.
I will try to write a solution, but I'm no sure it is what you need (there's a certain amount of guessing related to my answer).
Also I will not write it in Matlab-idiomatic code, but rather explicit loops and calculations; like this you may see easier if it's what you want or not:
% allocate space for monthly mean values
PS1_mean = zeros(size(PS1,1), size(PS1,2), 12)
for d = 1:size(PS1,3)
% calculate the month for date d
m = mod(d, 12);
if m == 0
m = 12;
end;
% cumulate the data
PS1_mean(:, :, m) = PS1_mean(:, :, m) + PS1(:, :, d);
end;
% calculate the mean value
% (this might be tricky for incomplete years)
n_years = fix(size(PS1,3) / 12);
PS1_mean = PS1_mean / n_years;
To plot a "slice" of data for e.g. March, you may try:
mesh(PS1_mean(:,:,3));
Again, I hope that's what you're looking for.
Related
Trying to calculate the variance of a European option using repeated trial (instead of 1 trial). I want to compare the variance using the standard randn function and the sobolset. I'm not quite sure how to draw repeated samples from the latter.
Generating from randn is easy:
num_steps = 100;
num_paths = 10;
z = rand(num_steps, mum_paths); % 100 paths, for 10 trials
Once I have this, I can loop through all the 10 columns of the z matrix, and can also repeat the experiment many times, as the randn function will provide a new random variable set everytime.
for exp_num = 1: 20
for col = 1: 10
price_vec = z(:, col);
end
end
I'm not quite sure how to do this with the sobolset. I understand I can create a matrix of dimensions to start with (say 100* 10). I can loop through as above through all the columns for the first experiment. However, when I try the next experiment (#2), the loop starts from the beginning and all the numbers are the same. Meaning I don't get any variation in my pricing. It seems I will need to find a way to randomize the column selection at the start of every experiment number. Is there a better way to do this??
data1 = sobolset(1000, 'Skip', 1000, 'Leap', 100)
data2 = net(test1, 10)
for exp_num = 1: 20
% how do I change the start of the column selection here, so that the next data3 is different from %the one in the previous exp_num?
for col = 1:10
data3(:, col) = data(2:, col)
% perform calculations
end
end
I hope this is making sense....
Thanks for the help!
Update: 8/21
I tried the following:
num_runs = 100
num_samples = 1000
for j = 1: num_runs
for i = 1 : num_samples
sobol_set = sobolset(num_samples,'Skip',j*50,'Leap',1e2);
sobol_set = net(sobol_set, 5);
sobol_seq = sobol_set(:, i)';
z_uncorr = norminv(sobol_seq, 0, 1)
% do pricing with z_uncorr through some function F
end
end
After generating 100 prices (through some function F, mentioned above), I find that the variance of the 100 prices is higher than that I get from the standard pseudo random numbers. This should not be the case. I think I'm still not sampling correctly from the sobolset. Any advice would be appreciated.
I will preface this post with the obvious fact that I'm not very experienced in MATLAB and this post may be somewhat confusing. Any help is appreciated!
I need to store data inside two parameters but unsure on how to do it. The number of "x" values is known but it is a user inputted value, so it's not something that can be hard coded. Same as the "y" values. Here's a simplified example of what I think I need (numbers are hard coded here for the sake of the example).
Then, the final figure should have multiple plots on it. Each "x" variable is its own "output" that needs to be plotted. In the end I need "x" number of plots with "z" and "y" being the (X,Y) coordinates for each "x" plot, respectively.
EDIT: Updated example code.
list = [.0025, .005, .0075];
x = input('How many? ');
y = linspace(2.4*10^9, 5.0*10^9, 1000);
z = zeros(x, length(y));
for i = x
time = list(i)/(3*10^8);
for j = y
z(i,j) = (time * j);
end
end
for i = x
plot(z(i,j));
end
I get the following error:
Requested 3x2400000000 (53.6GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
The example that I provided could be totally wrong but I hope I have explained enough for someone to provide feedback.
Create the z-Array beforehand to your needs: https://uk.mathworks.com/help/matlab/ref/zeros.html
Then you can fill it with z[x,y] = x+y
HTH
I have to construct the following function in MATLAB and am having trouble.
Consider the function s(t) defined for t in [0,4) by
{ sin(pi*t/2) , for t in [0,1)
s(t) = { -(t-2)^3 , for t in [1,3)*
{ sin(pi*t/2) , for t in [3,4)
(i) Generate a column vector s consisting of 512 uniform
samples of this function over the interval [0,4). (This
is best done by concatenating three vectors.)
I know it has to be something of the form.
N = 512;
s = sin(5*t/N).' ;
But I need s to be the piecewise function, can someone provide assistance with this?
If I understand correctly, you're trying to create 3 vectors which calculate the specific function outputs for all t, then take slices of each and concatenate them depending on the actual value of t. This is inefficient as you're initialising 3 times as many vectors as you actually want (memory), and also making 3 times as many calculations (CPU), most of which will just be thrown away. To top it off, it'll be a bit tricky to use concatenate if your t is ever not as you expect (i.e. monotonically increasing). It might be an unlikely situation, but better to be general.
Here are two alternatives, the first is imho the nice Matlab way, the second is the more conventional way (you might be more used to that if you're coming from C++ or something, I was for a long time).
function example()
t = linspace(0,4,513); % generate your time-trajectory
t = t(1:end-1); % exclude final value which is 4
tic
traj1 = myFunc(t);
toc
tic
traj2 = classicStyle(t);
toc
end
function trajectory = myFunc(t)
trajectory = zeros(size(t)); % since you know the size of your output, generate it at the beginning. More efficient than dynamically growing this.
% you could put an assert for t>0 and t<3, otherwise you could end up with 0s wherever t is outside your expected range
% find the indices for each piecewise segment you care about
idx1 = find(t<1);
idx2 = find(t>=1 & t<3);
idx3 = find(t>=3 & t<4);
% now calculate each entry apprioriately
trajectory(idx1) = sin(pi.*t(idx1)./2);
trajectory(idx2) = -(t(idx2)-2).^3;
trajectory(idx3) = sin(pi.*t(idx3)./2);
end
function trajectory = classicStyle(t)
trajectory = zeros(size(t));
% conventional way: loop over each t, and differentiate with if-else
% works, but a lot more code and ugly
for i=1:numel(t)
if t(i)<1
trajectory(i) = sin(pi*t(i)/2);
elseif t(i)>=1 & t(i)<3
trajectory(i) = -(t(i)-2)^3;
elseif t(i)>=3 & t(i)<4
trajectory(i) = sin(pi*t(i)/2);
else
error('t is beyond bounds!')
end
end
end
Note that when I tried it, the 'conventional way' is sometimes faster for the sampling size you're working on, although the first way (myFunc) is definitely faster as you scale up really a lot. In anycase I recommend the first approach, as it is much easier to read.
I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')
This is a follow-up to an earlier question of mine posted here. Based on Oleg Komarov's answer I wrote a little tool to get daily, hourly, etc. averages or sums of my data that uses accumarray() and datevec()'s output structure. Feel free to have a look at it here (it's probably not written very well, but it works for me).
What I would like to do now is add the functionality to calculate n-minute, n-hour, n-day, etc. statistics instead of 1-minute, 1-hour, 1-day, etc. like my function does. I have a rough idea that simply loops over my time-vector t (which would be pretty much what I would have done already if I hadn't learnt about the beautiful accumarray()), but that means I have to do a lot of error-checking for data gaps, uneven sampling times, etc.
I wonder if there is a more elegant/efficient approach that lets me re-use/extend my old function posted above, i.e. something that still makes use of accumarray() and datevec(), since this makes working with gaps very easy.
You can download some sample data taken from my last question here. These were sampled at 30 min intervals, so a possible example of what I want to do would be to calculate 6 hour averages without relying on the assumption that they are free of gaps and/or always sampled at exactly 30 min.
This is what I have come up with so far, which works reasonably well, apart from a small but easily fixed problem with the time stamps (e.g. 0:30 is representative for the interval from 0:30 to 0:45 -- my old function suffers from the same problem, though):
[ ... see my answer below ...]
Thanks to woodchips for inspiration.
The linked method of using accumarray seems overkill and too complex to me if you start with evenly spaced measurements without any gaps. I have the following function in my private toolbox for calculating an N-point average of vectors:
function y = blockaver(x, n)
% y = blockaver(x, n)
% input points are averaged over n points
% always returns column vector
if n == 1
y = x(:);
else
nblocks = floor(length(x) / n);
y = mean(reshape(x(1:n * nblocks), n, nblocks), 1).';
end
Works pretty well for quick and dirty decimating by a factor N, but note that it does not apply proper anti-alias filtering. Use decimate if that is important.
I guess I figured it out using parts of #Bas Swinckels answer and #woodchip 's code linked above. Not exactly what I would call good code, but working and reasonably fast.
function [ t_acc, x_acc, subs ] = ts_aggregation( t, x, n, target_fmt, fct_handle )
% t is time in datenum format (i.e. days)
% x is whatever variable you want to aggregate
% n is the number of minutes, hours, days
% target_fmt is 'minute', 'hour' or 'day'
% fct_handle can be an arbitrary function (e.g. #sum)
t = t(:);
x = x(:);
switch target_fmt
case 'day'
t_factor = 1;
case 'hour'
t_factor = 1 / 24;
case 'minute'
t_factor = 1 / ( 24 * 60 );
end
t_acc = ( t(1) : n * t_factor : t(end) )';
subs = ones(length(t), 1);
for i = 2:length(t_acc)
subs(t > t_acc(i-1) & t <= t_acc(i)) = i;
end
x_acc = accumarray( subs, x, [], fct_handle );
end
/edit: Updated to a much shorter fnction that does use loops, but appears to be faster than my previous solution.