interpolate missing values ( with dates as sample points) in matlab - matlab

I am new to Matlab, stuck with understanding data types(especially cell), probably there is an elegant solution I do not know about.
I have a cell which contains other cells with dates:
30/09/2005
30/12/2005
...
30/09/2016
I have also a cell with cells containing corresponding values:
1
5
...
3
I want to interpolate those values for all days/ or working days( better for me).
What I have been thinking to do is:
use datenum to receive numbers corresponding to the dates;
plug these dates( now numbers), corresponding values, and all dates( now numbers) in between them, into interp1.
Seemed a good plan but function
datenum('30/12/2005') = 13297
datenum('30/09/2016') = 13217
gives numbers which can not be used as earlier date is bigger than later one.

You can add any number of days to a datetime.
t = datetime('now') + days(1);
In addition it can give you the amount of days of a duration. Hence:
t0 = datetime('30/09/2005');
tEnd = datetime('30/09/2016');
durationInDays = days(tEnd - t0);
myDates(0) = t0;
for i = 2:durationInDays
myDates(i) = myDates(i-1) + days(1);
end

Related

MATLAB - Create Array Variable in For Loop and Plot

I will preface this post with the obvious fact that I'm not very experienced in MATLAB and this post may be somewhat confusing. Any help is appreciated!
I need to store data inside two parameters but unsure on how to do it. The number of "x" values is known but it is a user inputted value, so it's not something that can be hard coded. Same as the "y" values. Here's a simplified example of what I think I need (numbers are hard coded here for the sake of the example).
Then, the final figure should have multiple plots on it. Each "x" variable is its own "output" that needs to be plotted. In the end I need "x" number of plots with "z" and "y" being the (X,Y) coordinates for each "x" plot, respectively.
EDIT: Updated example code.
list = [.0025, .005, .0075];
x = input('How many? ');
y = linspace(2.4*10^9, 5.0*10^9, 1000);
z = zeros(x, length(y));
for i = x
time = list(i)/(3*10^8);
for j = y
z(i,j) = (time * j);
end
end
for i = x
plot(z(i,j));
end
I get the following error:
Requested 3x2400000000 (53.6GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
The example that I provided could be totally wrong but I hope I have explained enough for someone to provide feedback.
Create the z-Array beforehand to your needs: https://uk.mathworks.com/help/matlab/ref/zeros.html
Then you can fill it with z[x,y] = x+y
HTH

Trying to calculate annual cycle

I'm trying to take 5 years of data from a netcdf file for a variable and create an annual cycle. So take all 5 Jans and average them, take all 5 Febs and average them, etc, and plot on a line graph. I'm just starting the code (I want to get this to work before I move on) and am getting the following error message of: In an assignment A(:) = B, the number of elements in A and B must be the same.
My question is, is there a better way to do this?
Thanks for your help in advance. I'm a newbie, so I know this may be a simple question.
ncid = netcdf.open('example.nc','NC_NOWRITE');
PS1 = netcdf.getVar(ncid,netcdf.inqVarID(ncid, 'ps'), 'single');
for i = 1:12
MonthlyPS1(i) = PS1(month==i);
end
That should dump data into bins for each month, where I can later calculate the average.
If PS1 is a 5-by-12 array of real numbers, then:
MonthlyPS1 = mean(PS1);
If PS1 is a 12-by-5 array of real numbers, then:
MonthlyPS1 = mean(PS1, 2);
If PS1 is a 60-by-1 array of real numbers, then:
MonthlyPS1 = mean(reshape(PS1, 12, 5), 2);
If PS1 is not an array of real numbers, you need to explain what PS1 is, as requested by CST-Link. Also, it is not clear what the variable month is.
I will try to write a solution, but I'm no sure it is what you need (there's a certain amount of guessing related to my answer).
Also I will not write it in Matlab-idiomatic code, but rather explicit loops and calculations; like this you may see easier if it's what you want or not:
% allocate space for monthly mean values
PS1_mean = zeros(size(PS1,1), size(PS1,2), 12)
for d = 1:size(PS1,3)
% calculate the month for date d
m = mod(d, 12);
if m == 0
m = 12;
end;
% cumulate the data
PS1_mean(:, :, m) = PS1_mean(:, :, m) + PS1(:, :, d);
end;
% calculate the mean value
% (this might be tricky for incomplete years)
n_years = fix(size(PS1,3) / 12);
PS1_mean = PS1_mean / n_years;
To plot a "slice" of data for e.g. March, you may try:
mesh(PS1_mean(:,:,3));
Again, I hope that's what you're looking for.

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

find indices in array, use indices as lookup, plot w/r/t time

I'm looking to find the n largest values in an array, then to use the indices of those found values as a look up into another array representing time. But I am wondering how I can plot this if i want time to display as a continuous variable. Do I need to zero out data? That wouldn't be preferable for my use case as I'm looking to save memory.
Let's say that I have array A, which is where I am looking for the max values. Then I have array T, which represents timestamps. I want my plot to display continuous time and plot() doesn't like arguments of differing size. How do most people deal with this?
Here's what I've got so far:
numtofind = 4;
A = m{:,10};
T = ((m{:,4} * 3600.0) + (m{:,5} * 60.0) + m{:,6});
[sorted, sortindex] = sort(A(:), 'descend');
maxvalues = sorted(1:numtofind);
maxindex = sortindex(1:numtofind);
corresponding_timestamps = T(maxindex);
%here i plot the max values against time/corresponding timestamps,
%but i want to place them in the right timestamp and display time as continuous
%rather than the filtered set:
plot(time_values, maxvalues);
When you say "time as continuous", do you mean you want time going from minimum to maximum? If so, you can just sort corresponding_timestamps and use that to reorder maxvalues. Even if you don't do that, you can still do plot(time_values, maxvalues, '.') to get a scatter plot which won't mess up your graph with lines.

Looping over matrix elements more efficiently in Matlab

I am writing some matlab code and have written an algorithm that works but I don't think its particularly efficient. Since I am trying to improve my programming skills I would like to know if there is a more efficient way of doing this.
I have a (reasonably large ~ E07) matrix of values which are unordered, but fall within the range [-100, 100]. I want to create a second matrix based on the first, by using the following rules:
If the value of the point is > 70, then the value of the point should be set to 70.
If the value of the point is < -70, then the value of the point should be set to -70.
All other values should be rounded to the nearest multiple of 5.
Here is what I am currently doing:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data));
for i = 1:length(data)
if (data(i) > 70)
new_data(i) = 70;
elseif (data(i) < -70)
new_data(i) = -70;
else
new_data(i) = round(data(i)/5.0)*5.0;
end
end
Is there a more efficient method? I think there should be a way to do this using logical indexes but those are a new discovery for me...
You do not need a loop at all:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data)); % note that this memory allocation is not necessary at this point
new_data = round(data/5.0)*5.0;
new_data(data>70) = 70;
new_data(data<-70) = -70;
Even easier is to use max and min. Do it in one simple line.
new_data = round(5*max(-70,min(70,data)))/5;
The two answers by H.Muster and woodchips are of course the way to do it, but there still are small improvements to be found. If you are after performance you might want to exploit specifics of your problem. For example, your output data is integers -100 <= x <= 100. This obviously qualifies for 8-bit signed integer data type. This code (note explicit cast to int8 from arbitrary double precision data)
% your double precision input data
data = 100*(-1+2*rand(1,10000000));
% cast to int8 - matlab does usual round here
data = int8(data);
new_data = 5*(max(-70,min(70,data))/5);
is the fastest for two reasons:
1 data element takes 1 byte, not 8. Memory bandwidth is a limiting factor here, so you get a lot of improvement
round is no longer necessary
Here are some timings from the codes of H.Muster, woodchips, and my small modification:
H.Muster Elapsed time is 0.235885 seconds.
woodchips Elapsed time is 0.167659 seconds.
my code Elapsed time is 0.023061 seconds.
The difference is quite striking. Although MATLAB uses doubles everywhere, you should try to use integer data types when possible..
Edit This works because of how matlab implements integer arithmetic. Differently than in C, a cast of double to int implies a round operation:
a = 0.1;
int8(a)
ans =
0
a = 0.9;
int8(a)
ans =
1