MATLAB how to filter timeseries minute bar data so as to calculate realised volatility? - matlab

I have a data set looks like this:
'2014-01-07 22:20:00' [0.0016]
'2014-01-07 22:25:00' [0.0013]
'2014-01-07 22:30:00' [0.0017]
'2014-01-07 22:35:00' [0.0020]
'2014-01-07 22:40:00' [0.0019]
'2014-01-07 22:45:00' [0.0022]
'2014-01-07 22:50:00' [0.0019]
'2014-01-07 22:55:00' [0.0019]
'2014-01-07 23:00:00' [0.0021]
'2014-01-07 23:05:00' [0.0021]
'2014-01-07 23:10:00' [0.0026]
First column is the time stamp recording data everything 5 min, second column is return.
For each day, I want to calculate sum of squared 5 min bar returns. Here I define a day as from 5:00 pm - 5:00 pm. ( So date 2014-01-07 is from 2014-01-06 17:00 to 2014-01-07 17:00 ). So for each day, I would sum squared returns from 5:00 pm - 5:00 pm. Output will be something like:
'2014-01-07' [0.046]
'2014-01-08' [0.033]
How should I do this?

Here is alternative solution
Just defining some randome data
t1 = datetime('2016-05-31 00:00:00','InputFormat','yyyy-MM-dd HH:mm:ss ');
t2 = datetime('2016-06-05 00:00:00','InputFormat','yyyy-MM-dd HH:mm:ss ');
Samples = 288; %because your sampling time is 5 mins
t = (t1:1/Samples:t2).';
X = rand(1,length(t));
First we find the sample which has the given criteria (Can be anything, In your case it was 00:05:00)
n = find(t.Hour >= 5,1,'first')
b = n;
Find the total number of days after the given sample
totaldays = length(find(diff(t.Day)))
and square and accumulate the 'return'for each day
for i = 1:totaldays - 1
sum_acc(i) = sum(X(b:b + (Samples - 1)).^2);
b = b + Samples;
end
This is just for visualization of the data
Dates = datetime(datestr(bsxfun(#plus ,datenum(t(n)) , 0:totaldays - 2)),'Format','yyyy-MM-dd')
table(Dates,sum_acc.','VariableNames',{'Date' 'Sum'})
Date Sum
__________ ______
2016-05-31 93.898
2016-06-01 90.164
2016-06-02 90.039
2016-06-03 91.676

I admit that your dates are in a cell and your values in a vector.
So for example you have:
date = {'2014-01-07 16:20:00','2014-01-07 22:25:00','2014-01-08 16:20:00','2014-01-08 22:25:00'};
value = [1 2 3 4];
You can find the sum for each date with:
%Creation of an index that separate each "day".
[~,~,ind] = unique(floor(cellfun(#datenum,date)+datenum(0,0,0,7,0,0))) %datenum(0,0,0,7,0,0) correspond to the offset
for i = 1:length(unique(ind))
sumdate(i) = sum(number(ind==i).^2)
end
And you can find the corresponding day of each sum with
datesum = cellstr(datestr(unique(floor(cellfun(#datenum,date)+datenum(0,0,0,7,0,0)))))

Related

How do i convert monthly data into quarterly data in matlab as part of a table

How do I convert monthly data into quarterly data?
For example:
my 1 table looks like this:
TB3m
0.08
0.07
0.06
0.12
0.13
0.14
my second table is table of dates:
dates
1975/01/31
1975/02/28
1975/03/31
1975/04/30
1975/05/31
1975/06/30
I want to convert table 1 such that it takes the average of 3 months to give me quarterly data.
It should ideally look like:
TB3M_quarterly
0.07
0.13
so that it can match my other quarterly dates table which looks like:
dates_quarterly
1975/03/31
1975/06/30
Over all my data is from 1975 January to 2021 june which would give me around 186 quarterly data. Please suggest what I can use. It is thefirst time I am using matlab
In case you can't understand a command, search for it in MATLAB documentation. One quick way to do so in Windows is to click on the function you don't understand and then press F1. For example, if you can't understand what calmonths is doing, then click on calminths and press F1.
TBm = [0.08; 0.07; 0.06; 0.12; 0.13; 0.14]; %values
% months
t1 = datetime(1975, 01, 31); %1st month
% datetime is a datatype to represent time
t2 = datetime(1975, 06, 30); %change this as your last month
dates = t1:calmonths(1):t2;
dates = dates'; %row vector to column vector
TBm_reshaped = reshape(TBm, 3, []);
TB3m_quarterly = mean(TBm_reshaped);
dates_quarterly = dates(3:3:end);
TB3m_quarterly = TB3m_quarterly';
T = table(dates_quarterly, TB3m_quarterly)
I suggest you organize your data in a timetable, and then use the function retime()
% Replicating your data
dates = datetime(1975,1,31) : calmonths(1) : datetime(1975,6,30);
T = timetable([8 7 6 12 13 14]'./100, 'RowTimes', dates);
% Using retime to get quarterly values:
T_quarterly = retime(T, 'quarterly', 'mean')
Here you aggregate by taking the mean of the monthly data. For other aggregation methods, look at the documentation for retime()

Convert milliseconds into hours and plot

I'm trying to convert an array of milliseconds and its respective data. However I want to do so in hours and minutes.
Millis = [60000 120000 180000 240000....]
Power = [ 12 14 12 13 14 ...]
I've set it up so the data records every minute, hence the 60000 millis (= 1 minimte). I am trying to plot time on the x axis and power on the y. I would like to have the x axis displayed in hours and minutes with each respective power data corresponding to its respective time.
I've tried this
for i=2:length(Millis)
Conv2Min(i) = Millis(i) / 60000;
Time(i) = startTime + Conv2Min(i);
if (Time(i-1) > Time(i) + 60)
Time(i) + 100;
end
end
s = num2str(Time);
This in attempt to turn the milliseconds into hours starting at 08:00 and once 60 minutes have past going to 09:00, the problem is plotting this. I get a gap between 08:59 and 09:00. I also cannot maintain the 0=initial 0.
In this scenario it is preferable to work with datenum values and then use datetick to set the format of the tick labels of your plot to 'HH:MM'.
Let's suppose that you started taking measurements at t_1 = [HH_1, MM_1] and stopped taking measurements at t_2 = [HH_2, MM_2].
A cool trick to generate the array of datenum values is to use the following expression:
time_datenums = HH_1/24 + MM_1/1440 : 1/1440 : HH_2/24 + MM_2/1440;
Explanation:
We are creating a regularly-spaced vector time_datenums = A:B:C using the colon (:) operator, where A is the starting datenum value, B is the increment between datenum values and C is the ending datenum value.
Since your measurements have been taken every minute (60000 milliseconds), then the increment between datenum values should be of 1 minute too. As a day has 24 hours, that makes 1440 minutes a day, so use B = 1/1440 as the increment between vector elements, to get 1 minute increments.
For A and C we simply need to divide the hour digits by 24 and the minute digits by 1440 and sum them up like this:
A = HH_1/24 + MM_1/1440
C = HH_2/24 + MM_2/1440
So for example, if t_1 = [08, 00], then A = 08/24 + 00/1440. As simple as that.
Notice that this procedure doesn't use the datenum function at all, and still, it manages to generate a valid array of datenum values only taking into consideration the time of the datenum, without needing to bother about the date of the datenum. You can learn more about this here and here.
Going back to your original problem, let's have a look at the code:
time_millisec = 0:60000:9e6; % Time array in milliseconds.
power = 10*rand(size(time_millisec)); % Random power data.
% Elapsed time in milliseconds.
elapsed_millisec = time_millisec(end) - time_millisec(1);
% Integer part of elapsed hours.
elapsed_hours_int = fix(elapsed_millisec/(1000*60*60));
% Fractional part of elapsed hours.
elapsed_hours_frac = (elapsed_millisec/(1000*60*60)) - elapsed_hours_int;
t_1 = [08, 00]; % Start time 08:00
t_2 = [t_1(1) + elapsed_hours_int, t_1(2) + elapsed_hours_frac*60]; % Compute End time.
HH_1 = t_1(1); % Hour digits of t_1
MM_1 = t_1(2); % Minute digits of t_1
HH_2 = t_2(1); % Hour digits of t_2
MM_2 = t_2(2); % Minute digits of t_2
time_datenums = HH_1/24+MM_1/1440:1/1440:HH_2/24+MM_2/1440; % Array of datenums.
plot(time_datenums, power); % Plot data.
datetick('x', 'HH:MM'); % Set 'HH:MM' datetick format for the x axis.
This is the output:
I would use datenums:
Millis = [60000 120000 180000 240000 360000];
Power = [ 12 14 12 13 14 ];
d = [2017 05 01 08 00 00]; %starting point (y,m,d,h,m,s)
d = repmat(d,[length(Millis),1]);
d(:,6)=Millis/1000; %add them as seconds
D=datenum(d); %convert to datenums
plot(D,Power) %plot
datetick('x','HH:MM') %set the x-axis to datenums with HH:MM as format
an even shorter approach would be: (thanks to codeaviator for the idea)
Millis = [60000 120000 180000 240000 360000];
Power = [ 12 14 12 13 14 ];
D = 8/24+Millis/86400000; %24h / day, 86400000ms / day
plot(D,Power) %plot
datetick('x','HH:MM') %set the x-axis to datenums with HH:MM as format
I guess, there is an easier way using datetick and datenum, but I couldn't figure it out. This should solve your problem for now:
Millis=6e4:6e4:6e6;
power=randi([5 15],1,numel(Millis));
hours=floor(Millis/(6e4*60))+8; minutes=mod(Millis,(6e4*60))/6e4; % Calculate the hours and minutes of your Millisecond vector.
plot(Millis,power)
xlabels=arrayfun(#(x,y) sprintf('%d:%d',x,y),hours,minutes,'UniformOutput',0); % Create time-strings of the format HH:MM for your XTickLabels
tickDist=10; % define how often you want your XTicks (e.g. 1 if you want the ticks every minute)
set(gca,'XTick',Millis(tickDist:tickDist:end),'XTickLabel',xlabels(tickDist:tickDist:end))

Count days in MATLAB

I need to count X days starting on some specific day, but I don't know how to sum these days accosrding to the number of days on each month (e.g. February having 28 or 29, March 31 but April 30...).
I know there is the function daysact( startDate, endDate ), but this way I have to try dates until I reach the result I want, and what I need is for the program to count X days from the startDate and return the endDate. For example, if I want to count 90 days from tomorrow, I have done:
startDate = '8-jan-2016';
endDate = '7-apr-2016';
numDays = daysact( startDate, endDate );
but I have had to try dates until the function returned exactly 90 (I know it's fairly simple, but the final program will have to do this for different values of days and different starting dates...)
2014b and later:
datetime(2016,1,8) + days(90)
datetime(2016,1,8) + 90
datetime('today') + days(90)
datetime('today') + 90
datetime('tomorrow') + days(90)
datetime('tomorrow') + 90
startDate = '8-jan-2016';
NumDays = 90;
tmp = datevec(startDate);
tmp(3)=tmp(3)+90;
endDate = datestr(tmp)
This uses datevec to transform your string into a vector of [Y M D HH MM SS], so by adding the desired number of days, 90, to the third element you add 90 days to the start date. Then transform the vector back using datestr, which makes it a recognisable date again.
As per #excaza's comment datenum does approximately the same thing, returning the number of days since 0 January 0000, so the same could be accomplished using:
startDate = '8-jan-2016';
NumDays = 90;
endDate = datestr(datenum(startDate)+NumDays);
which is a bit more concise, at the cost of having to convert everything to fractions of days.
One-lining it because why not:
endDate = datestr(datenum('8-jan-2016')+90);
I wanted to post this as an alternative. It was mentioned by #excaza in a comment to #Adriaan's answer.
myDate = datenum(2016,1,8); % datenumber for january 8 2016
newDate = myDate + 90; % add 90 days
newDate2 = myDate + 2/24; % add 2 hours
newDate3 = myDate + 93/1440; % add 93 minutes
Then you can print the dates with datestr.
datestr(newDate)
ans =
07-Apr-2016
>> datestr(newDate2)
ans =
08-Jan-2016 02:00:00
>> datestr(newDate3)
ans =
08-Jan-2016 01:33:00
datenum can take many inputs and even user defined formats to read the date strings.

Matlab: count and sort data results in an inverse cumulative order

I have a script that cumulates my data and plots it afterwards. In my case my data are temperatures and the plots show the number of hours a year in which these temperatures and every temperature below are reached.
For example:
in 7500 hours a year it is 25 degree and colder
in 6000 hours a year it is 20 degree and colder
I get the result that i need using the matlab scrpit below:
filenameTRY2035='TZ10.dat';
daten = dlmread(filenameTRY2035);
TZ10 = sort(daten(1:length(daten)));
A = length(TZ7); A = A';
% plot
figure(1)
clf(1)
hold on;
h1 = plot(TZ10,A);
Now I want the temperatures counted the other way around.
For example:
in 1000 hours a year its 25 degrees and hotter
in 3500 hours a year it is 20 degrees and hotter
Could anyone help me modify my script in the way that I get the plots I need?
Thanks a lot,
Cheyenne
So let's say you have
TZ10 =
.... 7000 7300 7500 ....
7500 -> 25° or colder
7300 -> 24° or colder
7000 -> 23° or colder
...
And there are 8766 hours in a year.
Then the reversed order would be
l = length(TZ10);
TZ10_reverse(l) = 8766 - TZ10(1)
for temp = 2:l
TZ10_reverse(l - temp + 1) = (8766 - TZ10(temp)) + (TZ10(temp) - TZ10(temp - 1));
end
Because if there are 8766 hours a year and 7500 hours equals or colder than 25° a year, then there are 8766 - 7500 strictly warmer than 25° a year and TZ10(25) - TZ10(24) days equals to 25°
I also did it in order to get it sorted!
By the way....
TZ10 = sort(daten(1:length(daten)));
is equivalent to
TZ10 = sort(daten);
The elements of daten from 1 to the max index of daten is basicly daten itself!

Calculate week numbers in matlab

I have an annual time series where measurements are recorded at hourly intervals:
StartDate = '2011-01-01 00:00';
EndDate = '2011-12-031 23:00';
DateTime=datevec(datenum(StartDate,'yyyy-mm-dd HH:MM'):60/(60*24):...
datenum(EndDate,'yyyy-mm-dd HH:MM'));
dat = 2+(20-2).*rand(size(DateTime,1),1);
I would like to calculate the mean 24 hour cycle for each week of the year i.e. for week 1, day of year 1 to 7 I want to calculate the average 00:00, 01:00,... and so on so eventually I will end up with 52, 24 hour series i.e. one for each week of the year. Matlab does have a function called 'weeknum' which returns the week number from a given seriel date number, however, this function is in the financial toolbox. Can anyone suggest an alternative method for finding week number?
Maybe this can be of help (I'm using the current date and time as an example):
c = datevec(datestr(now));
week_num = ceil(diff(datenum([c(1), 1, 1, 0, 0, 0; c])) / 7)
I'm not sure how this solution handles edge cases properly, but I think it's a good place to start.
You can also verify it with this website that tells the current week number.
Applying this to your example can be done, for example, like so:
weeknum = #(v)ceil(diff(datenum([v(1), 1, 1, 0, 0, 0; v(:)'])) / 7);
arrayfun(#(n)weeknum(DateTime(n, :)), 1:size(DateTime, 1))'
According to Wikipedia, ISO week number is calculated like this.
I did it for today as an example.
offsetNotLeap = [0 31 59 90 120 151 181 212 243 273 304 334];
offsetLeap = [0 31 60 91 121 152 182 213 244 274 305 335];
todayVector = datevec(today);
todayNum = today;
ordinalDay = offsetLeap(todayVector(2)) + todayVector(3);
dayOfTheWeek = weekday(todayNum);
weekNumber = fix((ordinalDay - dayOfTheWeek + 10)/ 7)
weekNumber =
51
I didn't do the checks for 0 and 53 cases.
Also note that MATLAB's weekday function gives Sunday's index as 1, so if you want to make Monday's index 1, you need to do some adjustments. ISO says that Monday should be 1.
Just a simple weekday function with Monday as the first day of the week:
function [ daynumber,dayname ] = weekd( date )
%WEEKD Returns the day number and day name based on the input date
% First day of week
% -----------------
% The first day of the week is Monday.
%
% Parameter
% ---------
% date = Input date -> format: dd.mm.yyyy / Example: 21.11.2015
[dn,dayname] = weekday(datenum(date,'dd.mm.yyyy'));
if dn == 1;
daynumber = 7;
elseif dn >= 2 && dn <= 7
daynumber = dn - 1;
else
error('Invalid weekday number.'),
end
end
although some time has past since you posted your question, I want to give an additional answer, if people will have the same issue as me. As HebeleHododo already indicated, there exists a rule posted on Wikipedia how to compute the ISO week number. But anyhow, I wasn't able to find a generic code snippet for it.
Therefore, I developed the following generic method which is able to take vectors, matrices or single values of arbitrary datetimes with arbitrary reference years. Appreciate to read any constructive feedback. I successfully ran it on MATLAB R2017a and tested it further for some edge cases.
function week_num = get_week_num(dtimes)
dnums = datenum(dtimes);
firsts = datenum(year(dtimes), 1, 1);
ordinal = floor(dnums - firsts);
wday = weekday(dtimes) - 1;
wday(wday == 0) = 7;
week_num = floor((ordinal + wday + 10) / 7);
end