How to fill the last observations with retime in matlab? - matlab

I am interpolating variables from quarterly to monthly frequency in MATLAB. However, when I use retime it doesn't go as far as the end of the sample but it stops 2 months before.
Let me give you an example:
T = datetime(2002,01,01):calquarters:datetime(2019,12,01);
TT = timetable(T', randn(72,1))
x = retime(TT, 'monthly', 'spline') % interpolate
As you can see it gives me back 214 observations rather than 216, November and December 2019 are missing.
How can I fix it?
Thanks!

I don't have enough reputation to add a comment, but TT having 72 quarters instead of 73 means that you are actually storing dates from 1st January 2002 to 1st October 2019 - as the next quarter would start from 1st January 2020, which is then not included in your original array (you can check this by printing TT and checking if this date is included or not).
If this is the case, there is no way for retime to interpolate the missing months, as they aren't in the original matrix (that is, retime cannot interpolate from October to January, since there is no such thing in TT).
Replacing datetime(2019,12,01) with datetime(2020,01,01), as well as replacing randn(72,1) with randn(73,1), might solve your issue.

Related

UTC to GPS time for finding TOW in Simulink

for my project, I need to calculate TOW (Time of week) in Simulink. I know this can be achieved through conversion of UTC time to GPS time.
I have written a simple m-file in Matlab which does the action for me in Matlab as follow:
date_gps_int = 10000*y + 100*m + d
date_gps_str = int2str(date_gps_int)
date_gps_str_to_serial = datenum(date_gps_str,'yyyymmdd')
date_str_format = datestr(date_gps_str_to_serial,'dd-mmmm-yyyy')
Num_Days = daysact('06-jan-1980',date_str_format)
Num_Weeks = Num_Days/7
TOW = Num_Weeks - 1024
My first intention was to use this as a function in simulink. But apparently because of 'datenum' and 'datestr' it is not possible, since simulink does not handle strings.
Now I am wondering if anyone can help me with this issue. Is there any way to calculate TOW from the UTC date in Matlab without using those predefined functions?
I also tried to write an algorithm for calculating number of days since '6 January 1980' and then calculating number of weeks by dividing that by 7. But since I am not very familiar with leap year calculation and I don't really know the formula for these kinds of calculations, my result differs from real TOW.
I would appreciate if anybody can help me on this.
There are three formats handled by Matlab for time: formatted date strings - what datestr outputs -, serial date - scalar double, what datenum outputs - and date vectors (see datevec). Conversion functions work with these three, and the most convenient way to convert individual variables (year, month, etc) to a date is to build a date vector [yyyy mm dd HH MM SS].
date_gps_str_to_serial = datenum([y m d 0 0 0]); % midnight on day y-m-d
date_Jan_6_1980 = datenum([1980 01 06 0 0 0]); % midnight on Jan 6th, 1980
Num_Days = date_gps_str_to_serial - date_Jan_6_1980;
Now, beware of leap seconds...
GPS time is computed form the time elapsed since Jan 6th 1980. Take the number of seconds elapsed since that day, as measured by the satellites' atomic clocks, divide by (24*3600) to get a number of days, the remainder is the time of the day (in seconds since midnight).
But, once in a while, the International Earth Rotation and Reference Systems Service will decide that a day will last one second longer to accommodate for the slowing of Earth rotation. It may happen twice a year, on June 30th or December 31st. The calculation of GPS time is wrong, because it does not take into account that some days last 86401 seconds (so dividing by 24*3600 does not work) and will advance by 1 second with respect to UTC each time this happens. There has been 18 such days since Jan 6th 1980, so one should subtract 18 seconds from GPS time to find UTC time. The next time a leap second may be added is June 2019.

How to calculate averages for a 3D matrix for the same day of data over multiple years in Matlab?

I am interested in calculating GPH anomalies in Matlab. I have a 3D matrix of lat, lon, and data. Where the data (3rd dimension) is a daily GPH value spaced in one-day increments for 32 years (from Jan. 1st 1979 to Jan. 1st 2011). The matrix is 95x38x11689. How do I compute a daily average across all years for each day of data, when the matrix is 3D?
In other words, how do I compute the average of Jan. 1st dates for all years to compute the climatological mean of all Jan. 1st's from 1979-2010 (where I don't have time information, but a GPH value for each day)? And so forth for each day after. The data also includes leap years. How do I handle that?
Example: Sort, and average all Jan. 1st GPH values for indices 1, 365, 730, etc. And for each day of all years after that in the same manner.
First let's take out all the Feb 29th, because these days are in the middle of the data and not appear in every year, and will bother the averaging:
Feb29=60+365*[1:4:32];
mean_Feb29=mean(GPH(:,:,Feb29),3); % A matrix of 95x38 with the mean of all February 29th
GPH(:,:,Feb29)=[]; % omit Feb 29th from the data
Last_Jan_1=GPH(:,:,end); % for Jan 1st you have additional data set, of year 2011
GPH(:,:,end)=[]; % omit Jan 1st 2011
re_GPH=reshape(GPH,95,38,365,[]);
av_GPH=mean(re_GPH,4);
Now re_GPH is a matrix of 95x38x365, where each slice in 3rd dimension is an average of a day in the year, starting Jan 1st, etc.
If you want the to include the last Jan 1st (Jen 1st 2011), run this line at after the previous code:
av_GPH(:,:,1)=mean(cat(3,av_GPH(:,:,1),Last_Jan_1),3);
For the ease of knowing which slice nubmer corresponds to each date, you can make an array of all the dates in the year:
t1 = datetime(2011,1,1,'format','MMMMd')
t2 = datetime(2011,12,31,'format','MMMMd')
t3=t1:t2;
Now, for example :
t3(156)=
datetime
June5
So av_GPH(:,:,156) is the average of June 5th.
For your comment, if you want to subtract each day from its average:
sub_GPH=GPH-repmat(av_GPH,1,1,32);
And for February 29th, you will need to do that BEFORE you erase them from the data (line 3 up there):
sub_GPH_Feb_29=GPH(:,:,Feb29)-repmat(mean_Feb29,1,1,8);

datenum series to the end of February - Leap year or not

I want to create a list of dates that go until the end of February. However, since the end of February changes from 28 to 29 depending on whether there's a leap year, I'm having trouble with how to consider both options.
Here's what I have so far:
date = datenum(years(i),12,01):1:datenum(years(i)+1,02,29);
This case, when run on a year that is not a leap year, ends up counting March 1st instead of ending on Feb. 28th.
Here's a little hack I came up with. You can check whether a year is a leap year quite easily by calculating the number of days between February 28 and March 1, like so:
datenum(years(i), 3, 1) - datenum(years(i), 2, 28)
Checking whether it's larger than 1 would indicate leap year. This 1 or 0 logical MATLAB convention leads to the second part of the hack: this is exactly the number of days you need to add to Feb 28: 0 if not leap year, 1 if leap year. Here, therefore, is the full hack:
date = datenum(years(i),12,01):datenum(years(i)+1,02, ...
28 + ((datenum(years(i)+1,3,1) - datenum(years(i)+1,2,28))>1) );
UPDATE / IMPROVEMENT:
Answer already accepted, but I came up with an even better solution. I didn't realize that datenum simply counts days. In this case, we can simply say that the last day of February is the day before March 1. This yields the following drastic simplification:
date = datenum(years(i),12,01):1:(datenum(years(i)+1,3,1)-1);
Datenum, for good or ill, takes negative and zero numbers. So the last day of February can be written:
datenum(2015, 3, 0)
With a comment explaining this madness, of course.

Bins in Stata that will work in cem

In Stata is it possible (using the cem command) to create overlapping bins? For example, if a record in my treatment has DATE January 1, 2012, I want a match to be 'true' if a control record's DATE is within 2 days in either direction. I tried coding the bins manually with the treatment dates in the middle but since I have thousands of dates this is taking too long.
Using the above example control cases that would match could have dates December 30, 2011; December 31, 2011; January 1, 2012; January 2, 2012; or January 3, 2012.
You say:
I want a match to be 'true' if a control record's DATE is within 2 days
in either direction.
I have not checked the inner workings of the user-written command cem, but the variable cem_matched
(created after running cem) denotes whether an observation is matched or not and it
seems to depend on the observation belonging to a stratum in which there are
control and treatment observations. If a stratum has controled and treated
observations, they are all considered matched and cem_matched = 1. If not,
then all observations in the stratum have cem_matched = 0. So I do not see very
well how you want to modify this variable using as reference another.
Maybe you want to create the strata using the DATE variable. I'm no expert,
but to my knowledge, an observation must belong exclusively to one stratum or
another (this seems true for cem, at least). Overlapping bins violates this.
Your rule implies observations that could be to the right and left of a
certain cutpoint. From help cem:
. cem age (10 20 30 40 50) education (scott) re74, treatment(treated)
will coarsen the first variable, age into bins of (0-10), (10-20), (20-30), (30-40), (40-50) and (50+).
As you see, non-overlapping bins. What would it do if some overlapped? Where
would it assign the observation, to the bin on the left or to the right?
Some other criteria would be needed.
Maybe you want to discard (or flag) some observations per stratum based on the
DATE variable, after you run cem with other confounding covariates?
I'm not sure. Recall however that date variables in Stata can be computed on. See for example: http://www.ats.ucla.edu/stat/stata/modules/dates.htm
Note: cem is made available running ssc install cem.

MATLAB Change numbers to date

I have time set up as serial dates. Each number corresponds to a day, in order, from 20100101 to 20130611. How do I convert the serial date to a date in the format month-year? I need this because I want to plot data and need the x axis to show the date.
Thanks!
The first step is to convert your date-format into one of the standard Matlab date formats. The best format to use for plots is the "serial date format". The numbers itself are a bit awkward, since they represent the "amount of time after 0/0/0000, in days", which is a huge number. Also, this date actually never existed, making it really weird when you want to work with dates that are BC.
However, the conversion is easy, since your format also counts the days, but you count after 31st of December, 2009. You can convert this using
numeric_date_vec = datenum(2009, 12, 31) + x;
You then plot your data using
plot(numeric_date_vec, y)
and you let Matlab add the date-ticks automatically by calling
datetick('mmm yyyy')
The problem is, the ticks do not update after zooming in. You can either call
datetick('mmm yyyy','keeplimits')
again, after each zooming or panning, or you download datetickzoom from the Matlab file exchange. It takes the same arguments as datetick, but it hooks into the zoom function and updates the ticks automatically.
Edit:
Sometimes, the dateticks are not spaced in any sensible way, then you can either try to zoom in and out a little until it snaps to something good, or you have to set the ticks manually:
% Set ticks to first day of the months in 2010
tick_locations = datenum(2012,[1:12],1);
% Set ticks on x-axis
set(gca, 'XTick', tick_locations)
% Call datetick again to get the right date labels, use option "keepticks"
datetick('mmm yyyy','keeplimits', 'keepticks')
You might have to modify the tick_locations = datenum(2012,[1:12],1) a bit to get the ticks that you want. For instance, you can use
tick_locations = datenum(2012,[1:2:25],1)
to get every second month between Jan 2012 and Jan 2013.
For day number n use
datestr(datenum(2009, 12, 31) + n, 'yyyy-mm')
for example
>> datestr(datenum(2009, 12, 31)+365, 'yyyy-mm')
ans =
2010-12
>> datestr(datenum(2009, 12, 31)+366, 'yyyy-mm')
ans =
2011-01