MATLAB: Find all values on one date, then filter down to an hour and find average [duplicate] - matlab

This question already has answers here:
Counting values by day/hour with timeseries in MATLAB
(3 answers)
Closed 6 years ago.
I have a year's worth of data, the data is recorded one minute intervals each day of the year.
The date and time was imported from excel (in form 243.981944, then by adding 42004 (so will be for 2015) and formatting to date it becomes 31.8.15 23:34:00).
Importing to MATLAB it becomes
'31/08/2015 23:34:00'
I require the data for each day of the year to be at hourly intervals, so I need to sum the data recorded in each hour and divide that by the number of data recorded for that hour, giving me the hourly average.
For some reason the data in August actually increments in 2 minute intervals, data for every other month increments in one minute intervals.
ie
...
31/07/2015 23:57:00
31/07/2015 23:58:00
31/07/2015 23:59:00
31/08/2015 00:00:00
31/08/2015 00:02:00
31/08/2015 00:04:00
...
I'm not sure how I can find all the values for a specific date and hour in order to work out the averages. I was thinking of using a for loop to find the values on each day, but when I got down to writing code realised this wouldn't work the way I was thinking.
I presume there must be some kind of functions available that would allow for data to be filtered by the date and time?
edit:
So I tried the following but I get these errors.
dates is a 520000x1 cell array containing the dates form = formatIn.
formatIn = 'DD/MM/YYYY HH:MM:SS';
[~,M,D,H] = datevec(dates, formatIn);
Error using cnv2icudf (line 131) Unrecognized minute format.
Format string: DD/MM/YYYY HH:MM:SS.
Error in datevec (line 112) icu_dtformat = cnv2icudf(varargin{isdateformat});`

Assuming your data is in a matrix or cell-array of strings called A, and your other data is in a vector X. Let's say all the data is in the same year (so we can ignore years)
[~,M,D,H] = datevec(A, 'dd/mm/yyyy HH:MM:SS');
mean_A = accumarray([M, D, H+1], X, [], #mean);
Then data from February will be in
mean_A(2,:,:)
To look at the data, you may find the squeeze() function useful, e.g.
squeeze(mean_A(2,1:10,13:24))
shows the average for the hours after midday (by column) for the first ten days (by row) of February.
See also:
Counting values by day/hour with timeseries in MATLAB

Related

Indexing precipitaiton data by month in matlab

I have an array of daily precipitation data over a geographic region that is structured long x lat x time. I want to be able to index this data by both month and year, and so used datevec to find the months/years of each timestep. All good! However, I'm having trouble creating the index because the number of dates in each month and each year are not equal.
Originally, I'd thought that I would create two indexes using a months loop and year loop to find the rows associated with each month/year in the datevec. That is, I wanted something like (row_loc x month) and ( year x row_loc), with NaN filling in places where there would be no data (i.e. the last several values in February, or the last values of non-leap years). However, matlab doesn't seem to like the fact that the number of rows for each month/year aren't the same.
dates = datevec;
months = 1:12;
years = 1979:2016;
ind_mo = NaN(1178,length(months));
ind_yr = NaN(366,length(years));
for m = months
ind_mo(:,m) = find(dates(:,2) == m);
end
for y = 1:length(years)
ind_yr(:,y) = find(dates(:,1) == y);
end
Eventually, I'd like to have a way to index the precipitation data by month and by year. If there's a way to do that through the structure I'm imagining...great! If there's another workaround that's good too, although I'd prefer not to have a separate index for each of the 12 months and 38 years since I think this will be cumbersome.
Thanks in advance!

UTC to GPS time for finding TOW in Simulink

for my project, I need to calculate TOW (Time of week) in Simulink. I know this can be achieved through conversion of UTC time to GPS time.
I have written a simple m-file in Matlab which does the action for me in Matlab as follow:
date_gps_int = 10000*y + 100*m + d
date_gps_str = int2str(date_gps_int)
date_gps_str_to_serial = datenum(date_gps_str,'yyyymmdd')
date_str_format = datestr(date_gps_str_to_serial,'dd-mmmm-yyyy')
Num_Days = daysact('06-jan-1980',date_str_format)
Num_Weeks = Num_Days/7
TOW = Num_Weeks - 1024
My first intention was to use this as a function in simulink. But apparently because of 'datenum' and 'datestr' it is not possible, since simulink does not handle strings.
Now I am wondering if anyone can help me with this issue. Is there any way to calculate TOW from the UTC date in Matlab without using those predefined functions?
I also tried to write an algorithm for calculating number of days since '6 January 1980' and then calculating number of weeks by dividing that by 7. But since I am not very familiar with leap year calculation and I don't really know the formula for these kinds of calculations, my result differs from real TOW.
I would appreciate if anybody can help me on this.
There are three formats handled by Matlab for time: formatted date strings - what datestr outputs -, serial date - scalar double, what datenum outputs - and date vectors (see datevec). Conversion functions work with these three, and the most convenient way to convert individual variables (year, month, etc) to a date is to build a date vector [yyyy mm dd HH MM SS].
date_gps_str_to_serial = datenum([y m d 0 0 0]); % midnight on day y-m-d
date_Jan_6_1980 = datenum([1980 01 06 0 0 0]); % midnight on Jan 6th, 1980
Num_Days = date_gps_str_to_serial - date_Jan_6_1980;
Now, beware of leap seconds...
GPS time is computed form the time elapsed since Jan 6th 1980. Take the number of seconds elapsed since that day, as measured by the satellites' atomic clocks, divide by (24*3600) to get a number of days, the remainder is the time of the day (in seconds since midnight).
But, once in a while, the International Earth Rotation and Reference Systems Service will decide that a day will last one second longer to accommodate for the slowing of Earth rotation. It may happen twice a year, on June 30th or December 31st. The calculation of GPS time is wrong, because it does not take into account that some days last 86401 seconds (so dividing by 24*3600 does not work) and will advance by 1 second with respect to UTC each time this happens. There has been 18 such days since Jan 6th 1980, so one should subtract 18 seconds from GPS time to find UTC time. The next time a leap second may be added is June 2019.

How to get monthly totals from linearly interpolated data

I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end

how to simulate date for one year in kdb

i would like to simulate random timestamp data.
100 records in a day for one year.
How am I am able to do that?
when i set a:2013.01.01D00:00:00.000000000
100?a
the randomize data doesn't stay in a day.
thanks for your input
I am not sure, if this can be done easily. But you may generate 100 random timestamps for every day of 2013 in the next way
daysInYear: 365;
year: 2013.01.01D00:00:00.000000000;
//array of 365 elements, where every element represents corresponding date of year
dates: year + 01D * til daysInYear;
//array of 365 elements, where every element is an array of 100 random timestamps [0 .. 1D)
randomNanos: cut[100; (100 * daysInYear)?1D];
//array of 365 elements, where each element is an array of 100 random dateTimes for given day
result: dates + randomNanos;
//put all the dates in single array
raze result
The short version which does the same is below:
raze (2013.01.01D+01D * til 365) + cut[100; (100*365)?1D]
In order to simulate data for a single day, it's possible to generate random times (as floats less than one) and add them to the day you would like to generate data for. In this case:
D:2016.03.01;
D+100?1f
Will return 100 random times on 2016.03.01. If you want to generate data within a time range you can restrict the size of the float to something less than 1, or greater than a certain minimum value.
If you want to handle leap years... Not sure of a better way at the minute other than adding the max number of days onto the start of the year and asking whether it's the 31st. Adding on 366, it can either be 31st or 1st. If it's the 31st good, otherwise drop off the last date.
/e.g.
q)last 2015.01.01+til 365
2015.12.31
q)last 2016.01.01+til 365
2016.12.30 /we are a day short
q)
/return the dates and the number of days based on whether its a leap year
q)dd:$[31i~`dd$last d:2016.01.01+til 366;(366;d);(365;-1_d)]
q)/returns (366;2016.01.01 2016.01.02...)
q)/the actual logic below is pretty much the same as the other answer
q)raze{[n;dy;dt] dt+n cut(n*dy)?.z.N}[100;].dd
2016.01.01D16:06:53.957527121 2016.01.01D10:55:10.892935198 2016.01.01D15:36:..

Iterate for loop by hour in MATLAB

I am writing a for loop to average 10 years of hourly measurements made on the hour. The dates of the measurements are recorded as MATLAB datenums.
I am trying to iterate through using 0.0417 as it is the datenum for 1AM 00/00/00 but it is adding in a couple of seconds of error each time I iterate.
Can anyone recommend a better way for me to iterate by hour?
date = a(:,1);
load = a(:,7);
%loop for each hour of the year
for i=0:0.0417:366
%set condition
%condition removes year from current date
c = date(:)-datenum(year(date(:)),0,0)==i;
%evaluate condition on load vector and find mean
X(i,2)=mean(load(c==1));
end
An hour has a duration of 1/24 day, not 0.0417. Use 1/24 and the precision is sufficient high for a year.
For an even higher precision, use something like datenum(y,1,1,1:24*365,0,0) to generate all timestamps.
To avoid error drift entirely, specify the index using integers, and divide the result down inside the loop:
for hour_index=1:365*24
hour_datenum = (hour_index - 1) / 24;
end