For two arrays (of different length) with datetime values, I want to flag those entries where the date (year, month, day) in both is the same while ignoring the time values. The issue is that the datetime type internally always also contains a time (even the constructor datetime(Y,M,D) will internally set the time to 00:00:00). Therefore a direct comparison of datetimes (dt1 == dt2) will give true only if the times are also the same.
E.g.:
arDt1 = datetime(2021,5,1:10,0,0,0); % 1. to 10. May, always at time 00:00:00
arDt2 = datetime(2021,5,1:2:10,12,0,0); % 1.,3.,5.,7.,9. May always at time 12:00:00
The desired logical array of flags for arDt1 would thus be
[ 1 0 1 0 1 0 1 0 1 0] % days in arDt1 which are also in arDt2, *never mind the time*
The real data extend across multiple months and years, so using the day-of-month number alone is not sufficient (even if it would be for the example above).
I can think of various work-arounds (converting datetime to string and parsing; using ymd() and then compare year, month, day "piecewise"; or forcing all times, at least temporarily, to be the same, e.g. 00:00:00), but it all gets a bit clumsy, esp. when arrays of different lengths are involved, which means ismember() has to be used instead of a direct one-to-one array comparison.
The problem of of processing only the date part of datetime values, regardless of the time part, should be a rather general one, but I can't find any examples or references. Can someone advise a more elegant/efficient way?
You can use the DATESHIFT function to move all of the dates to the start of the day and then do the comparison.
ismember(dateshift(arDt1,'start','day'),dateshift(arDt2,'start','day'))
ans =
1×10 logical array
1 0 1 0 1 0 1 0 1 0
Alternative solution would be to use the properties of datetime directly. Meaning comparing the year, month and day properties. I prefere this approch since it allows me for example also to filter for a given month (neglecting the day).
arDt1 = datetime(2021,5,1:10,0,0,0); % 1. to 10. May, always at time 00:00:00
arDt2 = datetime(2021,5,1:2:10,12,0,0); % 1.,3.,5.,7.,9. May always at time 12:00:00
arDtYMD1 = [arDt1.Year', arDt1.Month', arDt1.Day']; % 10x3 double
arDtYMD2 = [arDt2.Year', arDt2.Month', arDt2.Day']; % 5x3 double
all(ismember(arDtYMD1, arDtYMD2),2) % 10x1 logical array
Related
I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end
I have two excel files with dates in each of them. The goal is to find the location of datetimes in file A in file B.
e.g.
Excel file A has dates and each hour in column A from 1Jan1970 1AM to 31Dec2015 1AM with a lot of random missing dates and hour.
Excel file B has date e.g. 1jan1978 5PM
I read file A in array called A and do the following:
ind = find( x2mdate(A) == x2mdate(28491.7083333333) ); %datestr(x2mdate(28491.7083333333)) ans = 01-Jan-1978 17:00:00
it returns empty even though I can see that 1/1/1978 all hours are available in file A.
This is clearly a rounding issue. So, how do I deal with this? I tried using datestr but it is very slow.
Instead of x2mdate(28491.7083333333), try using:
datenum('01-Jan-1978 17:00:00', 'dd-mmm-yyyy HH:MM:SS')
It's easy to see that because of the rounding, they are not considered equal:
>> datenum('01-Jan-1978 17:00:00', 'dd-mmm-yyyy HH:MM:SS') == x2mdate(28491.7083333333)
ans =
0
You are comparing to the wrong value. 28491.7083333333 is slightly off the value you are looking for. When you want to use a precise match with constant floats, you have to use 17 digits. Otherwise compare with a reasonable tolerance.
tol=datenum(0,0,0,0,0,60) %60 seconds tolerance
ind = find( abs(x2mdate(A) - x2mdate(28491.7083333333)<tol );
I will jump straight into a minimal example as I find this difficult to put into words. I have the following example:
Data.Startdate=[datetime(2000,1,1,0,0,0) datetime(2000,1,2,0,0,0) datetime(2000,1,3,0,0,0) datetime(2000,1,4,0,0,0)];
Data.Enddate=[datetime(2000,1,1,24,0,0) datetime(2000,1,2,24,0,0) datetime(2000,1,3,24,0,0) datetime(2000,1,4,24,0,0)];
Data.Value=[0.5 0.1 0.2 0.4];
Event_start=[datetime(2000,1,1,12,20,0) datetime(2000,1,1,16,0,0) datetime(2000,1,4,8,0,0)];
Event_end=[datetime(2000,1,1,14,20,0) datetime(2000,1,1,23,0,0) datetime(2000,1,4,16,0,0)];
What I want to do is add a flag to the Data structure (say a 1) if any time between Data.Startdate and Data.Enddate falls between Event_start and Event_end. In the example above Data.Flag would have have the values 1 0 0 1 because from the Event_start and Event_end vectors you can see there are events on January 1st and January 4th. The idea is that I will use this flag to process the data further.
I am sure this is straightforward but would appreciate any help you can give.
I would convert the dates to numbers using datenum, which then allows fairly convenient comparisons using bsxfun:
isStartBeforeEvent = bsxfun(#gt,datenum(Event_start)',datenum(Data.Startdate));
isEndAfterEvent = bsxfun(#lt,datenum(Event_end)',datenum(Data.Enddate));
flag = any(isStartBeforeEvent & isEndAfterEvent, 1)
How can I convert this kind of data 08:00:43.771 given as string into a number specifying the number of milliseconds since midnight corresponding to this time instance?
I generally use the Matlab datenum outputs for timestamping in Matlab. Datenums are the number of days since 0/0/0000, expressed as a double (double precision numbers are precise to about 14 usec for contemporary dates).
Using datenums.
currentDateTime1 = datenum('08:00:43.771'); %Assumes today
currentDateTime2 = datenum('6/8/1975 08:00:43.771'); %Using an explicit date
millisecondsSinceMidnight = mod(currentDateTime1 ,1) *24*60*60*1000; %Mod 1 removes any day component
millisecondsSinceMidnight = mod(currentDateTime2 ,1) *24*60*60*1000; %Then this is just a unit conversion
For unusual string formats, use the extended form of datenum, which can accept a string format specifier.
Use 1000*etime(datevec('08:00:43.771'),datevec('0:00')) to give the number of milliseconds since midnight. etime gives the number of seconds between two date vectors, datevec converts strings to date vectors (assuming Jan 1 this year if only a time is given).
I have a data file which contains time data. The list is quite long, 100,000+ points. There is data every 0.1 seconds, and the time stamps are so:
'2010-10-10 12:34:56'
'2010-10-10 12:34:56.1'
'2010-10-10 12:34:56.2'
'2010-10-10 12:34:53.3'
etc.
Not every 0.1 second interval is necessarily present. I need to check whether a 0.1 second interval is missing, then insert this missing time into the date vector. Comparing strings seems unnecessarily complicated. I tried comparing seconds since midnight:
date_nums=datevec(time_stamps);
secs_since_midnight=date_nums(:,4)*3600+date_nums(:,5)*60+date_nums(:,6);
comparison_secs=linspace(0,86400,864000);
res=(ismember(comparison_secs,secs_since_midnight)~=1);
However this approach doesn't work due to rounding errors. Both the seconds since midnight and the linspace of the seconds to compare it to never quite equal up (due to the tenth of a second resolution?). The intent is to later do an fft on the data associated with the time stamps, so I want as much uniform data as possible (the data associated with the missing intervals will be interpolated). I've considered blocking it into smaller chunks of time and just checking the small chunks one at a time, but I don't know if that's the best way to go about it. Thanks!
Multiply your numbers-of-seconds by 10 and round to the nearest integer before comparing against your range.
There may be more efficient ways to do this than ismember. (I don't know offhand how clever the implementation of ismember is, but if it's The Simplest Thing That Could Possibly Work then you'll be taking O(N^2) time that way.) For instance, you could use the timestamps that are actually present (as integer numbers of 0.1-second intervals) as indices into an array.
Since you're concerned with missing data records and not other timing issues such as a drifting time channel, you could check for missing records by converting the time values to seconds, doing a DIFF and finding those first differences that are greater than some tolerance. This would tell you the indices where the missing records should go. It's then up to you to do something about this. Remember, if you're going to use this list of indices to fill the gaps, process the list in descending index order since inserting the records will cause the index list to be unsynchronized with the data.
>> time_stamps = now:.1/86400:now+1; % Generate test data.
>> time_stamps(randi(length(time_stamps), 10, 1)) = []; % Remove 10 random records.
>> t = datenum(time_stamps); % Convert to date numbers.
>> t = 86400 * t; % Convert to seconds.
>> index = find(diff(t) > 1.999 * 0.1)' + 1 % Find missing records.
index =
30855
147905
338883
566331
566557
586423
642062
654682
733641
806963