Counting values by day/hour with timeseries in MATLAB - matlab

So, I'm beginning to use timeseries in MATLAB and I'm kinda stuck.
I have a list of timestamps of events which I imported into MATLAB. It's now a 3000x25 array which looks like
2000-01-01T00:01:01+00:00
2000-01-01T00:01:02+00:00
2000-01-01T00:01:03+00:00
2000-01-01T00:01:04+00:00
As you can see, each event was recorded by date, hour, minute, second, etc.
Now, I would like to count the number of events by date, hour, etc. and then do various analyses (regression, etc.).
I considered creating a timeseries object for each day, but considering the size of the data, that's not practical.
Is there any way to manipulate this array such that we have "date: # of events"?
Perhaps there's just a simpler way to count events using timeseries?

As others have suggested, you should convert the string dates to serial date numbers. This makes it easy to work with the numeric data.
An efficient way to count number of events per interval (days, hours, minutes, etc...) is to use functions like HISTC and ACCUMARRAY. The process will involve manipulating the serial dates into units/format required by such functions (for example ACCUMARRAY requires integers, whereas HISTC needs to be given the bin edges to specify the ranges).
Here is a vectorized solution (no-loop) that uses ACCUMARRAY to count number of events. This is a very efficient function (even of large input). In the beginning I generate some sample data of 5000 timestamps unevenly spaced over a period of 4 days. You obviously want to replace it with your own:
%# lets generate some random timestamp between two points (unevenly spaced)
%# 1000 timestamps over a period of 4 days
dStart = datenum('2000-01-01'); % inclusive
dEnd = datenum('2000-01-5'); % exclusive
t = sort(dStart + (dEnd-dStart).*rand(5000,1));
%#disp( datestr(t) )
%# shift values, by using dStart as reference point
dRange = (dEnd-dStart);
tt = t - dStart;
%# number of events by day/hour/minute
numEventsDays = accumarray(fix(tt)+1, 1, [dRange*1 1]);
numEventsHours = accumarray(fix(tt*24)+1, 1, [dRange*24 1]);
numEventsMinutes = accumarray(fix(tt*24*60)+1, 1, [dRange*24*60 1]);
%# corresponding datetime range/interval label
days = cellstr(datestr(dStart:1:dEnd-1));
hours = cellstr(datestr(dStart:1/24:dEnd-1/24));
minutes = cellstr(datestr(dStart:1/24/60:dEnd-1/24/60));
%# display results
[days num2cell(numEventsDays)]
[hours num2cell(numEventsHours)]
[minutes num2cell(numEventsMinutes)]
Here is the output for the number of events per day:
'01-Jan-2000' [1271]
'02-Jan-2000' [1258]
'03-Jan-2000' [1243]
'04-Jan-2000' [1228]
And an extract of the number of events per hour:
'02-Jan-2000 09:00:00' [50]
'02-Jan-2000 10:00:00' [54]
'02-Jan-2000 11:00:00' [53]
'02-Jan-2000 12:00:00' [74]
'02-Jan-2000 13:00:00' [49]
'02-Jan-2000 14:00:00' [59]
similarly for minutes:
'03-Jan-2000 08:54:00' [1]
'03-Jan-2000 08:55:00' [1]
'03-Jan-2000 08:56:00' [1]
'03-Jan-2000 08:57:00' [0]
'03-Jan-2000 08:58:00' [0]
'03-Jan-2000 08:59:00' [0]
'03-Jan-2000 09:00:00' [1]
'03-Jan-2000 09:01:00' [2]

You can convert those timestamps to a number with datenum:
A serial date number represents the whole and fractional number of days from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns the number 1. (The year 0000 is merely a reference point and is not intended to be interpreted as a real year in time.)
This way, it's easier to check where a period starts and end. Eg: the week your looking for starts at x and ends at x+7.999... ; all you have to do to find events in that period is checking if the datenum value is between x and x+8:
week_x_events = find(dn_timestamp>=x & dn_timestamp<x+8)
The difficulty is in converting your timestamp to datenum acceptable format, which is doable using regexp, good luck!

I don't know what +00:00 means (maybe time zone?), but you can simply convert your string timestamps into numerical format:
>> t = datenum('2000-01-01T00:01:04+00:00', 'yyyy-mm-ddTHH:MM:SS')
t =
7.3049e+005
>> datestr(t)
ans =
01-Jan-2000 00:01:04

Related

Shift time series to start from zero H:M:S:MS (possibly in Matlab)

I have some ECG data for a number of subjects. For each subject, I can export an excel file with the RR interval, Heart Rate and other measures. The problem is that I have a timestamp starting at the time of recording (in this case 11:22:3:00).
I need to compare the date with other subjects and I want to automate the procedure in Matlab.
I need to flexibly compare, for instance, the first 3 minutes of subjects in condition 1 with those of sbj in condition 2. Or minutes 4 to 8 of condition 1 and 2 and so forth. To do this, I am thinking that the best way is to shift the time vector for each subject so that it starts from 0.
There are a couple of problems to note: I CANNOT create just one vector for all subjects. This would be inaccurate because the heart measures are variable for each individual.
So, IN SHORT I need to shift the time vector for each participant so that it starts at 0 and increases exactly like the original one. So, in this example:
H: M: S: MS RR HR
11:22:03:000 0.809 74.1
11:22:03:092 0.803 74.7
11:22:03:895 0.768 78.1
11:22:04:663 0.732 81.9
11:22:05:395 0.715 83.9
11:22:06:110 0.693 86.5
11:22:06:803 0.705 85.1
11:22:07:508 0.706 84.9
11:22:08:214 0.749 80.1
11:22:08:963 0.762 78.7
11:22:09:725 0.766 78.3
would become:
00:00:00:0000
00:00:00:092
00:00:00:895
00:00:01:663
and so forth...
I would like to do it in Matlab...
P.S.
I was working around the idea of extracting the info in 4 different variables.
Then, I could subtract the values for each cell from the first cell.
For instance:
11-11 = 0; 22-22=0; 03-03=0; ms: keep the same value
Maybe this could kind of work, except that it wouldn't if I have a subject that started, say, at 11:55:05:00
Thank you all for any help.
Gluce
Basic timestamp normalization just subtracts the minimum (or first, assuming they're properly ordered) time from the rest.
With MATLAB's datetime object, this is just subtraction, which yields a duration object:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
% Convert to datetime & normalize
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
nt = t - t(1);
% Reformat & display
nt.Format = 'hh:mm:ss.SSS';
Which returns:
>> nt
nt =
1×4 duration array
00:00:00.000 00:00:00.092 00:00:00.895 00:00:01.663
Alternatively, you can normalize the datetime array itself:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
[h, m, s] = hms(t);
[t.Hour, t.Minute, t.Second] = deal(h - h(1), m - m(1), s - s(1));
Which returns the same:
>> t
t =
1×4 datetime array
00:00:00:000 00:00:00:092 00:00:00:895 00:00:01:663

MATLAB Find the indexes that have a particular date

I am trying to find the index in Date3, a column vector of date numbers from 01/01/2008 to 01/31/2014 that has been repeated many times, matches day. I basically want to organize idx into a cell array like idx{i} in which each cell is one day (So return all the indexes where Date3 equals day, where day is one of the days between 01/01/2008 and 01/31/2014. Eventually, I want to pull out the data for each day by applying the index I found to the variable Data2 (Reshape Data2 so that instead of a long column vector of data concentrations, I'll have a cell array in which each cell is all the data from one day)
This is what I have been doing:
for day = datenum(2008,01,01):1:datenum(2014,01,31); % All the days under consideration
idx = find(Date3, day); % Index of where Date3 equals the day under consideration
Data_PM25 = Data2(idx); % Pull out the data based on the idx
end
Example:
If Date3 looks like the following (It's actually much larger and repeats many many more times)
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
I want idx to be
`idx{1}` = (1, 15) % Where 733408 repeats
`idx{2}` = (2, 16) % Where 733409 repeats
...
And then Data2, which looked like:
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
'25.8'
'26.1'
'28.9'
'37.5'
'25.2'
'20'
'32.3'
'41'
'46.7'
'28.2'
'34.5'
'31.8'
'37.6'
'45.5'
'54.9'
'54.8'
'36.3'
'18.5'
Will now look like
'Data_PM25{1}' = ([NaN], '25.2')
'Data_PM25{2}' = ([NaN], '20')
...
Of course, the actual outputs will be much longer than just two matches.
What appears to be happening though is that I am comparing every day against Date3, a list of days, so I am getting all the days back.
This question expands off of a previous question: Find where a value matches and concatentate into column vector MATLAB
Use find(ismember…) to find where a certain day shows up in the long column of Date3 and then used that to pull out all the data, Lat, and Lon from that day.
group2cell for some reason, created double the number of days there were supposed to be. Basically, it somehow pulled out two years in which the first year it pulled out (1:365 or 1:366) was random.
day = datenum(years(y), 01, 01):datenum(years(y), 12, 31); % Create a column vector of all days in one year
for d = 1:length(day)
% Find index where Date3 matches one day, one day at a time
ind{d} = find(ismember(datenum(Date3), day(d)) == 1);
data_O3{d} = Data2(ind{d});
end
How about this: The result should be a cell array of vectors, each vector corresponding to the data from a given day.
This of course assumes Date3 and Data2 are the same size
Date3=[733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421];
Data2 = Date3.*50; % //just some dummy data for testing
Data_pm25=cell(0);
for day=datenum(2008,01,01):datenum(2014,01,01)
idx=day==Date3;
Data_pm25 = [Data_pm25,Data2(idx)];
end
The only problem I can see with this is if you don't have data for every day, you wouldn't necessarily know which day each cell is representing. This could easily be solved by storing a vector of dates with data.
As I mentioned in your other question try GROUP2CELL function from FileExchange.
You can use it on index as
[newidx, uniqueDate] = group2cell(idx,Date2);
or directly on data
[Data_PM25, uniqueDate] = group2cell(Data2,Date2);
uniqueDate will be an array of all unique Dates. uniqueDate(i) will correspond to Data_PM25(i).

subtracting one from another in matlab

My time comes back from a database query as following:
kdbstrbegtime =
09:15:00
kdbstrendtime =
15:00:00
or rather this is what it looks like in the command window.
I want to create a matrix with the number of rows equal to the number of seconds between the two timestamps. Are there time funcitons that make this easily possible?
Use datenum to convert both timestamps into serial numbers, and then subtract them to get the amount of seconds:
secs = fix((datenum(kdbstrendtime) - datenum(kdbstrbegtime)) * 86400)
Since the serial number is measured in days, the result should be multiplied by 86400 ( the number of seconds in one day). Then you can create a matrix with the number of rows equal to secs, e.g:
A = zeros(secs, 1)
I chose the number of columns to be 1, but this can be modified, of course.
First you have to convert kdbstrendtime and kdbstrbegtime to char by datestr command, then:
time = datenum(kdbstrendtime )-datenum(kdbstrbegtime )
t = datestr(time,'HH:MM:SS')

Vectorising Date Array Calculations

I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.

find mean or median date of event

I have a dataset for which I have extracted the date at which an event occurred. The date is in the format of MMDDYY although MatLab does not show leading zeros so often it's MDDYY.
Is there a method to find the mean or median (I could use either) date? median works fine when there is an odd number of days but for even numbers I believe it is averaging the two middle ones which doesn't produce sensible values. I've been trying to convert the dates to a MatLab format with regexp and put it back together but I haven't gotten it to work. Thanks
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
You can use datenum to convert dates to a serial date number (1 at 01/01/0000, 2 at 02/01/0000, 367 at 01/01/0001, etc.):
strDate='27112011';
numDate = datenum(strDate,'ddmmyyyy')
Any arithmetic operation can then be performed on these date numbers, like taking a mean or median:
mean(numDates)
median(numDates)
The only problem here, is that you don't have your dates in a string type, but as numbers. Luckily datenum also accepts numeric input, but you'll have to give the day, month and year separated in a vector:
numDate = datenum([year month day])
or as rows in a matrix if you have multiple timestamps.
So for your specified example data:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
years = mod(dates,100);
dates = (dates-years)./100;
days = mod(dates,100);
months = (dates-days)./100;
years = years + 1900; % set the years to the 20th century
numDates = datenum([years(:) months(:) days(:)]);
fprintf('The mean date is %s\n', datestr(mean(numDates)));
fprintf('The median date is %s\n', datestr(median(numDates)));
In this example I converted the resulting mean and median back to a readable date format using datestr, which takes the serial date number as input.
Try this:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
d=zeros(1,length(dates));
for i=1:length(dates)
d(i)=datenum(num2str(dates(i)),'ddmmyy');
end
m=mean(d);
m_str=datestr(m,'dd.mm.yy')
I hope this info to be useful, regards
Store the dates as YYMMDD, rather than as MMDDYY. This has the useful side effect that the numeric order of the dates is also the chronological order.
Here is the pseudo-code for a function that you could write.
foreach date:
year = date % 100
date = (date - year) / 100
day = date % 100
date = (date - day) / 100
month = date
newdate = year * 100 * 100 + month * 100 + day
end for
Once you have the dates in YYMMDD format, then find the median (numerically), and this is also the median chronologically.
You see above how to present dates as numbers.
I will add no your issue of finding median of the list. The default matlab median function will average the two middle values when there are an even number of values.
But you can do it yourself! Try this:
dates; % is your array of dates in numeric form
sdates = sort(dates);
mediandate = sdates(round((length(sdates)+1)/2));