MATLAB Find the indexes that have a particular date - matlab

I am trying to find the index in Date3, a column vector of date numbers from 01/01/2008 to 01/31/2014 that has been repeated many times, matches day. I basically want to organize idx into a cell array like idx{i} in which each cell is one day (So return all the indexes where Date3 equals day, where day is one of the days between 01/01/2008 and 01/31/2014. Eventually, I want to pull out the data for each day by applying the index I found to the variable Data2 (Reshape Data2 so that instead of a long column vector of data concentrations, I'll have a cell array in which each cell is all the data from one day)
This is what I have been doing:
for day = datenum(2008,01,01):1:datenum(2014,01,31); % All the days under consideration
idx = find(Date3, day); % Index of where Date3 equals the day under consideration
Data_PM25 = Data2(idx); % Pull out the data based on the idx
end
Example:
If Date3 looks like the following (It's actually much larger and repeats many many more times)
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
I want idx to be
`idx{1}` = (1, 15) % Where 733408 repeats
`idx{2}` = (2, 16) % Where 733409 repeats
...
And then Data2, which looked like:
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
'25.8'
'26.1'
'28.9'
'37.5'
'25.2'
'20'
'32.3'
'41'
'46.7'
'28.2'
'34.5'
'31.8'
'37.6'
'45.5'
'54.9'
'54.8'
'36.3'
'18.5'
Will now look like
'Data_PM25{1}' = ([NaN], '25.2')
'Data_PM25{2}' = ([NaN], '20')
...
Of course, the actual outputs will be much longer than just two matches.
What appears to be happening though is that I am comparing every day against Date3, a list of days, so I am getting all the days back.
This question expands off of a previous question: Find where a value matches and concatentate into column vector MATLAB

Use find(ismember…) to find where a certain day shows up in the long column of Date3 and then used that to pull out all the data, Lat, and Lon from that day.
group2cell for some reason, created double the number of days there were supposed to be. Basically, it somehow pulled out two years in which the first year it pulled out (1:365 or 1:366) was random.
day = datenum(years(y), 01, 01):datenum(years(y), 12, 31); % Create a column vector of all days in one year
for d = 1:length(day)
% Find index where Date3 matches one day, one day at a time
ind{d} = find(ismember(datenum(Date3), day(d)) == 1);
data_O3{d} = Data2(ind{d});
end

How about this: The result should be a cell array of vectors, each vector corresponding to the data from a given day.
This of course assumes Date3 and Data2 are the same size
Date3=[733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421];
Data2 = Date3.*50; % //just some dummy data for testing
Data_pm25=cell(0);
for day=datenum(2008,01,01):datenum(2014,01,01)
idx=day==Date3;
Data_pm25 = [Data_pm25,Data2(idx)];
end
The only problem I can see with this is if you don't have data for every day, you wouldn't necessarily know which day each cell is representing. This could easily be solved by storing a vector of dates with data.

As I mentioned in your other question try GROUP2CELL function from FileExchange.
You can use it on index as
[newidx, uniqueDate] = group2cell(idx,Date2);
or directly on data
[Data_PM25, uniqueDate] = group2cell(Data2,Date2);
uniqueDate will be an array of all unique Dates. uniqueDate(i) will correspond to Data_PM25(i).

Related

MATLAB drop observations from a timetable not contained in another timetable

I have two timetables, each of them have 4 columns, where the first 2 columns are of my particular interest. The first column is a date and the second is an hour.
How can I know which observations (by date an hour) are in the timetable 1 but not in the timetable 2 and, therefore, drop those observations from my timetable 1?
So for example, just by looking I realized that timetable1 included the day 25/05/2015 with hours 1 and 2, but the timetable 2 did not include them, therefore I would like to drop those observations from timetable 1.
I tried using the command groups_timetable1 = findgroups(timetable1.Date,timetable1.Hour);but unfortunately this command does not tell you a lot how to distinguish between observations.
Thank you!
call ismember to find one set of data in another.
to find multiple records as a group in another composite records, you call ismember(..., 'rows').
for example
baseline=[
100, 2.1
200, 7.5
120, 11.0
];
isin=ismember(baseline,[200, 7.5],'rows');
pos=find(isin)
if you have time date strings or datetime objects, please convert those to numerical values, such as by calling datenum or posixtime first.
You can use the timetable method innerjoin to do this. Like so:
% Fabricate some data
dates1 = datetime(2015, 5, ones(10,1));
hours1 = (1:10)';
timetable1 = timetable(dates1(:), hours1, rand(10,1), rand(10,1), ...
'VariableNames', {'Hour', 'Price', 'Volume'});
% Subselect a few rows for timetable2
timetable2 = timetable1([1:3, 6:10],:);
% Use innerjoin to pick rows where Time & Hour intersect:
innerjoin(timetable1, timetable2, 'Keys', {'Time', 'Hour'})
By default, the result of innerjoin contains the table variables from both input tables - that may or may not be what you want.

Matlab regexp split time array and save time output for plotting

I'm trying to loop through an array of dates/times in matlab, split each column using regexp with the following delimiters ('/' or ':' or '.'), and store each column separately as year, day, hour, min, sec, ss, respectively. Ultimately I'm trying to turn this array of Julian dates and times into a plot-able format in matlab. So far I've been able to loop through my array called 'time' and created a new 1x6 cell called 'clean2_time' which splits each row into 6 columns (year, day, hour, min, sec, ss) based on the delimiters '/' ':' and '.'. My issue is that the loop overwrites 'clean2_time' every iteration and I am left with only the final 1x6 time stamp for the last row. I have tried creating a new variable of all zeros 'z' and setting 'clean2_time' equal to z but have no luck.
Sample of 'time':
'2013/231/21:38:09.856619'
'2013/231/21:38:09.955640'
'2013/231/21:38:10.156685'
'2013/231/21:38:10.356550'
'2013/231/21:38:10.556770'
'2013/231/21:38:10.756565'
'2013/231/21:38:10.955627'
'2013/231/21:38:11.256588'
'2013/231/21:38:11.556649'
'2013/231/21:38:11.955597'
'2013/231/21:38:12.356627'
'2013/231/21:38:12.856557'
'2013/231/21:38:13.356558'
'2013/231/21:38:14.156530'
'2013/231/21:38:14.970500'
'2013/231/21:38:16.256545'
'2013/231/21:38:16.266736'
'2013/231/21:38:18.156398'
Code I've tried so far:
z=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time = regexp(time{i,1}, '[/:.]', 'split');
z{i,1} = clean2_time(i,1)
z{i,2} = clean2_time(i,2)
z{i,3} = clean2_time(i,3)
z{i,4} = clean2_time(i,4)
z{i,5} = clean2_time(i,5)
z{i,6} = clean2_time(i,6)
end
You are on the right track, however, you don't need the for loop.
Simply doing this would suffice:
clean2_time=regexp(time, '[/:.]', 'split');
Then clean2_time is a cell structure in which every row contains another 1x6 cell array. You can then access the different values with: clean2_time{row}{column}. If you really want clean2_time to be a nx6 numerical matrix instead of this cell array of strings, simply use this to reshape:
clean2_time=cellfun(#str2num,vertcat(clean2_time{:}))
clean2_time=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time(i,:)=regexp(time{i,1}, '[/:.]', 'split')
end
clean2_time(i,:) indexes the i-th row of the cell.

MATLAB - Count Number of Entries in Each Year

I have a .mat file that contains data from the years 2006-2100. Each year, there is a different number of lines. I need to count how many lines are 2006, how many are 2007, etc.
The set up, by column, is: Year, Month, Day, Lat, Long
I just want to count the number of rows containing the same Year entry and get an array back with an array containing that info.
I'm thinking a for or while loop should work, but I don't know how to right it.
If we assume your data are in a numeric matrix, you can just do:
num_lines2006 = sum(data(:,1)==2006);
data2006 = data(data(:,1)==2006),:);
If you want to add a column with number of rows for corresponding year, here is a solution with a loop:
for k=size(data,1):-1:1
num_year(k,1) = sum(data(:,1)==data(k,1));
end
data = [data num_year];
Here is a solution without loop:
[unq_year,~,idx] = unique(data(:,1),'stable');
num_year = grpstats(data(:,1),unq_year,#numel);
data = [data num_year(idx)];
To count numeric entries, you may want to use histc
years = unique(data(:,1);
counts = histc(data(:,1),years);
Since you just want to count the number of rows you could just write something simple like:
years = unique(data(:, 1));
counts = arrayfun(#(year) nnz(data(:, 1) == year), years);
years contains the unique years, and numRows the number of times they are found.
You could also use a one-liner inspired by Jonas' answer:
[counts, years] = hist(data(:,1), unique(data(:,1))');

Vectorising Date Array Calculations

I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.

Counting values by day/hour with timeseries in MATLAB

So, I'm beginning to use timeseries in MATLAB and I'm kinda stuck.
I have a list of timestamps of events which I imported into MATLAB. It's now a 3000x25 array which looks like
2000-01-01T00:01:01+00:00
2000-01-01T00:01:02+00:00
2000-01-01T00:01:03+00:00
2000-01-01T00:01:04+00:00
As you can see, each event was recorded by date, hour, minute, second, etc.
Now, I would like to count the number of events by date, hour, etc. and then do various analyses (regression, etc.).
I considered creating a timeseries object for each day, but considering the size of the data, that's not practical.
Is there any way to manipulate this array such that we have "date: # of events"?
Perhaps there's just a simpler way to count events using timeseries?
As others have suggested, you should convert the string dates to serial date numbers. This makes it easy to work with the numeric data.
An efficient way to count number of events per interval (days, hours, minutes, etc...) is to use functions like HISTC and ACCUMARRAY. The process will involve manipulating the serial dates into units/format required by such functions (for example ACCUMARRAY requires integers, whereas HISTC needs to be given the bin edges to specify the ranges).
Here is a vectorized solution (no-loop) that uses ACCUMARRAY to count number of events. This is a very efficient function (even of large input). In the beginning I generate some sample data of 5000 timestamps unevenly spaced over a period of 4 days. You obviously want to replace it with your own:
%# lets generate some random timestamp between two points (unevenly spaced)
%# 1000 timestamps over a period of 4 days
dStart = datenum('2000-01-01'); % inclusive
dEnd = datenum('2000-01-5'); % exclusive
t = sort(dStart + (dEnd-dStart).*rand(5000,1));
%#disp( datestr(t) )
%# shift values, by using dStart as reference point
dRange = (dEnd-dStart);
tt = t - dStart;
%# number of events by day/hour/minute
numEventsDays = accumarray(fix(tt)+1, 1, [dRange*1 1]);
numEventsHours = accumarray(fix(tt*24)+1, 1, [dRange*24 1]);
numEventsMinutes = accumarray(fix(tt*24*60)+1, 1, [dRange*24*60 1]);
%# corresponding datetime range/interval label
days = cellstr(datestr(dStart:1:dEnd-1));
hours = cellstr(datestr(dStart:1/24:dEnd-1/24));
minutes = cellstr(datestr(dStart:1/24/60:dEnd-1/24/60));
%# display results
[days num2cell(numEventsDays)]
[hours num2cell(numEventsHours)]
[minutes num2cell(numEventsMinutes)]
Here is the output for the number of events per day:
'01-Jan-2000' [1271]
'02-Jan-2000' [1258]
'03-Jan-2000' [1243]
'04-Jan-2000' [1228]
And an extract of the number of events per hour:
'02-Jan-2000 09:00:00' [50]
'02-Jan-2000 10:00:00' [54]
'02-Jan-2000 11:00:00' [53]
'02-Jan-2000 12:00:00' [74]
'02-Jan-2000 13:00:00' [49]
'02-Jan-2000 14:00:00' [59]
similarly for minutes:
'03-Jan-2000 08:54:00' [1]
'03-Jan-2000 08:55:00' [1]
'03-Jan-2000 08:56:00' [1]
'03-Jan-2000 08:57:00' [0]
'03-Jan-2000 08:58:00' [0]
'03-Jan-2000 08:59:00' [0]
'03-Jan-2000 09:00:00' [1]
'03-Jan-2000 09:01:00' [2]
You can convert those timestamps to a number with datenum:
A serial date number represents the whole and fractional number of days from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns the number 1. (The year 0000 is merely a reference point and is not intended to be interpreted as a real year in time.)
This way, it's easier to check where a period starts and end. Eg: the week your looking for starts at x and ends at x+7.999... ; all you have to do to find events in that period is checking if the datenum value is between x and x+8:
week_x_events = find(dn_timestamp>=x & dn_timestamp<x+8)
The difficulty is in converting your timestamp to datenum acceptable format, which is doable using regexp, good luck!
I don't know what +00:00 means (maybe time zone?), but you can simply convert your string timestamps into numerical format:
>> t = datenum('2000-01-01T00:01:04+00:00', 'yyyy-mm-ddTHH:MM:SS')
t =
7.3049e+005
>> datestr(t)
ans =
01-Jan-2000 00:01:04