matlab - only keep days where 24 values exist - matlab

Say that I have a dataset:
Jday = datenum('2009-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
datenum('2009-01-05 23:00','yyyy-mm-dd HH:MM');
DateV = datevec(Jday);
DateV(4,:) = [];
DateV(15,:) = [];
DateV(95,:) = [];
Dat = rand(length(Jday),1)
How is it possible to remove all of the days that have less than 24 measurements. For example, in the first day there is only 23 measurements thus I would need to remove that entire day, how could I repeat this for all of the array?

A quick solution is to group by year, month, day with unique(), then count observation per day with accumarray() and exclude those with less than 24 obs with two steps of logical indexing:
% Count observations per day
[unDate,~,subs] = unique(DateV(:,1:3),'rows');
counts = [unDate accumarray(subs,1)]
counts =
2009 1 1 22
2009 1 2 24
2009 1 3 24
2009 1 4 24
2009 1 5 23
Then, apply criteria to the counts and retrieve logical index
% index only those that meet criteria
idxC = counts(:,end) == 24
idxC =
0
1
1
1
0
% keep those which meet criteria (optional, for visual inspection)
counts(idxC,:)
ans =
2009 1 2 24
2009 1 3 24
2009 1 4 24
Finally, find the members of Dat that fall into the selected counts with a second round of logical indexinf through ismember():
idxDat = ismember(subs,find(idxC))
Dat(idxDat,:)

Rather long answer, but I think it should be useful. I would do this using containers.Map. Possibly there is a faster way, but maybe for now this one will be good.
Jday = datenum('2009-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
datenum('2009-01-05 23:00','yyyy-mm-dd HH:MM');
DateV = datevec(Jday);
DateV(4,:) = [];
DateV(15,:) = [];
DateV(95,:) = [];
% create a map
dateMap = containers.Map();
% count measurements in each date (i.e. first three columns of DateV)
for rowi = 1:1:size(DateV,1)
dateRow = DateV(rowi, :);
dateStr = num2str(dateRow(1:3));
if ~isKey(dateMap, dateStr)
% initialize Map for a given date with 1 measurement (i.e. our
% counter of measuremnts
dateMap(dateStr) = 1;
continue;
end
% increment measurement counter for given date
dateMap(dateStr) = dateMap(dateStr) + 1;
end
% get the dates
dateStrSet = keys(dateMap);
for keyi = 1:numel(dateStrSet)
dateStrCell = dateStrSet(keyi);
dateStr = dateStrCell{1};
% get number of measurements in a given date
numOfmeasurements = dateMap(dateStr);
% if less then 24 do something about it, e.g. save the date
% for later removal from DateV
if numOfmeasurements < 24
fprintf(1, 'This date has less than 24 measurement: %s\n', dateStr);
end
end
The results is:
This date has less than 24 measurement: 2009 1 1
This date has less than 24 measurement: 2009 1 5

Related

MATLAB get date from week number and year

In Matlab 2016a, I have a vector of week numbers:
weekNum = [20 21 22 23];
Is there a way to convert the week numbers to their respective dates in 2018 starting on Sunday?
You can do this using the functions days and weekday. Starting with the first day of the year, you can subtract days to find the previous Sunday (if it isn't Sunday already), then multiply your vector weekNum by 7 to add offsets to that date for each desired week:
weekNum = [20 21 22 23];
t = datetime(2018, 1, 1, 'Format', 'dd-MMMM-yyyy'); % First day of the year
t = t - days(weekday(t)-1) + days((weekNum-1).*7);
And this gives you the following array of Sunday datetimes:
t =
1×4 datetime array
13-May-2018 20-May-2018 27-May-2018 03-June-2018
And you can confirm it works using week:
>> week(t)
ans =
20 21 22 23 % Same as weekNum
NOTE: The first week is usually defined as the week that includes January 1st, and can include days from the previous year. This means you could end up with a date in 2017 for a Sunday in week 1.
% 1st day of year
D = datetime(2018,1,1);
% datetime objects representing the Sunday of the specified weeks
Dt = D + (20:23)*7-7-(weekday(D)-1);
This assumes that week 1 is defined to be the first week for which at least one day is in 2018. If you use a different definition, adjust your indexes accordingly.
What I love about Matlab is that, sometimes, you can delegate computational tasks to the underlying Java framework, if you know it a little bit:
year = 2018;
week = [20 21 22 23];
dow = 1;
sep = repmat('-',4,1);
chain = [repmat(num2str(year),4,1) sep num2str(week.','%d') sep repmat(num2str(dow),4,1)];
df = java.text.SimpleDateFormat('yyyy-w-u');
for i = 1:4
date = 719529 + (df.parse(chain(i,:)).getTime() / 1000 / 3600 / 24);
disp(datestr(date,'dd-mm-yyyy'));
end
Output:
13-05-2018
20-05-2018
27-05-2018
03-06-2018

Make a timeline from year and day numbers

I want to make a timeline.
The below code extracts information from columns A and B of some Excel workbooks. In Column A are years, column B contains the day number (for that year) when an event happened.
My question is: How can I plot this with Station1, Station2 ect. on the Y-axis, and year on X-axis? I want the graph to make a point on the day (and the right year) where my Excel sheet has data.
num = xlsread('station1.xlsx', 1, 'A:B');
num3 = xlsread('station2.xlsx', 1, 'A:B');
num4 = xlsread('station3.xlsx', 1, 'A:B');
num5 = xlsread('station5.xlsx', 1, 'A:B');
Example data:
num = 2000 193
2000 199
2000 220
2000 228
2000 241
2000 244
2000 250
2000 257
2016 287
2016 292
2016 294
2016 300
Use datetime and caldays to convert your year / day of year data into actual dates:
dnum = datetime(num(:,1),1,1) + caldays(num(:,2));
% dnum = '12-Jul-2000'
% '18-Jul-2000'
% '08-Aug-2000'
% ...
Plot a line with marks on every date:
hold on % to plot multiple lines
plot(dnum, 1*ones(size(dnum)), 'x-') % Change the 1 to the y-axis value
plot(dnum2, 2*ones(size(dnum2)), 'x-') % Line at y=2 with other dates dnum2
hold off
Output (zoomed in on x-axis to show year 2000 dates):
If your files are named as in your example, then you can replace your whole code with a loop to avoid declaring loads of num variables and calling plot over many lines:
figure; hold on;
for ii = 1:5
num = xlsread(['station', num2str(ii), '.xlsx'], 1, 'A:B');
dnum = datetime(num(:,1),1,1) + caldays(num(:,2));
plot(dnum, 1*ones(size(dnum)), 'x-');
end
hold off

Generating consistent arrays of datenums

Whenever I need to plot data vs time I generate the corresponding array of datenums so I can visualize a timeline on the plot by calling datetick.
Let's suppose I need all the datenums with a 1 minute interval between hours h_1 and h_2. This is how would I generate my array of datenums vd:
h_1 = [01 00]; % 01:00 AM
h_2 = [17 00]; % 17:00 PM
HH_1 = h_1(1); % Hour digits of h_1
MM_1 = h_1(2); % Minute digits of h_1
HH_2 = h_2(1); % Hour digits of h_2
MM_2 = h_2(2); % Minute digits of h_2
% Vector of 01:00 - 17:30 with 1 minute increments (24 hours a day, 1440 minutes a day)
vd = HH_1/24+MM_1/1440:1/1440:HH_2/24+MM_2/1440;
I learnt this technique reading this answer.
On the other hand, whenever I wish to generate a single datenum, I use the datenum function like this:
d = datenum('14:20','HH:MM');
Since 14:20 is comprised between the interval 01:00 - 17:30, its datenum should be too! Unfortunately this is not working as expected, since the number assigned to d is radically different to those values contained in vd. I think I might be missing something, probably something to do with setting up a reference date or similar.
So what would be the appropriate way to generate datenums consistently?
The reason is that the datenum function gives you a number of days from January 0, 0000.
So when calling
d = datenum('14:20','HH:MM');
you get a number about 735965, while the numbers in your vd array are between 0 and 1. In order to substruct the days between January 0, 0000 and today you can write
d = datenum('14:20','HH:MM') - datenum('00:00','HH:MM');
Then your code will look like
h_1 = [01 00]; % 01:00 AM
h_2 = [17 00]; % 17:00 PM
HH_1 = h_1(1); % Hour digits of h_1
MM_1 = h_1(2); % Minute digits of h_1
HH_2 = h_2(1); % Hour digits of h_2
MM_2 = h_2(2); % Minute digits of h_2
vd = HH_1/24 + MM_1/1440 : 1/1440 : HH_2/24+MM_2/1440;
d = datenum('14:20','HH:MM') - datenum('00:00','HH:MM');
display(d);
display(vd(801));
And the result:
d = 0.5972
ans = 0.5972

Reading date vector with different formats: Matlab

Consider a cell array of dates date = { 1000000 x 1 } such that it has dates in different formats.
date = 27-01-2009
28-Mar-2003
.
.
.
21-02-2003 06:35:20
21-02-2003 06:35:20.42
.
.
and so on
How do I get the a 100000x3 matrix A = [ year month day ] from date?
Approach 1
date = {
'27-01-2009'
'28-Mar-2003'
'21-02-2003 06:35:20'
'21-02-2003 06:35:20.42'}
date_double_arr = datevec(date,'dd-mm-yyyy')
out = date_double_arr(:,1:3) %// desired output
Output -
out =
2009 1 27
2003 3 28
2003 2 21
2003 2 21
Approach 2
In case of inconsistencies between the date-month-year and time, one might want to seperate out the former group and use them to get the final Nx3 array like so -
t1 = cellfun(#(x) strsplit(x,' '), date,'uni',0)
t2 = cellfun(#(x) x(1), t1)
t3 = datevec(t2,'dd-mm-yyyy')
out = t3(:,1:3) %// desired output

How to delete specific days of week from MATLAB timeseries or financial timeseries objects?

How to delete specific days of week (Mondays for instance) from MATLAB timeseries or financial timeseries objects?
This is with what I came up with.
function [ ret_fts ] = deleteWeekDays( fts, dayName )
tsz = size(fts);
sz = tsz(1);
for i=1:sz,
mat=fts2mat(fts(i),1);
[dnum, dnam] = weekday(mat(1));
if dnam==dayName
fts(i) = NaN;
end
end
ret_fts=fts;
end
Some ideas, but only remove a specific date, not a specific day of week, it doesn't look like there's any clever the way to do so, so you might have to generate the date vector to delete yourself:
% Set time series
ts = timeseries([3 6 8 0 10 3 6 8 0 10 3 6 8 0 10 3 6 8 0 10 3 6 8 0 10])
ts.Data
tsc = tscollection(ts);
tsc.TimeInfo.Units = 'days';
tsc.TimeInfo.StartDate = '10/27/2005 07:05:36';
% Plot
ts.DataInfo.Interpolation = tsdata.interpolation('zoh');
tsc1.TimeInfo.Format='DD:HH:MM';
figure
plot(ts)
% Change the date-string format of the time vector.
tsc.TimeInfo.Format = 'mm/dd/yy';
tscTime = getabstime(tsc)
% Spot the days you're interested in, get indices and replace them by NaN
% in ts.
dayToDelete = '11/11/05';
idx = strcmp(tscTime, dayToDelete);
ts.Data(idx) = NaN;
% Plot after deleting the specific date
ts.DataInfo.Interpolation = tsdata.interpolation('zoh');
figure
plot(ts)