How to convert 3-dimensional daily data into monthly? - matlab

I have 3-dimensional data matrix for ten years (2001-2010). In each file data matrix is 180 x 360 x 365/366 (latitude x longitude x daily rainfall). for example: 2001: 180x360x365, 2002: 180x360x365, 2003: 180x360x365, 2004: 180x360x366........................... 2010: 180x360x365
Now I want to convert this daily rainfall into monthly rainfall (by summing) and combine all the years in one file.
So my final output will be 180x360x120 (latitude x longitude x monthly rainfall over ten the years).

It might be time consuming, but I suppose you could use some form of loop to iterate over the data in each year on a monthly basis, pick out the appropriate number of data points for each month, and then add that to a final data set. Something to the effect of the (very rough) code below might work:
years = ['2001','2002,'2003',...,'2010'];
months = ['Jan','Feb','Mar',...,'Dec'];
finalDataset=[];
for i=1:length(years)
year = years(i);
yearData=%% load in dataset for that year %%
for j=1:length(months)
month = months(j);
switch month
case {'Jan','Mar'}
days=30;
case 'Feb'
days=28'
if(year=='2004' || year=='2008')
days=29;
end
% then continue with cases to include each month
end
monthData=yearData(:,:,1:days) % extract the data for those months
yearData(:,:,1:days)=[]; % delete data already extracted
summedRain = % take mean of rainfall data
monthSummed = % replace daily rainfall data with monthly rainfall, but keep latitude and longitude data
finalDataset=[finalDataset; monthSummed];
end
end
Apologies it's very shabby and I haven't included some of the indexing details, but I hope that helps in at least illustrating an idea? I'm also not entirely sure whether 'if' statements work within 'switch' statements, but the days amendment can be added elsewhere if not.

I am sure you can vectorise this to work faster, but it should do the job. Haven't tested properly
% range of years
years = 2000:2016;
leap_years = [2000 2004 2008 2012 2016];
% Generating random data
nr_of_years = numel(years);
rainfall_data = cell(nr_of_years, 1);
for i=1:nr_of_years
nr_of_days = 365;
if ismember(years(i), leap_years);
nr_of_days = 366;
end
rainfall_data{i} = rand(180, 360, nr_of_days);
end
The actual code you need is below
% fixed stuff
months = 12;
nr_of_days = [31 28 31 30 31 30 31 31 30 31 30 31];
nr_of_days_leap = [31 29 31 30 31 30 31 31 30 31 30 31];
% building vectors of month indices for days
month_indices = [];
month_indices_leap = [];
for i=1:months
month_indices_temp = repmat(i, nr_of_days(i), 1);
month_indices_leap_temp = repmat(i, nr_of_days_leap(i), 1);
month_indices = [month_indices; month_indices_temp];
month_indices_leap = [month_indices_leap; month_indices_leap_temp];
end
% the result will be stored here
result = zeros(size(rainfall_data{i}, 1), size(rainfall_data{i}, 2), months*nr_of_years);
for i=1:nr_of_years
% determining which indices to use depending if it is a leap year
month_indices_temp = month_indices;
if size(rainfall_data{i}, 3)==366
month_indices_temp = month_indices_leap;
end
% data for the current year
current_data = rainfall_data{i};
% this holds the data for current year
monthy_sums = zeros(size(rainfall_data{i}, 1), size(rainfall_data{i}, 2), months);
for j=1:months
monthy_sums(:,:,j) = sum(current_data(:,:,j==month_indices_temp), 3);
end
% putting it into the combined matrix
result(:,:,((i-1)*months+1):(i*months)) = monthy_sums;
end
You can probably achieve a more elegant solution using build in datetime, datestr and datenum, but I am not sure those would be a lot faster or shorter.
EDIT: An alternative using built in date functions
months = 12;
% where the result will be stored
result = zeros(size(rainfall_data{i}, 1), size(rainfall_data{i}, 2), months*nr_of_years);
for i=1:nr_of_years
current_data = rainfall_data{i};
% first day of the year
year_start_timestamp = datenum(datetime(years(i), 1, 1));
% holding current sums
monthy_sums = zeros(size(current_data, 1), size(current_data, 2), months);
% finding the month indices vector
datetime_obj = datetime(datestr(year_start_timestamp:(year_start_timestamp+size(current_data, 3)-1)));
month_indices = datetime_obj.Month;
% summing
for j=1:months
monthy_sums(:,:,j) = sum(current_data(:,:,j==month_indices), 3);
end
% result
result(:,:,((i-1)*months+1):(i*months)) = monthy_sums;
end
This 2nd solution took 1.45 seconds for me, compared to the 1.2 seconds for the first solution. The results were the same for both cases. Hope this helps.

Related

Count Days in Months (MATLAB)

I want to know the number of days of each month from 1982-01-01 to 2015-12-31.
I tried some codes from Matlab Help. till now I wrote this code:
t1 = datetime(1982,01,01); %start date
t2 = datetime(2015,12,31); %end date
T = t1:t2; %creating a range
no idea how to do it. in the end, I want to have one array (1*408)
thank you all
Here's one approach. See ndgrid and datenum.
years = 1982:2015; % desired range of years
[mm, yy] = ndgrid(1:12, years); % all pairs of month, year
result = datenum(yy(:), mm(:)+1, 1) - datenum(yy(:), mm(:), 1); % adding 1 to the month
% works even for December. 'datenum' gives a result where each unit is one day

Function for extracting the smallest value from data set [MATLAB]

I have already made a function which extracts all temeperatures from a data-set for a certain time for a given month and year.
Which looks like: exctractperiod(data, year, month, time)
This will as previously said extract all temperatures for a particular month at a time say 1400.
I now want to find a minimum temperature for a certain month say january across many years. For example if I look at the month january between the year 1997 and 2006. Now I want the lowest registered temperatue for Janauary between 1997-2006.
My progress so far is the code below:
function [algo] = getMiniserieone(data, startYear, endYear, time)
v = zeros(12,2);
for month = 1:12
for year = startYear:1:endYear
p = extractperiodtwo(data, year, month, time);
q = min(p);
v(month,1) = v(month,1) + q;
v(month,2) = v(month,2) + 1;
algo = v(12,2);
end
end
end
I do however get the error message:
Unable to perform assignment because the size of the left side is
1-by-1 and the size of the right
side is 0-by-1.
When calling the function in the command window:
>> getMiniserieone(data, 1996, 2006, 1400)
Error in getMiniserieone (line 12)
v(month,1) = v(month,1) + q;
But I have not been able remedy this.
My intention with the program above is to lets say for the years between 1996-2006 for a particular time extract the lowest temperature for all months between January and December. Meaning that between the years 1996-2006 for january for a certain time, say 1300, I want the lowest temperature for that month at that time. Then store it in my vector v column 1, column 2 will then denote the month.
My question is how I can fix this, i'm not really sure what the error message means? Does it maybe mean that q is not a single element value?
I hope the information given was ample enough to understand the problem, if not feel free to ask.
As requested, the code for extractPeriod()
function [mdata] = extractperiodtwo(data, year, month, time)
x = year*100 +month;
k = find( floor(TEMPUF(:,1)/100) == x & (data(:,2)==time));
mdata = data(k,3);
end
So there's two issues that you're running into. The first is the case when your extractperiodtwo() function doesn't find a value it returns a empty vector. You may write a check into the loop that reports that no data was found for a given entry and handles the error. I recommend a try-catch block (which I implement in the code below). The second issue has to do with how you're storing the minimum for each iteration.
Here is my solution:
function [algo] = getMiniserieone(data, startYear, endYear, time)
%initialize output
algo = [zeros(12,1), (1:12)']; %col1 = min(temps), col2 = month
% loop through months
yearsToTest = startYear:1:endYear;
for month = 1:12
%initialize a year storage, i.e. on the first iteration
% this will be january minimum temps from startYear:endYear,
% on iteration 2, it will be all february temps from startYear:endYear
% and so on until it completes month == 12.
lowTempsOverYears = nan(1,length(yearsToTest));
for yearNum = 1:length(yearsToTest)
%edit: forgot to paste this line, sorry.
year = yearsToTest(yearNum);
p = extractperiodtwo(data, year, month, time);
q = min(p);
try
%try appending the min
lowTempsOverYears(yearNum) = q;
catch
fprintf(2, ...
'\nNo data found for month=%d, year=%d.\n Skipping...\n', ...
month, ...
year);
continue
end
end
% inner loop is over, now we need the minimum of mins for the month
algo(month, 1) = min(lowTempsOverYears,[],'omitnan');
end
end
So what i've done is added a second vector, lowTempsOverYears that tracks the min(p) for each month over the startYear:endYear interval. Then I get the minimum of that and store it in the corresponding row of the month, in the output variable. I initialize lowTempsOverYears with nan() because it allows me to later take the min() with the 'omitnan' flag set to true, effectively ignoring any nan that appear in the vector.
Alternatively
You could capture all the temps over the years and months in a matrix and then perform the statistic of your choice. For example, the inside of your function could be:
function algo = getMiniserieone(data, startYear, endYear, time)
algo = [zeros(12,1), (1:12)'];
yearsToTest = startYear:1:endYear;
% create local storage of values as matrix
v = nan(12,length(yearsToTest));
%loop
for month = 1:12
for yearNum = 1:length(yearsToTest)
% current year to gather data from
year = yearsToTest(yearNum);
% p could be scalar, vector, empty
p = extractperiodtwo(data, year, month, time);
% q is scalar or empty
q = min(p);
try
%try inserting the min temp
v(month,yearNum) = q;
catch
fprintf(2,'\nNo data at month=%d, year=%d.\n Skipping...\n', ...
month, year ...
);
continue
end
end
end
% now all data is stored in a matrix perform statistic
algo(:,1) = min(v, [], 2, 'omitnan');
end

Loop for monthly or yearly averages by day number (in Julian day format)

I have daily '.mat' files whose nomenclature is like 'Data_grid_day_ppt_ice003.mat', the last '003' is the date of the year in Julian day format, It can start from '001' and end in '365' and '366' depending on whether it is a leap year or not.
I need monthly averages of the data. Any idea how can I make loops for the date files?
Some dates can be missing also, so just counting the number of files would not work.
I am writing in Matlab, I was trying something like
year = linspace(2007,2016,10); % Years of data
months = linspace(01,12,12);% Months of the year
for n = 1:length(year)
foldery = int2str(year(n));
if ~exist(folderyear)
continue
else
if mod(year,4) == 0 % if the year is divisible by four
dpm = [31 60 91 121 152 182 213 244 274 305 335 366];
else
dpm = [31 59 90 120 151 181 212 243 273 304 334 365];
end
for j = 1:length(months) %loop over months
if ~exist(months)
continue
else
for ii = 1:dpm(j) %loop over days
% ... Not sure what to do here ...
end
end
end
end
end
However, I am not able to work out the next step...
Firstly, don't use year as a variable name, because it is a built in Matlab function! Then be careful with your variable names, like confusing foldery and folderyear.
The easiest way to do this would be to just add the day of the year to the 1st Jan in that year, then get the month using Matlab's built in month function. This will take care of all of the leap year issues as long as you know the year (which you seem to). Also see this documentation, Date and Time Arithmetic, which confirms that adding a number to a datetime value will add full 24 hour days.
The key is this method for getting month from Julian day of year:
thisyear = 2017; % Define which year you're in
J1 = datetime(2016,1,1); % Set January first datetime
dayofyear = 360; % Christmas day as an example!
m = month(J1 + dayofyear - 1) % Add day of year to Jan 1st, get month
% J1 + dayofyear - 1 = 25-Dec-2016 00:00:00
% m = 12
% month(J1 + dayofyear - 1, 'name') = 'December'
See this commented extension of your code for more detail...
y = 2007:2016; % Years of data [2007, 2008, ... 2016]
outputdata = cell(1,12);
for n = 1:length(y)
folderyear = int2str(y(n));
if ~exist(folderyear, 'dir')
continue
else
% Get all .mat files within this year folder
f = dir(fullfile(folderyear, '*.mat'));
filenames = {f.name};
% Set Jan 1st date
J1 = datetime(y, 1, 1);
% Loop over months
for m = 1:12
% Loop over files
for myfile = 1:numel(filenames)
% Get number from end of file name
dayofyear = filenames{myfile};
dayofyear = str2num(dayofyear(end-6:end-4));
% Get month of day by adding to Jan 1st of that year
% -1 used assuming Jan 1st is day 1. Not needed if Jan 1st is day 0.
if month(J1 + dayofyear - 1) == m
% hurrah, this file is in the month m, add it to an average
% or anything else you want, then you could assign to
% an output cell or matrix
% outputdata{m} = ...
end
end
end
end
end
It is now coming if I do Num = regexp(dayofyear,'\d','Match');
tt=cellstr(Num{:});
pp = strcat(tt(1),tt(2),tt(3));
dayofyear=str2double(pp); Though it is a bit childish, but it is returning error if I do it in one line :P Thanku Wolfie :)

How to remove repeated date tick labels

I'm trying to plot 63 years of monthly data and some labels are repeating (indicated by arrows), illustrated by the following figure:
In order to handle x-labels, I used the following code:
xdate = datenum (datestring2, 'yyyy/mm');
plot (xdate,data1.data(:,4), 'b', xdate,data1.data(:,6),'r', ...
xdate,data1.data(:,8),'k', xdate,data1.data(:,10),'g');
xtic=xdate(1):12:xdate(end);
set(gca,'XTick',xtic);
datetick('x', 'mm/yyyy', 'keepticks');
How can I solve that issue?
You can exploit the carryover feature of datenum():
out = datenum(2000,1:14,1);
Let's verify:
datestr(out)
ans =
01-Jan-2000
01-Feb-2000
01-Mar-2000
01-Apr-2000
01-May-2000
01-Jun-2000
01-Jul-2000
01-Aug-2000
01-Sep-2000
01-Oct-2000
01-Nov-2000
01-Dec-2000
01-Jan-2001
01-Feb-2001
Now set them as Xtick:
set(gca,'Xtick',out)
EDIT: use start end dates
% Provide example initial and final date
in = '20100324';
fi = '20130215';
% Decompose with datevec, count number of months, and skip one month if day > 1
infi = datevec({in,fi},'yyyymmdd');
nmonths = diff(infi(:,1:2))*[12 1]' + infi(1,2);
skipone = infi(1,3) ~= 1;
% Calculate dates
out = datenum(infi(1), (infi(1,2) + skipone):nmonths, 1);
Your expression xtic=xdata(1):12:xdate(end) says that you want a tick every 12 days; often that means you will get two (or even three) repeated months. A quick and dirty solution is
xtic = xdata(1):30:xdate(end);
But that may in some situations skip February, and will be wrong when you run for a large number of months.
To get around this properly, you need to place ticks at the first of every month. A possible way to do that is this:
xdate = datenum(datestring2, 'yyyy/mm');
d1 = datevec(xdate(1)); % [year, month, date, hour, min, sec]
d2 = datevec(xdate(end)+30); % one month past the last data point
nm = ceil((xdate(end) - xdate(1))*12/365); % whole number of months
mv = mod((1:nm) + d1(2) - 2, 12) + 1; % months
yv = d1(1) + floor(((1:nm) + d1(2) - 1)/12); % years
ymdv = [yv' mv' ones(nm,1)]; % year, month, day for each tic
xtic = datenum(ymdv); % will turn this into "the first of every month"
EDIT Oleg Komarov's answer points to a much cleaner way to generate tics at every first of the month - this is using the fact that datenum can cope with months greater than 12. You could probably make the above code a little more compact and cleaner by using that approach (for example, you could leave out the mod operation for mv, and just use d1(1) for the year). But sometimes being explicit about what you are doing isn't a bad thing.

For command + interpolation: need some tips

I have a matrix A with three columns: daily dates, prices, and hours - all same size vector - there are multiple prices associated to hours in a day.
sample data below:
A_dates = A_hours= A_prices=
[20080902 [9.698 [24.09
20080902 9.891 24.59
200080902 10.251 24.60
20080903 9.584 25.63
200080903 10.45 24.96
200080903 12.12 24.78
200080904 12.95 26.98
20080904 13.569 26.78
20080904] 14.589] 25.41]
Keep in my mind that I have about two years of daily data with about 10 000 prices per day that covers almost every minutes in a day from 9:30am to 16:00pm. Actually my initial dataset time was in milliseconds. I then converted my milliseconds in hours. I have some hours like 14.589 repeated three times with 3 different prices. Hence I did the following:
time=[A_dates,A_hours,A_prices];
[timeinhr,price]=consolidator(time,A_prices,'mean'); where timeinhr is both vector A_dates and A_hours
to take an average price at each say 14.589hours.
then for any missing hours with .25 .50 .75 and integer hours - I wish to interpolate.
For each date, hours repeat and I need to interpolate linearly prices that I don't have for some "wanted" hours. But of course I can't use the command interp1 if my hours repeats in my column because I have multiple days. So say:
%# here I want hours in 0.25unit increments (like 9.5hrs)
new_timeinhr = 0:0.25:max(A_hours));
day_hour = rem(new_timeinhour, 24);
%# Here I want only prices between 9.5hours and 16hours
new_timeinhr( day_hour <= 9.2 | day_hour >= 16.1 ) = [];
I then create a unique vectors of day and want to use a for and if command to interpolate daily and then stack my new prices in a vector one after the other:
days = unique(A_dates);
for j = 1:length(days);
if A_dates == days(j)
int_prices(j) = interp1(A_hours, A_prices, new_timeinhr);
end;
end;
My error is:
In an assignment A(I) = B, the number of elements in B and I must be the same.
How can I write the int_prices(j) to the stack?
I recommend converting your input to a single monotonic time value. Use the MATLAB datenum format, which represents one day as 1. There are plenty of advantages to this: You get the builtin MATLAB time/date functions, you get plot labels formatted nicely as date/time via datetick, and interpolation just works. Without test data, I can't test this code, but here's the general idea.
Based on your new information that dates are stored as 20080902 (I assume yyyymmdd), I've updated the initial conversion code. Also, since the layout of A is causing confusion, I'm going to refer to the columns of A as the vectors A_prices, A_hours, and A_dates.
% This datenum vector matches A. I'm assuming they're already sorted by date and time
At = datenum(num2str(A_dates), 'yyyymmdd') + datenum(0, 0, 0, A_hours, 0, 0);
incr = datenum(0, 0, 0, 0.25, 0, 0); % 0.25 hour
t = (At(1):incr:At(end)).'; % Full timespan of dataset, in 0.25 hour increments
frac_hours = 24*(t - floor(t)); % Fractional hours into the day
t_business_day = t((frac_hours > 9.4) & (frac_hours < 16.1)); % Time vector only where you want it
P = interp1(At, A_prices, t_business_day);
I repeat, since there's no test data, I can't test the code. I highly recommend testing the date conversion code by using datestr to convert back from the datenum to readable dates.
Converting days/hours to serial date numbers, as suggested by #Peter, is definitely the way to go. Based on his code (which I already upvoted), I present below a simple example.
First I start by creating some fake data resembling what you described (with some missing parts as well):
%# three days in increments of 1 hour
dt = datenum(num2str((0:23)','2012-06-01 %02d:00'), 'yyyy-mm-dd HH:MM'); %#'
dt = [dt; dt+1; dt+2];
%# price data corresponding to each hour
p = cumsum(rand(size(dt))-0.5);
%# show plot
plot(dt, p, '.-'), datetick('x')
grid on, xlabel('Date/Time'), ylabel('Prices')
%# lets remove some rows as missing
idx = ( rand(size(dt)) < 0.1 );
hold on, plot(dt(idx), p(idx), 'ro'), hold off
legend({'prices','missing'})
dt(idx) = [];
p(idx) = [];
%# matrix same as yours: days,prices,hours
ymd = str2double( cellstr(datestr(dt,'yyyymmdd')) );
hr = str2double( cellstr(datestr(dt,'HH')) );
A = [ymd p hr];
%# let clear all variables except the data matrix A
clearvars -except A
Next we interpolate the price data across the entire range in 15 minutes increments:
%# convert days/hours to serial date number
dt = datenum(num2str(A(:,[1 3]),'%d %d'), 'yyyymmdd HH');
%# create a vector of 15 min increments
t_15min = (0:0.25:(24-0.25))'; %#'
tt = datenum(0,0,0, t_15min,0,0);
%# offset serial date across all days
ymd = datenum(num2str(unique(A(:,1))), 'yyyymmdd');
tt = bsxfun(#plus, ymd', tt); %#'
tt = tt(:);
%# interpolate data at new datetimes
pp = interp1(dt, A(:,2), tt);
%# extract desired period of time from each day
idx = (9.5 <= t_15min & t_15min <= 16);
idx2 = bsxfun(#plus, find(idx), (0:numel(ymd)-1)*numel(t_15min));
P = pp(idx2(:));
%# plot interpolated data, and show extracted periods
figure, plot(tt, pp, '.-'), datetick('x'), hold on
plot([tt(idx2);nan(1,numel(ymd))], [pp(idx2);nan(1,numel(ymd))], 'r.-')
hold off, grid on, xlabel('Date/Time'), ylabel('Prices')
legend({'interpolated prices','period of 9:30 - 16:00'})
and here are the two plots showing the original and interpolated data:
I think I might have solved it this way:
new_timeinhr = 0:0.25:max(A(:,2));
day_hour = rem(new_timeinhr, 24);
new_timeinhr( day_hour <= 9.4 | day_hour >= 16.1 ) = [];
days=unique(data(:,1));
P=[];
for j=1:length(days);
condition=A(:,1)==days(j);
intprices = interp1(A(condition,2), A(condition,3), new_timeinhr);
P=vertcat(P,intprices');
end;