How to calculate averages for a 3D matrix for the same day of data over multiple years in Matlab? - matlab

I am interested in calculating GPH anomalies in Matlab. I have a 3D matrix of lat, lon, and data. Where the data (3rd dimension) is a daily GPH value spaced in one-day increments for 32 years (from Jan. 1st 1979 to Jan. 1st 2011). The matrix is 95x38x11689. How do I compute a daily average across all years for each day of data, when the matrix is 3D?
In other words, how do I compute the average of Jan. 1st dates for all years to compute the climatological mean of all Jan. 1st's from 1979-2010 (where I don't have time information, but a GPH value for each day)? And so forth for each day after. The data also includes leap years. How do I handle that?
Example: Sort, and average all Jan. 1st GPH values for indices 1, 365, 730, etc. And for each day of all years after that in the same manner.

First let's take out all the Feb 29th, because these days are in the middle of the data and not appear in every year, and will bother the averaging:
Feb29=60+365*[1:4:32];
mean_Feb29=mean(GPH(:,:,Feb29),3); % A matrix of 95x38 with the mean of all February 29th
GPH(:,:,Feb29)=[]; % omit Feb 29th from the data
Last_Jan_1=GPH(:,:,end); % for Jan 1st you have additional data set, of year 2011
GPH(:,:,end)=[]; % omit Jan 1st 2011
re_GPH=reshape(GPH,95,38,365,[]);
av_GPH=mean(re_GPH,4);
Now re_GPH is a matrix of 95x38x365, where each slice in 3rd dimension is an average of a day in the year, starting Jan 1st, etc.
If you want the to include the last Jan 1st (Jen 1st 2011), run this line at after the previous code:
av_GPH(:,:,1)=mean(cat(3,av_GPH(:,:,1),Last_Jan_1),3);
For the ease of knowing which slice nubmer corresponds to each date, you can make an array of all the dates in the year:
t1 = datetime(2011,1,1,'format','MMMMd')
t2 = datetime(2011,12,31,'format','MMMMd')
t3=t1:t2;
Now, for example :
t3(156)=
datetime
June5
So av_GPH(:,:,156) is the average of June 5th.
For your comment, if you want to subtract each day from its average:
sub_GPH=GPH-repmat(av_GPH,1,1,32);
And for February 29th, you will need to do that BEFORE you erase them from the data (line 3 up there):
sub_GPH_Feb_29=GPH(:,:,Feb29)-repmat(mean_Feb29,1,1,8);

Related

Calculating monthly averages of daily temperature data including negative and positive values

I am trying to calculate monthly average temperatures for a dataset with daily temperature values that spans three years. With the data.frame appearing like this example"
Date Month Temperature
12-2-2016 December -10
12-3-2016 December -12
01-2-2017 January -15
01-3-2017 January -14
02-3-2017 February 3
02-4-2017 February 7
03-2-2017 March 8
03-3-2017 March 9
I tried running the following code in order to create a new dataframe with the Month and the average temperature:
group_by(df$month) %>%
summarise(mean_airtemp = mean(Temperature))
However, when running this code I get NA for certain months which I believe is attributed to negative values. I have tried to figure it out but have only found solutions that seem to separate the values based on whether they are negative or positive.
you can use groupby of month and temp together
df.groupby(['Month'])['Temperature'].mean().reset_index(name = 'avg')

How to fill the last observations with retime in matlab?

I am interpolating variables from quarterly to monthly frequency in MATLAB. However, when I use retime it doesn't go as far as the end of the sample but it stops 2 months before.
Let me give you an example:
T = datetime(2002,01,01):calquarters:datetime(2019,12,01);
TT = timetable(T', randn(72,1))
x = retime(TT, 'monthly', 'spline') % interpolate
As you can see it gives me back 214 observations rather than 216, November and December 2019 are missing.
How can I fix it?
Thanks!
I don't have enough reputation to add a comment, but TT having 72 quarters instead of 73 means that you are actually storing dates from 1st January 2002 to 1st October 2019 - as the next quarter would start from 1st January 2020, which is then not included in your original array (you can check this by printing TT and checking if this date is included or not).
If this is the case, there is no way for retime to interpolate the missing months, as they aren't in the original matrix (that is, retime cannot interpolate from October to January, since there is no such thing in TT).
Replacing datetime(2019,12,01) with datetime(2020,01,01), as well as replacing randn(72,1) with randn(73,1), might solve your issue.

UTC to GPS time for finding TOW in Simulink

for my project, I need to calculate TOW (Time of week) in Simulink. I know this can be achieved through conversion of UTC time to GPS time.
I have written a simple m-file in Matlab which does the action for me in Matlab as follow:
date_gps_int = 10000*y + 100*m + d
date_gps_str = int2str(date_gps_int)
date_gps_str_to_serial = datenum(date_gps_str,'yyyymmdd')
date_str_format = datestr(date_gps_str_to_serial,'dd-mmmm-yyyy')
Num_Days = daysact('06-jan-1980',date_str_format)
Num_Weeks = Num_Days/7
TOW = Num_Weeks - 1024
My first intention was to use this as a function in simulink. But apparently because of 'datenum' and 'datestr' it is not possible, since simulink does not handle strings.
Now I am wondering if anyone can help me with this issue. Is there any way to calculate TOW from the UTC date in Matlab without using those predefined functions?
I also tried to write an algorithm for calculating number of days since '6 January 1980' and then calculating number of weeks by dividing that by 7. But since I am not very familiar with leap year calculation and I don't really know the formula for these kinds of calculations, my result differs from real TOW.
I would appreciate if anybody can help me on this.
There are three formats handled by Matlab for time: formatted date strings - what datestr outputs -, serial date - scalar double, what datenum outputs - and date vectors (see datevec). Conversion functions work with these three, and the most convenient way to convert individual variables (year, month, etc) to a date is to build a date vector [yyyy mm dd HH MM SS].
date_gps_str_to_serial = datenum([y m d 0 0 0]); % midnight on day y-m-d
date_Jan_6_1980 = datenum([1980 01 06 0 0 0]); % midnight on Jan 6th, 1980
Num_Days = date_gps_str_to_serial - date_Jan_6_1980;
Now, beware of leap seconds...
GPS time is computed form the time elapsed since Jan 6th 1980. Take the number of seconds elapsed since that day, as measured by the satellites' atomic clocks, divide by (24*3600) to get a number of days, the remainder is the time of the day (in seconds since midnight).
But, once in a while, the International Earth Rotation and Reference Systems Service will decide that a day will last one second longer to accommodate for the slowing of Earth rotation. It may happen twice a year, on June 30th or December 31st. The calculation of GPS time is wrong, because it does not take into account that some days last 86401 seconds (so dividing by 24*3600 does not work) and will advance by 1 second with respect to UTC each time this happens. There has been 18 such days since Jan 6th 1980, so one should subtract 18 seconds from GPS time to find UTC time. The next time a leap second may be added is June 2019.

How to get monthly totals from linearly interpolated data

I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end

daily to monthly sum in matlab of multiple years

My data is excel column. In excel sheet one column contain last 50 years date (no missing date; dd/mm/yyyy format) and in other column everyday rainfall (last 50 years; no blank).
I want to calculate what is the sum of monthly rainfall for every month of last 50 years in Matlab. Remember, there four types of month ending date: 30, 31, 28 and 29. Upto now I am able read the dates and rainfall value from excel file like below
filename = 'rainfalldate.xlsx';
% Extracts the data from each column of textData
[~,DateString ]= xlsread(filename,'A:A')
formatIn = 'dd/mm/yyyy';
DateVector = datevec(DateString,formatIn)
rainfall = xlsread(filename,'C:C');
what is the next step so that I can see every months of last fifty years rainfall sum?
I mean suppose July/1986 rainfall sum... Any new solutions?Code or loop base in Matlab 2014a
As #Daniel suggested, you could use accumarray for this.
Since you want the monthly rainfall for multiple years, you could combine year and month into one single variable, ym.
It is possible then to identify if a value belongs to both a determined year and month at once, and "tag" it using histc function. Values with similar "tags" are summed by the accumarray function.
As a bonus, you do not have to worry about the number of days in a month. Take a look at this example:
% Variables initialization
y=[2014 2014 2015 2015 2015 2014 2014 2015 2015];
m=[1 1 1 2 2 3 3 3 3]; % jan, feb and march
values = 1:9;
% Identifyier creation and index generation
ym = y*100+m;
[~,subs]=histc(ym,unique(ym));
% Sum
total = accumarray(subs',values);
The total variable is the monthly rainfall already sorted by year and month (yyyymm).