How to get monthly totals from linearly interpolated data - matlab

I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).

You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end

Related

MATLAB: Find all values on one date, then filter down to an hour and find average [duplicate]

This question already has answers here:
Counting values by day/hour with timeseries in MATLAB
(3 answers)
Closed 6 years ago.
I have a year's worth of data, the data is recorded one minute intervals each day of the year.
The date and time was imported from excel (in form 243.981944, then by adding 42004 (so will be for 2015) and formatting to date it becomes 31.8.15 23:34:00).
Importing to MATLAB it becomes
'31/08/2015 23:34:00'
I require the data for each day of the year to be at hourly intervals, so I need to sum the data recorded in each hour and divide that by the number of data recorded for that hour, giving me the hourly average.
For some reason the data in August actually increments in 2 minute intervals, data for every other month increments in one minute intervals.
ie
...
31/07/2015 23:57:00
31/07/2015 23:58:00
31/07/2015 23:59:00
31/08/2015 00:00:00
31/08/2015 00:02:00
31/08/2015 00:04:00
...
I'm not sure how I can find all the values for a specific date and hour in order to work out the averages. I was thinking of using a for loop to find the values on each day, but when I got down to writing code realised this wouldn't work the way I was thinking.
I presume there must be some kind of functions available that would allow for data to be filtered by the date and time?
edit:
So I tried the following but I get these errors.
dates is a 520000x1 cell array containing the dates form = formatIn.
formatIn = 'DD/MM/YYYY HH:MM:SS';
[~,M,D,H] = datevec(dates, formatIn);
Error using cnv2icudf (line 131) Unrecognized minute format.
Format string: DD/MM/YYYY HH:MM:SS.
Error in datevec (line 112) icu_dtformat = cnv2icudf(varargin{isdateformat});`
Assuming your data is in a matrix or cell-array of strings called A, and your other data is in a vector X. Let's say all the data is in the same year (so we can ignore years)
[~,M,D,H] = datevec(A, 'dd/mm/yyyy HH:MM:SS');
mean_A = accumarray([M, D, H+1], X, [], #mean);
Then data from February will be in
mean_A(2,:,:)
To look at the data, you may find the squeeze() function useful, e.g.
squeeze(mean_A(2,1:10,13:24))
shows the average for the hours after midday (by column) for the first ten days (by row) of February.
See also:
Counting values by day/hour with timeseries in MATLAB

How do I take an n-day average of data in Matlab to match another time series?

I have daily time series data and I want to calculate 5-day averages of that data while also retrieving the corresponding start date for each of the 5-day averages. For example:
x = [732099 732100 732101 732102 732103 732104 732105 732106 732107 732108];
y= [1 5 3 4 6 2 3 5 6 8];
Where x and y are actually size 92x1.
Firstly, how do I compute the 5-day mean when this time series data is not divisible by 5? Ultimately, I want to compute the 'jumping mean', where the average is not computed continuously (e.g., June 1-5, June 6-10, and so on).
I've tried doing the following:
Pentad_avg = mean(reshape(y(1:90),5,[]))'; %manually adjusted to be divisible by 5
Pentad_dt = x(1:5:90); %select every 5th day for time
However, Pentad_dt gives me dates 01-Jun-2004 and 06-Jun-2004 as output. And, that brings me to my second point.
I am looking to find 5-day averages for x and y that correspond to 5-day averages of another time series. This second time series has 5-day averaged data starting from 15-Jun-2004 until 29-Aug-2004 (instead of starting at 01-Jun-2004). Ultimately, how do I align the dates and 5-day averages between these two time series?
Synchronization between two time series can be accomplished using the timeseries object. Placing your data into an object allows Matlab to intelligently process it. The most useful thing is adds for your usage is the synchronize method.
You'll want to make sure to properly set the time vector on each of the timeseries objects.
An example of what this might look like is as follows:
ts1 = timeseries(y,datestr(x));
ts2 = timeseries(OtherData,OtherTimes);
[ts1 ts2] = synchronize(ts1,ts2,'Uniform','Interval',5);
This should return to you each timeseries aligned to be with the same times. You could also specify a specific time vector to align a timeseries to using the resample method.

how to simulate date for one year in kdb

i would like to simulate random timestamp data.
100 records in a day for one year.
How am I am able to do that?
when i set a:2013.01.01D00:00:00.000000000
100?a
the randomize data doesn't stay in a day.
thanks for your input
I am not sure, if this can be done easily. But you may generate 100 random timestamps for every day of 2013 in the next way
daysInYear: 365;
year: 2013.01.01D00:00:00.000000000;
//array of 365 elements, where every element represents corresponding date of year
dates: year + 01D * til daysInYear;
//array of 365 elements, where every element is an array of 100 random timestamps [0 .. 1D)
randomNanos: cut[100; (100 * daysInYear)?1D];
//array of 365 elements, where each element is an array of 100 random dateTimes for given day
result: dates + randomNanos;
//put all the dates in single array
raze result
The short version which does the same is below:
raze (2013.01.01D+01D * til 365) + cut[100; (100*365)?1D]
In order to simulate data for a single day, it's possible to generate random times (as floats less than one) and add them to the day you would like to generate data for. In this case:
D:2016.03.01;
D+100?1f
Will return 100 random times on 2016.03.01. If you want to generate data within a time range you can restrict the size of the float to something less than 1, or greater than a certain minimum value.
If you want to handle leap years... Not sure of a better way at the minute other than adding the max number of days onto the start of the year and asking whether it's the 31st. Adding on 366, it can either be 31st or 1st. If it's the 31st good, otherwise drop off the last date.
/e.g.
q)last 2015.01.01+til 365
2015.12.31
q)last 2016.01.01+til 365
2016.12.30 /we are a day short
q)
/return the dates and the number of days based on whether its a leap year
q)dd:$[31i~`dd$last d:2016.01.01+til 366;(366;d);(365;-1_d)]
q)/returns (366;2016.01.01 2016.01.02...)
q)/the actual logic below is pretty much the same as the other answer
q)raze{[n;dy;dt] dt+n cut(n*dy)?.z.N}[100;].dd
2016.01.01D16:06:53.957527121 2016.01.01D10:55:10.892935198 2016.01.01D15:36:..

Iterate for loop by hour in MATLAB

I am writing a for loop to average 10 years of hourly measurements made on the hour. The dates of the measurements are recorded as MATLAB datenums.
I am trying to iterate through using 0.0417 as it is the datenum for 1AM 00/00/00 but it is adding in a couple of seconds of error each time I iterate.
Can anyone recommend a better way for me to iterate by hour?
date = a(:,1);
load = a(:,7);
%loop for each hour of the year
for i=0:0.0417:366
%set condition
%condition removes year from current date
c = date(:)-datenum(year(date(:)),0,0)==i;
%evaluate condition on load vector and find mean
X(i,2)=mean(load(c==1));
end
An hour has a duration of 1/24 day, not 0.0417. Use 1/24 and the precision is sufficient high for a year.
For an even higher precision, use something like datenum(y,1,1,1:24*365,0,0) to generate all timestamps.
To avoid error drift entirely, specify the index using integers, and divide the result down inside the loop:
for hour_index=1:365*24
hour_datenum = (hour_index - 1) / 24;
end

Plotting time-series in Matlab with "label grouping"

In Matlab, I have data in the following form:
"z", a 17'256x1 double, containing the residuals of a regression, e.g. -0.0596
"dates", a 17'256x1 cell, containing date- and timestamps of each observation in the regression (and therefore, of the residuals), e.g. '10/3/2011 9:30:00 PM'
What I would like to do:
Plot the residuals versus the datestamp as label. The observations are not from a continuous sequence of days (i.e. there might be some gaps with no observations between days), and some days have more observations than others. I cannot have one label per observation because that would be too many labels. So I need to group them somehow, either by day or by month. That is, only show the month and day (e.g. 10/3) under all the observations from that day, or only show the month (e.g. 3) under all the observations from that month. How can I do that, using the data I have?
You should be able to plot this without 'grouping' things. If you convert your dates to timestamps:
timestamps = cellfun(#(date)datenum(date), dates);
then you can do a normal plot:
plot(timestamps, z);
and Matlab will deal with the xaxis labels itself (i.e. it will spread them evenly over the time range of dates), but they will be the timestamp numbers. To get formatted dates on the xaxis, use:
datetick('x');