How can I partition data based on date column to get subsets of max 1 year (365 consecutive days) - date

I'm quite new to R, so I still struggle a bit with what may be some of the basics. I have movement data for several individuals, and some of them have been monitored for over a year. In these cases, I'm trying to partition the data into bins of max. one year (i.e. 365 consecutive days) per individual. For example, if one individual was tracked for 800 days, I would need to partition the data into the first 365 days, the second 365 days, and the remaining 70 days. Or, if one individual started on 2019/04/17, I'd like to subset the data until 2020/04/16, with the next subset beginning on 2020/04/17.
Individuals were added in different moments so not all monitoring periods begin on the same day, and per individual there can be more than one observation per day, so many rows share the same date. Naturally I want to use the timestamp column for this, but have been looking for ways and can't seem to find one. Is there a way to tell R to pick the first date and extract the next 365 days?
I could manually calculate each bin and try to partition it by hand, but I was wondering if there was a simpler way for this. I can however separate the data per individual.
Thanks!
My data looks something like this
Date.and.time Ind Lat Long
2019-04-02 08:54:03 Animal_1 Y X
2019-04-02 09:01:13 Animal_2 Y X
2019-04-02 15:45:22 Animal_1 Y X
2019-04-03 17:31:50 Animal_1 Y X
.
.
.
2021-10-14 12:34:56 Animal_1 Y X
2021-10-15 16:05:50 Animal_20 Y X
2021-10-15 22:29:37 Animal_15 Y X

Related

Indexing precipitaiton data by month in matlab

I have an array of daily precipitation data over a geographic region that is structured long x lat x time. I want to be able to index this data by both month and year, and so used datevec to find the months/years of each timestep. All good! However, I'm having trouble creating the index because the number of dates in each month and each year are not equal.
Originally, I'd thought that I would create two indexes using a months loop and year loop to find the rows associated with each month/year in the datevec. That is, I wanted something like (row_loc x month) and ( year x row_loc), with NaN filling in places where there would be no data (i.e. the last several values in February, or the last values of non-leap years). However, matlab doesn't seem to like the fact that the number of rows for each month/year aren't the same.
dates = datevec;
months = 1:12;
years = 1979:2016;
ind_mo = NaN(1178,length(months));
ind_yr = NaN(366,length(years));
for m = months
ind_mo(:,m) = find(dates(:,2) == m);
end
for y = 1:length(years)
ind_yr(:,y) = find(dates(:,1) == y);
end
Eventually, I'd like to have a way to index the precipitation data by month and by year. If there's a way to do that through the structure I'm imagining...great! If there's another workaround that's good too, although I'd prefer not to have a separate index for each of the 12 months and 38 years since I think this will be cumbersome.
Thanks in advance!

How to get monthly totals from linearly interpolated data

I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end

How do I take an n-day average of data in Matlab to match another time series?

I have daily time series data and I want to calculate 5-day averages of that data while also retrieving the corresponding start date for each of the 5-day averages. For example:
x = [732099 732100 732101 732102 732103 732104 732105 732106 732107 732108];
y= [1 5 3 4 6 2 3 5 6 8];
Where x and y are actually size 92x1.
Firstly, how do I compute the 5-day mean when this time series data is not divisible by 5? Ultimately, I want to compute the 'jumping mean', where the average is not computed continuously (e.g., June 1-5, June 6-10, and so on).
I've tried doing the following:
Pentad_avg = mean(reshape(y(1:90),5,[]))'; %manually adjusted to be divisible by 5
Pentad_dt = x(1:5:90); %select every 5th day for time
However, Pentad_dt gives me dates 01-Jun-2004 and 06-Jun-2004 as output. And, that brings me to my second point.
I am looking to find 5-day averages for x and y that correspond to 5-day averages of another time series. This second time series has 5-day averaged data starting from 15-Jun-2004 until 29-Aug-2004 (instead of starting at 01-Jun-2004). Ultimately, how do I align the dates and 5-day averages between these two time series?
Synchronization between two time series can be accomplished using the timeseries object. Placing your data into an object allows Matlab to intelligently process it. The most useful thing is adds for your usage is the synchronize method.
You'll want to make sure to properly set the time vector on each of the timeseries objects.
An example of what this might look like is as follows:
ts1 = timeseries(y,datestr(x));
ts2 = timeseries(OtherData,OtherTimes);
[ts1 ts2] = synchronize(ts1,ts2,'Uniform','Interval',5);
This should return to you each timeseries aligned to be with the same times. You could also specify a specific time vector to align a timeseries to using the resample method.

how to simulate date for one year in kdb

i would like to simulate random timestamp data.
100 records in a day for one year.
How am I am able to do that?
when i set a:2013.01.01D00:00:00.000000000
100?a
the randomize data doesn't stay in a day.
thanks for your input
I am not sure, if this can be done easily. But you may generate 100 random timestamps for every day of 2013 in the next way
daysInYear: 365;
year: 2013.01.01D00:00:00.000000000;
//array of 365 elements, where every element represents corresponding date of year
dates: year + 01D * til daysInYear;
//array of 365 elements, where every element is an array of 100 random timestamps [0 .. 1D)
randomNanos: cut[100; (100 * daysInYear)?1D];
//array of 365 elements, where each element is an array of 100 random dateTimes for given day
result: dates + randomNanos;
//put all the dates in single array
raze result
The short version which does the same is below:
raze (2013.01.01D+01D * til 365) + cut[100; (100*365)?1D]
In order to simulate data for a single day, it's possible to generate random times (as floats less than one) and add them to the day you would like to generate data for. In this case:
D:2016.03.01;
D+100?1f
Will return 100 random times on 2016.03.01. If you want to generate data within a time range you can restrict the size of the float to something less than 1, or greater than a certain minimum value.
If you want to handle leap years... Not sure of a better way at the minute other than adding the max number of days onto the start of the year and asking whether it's the 31st. Adding on 366, it can either be 31st or 1st. If it's the 31st good, otherwise drop off the last date.
/e.g.
q)last 2015.01.01+til 365
2015.12.31
q)last 2016.01.01+til 365
2016.12.30 /we are a day short
q)
/return the dates and the number of days based on whether its a leap year
q)dd:$[31i~`dd$last d:2016.01.01+til 366;(366;d);(365;-1_d)]
q)/returns (366;2016.01.01 2016.01.02...)
q)/the actual logic below is pretty much the same as the other answer
q)raze{[n;dy;dt] dt+n cut(n*dy)?.z.N}[100;].dd
2016.01.01D16:06:53.957527121 2016.01.01D10:55:10.892935198 2016.01.01D15:36:..

Group of consecutive minimum values within data - MATLAB

I have daily river flow data for 1975-2009 and I am asked to find the 7 consecutive days within each year that have the smallest flows.
Any advice how to start this? I've only been using MATLAB for a couple weeks.
Thanks!
You could convolve the data with ones(1,7) and look for the minimum, which will yield the starting day of your dry period:
[~,startingDay] = min(conv(flow,ones(1,7),'valid'))
(This is basically a moving average filter without the normalization).
Loop through the years to get each year's result.
Start by finding cumulative sum with cumsum. The difference between cumulative sums 7 days apart will give you the total for those 7 days. Then pick the minimum of those.
a = cumsum(flow);
b = a(8:end) - a(1:end-7);
[m,i] = min(b);
Here m holds the smallest total over 7 consecutive days, and i is a vector of indices telling you when they occurred.