Calculate 4 previous quarters from a date - pyspark

Calculate 4 previous quarters from a date - pyspark - pyspark

I am using the below to get the previous quarter from 'new_date' and it works great. How do I calculate 4 quarters back?
(F.expr("date_add(date_trunc('quarter', cast(new_date as date)), -1)"))

Here's one way to do it using transform.
You get the previous quarter's ending date using your approach (or any approach), and create an array using that date 4 times (using array_repeat). Then use transform with add_months to subtract 3, 6, 9 months and use last_day to get the quarter ending date.
data_sdf. \
withColumn('q1_back', func.date_add(func.date_trunc('quarter', 'dt'), -1)). \
withColumn('qtr_arr',
func.expr('transform(array_repeat(q1_back, 4), (x, i) -> last_day(add_months(x, i*-3)))')
). \
show(truncate=False)
# +----------+----------+------------------------------------------------+
# |dt |q1_back |qtr_arr |
# +----------+----------+------------------------------------------------+
# |2018-12-31|2018-09-30|[2018-09-30, 2018-06-30, 2018-03-31, 2017-12-31]|
# |2019-12-31|2019-09-30|[2019-09-30, 2019-06-30, 2019-03-31, 2018-12-31]|
# |2018-12-31|2018-09-30|[2018-09-30, 2018-06-30, 2018-03-31, 2017-12-31]|
# |2018-11-10|2018-09-30|[2018-09-30, 2018-06-30, 2018-03-31, 2017-12-31]|
# +----------+----------+------------------------------------------------+
The binary function provided in transform can access the 0-based index which can be used to subtract 3, 6, 9 months from the first quarter [0*-3, 1*-3, 2*-3, 3*-3].

Related

Stata: is there a way to count months between two dates, given in year-month format?

I am trying to simply count the months between two dates in Stata, given in year-month (%tm) format, data storage type being int. Basically, I want to do what the datediff(d1, d1, "month") function does in Stata 17 - except I have Stata 15. I saw a few other forums about this, but they all talked about solutions for a year-month-day format, which doesn't seem to work for me. Is there a way to do this?

As both variables are numeric and the units are consistent, no special function is needed as a subtraction gives the answer.
. clear
. set obs 1
Number of observations (_N) was 0, now 1.
. gen now = ym(2022, 7)
. gen lastyear = ym(2021, 7)
. format now lastyear %tm
. di now - lastyear
12
. gen diff = now - lastyear
. l
+--------------------------+
| now lastyear diff |
|--------------------------|
1. | 2022m7 2021m7 12 |
+--------------------------+

CDO - Resample netcdf files from monthly to daily timesteps

I have a netcdf file that has monthly global data from 1991 to 2000 (10 years).
Using CDO, how can I modify the netcdf from monthly to daily timesteps by repeating the monthly values each day of each month?
for eaxample,
convert from
Month 1, value = 0.25
to
Day 1, value = 0.25
Day 2, value = 0.25
Day 3, value = 0.25
....
Day 31, value = 0.25
convert from
Month 2, value = 0.87
to
Day 1, value = 0.87
Day 2, value = 0.87
Day 3, value = 0.87
....
Day 28, value = 0.87
Thanks
##############
Update
my monthly netcdf has the monthly values not on the first day of each month, but in sparse order. e.g. on the 15th, 7th, 9th, etc.. however one value for each month.

The question is perhaps ambiguously worded. Adrian Tompkins' answer is correct for interpolation. However, you are actually asking to set the value for each day of the month to that for the first day of the month. You could do this by adding a second CDO call as follows:
cdo -inttime,1991-01-01,00:00:00,1day in.nc temp.nc
cdo -monadd -gtc,100000000000000000 temp.nc in.nc out.nc
Just set the value after gtc to something much higher than anything in your data.

You can use inttime which interpolates in time at the interval required, but this is not exactly what you asked for as it doesn't repeat the monthly values and your series will be smoothed by the interpolation.
If we assume your dataset starts on the 1st January at time 00:00 (you don't state in the question) then the command would be
cdo inttime,1991-01-01,00:00:00,1day in.nc out.nc
This performs a simple linear interpolation between steps.
Note: This is fine for fields like temperature and seems to be want you ask for, but readers should note that one has to be more careful with flux fields such as rainfall, where one might want to scale and/or change the units appropriately.

I could not find a solution with CDO but I solved the issue with R, as follows:
library(dplyr)
library(ncdf4)
library(reshape2)
## Read ncfile
ncpath="~/my/path/"
ncname="my_monthly_ncfile"
ncfname=paste(ncpath, ncname, ".nc", sep="")
ncin=nc_open(ncfname)
var=ncvar_get(ncin, "nc_var")
## melt ncfile
var=melt(var)
var=var[complete.cases(var), ] ## remove any NA
## split ncfile by gridpoint (lat and lon) into a list
var=split(var, list(var$lat, var$lon))
var=var[lapply(var,nrow)>0] ## remove any empty list element
## create new list and replicate, for each gridpoint, each monthly value n=30 times
var_rep=list()
for (i in 1:length(var)) {
var_rep[[i]]=data.frame(value=rep(var[[i]]$value, each=30))
}

lubridate assigns unrealistic date

I have a vector of dates in either dmY formats and Ymd format.
These are all dates in the last century.
From each, I need to extract just the year (Y).
I use the following code
library(lubridate)
sampleDates <- c(20100517,17052010)
result <- year(parse_date_time(x, guess_formats(as.character(x), c("Ymd","dmY"))))
result
517 2010
However, I expect something like
result
2010 2010

Here is a base R solution to your problem that takes a particular difficulty into account with your date format. Let's say you have the date 20112020, i.e. November 20th in the year 2020. For your function, it is not easy to distinguish which part of the string is the year - is it 2011 or 2020? The following code takes this difficulty into account, though let me mention that there surely must be simpler solutions.
Code
NonID <- grepl("^2", sampleDates) & (substr(sampleDates, 5, 5) == "2")
ID <- !NonID
dates_normal <- sampleDates[!NonID]
dates_special <- sampleDates[NonID]
normal_years <- as.numeric(c(substr(dates_normal, nchar(dates_normal) - 3, nchar(dates_normal)), substr(dates_normal, 1, 4)))
normal_years <- normal_years[normal_years > 1999]
special_years <- as.numeric(substr(dates_special, nchar(dates_special) - 3, nchar(dates_special)))
all_years <- c(normal_years, special_years)
all_years
> all_years
[1] 2010 2010
Explanation
First, we divide the date vector into those dates which exhibit the indistinguishability (dates_normal) and those which do not (dates_special). Then, for the normal dates, we use the substr() function to extract the first four and last four digits of the string and keep only those values which exceed 2000. For the special dates, we only keep the last four digits because the year can't be possibly included in the first four digits for this date format.

Week Number in Q/KDB

I'm looking to map a date/week to the Week number of the year.
I've thought about subtracting the start of the year, and dividing by 7 - however it might not line up correctly.
e.g.
2020.01.02 -> Week 1
2020.01.06 -> Week 2

I would suggest to use following function:
weekOfYear: {1+floor (x-`week$"d"$12 xbar"m"$x)%7}
This function
Finds the first Monday before or on 1st Jan. E.g. {(`week$"d"$12 xbar"m"$x)}2020.01.01 returns 2019.12.30
Then finds difference in days between x and the first Monday
Divides difference by 7 and adds 1, which returns result you are looking for
For example
weekOfYear 2019.12.31 2020.01.01 2020.01.02 2020.01.05 2020.01.06 2020.01.07
returns
53 1 1 1 2 2

Just to build on Antons great answer, you could also use the div function instead of flooring it, which would look something like
{1 + (x - `week $ `date $ 12 xbar `month $ x) div 7}

Take out date values between two dates from matrix variable, Matlab

I'm trying to take out two separate years from a date table.
% Date table
Datez = [2001 2;2001 5;2001 9;2001 11;2002 3;2002 5;2002 7;2002 9;2002 11;...
2003 2;2003 4;2003 6;2003 8;2003 10;2003 12;2004 3;2004 5;2004 7;...
2004 9;2004 11; 2005 10;2005 12]
I want to take out all values as 1 or 0. I want the dates from 2001-11 to 2002-11 plus all values from 2004-11 to 2005-11.
In total I should get a new vector, called test:
test = [0;0;0;1;1;1;1;1;1;0;0;0;0;0;0;0;0;0;0;1;1;0] % final result
I tried these combinations, but I don't know how to combine these four statements into a vector that looks like "test" or if there are any better solutions?
xjcr = 1:length(Datez)
(Datez(xjcr,1) >= 2001 & Datez(xjcr,2) >= 11) % greater than 2001-11
(Datez(xjcr,1) <= 2002 & Datez(xjcr,2) <= 11) % smaller than 2002-11
(Datez(xjcr,1) >= 2004 & Datez(xjcr,2) >= 11) % greater than 2004-11
(Datez(xjcr,1) <= 2005 & Datez(xjcr,2) <= 11) % smaller than 2005-11
Any ideas are much appreciated, thanks in advance!

Your issue is that you do not want to filter on two items independently, years greater than 2001 and months greater than November. This would give you December 2001 but not January 2002. The solution I believe is to treat your two composite numbers as a single number so that the comparison operator can operate on them as a pair. Here is an easy method:
Datez2 = Datez(:,1)*100 + Datez(:,2);
test = (Datez2>=200111 & Datez2<=200211) | (Datez2>=200411 & Datez2<=200511)
Maybe multiplying by 12 and adding (month - 1) would be best depending on if you are building something that needs to be very robust or if you are just hacking something together.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Calculate 4 previous quarters from a date - pyspark - pyspark

I am using the below to get the previous quarter from 'new_date' and it works great. How do I calculate 4 quarters back? (F.expr("date_add(date_trunc('quarter', cast(new_date as date)), -1)"))

Related

Stata: is there a way to count months between two dates, given in year-month format?

CDO - Resample netcdf files from monthly to daily timesteps

lubridate assigns unrealistic date

Week Number in Q/KDB

Take out date values between two dates from matrix variable, Matlab

Categories

Resources