Pyspark: Return next weeks saturday - pyspark

I'm trying to return next weeks Saturday date from datatype column rel_d.
Normally, in python, I'd subtract number of days till next Saturday and add it to the rel_d
def next_saturday(dt):
next_sat_dt = dt + relativedelta(days=(12-dt.weekday())) # 12 as indexing starts from 0 in python
return next_sat_dt
creating a UDF in pyspark for the same seems like a bulky operation. Is there some spark operation which could do it faster?

You could use 2 next_day in pyspark to reach to next week's Saturday
Note that in pyspark day starts from Sunday (0) and ends on Saturday (7).
So, if you jump to next Sunday and then jump to next Saturday, it will be equal to your requirement.
Subsequently, You can also add multiples of 7 using F.day_add to reach nth week of your choice
df = df.withColumn('next_saturday_date',F.next_day(F.next_day(F.col('rel_d'), 'Sun'), 'Sat'))

Related

Fetching the 15th last working day date-yyyyMMdd (excluding only weekends) in Hive

I have a table with date column (date in string format yyyyMMdd). My requirement is to design a logic to fetch data from the table where "date column value equals to the date of the 15th previous working day" (excluding only Saturdays and Sundays) without using a UDF or a shell script. For example it is 21st Feb 2020 today; the logic should produce an output: 20200203.
Assuming you actually mean the 14th previous working day based on your example, and you are ignoring holidays, it is just a date_sub function with a case statement for day of the week.
case from_unixtime(unix_timestamp(event_date,'yyyyMMdd'),'u')
when 1 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),20),'-','')
when 2 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),20),'-','')
when 3 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),20),'-','')
when 4 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),20),'-','')
when 5 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),18),'-','')
when 6 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),18),'-','')
when 7 then regexp_replace(date_sub(from_unixtime(unix_timestamp(event_dt,'yyyymmdd' )),19),'-','')
end as new_date
This assumes Sat/Sun should be treated like Monday,
If Sat/Sun should be like Friday then make then use 19, 20.
If you need to account for holidays, then you need to create a calendar table with every day, and note which days are holidays, and then it is a join to the table and a some more logic that could be figured out if this is the case.

Function that takes a start and end date and counts how many Sundays between those dates fell on the 1st of the month on kdb+

Creating a function that takes a start and end date and counts how many Sundays between those dates fell on the 1st of the month on kdb+, how would I do this?
The function needs to show how many times this has happened since 1950
Let's define a function which returns a weekday of its argument (of type date) first.
The underlying value of a date is the count of days from 1/1/2000 and we know that 1/1/2000 was Saturday. The next day was obviously Sunday, then Monday etc. and every 7th, 14th, 21st, etc. day after and before Jan 1, 2000 was Saturday too. So if we take a date modulo 7 we'll get a weekday number where 0 is Saturday, 1 is Sunday, etc. which leads us to the following definition.
weekday:{ `sat`sun`mon`tue`wed`thu`fri x mod 7 }
Now we can create a function that answers the original question:
sundaysThe1st:{[start;end]sum `sun=weekday dates where 1=`dd$dates:start+til 1+end-start }
start+til 1+end-start generates a list of dates between start and end, dates where 1=`dd$dates returns only the first days of the months and `sun=weekday dates returns 1b if the 1st day of the month is Sunday and 0b otherwise. sum is effectively the number of 1's which is exactly what we need.
Hope this helps.

How do sum if the date is greater than 10 days from and including the upcoming Sunday?

I have 2 rows. One row has dates and the other row has numbers.
I want to build a google sheet formula that allows me to sum 10 days after the next Saturday. So if today is Monday 7/22/2019, I want the sum from Sunday (7/28/2019) through and including the next 9 days. If today was Tuesday (7/23/2019) the formula would return the same result, because it's always starting from the next Sunday.
Another way I'm looking at it is the sum of the 10 days after the current week number since google has the weeks beginning with Sunday.
Thanks for your help.
My current array formula is using weeknumber:
sumifs(rowwithnumbers,weeknum(rowwithdates),">"&weeknum(today()),weeknum(rowwithdates),"<="&weeknum(Today())+2))
but that gives me the next 14 days, which is close but not exactly what I want. I don't want to sum the next 2 weeks after the current week, I want to sum the next 1o days after the current week.
=ARRAYFORMULA(SUM(INDIRECT(
ADDRESS(2, MATCH(VLOOKUP(WEEKNUM(TODAY())+1, {WEEKNUM(ROW(INDIRECT(
"A"&DATEVALUE(TODAY())&":A"&DATEVALUE(TODAY()+8)))),ROW(INDIRECT(
"A"&DATEVALUE(TODAY())&":A"&DATEVALUE(TODAY()+8)))}, 2, 0), 1:1, 0), 4)&":"&
ADDRESS(2, MATCH(VLOOKUP(WEEKNUM(TODAY())+1, {WEEKNUM(ROW(INDIRECT(
"A"&DATEVALUE(TODAY())&":A"&DATEVALUE(TODAY()+8)))),ROW(INDIRECT(
"A"&DATEVALUE(TODAY())&":A"&DATEVALUE(TODAY()+8)))}, 2, 0), 1:1, 0)+9, 4))))
Although you could put it all in one formula, I think for clarity and debugging it would be good to use a separate sheet (call it "Dates") for the date calculations. Then use the results of the date calculations as arguments to your SUMIFS function.
In that case, the date calculation for the next Saturday is:
=IF(WEEKDAY(TODAY()) = 7, TODAY() + 7, TODAY() + 7-WEEKDAY(TODAY()))
That checks if today is a Saturday, and if it is, uses today plus 7 days. Otherwise, it just adds the number of days until the upcoming Saturday. If you store that cell in B2, for example, then you can calculate your end date with a simple:
=B2+10
If you store that in C2 of the "Dates" Sheet, and your value list is in a Sheet called "Values" with dates in column A and the numbers in column B, then your sum function is:
=SUMIFS(Values!$B:$B,Values!$A:$A, ">= " & Dates!B2, Values!$A:$A, "<= " & Dates!C2)
Here is a link to a Google Sheet showing all of above: Google Sheet

SSRS - How to show rolling 6 months where the start date is always a Saturday?

I need help on calculating my start date for my report date parameters.
The end date will always be the last Sunday, here: =DateAdd("d", 1 - WeekDay(Today(), 1), Today())
What I need help with is how to write a formula to go back 6 months from today and pick the 1st Saturday in that range..
Thanks in advance.
Assuming your start day is a Sunday then you can use this...
=DATEADD(
DateInterval.Day,
7 - WEEKEDAY( DATEADD(DateInterval.Month,-6,Today()), FirstDayOfWeek.Sunday),
DATEADD(DateInterval.Month,-6,Today())
)
This works as follows
WEEKEDAY( DATEADD(DateInterval.Month,-6,Today()), FirstDayOfWeek.Sunday)
Takes today's date, subtracts 6 months and then finds out what daynumber that is. Running that today (2018-11-08) gives use (2018-05-10) which is a Thursday, this is day number 5
Saturdays are day number 7 (if your first day of week is a Sunday). As there can be no higher number than 7 we can do a simple subtraction of 7 minus the day number we landed on (from above) which gives us a required adjustment of 2 days.
Finally the outer DATEADD function simply says add our calculated 2 days to the date 6 months ago.
Hope that makes sense!?
If the first day of the week is not a Sunday for you then you may have to do some Mod% calc on the second argument to calculate the correct number of days to adjust by.

DB2 separate number of weeks per quarter from timestamp

I'm trying to separate weeks from timestamp per quarter so it should be between 1-13 week per quarter so I used function week() but it takes between 1-52 week as whole year so I made it to be divided by function of quarter like below
select Week (EVENTTIMESTAMP) / QUARTER (EVENTTIMESTAMP) from KAP
The thing here that results aren't accurate; for example it shows:
time stamp 2014-07-06 12:13:03.018
week number 9
which isn't correct because July is first month in Q3 and it's in the 6 days so it should be 1 week from Q3 not 9.
Any suggestion where it go wrong?
You want something like WEEK modulo 13 to get week number within a quarter. You will have to tinker with 'modulo 13 yields 0..12' by adding or subtracting one at appropriate points.
Some minimal Google searching using 'ibm db2 sql modulo' yields DB2 MOD function:
The MOD function divides the first argument by the second argument and returns the remainder.
Hence MOD(WEEK(...), 13), except you probably need MOD(WEEK(...)-1, 13) + 1, as intimated already.
You may need to watch for what the WEEK() function does at year ends:
The WEEK function returns an integer in the range of 1 to 54 that represents the week of the year. The week starts with Sunday, and January 1 is always in the first week.
I'm curious about how they can come up with week 54. I suppose it requires 1st January to be a Saturday (so 2nd January is the start of week 2) of a leap year, as in 2000 and 2028. Note that week 53 and (occasionally) week 54 will show up as weeks 1 and 2 of Q5 unless you do something. Also, Saturday 2000-03-25 would be the end of Q1 and Sunday 2000-03-26 would be the start of Q2 under the regime imposed by the WEEK() function and a simple MOD(WEEK(...), 13) calculation. You're likely to have to tune this to meet your real requirements.
There's also the WEEK_ISO() function:
The WEEK_ISO function returns an integer in the range of 1 to 53 that represents the week of the year. The week starts with Monday and includes seven days. Week 1 is the first week of the year that contains a Thursday, which is equivalent to the first week that contains January 4.
Note that under the ISO scheme, the 3rd of January can be in week 52 or 53 of the previous year, and the 29th of December can be in week 1 of the next year. Curiously, there doesn't seem to be a YEAR_ISO() function to resolve such ambiguities.
In a data warehouse, the proper solution to this is to create a time dimension that contains static mappings for days/weeks/months/quarters/years. This provides the ability to define these based on your business' fiscal calendar (if it is not following on the calendar year).
See: http://www.kimballgroup.com/1997/07/10/its-time-for-time/ for more information.