Convert Week of the Year to Date in PySpark - date

I have a PySpark dataframe with 'Week_of_the_year' column. '202005' means 5th week of year 2020.
How can I convert it to 'date' format, maybe convert to mid-date (Wednesday) of that week?
Example: I want '202005' to show as '2020-01-29'.

You can use to_date function on your date with 3(day of week: Wednesday) concatenated, like 2020053, where 2020 is year, 05 is week of year, 3 is week day number. Refer to Java Simple Date format for info on date time chars.
from pyspark.sql import functions as F
df.withColumn("new_date", F.to_date(F.concat("old_date",F.lit("3")), "yyyywwu")).show()
#+--------+----------+
#|old_date| new_date|
#+--------+----------+
#| 202005|2020-01-29|
#+--------+----------+

Related

Wrong day when using day()-formula with format - PowerBI

I'm trying to find out the weekday i.e Mon, Tue, Wed etc. from a date-range formatted as yyyy mm dd
I tried to use the formula format(day(Date Table),"ddd"), but the weekday is wrong. In my example, the output of 2020.01.01 gives Sunday, but it should be Wednesday.
I think your formula is wrong:
Instead of
format(day(Date Table),"ddd")
Use
format(<Target Table>[<date column>],"ddd")
I.e. Omit the DAX DAY call. This is resulting in the day of the month (1..31) being passed to the format function.
When you use the DAY function in DAX, it returns the day of the month (1 through 31).
Thus DAY ( DATE ( 2020, 1, 1) ) = 1 which means you're trying to format the number 1 as a date. Integers are interpreted as days since 1899/12/30 when treated as a date, so 1 corresponds to 1899/12/31, which happened to be a Sunday. Thus FORMAT(1, "ddd") = "Sun".
There's no reason to get DAY involved here. You can simply write
Day = FORMAT ( 'Calendar'[Date], "ddd" )

pyspark How to filter rows based on HH:mm:ss portion in timestamp column

I have a dataframe in pyspark that has a timestamp string column in the following format:
"11/21/2018 07:21:49 PM"
This is in 24 hours format.
I want to filter the rows in the dataframe based on only the time portion of this string timestamp regardless of the date. For example I want to keep all rows that fall between the hours of 2:00pm and 4:00pm inclusive.
I tried the below to extract the HH:mm:ss and use the function between but it is not working.
# Grabbing only time portion from datetime column
import pyspark.sql.functions as F
time_format = "HH:mm:ss"
split_col = F.split(df['datetime'], ' ')
df = df.withColumn('Time', F.concat(split_col.getItem(1),F.lit(' '),split_col.getItem(2)))
df = df.withColumn('Timestamp', from_unixtime(unix_timestamp('Time', format=time_format)))
df.filter(F.col("Timestamp").between('14:00:00','16:00:00')).show()
Any ideas on how to filter rows only based on the HH:mm:ss portion in a timestamp column regardless of the actual date, would be very appreciated.
Format your timestamp to HH:mm:ss then filter using between clause.
Example:
df=spark.createDataFrame([("11/21/2018 07:21:49 PM",),("11/22/2018 04:21:49 PM",),("11/23/2018 12:21:49 PM",)],["ts"])
from pyspark.sql.functions import *
df.withColumn("tt",from_unixtime(unix_timestamp(col("ts"),"MM/dd/yyyy hh:mm:ss a"),"HH:mm:ss")).\
filter(col("tt").between("12:00","16:00")).\
show()
#+----------------------+--------+
#|ts |tt |
#+----------------------+--------+
#|11/23/2018 12:21:49 PM|12:21:49|
#+----------------------+--------+

Comparing date values for two columns

I have a column called PairDt, a string that contains a date value in the last 5 characters. I want to compare that date value with the date value in the Day column, which contains dates in the YYYY-MM-DD format.
PairDt Day
----------------------------------
DCS-CNY-Yunbi-42606 2016-08-24
DCS-CNY-Yunbi-42607 2016-08-25
DCS-CNY-Yunbi-42608 2016-08-26
DCS-CNY-Yunbi-42609 2016-08-27
DCS-CNY-Yunbi-42610 2016-08-28
How do I convert Day to a value?
I'm trying to isolate Date values in PairDt that does not match the date value in Days
This 5 digit number at the end of PairDt looks like number of days since December 30th 1899. To convert this number to date use DATEADD to add as many days. To convert a date to number, use DATEDIFF to calculate the number of days. Something like this code:
declare #PairDt varchar(50) = 'DCS-CNY-Yunbi-42606', #Day date = '2016-08-24'
select DATEADD(d, cast(right(#PairDt, 5) as int), '1899-12-30'), DATEDIFF(day, '1899-12-30', #Day)

Subtracting 1 ISO 8601 year from a date in BigQuery

I'm trying to manipulate a date value to go back in time exactly 1 ISO-8601 year.
The following does not work, but best describes what I want to accomplish:
date_add(date '2018-01-03', interval -1 isoyear)
I tried string conversion as an intermediate step, but that doesn't work either:
select parse_date('%G%V%u',safe_cast(safe_cast(format_date('%G%V%u',date '2018-01-03') as int64)-1000 as string))
The error provided for the last one is "Failed to parse input string "2017013"". I don't understand why, this should always resolve to a unique date value.
Is there another way in which I can subtract an ISO year from a date?
This gives the corresponding day of the previous ISO year by subtracting the appropriate number of weeks from the date. I based the calculation on the description of weeks per year from the Wikipedia page:
CREATE TEMP FUNCTION IsLongYear(d DATE) AS (
-- Year starting on Thursday
EXTRACT(DAYOFWEEK FROM DATE_TRUNC(d, YEAR)) = 5 OR
-- Leap year starting on Wednesday
(EXTRACT(DAY FROM DATE_ADD(DATE(EXTRACT(YEAR FROM d), 2, 28), INTERVAL 1 DAY)) = 29
AND EXTRACT(DAYOFWEEK FROM DATE_TRUNC(d, YEAR)) = 4)
);
CREATE TEMP FUNCTION PreviousIsoYear(d DATE) AS (
DATE_SUB(d, INTERVAL IF(IsLongYear(d), 53, 52) WEEK)
);
SELECT PreviousIsoYear('2018-01-03');
This returns 2017-01-04, which is the third day of the 2017 ISO year. 2018-01-03 is the third day of the 2018 ISO year.

Given an ISO 8601 week number, get date of first day of that week in LibreOffice Calc spreadsheet

LibreOffice Calc spreadsheet offers a function ISOWEEKNUM to return the standard ISO 8601 week number of the specified date.
I want the opposite.
➠ Given a standard week number, give me the date of the first day of that week (the Monday date).
Passing integers is acceptable. Also nice if able to pass a string in standard format.
Like this:
DATE_OF_ISOWEEKNUM( 2017 , 42 ) ➝ date of Monday of week 42 in week-based year 2017
DATE_OF_ISOWEEKNUM( "2017-W42" ) ➝ date of Monday of week 42 in week-based year 2017
Ideally, I would be able to pass a number 1-7 for Monday-Sunday to specify the day-of-week for which I want a date. Something like this:
DATE_OF_ISOWEEKNUM( 2017 , 42 , 1 ) ➝ date of Monday of week 42 in week-based year 2017
DATE_OF_ISOWEEKNUM( "2017-W42-1" ) ➝ date of Monday of week 42 in week-based year 2017
DATE_OF_ISOWEEKNUM( 2017 , 42 , 7 ) ➝ as above, but Sunday
DATE_OF_ISOWEEKNUM( "2017-W42-7" ) ➝ as above, but Sunday
Example:
Formula:
=DATE(B$1,1,$A4*7)+(2-WEEKDAY(DATE(B$1,1,$A4*7)))-7*(ISOWEEKNUM(DATE(B$1,1,1))=1)
Calculate the date of day (weeknumber * 7) in the year.
Correct the day to be weekday Monday.
Correct to 7 days before, if the first day of the year is in the
first ISO weeknumber.