How can group each week of each month? - Python Panda - date

I would like to make a column of which week of the month. like the 2022-10-04 will be the first week of October. How should I do it ?
Date
2022-10-04
2022-10-04
2022-10-06
2022-10-12
2022-10-19
2022-10-25
2022-10-31
2022-11-02
2022-11-03
I would like to look like this
Date
Week
2022-10-04
Week1_Oct
2022-10-04
Week1_Oct
2022-10-06
Week1_Oct
2022-10-12
Week1_Oct
2022-10-19
Week2_Oct
2022-10-25
Week3_Oct
2022-10-31
Week4_Oct
2022-11-02
Week1_Nov
2022-11-03
Week1_Nov

I worked out some code, but it seems that your sample data has a different result than mine.
import pandas as pd
from math import ceil
import calendar
#from this post https://stackoverflow.com/questions/3806473/week-number-of-the-month
def week_of_month(dt):
first_day = dt.replace(day=1)
dom = dt.day
adjusted_dom = dom + first_day.weekday()
week_num = int(ceil(adjusted_dom/7.0))
month_name = calendar.month_abbr[dt.month]
return f'Week{week_num}_{month_name}'
#Creating a sample DataFrame
df = pd.DataFrame({'date': ['2022-10-04', '2022-10-04', '2022-10-06', '2022-10-19', '2022-11-02']})
df['date'] = pd.to_datetime(df['date'])
df['week'] = df['date'].apply(week_of_month)
print(df)
Result:
date
week
2022-10-04
Week2_Oct
2022-10-04
Week2_Oct
2022-10-06
Week2_Oct
2022-10-19
Week4_Oct
2022-11-02
Week1_Nov

Related

Value of date changes in scala when reading from yaml file

I have a YAML file, which has data -
time:
- days : 50
- date : 2020-02-30
I am reading both the things in my scala program -
val t = yml.get(time).asInstanceOf[util.Arraylist[util.LinkedHashMap[String, Any]]]
val date = t.get(1)
this output of date is - {date=Sat Feb 29 19:00:00 EST 2020}
or if I put the value in yaml file as date: 2020-02-25 the output will be {date=Mon Feb 24 19:00:00 EST 2020} and not {date=Tue Feb 25 19:00:00 EST 2020}.
Why is it always reducing the date value by 1?
Any help is highly appreciated.
PS - I want to validate the date; that is why the input is date: 2020-02-30

Split date into day of the week, month,year using Pyspark

I have very little experience in Pyspark and I am trying with no success to create 3 new columns from a column that contain the timestamp of each row.
The column containing the date has the following format:EEE MMM dd HH:mm:ss Z yyyy.
So it looks like this:
+--------------------+
| timestamp|
+--------------------+
|Fri Oct 18 17:07:...|
|Mon Oct 21 21:49:...|
|Thu Oct 31 18:03:...|
|Sun Oct 20 15:00:...|
|Mon Sep 30 23:35:...|
+--------------------+
The 3 columns have to contain: the day of the week as an integer (so 0 for monday, 1 for tuesday...), the number of the month and the year.
What is the most effective way to create these additional 3 columns and append them to the pyspark dataframe? Thanks in advance!!
Spark 1.5 and higher has many date processing functions. Here are some that maybe useful for you
from pyspark.sql.functions import *
from pyspark.sql.functions import year, month, dayofweek
df = df.withColumn('dayOfWeek', dayofweek(col('your_date_column')))
df = df.withColumn('month', month(col('your_date_column')))
df = df.withColumn('year', year(col('your_date_column')))

add 1 day to date when using CASE WHEN REDSHIFT

I am trying to use a CASE WHEN statement like below to add 1 day to a timestamp based on the time part of the timestamp:
CASE WHEN to_char(pickup_date, 'HH24:MI') between 0 and 7 then y.pickup_date else dateadd(day,1,y.pickup_date) end as ead_target
pickup_Date is a timestamp with default format YYYY-MM-DD HH:MM:SS
My output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-01 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00
When the hour of the day is between 0 and 7 then ead_target = pickup_Date ELSE add 1 day
Expected output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-02 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00
You will want to use the date_part() function to extract the hour of the day - https://docs.aws.amazon.com/redshift/latest/dg/r_DATE_PART_function.html
Your case statement should work if you extract 'hour' from the timestamp and compare it to the range 0 - 7.

Converting CDT timestamp into UTC format in spark scala

My Dataframe, myDF is like bellow -
DATE_TIME
Wed Sep 6 15:24:27 CDT 2017
Wed Sep 6 15:30:05 CDT 2017
Expected output in format :
2017-09-06 15:24:27
2017-09-06 15:30:05
Need to convert DATE_TIME timestamp to UTC.
Tried the below code in databricks notebook but it's not working.
%scala
val df = Seq(("Wed Sep 6 15:24:27 CDT 2017")).toDF("times")
df.withColumn("times2",date_format(to_timestamp('times,"ddd MMM dd hh:mm:ss CDT yyyy"),"yyyy-MM-dd HH:mm:ss")).show(false)
times | times2
Wed Sep 6 15:24:27 CDT 2017 | null
I think we need to remove wed from your string then use to_timestamp() function.
Example:
df.show(false)
/*
+---------------------------+
|times |
+---------------------------+
|Wed Sep 6 15:24:27 CDT 2017|
+---------------------------+
*/
df.withColumn("times2",expr("""to_timestamp(substring(times,5,length(times)),"MMM d HH:mm:ss z yyyy")""")).
show(false)
/*
+---------------------------+-------------------+
|times |times2 |
+---------------------------+-------------------+
|Wed Sep 6 15:24:27 CDT 2017|2017-09-06 15:24:27|
+---------------------------+-------------------+
*/

converting specific string format to date in sparksql

I have a column that contains a string with the following date as a string Sat Sep 14 09:54:30 UTC 2019. Not familiar with format at all.
I need to convert to date or timestamp. Just a unit that I can compare against. I just need a point of comparison with a precision of one day.
This can help you get the timestamp from your string and then you get the days from it using Spark SQL(2.x)
spark.sql("""SELECT from_utc_timestamp(from_unixtime(unix_timestamp("Sat Sep 14 09:54:30 UTC 2019","EEE MMM dd HH:mm:ss zzz yyyy") ),"IST")as timestamp""").show()
+-------------------+
| timestamp|
+-------------------+
|2019-09-14 20:54:30|
+-------------------+