add 1 day to date when using CASE WHEN REDSHIFT - amazon-redshift

I am trying to use a CASE WHEN statement like below to add 1 day to a timestamp based on the time part of the timestamp:
CASE WHEN to_char(pickup_date, 'HH24:MI') between 0 and 7 then y.pickup_date else dateadd(day,1,y.pickup_date) end as ead_target
pickup_Date is a timestamp with default format YYYY-MM-DD HH:MM:SS
My output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-01 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00
When the hour of the day is between 0 and 7 then ead_target = pickup_Date ELSE add 1 day
Expected output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-02 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00

You will want to use the date_part() function to extract the hour of the day - https://docs.aws.amazon.com/redshift/latest/dg/r_DATE_PART_function.html
Your case statement should work if you extract 'hour' from the timestamp and compare it to the range 0 - 7.

Related

date computation give a detailed calendar

given a detailed calendar,
Sunday
hrs_per_day 0
Monday
07:00:00 11:59:00 5 hours
13:00:00 15:59:00 3 hours
hrs_per_day 8
Tuesday
07:00:00 11:59:00 5 hours
13:00:00 15:59:00 3 hours
hrs_per_day 8
Wednesday
07:00:00 11:59:00 5 hours
13:00:00 15:59:00 3 hours
hrs_per_day 8
Thursday
07:00:00 11:59:00 5 hours
13:00:00 15:59:00 3 hours
hrs_per_day 8
Friday
07:00:00 12:59:00 6 hours
hrs_per_day 6
Saturday
hrs_per_day 0
hrs_per_week 38
how can i compute start and end dates of a task based on its duration?
suppose i have a task that can start after Sunday 8 AM, and it will take 23 (8+8+7) hours of work.
then the start date should be Monday 07:00:00, and the end date should be Wednesday 15:00:00.
I can try to find out the dates manually, but not sure how to implement it in a program
function get_start_end_dates(can_start_after, duration_hrs, calendar_data){
// ??????????
return {start_date, end_date}
}

How to INSERT repeated values like (a,b,c,d,a,b,c,d....) in DB table?

I try to make work schedule table.
I have a table like:
shift_starts_dt
shift_type
2022-01-01 08:00:00
Day
2022-01-01 20:00:00
Night
2022-01-02 08:00:00
Day
2022-01-02 20:00:00
Night
2022-01-03 08:00:00
Day
2022-01-03 20:00:00
Night
2022-01-04 08:00:00
Day
2022-01-04 20:00:00
Night
etc.. until the end of the year
I can't figure out how to add repeated values to table.
I want to add the 'shift_name' column that contains 'A','B','C','D' (It's like name for team)
What query should I use to achieve the next result:
shift_starts_dt
shift_type
shift_name
2022-01-01 08:00:00
Day
'A'
2022-01-01 20:00:00
Night
'B'
2022-01-02 08:00:00
Day
'C'
2022-01-02 20:00:00
Night
'D'
2022-01-03 08:00:00
Day
'A'
2022-01-03 20:00:00
Night
'B'
2022-01-04 08:00:00
Day
'C'
2022-01-04 20:00:00
Night
'D'
. . . . . .
Use number of half days since Jan 1 modulus 4 to index an array:
select
shift_starts_dt,
shift_type,
(array['A','B','C','D'])[(extract(epoch from shift_starts_dt - '2022-01-01')::int / 43200) % 4 + 1]
from work_schedule
See live demo.
You could replace '2022-01-01' with (select min(shift_starts_dt) from work_schedule) for a more general solution.

Is there a way to do a selective sum using a time interval in Postgres?

I have two tables, the first table has columns: id, start_time, and end_time. The second table has columns: id, timestamp, value. Is there a way to make a sum of table 2 based on the conditions in table 1?
Table 1:
id
start_date
end_date
5
2000-01-01 01:00:00
2000-01-05 02:45:00
5
2000-01-10 01:00:00
2000-01-15 02:45:00
6
2000-01-01 01:00:00
2000-01-05 02:45:00
6
2000-01-11 01:00:00
2000-01-12 02:45:00
6
2000-01-15 01:00:00
2000-01-20 02:45:00
Table 2:
id
timestamp
value
5
2000-01-01 05:00:00
1
5
2000-01-01 06:00:00
2
6
2000-01-01 05:00:00
1
6
2000-01-11 05:00:00
2
6
2000-01-15 05:00:00
2
6
2000-01-15 05:30:00
2
Desired result:
id
start_date
end_date
Sum
5
2000-01-01 01:00:00
2000-01-05 02:45:00
3
5
2000-01-10 01:00:00
2000-01-15 02:45:00
null
6
2000-01-01 01:00:00
2000-01-05 02:45:00
1
6
2000-01-11 01:00:00
2000-01-12 02:45:00
2
6
2000-01-15 01:00:00
2000-01-20 02:45:00
4
Try this :
SELECT a.id, a.start_date, a.end_date, sum(b.value) AS sum
FROM table1 AS a
LEFT JOIN table2 AS b
ON b.id = a.id
AND b.timestamp >= a.start_date
AND b.timestamp < a.end_date
GROUP BY a.id, a.start_date, a.end_date

Get COUNT of hours on 30 mins Interval

I have this below data.
Date Interval
2014-01-01 12:00 AM
2014-01-01 12:30 AM
2014-01-01 1:00 AM
2014-01-01 1:30 AM
2014-01-01 2:00 AM
2014-01-01 2:30 AM
2014-01-01 3:00 AM
2014-01-01 3:30 AM
2014-01-01 4:00 AM
2014-01-01 4:30 AM
I need to extract the hour of the interval column.
I could do it using EXTRACT('hour', Interval) which gives me the hour numbers as int.
Result will be as follows:
Date Interval HourCount
2014-01-01 12:00 AM 0
2014-01-01 12:30 AM 0
2014-01-01 1:00 AM 1
2014-01-01 1:30 AM 1
2014-01-01 2:00 AM 2
2014-01-01 2:30 AM 2
2014-01-01 3:00 AM 3
2014-01-01 3:30 AM 3
2014-01-01 4:00 AM 4
2014-01-01 4:30 AM 4
But what I'm looking for is. I need the count for every 30 mins as 1.
Example data what I'm looking for.
Date Interval HourCount
2014-01-01 12:00 AM 1
2014-01-01 12:30 AM 2
2014-01-01 1:00 AM 3
2014-01-01 1:30 AM 4
2014-01-01 2:00 AM 5
2014-01-01 2:30 AM 6
2014-01-01 3:00 AM 7
2014-01-01 3:30 AM 8
2014-01-01 4:00 AM 9
2014-01-01 4:30 AM 10
This way, in a day I'll be getting 48 intervals.
I could use ROW_NUMBER() OVER (PARTITION BY Date ORDER BY Date). But this will give me wrong count if any interval is been missed out.
Suppose if the below row is missed out.
2014-01-01 4:00 AM 9
I'll be getting 9 as the HourCount for this row.
2014-01-01 4:30 AM 9
Someone help me to get the count of hours on 30 mins interval.
Try something like:
with series as
(select interval_num, interval_num * interval '30 minutes' as interval_time
from generate_series(0,47) series(interval_num))
select *
from data_table
right join series on series.interval_time = data_table.interval
Like you wrote, you can extract the hour, what you're missing is you can also extract the minutes and just check if they are bigger or equal 30.
SELECT date_part('hour', interval) * 2 + CASE WHEN date_part('minute', interval) >= 30 THEN 1 ELSE 0 END

Pandas: Combine resampling with groupby and calculate time differences

I am doing data analysis with trading data. I would like to use Pandas in order to examine the times when the traders are active.
In particular, I try to extract the difference in minutes between the dates of every first trade of every trader for each day and cumulate it to a monthly basis
The data looks like this:
Timestamp (Datetime) | Buyer | Volume
--------------------------------------
2012-01-01 09:00:00 | John | 10
2012-01-01 10:00:00 | Mark | 10
2012-01-01 16:00:00 | Mark | 10
2012-01-01 11:00:00 | Kevin | 10
2012-02-01 10:00:00 | Mark | 10
2012-02-01 09:00:00 | John | 10
2012-02-01 17:00:00 | Mark | 10
Right now I use resampling to retrieve the first trade on a daily basis. However, I want to group also by the buyer to calculate the differences in their trading dates. Like this
Timestamp (Datetime) | Buyer | Volume
--------------------------------------
2012-01-01 09:00:00 | John | 10
2012-01-01 10:00:00 | Mark | 10
2012-01-01 11:00:00 | Kevin | 10
2012-01-02 10:00:00 | Mark | 10
2012-01-02 09:00:00 | John | 10
Overall I am looking to calculate the differences in minutes between the first trades on a daily basis for each trader.
Update
For example in the case of John on the 2012-01-01: Dist = 60 (Diff John-Mark) + 120 (Diff John-Kevin) = 180
I would highly appreciate if anyone has an idea how to do this.
Thank you
Your original frame (the resampled one)
In [71]: df_orig
Out[71]:
buyer date volume
0 John 2012-01-01 09:00:00 10
1 Mark 2012-01-01 10:00:00 10
2 Kevin 2012-01-01 11:00:00 10
3 Mark 2012-01-02 10:00:00 10
4 John 2012-01-02 09:00:00 10
Set the index to the date column, keeping the date column in place
In [75]: df = df_orig.set_index('date',drop=False)
Create this aggregation function
def f(frame):
frame.sort('date',inplace=True)
frame['start'] = frame.date.iloc[0]
return frame
Groupby the single date
In [74]: x = df.groupby(pd.TimeGrouper('1d')).apply(f)
Create the differential in minutes
In [86]: x['diff'] = (x.date-x.start).apply(lambda x: float(x.item().total_seconds())/60)
In [87]: x
Out[87]:
buyer date volume start diff
date
2012-01-01 2012-01-01 09:00:00 John 2012-01-01 09:00:00 10 2012-01-01 09:00:00 0
2012-01-01 10:00:00 Mark 2012-01-01 10:00:00 10 2012-01-01 09:00:00 60
2012-01-01 11:00:00 Kevin 2012-01-01 11:00:00 10 2012-01-01 09:00:00 120
2012-01-02 2012-01-02 09:00:00 John 2012-01-02 09:00:00 10 2012-01-02 09:00:00 0
2012-01-02 10:00:00 Mark 2012-01-02 10:00:00 10 2012-01-02 09:00:00 60
Here's the explanation. We use the TimeGrouper to have the grouping by date, where a frame is passed to the function f. This function, then uses the first date of the day (the sort is necessary here). You subtract this from the date on the entry to get a timedelta64, which is then massaged to minutes (this is a bit hacky right now because of some numpy issues, should be more natural in 0.12)
Thanks for you update, I originally thought you wanted the diff per buyer, not from the first buyer, but that's just a minor tweak.
Update:
To track the buyer name as well (which corresponds to the start date), just include
it in the function f
def f(frame):
frame.sort('date',inplace=True)
frame['start'] = frame.date.iloc[0]
frame['start_buyer'] = frame.buyer.iloc[0]
return frame
Then can groupby on this at the end:
In [14]: x.groupby(['start_buyer']).sum()
Out[14]:
diff
start_buyer
John 240