How to split and aggregate days into different month - postgresql

db fiddle
run select *, return_date - pickup_date as total from order_history order by id; return the following result:
id pickup_date return_date date_ranges total
1 2020-03-01 2020-03-12 [2020-03-01,2020-04-01) 11
2 2020-03-01 2020-03-22 [2020-03-01,2020-04-01) 21
3 2020-03-11 2020-03-22 [2020-03-01,2020-04-01) 11
4 2020-02-11 2020-03-22 [2020-02-01,2020-03-01) 40
5 2020-01-01 2020-01-22 [2020-01-01,2020-02-01) 21
6 2020-01-01 2020-04-22 [2020-01-01,2020-02-01) 112
for example:
--id=6. total = 112. 112 = 22+ 31 + 29 + 30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.
but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')

step-by-step demo: db<>fiddle
Not quite perfect, but a sketch:
SELECT
id,
ARRAY_AGG( -- 4
LEAST(return_date, gs + interval '1 month - 1 day') -- 2
- GREATEST(pickup_date, gs) -- 3
+ interval '1 day'
)
FROM order_history,
generate_series( -- 1
date_trunc('month', pickup_date),
date_trunc('month', return_date),
interval '1 month'
) gs
GROUP BY id
Generate a set of months that are included in the given date range
a) Calculate the last day of the month (first of a month + 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
Same for pickup day. Afterwards calculate the difference of the days kept in one month.
Aggregate the values for one month.
Open questions / Potential enhancements:
You said:
jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?
If you don't want to count the very first day, then you can change (3.) to
GREATEST(pickup_date + interval '1 day', gs)
There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.

Related

Certain Range of Date in each Month

I'd like to have a range of day 20th - 25th in each month in BigQuery but i dont know what syntax should i use. For ex:
Jan 20 - 25
Feb 20 - 25
and so on
I only can think of creating a CTE for every month then union all those.
Consider below query.
SELECT DATE_ADD(month, INTERVAL day - 1 DAY) date_range,
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-03-01', INTERVAL 1 MONTH)) month,
UNNEST(GENERATE_ARRAY(20, 25)) day;
Query results
Below seem to be more simple than my original answer and you could adjust date range by specifying condition on WHERE clause.
SELECT *
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-12-31', INTERVAL 1 DAY)) date_range
WHERE EXTRACT(DAY FROM date_range) BETWEEN 21 AND 25
For the usecase that you commented,
WHERE EXTRACT(DAY FROM date_range) >= 21 OR EXTRACT(DAY FROM date_range) = 1

I need help in writing a subquery

I have a query like this to create date series:
Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"
And the table looks like this:
month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Now I want to count the status from the KYC table.
So I try this:
Select
(Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"),
count(*) filter (where status = 4) as "KYC_Success"
From kyc
group by 1
I hope the result will be like this:
Month | KYC_Success
Jan | 234
Feb | 435
Mar | 546
Apr | 157
But it said
error: more than one row returned by a subquery used as an expression
What should I change in this query?
Let us assume that the table KYC has a timestamp column called created_date and the status column, and, that you want to count the success status per month - even if there was zero success items in a month.
SELECT thang.month
, count(CASE WHEN kyc.STATUS = 'success' THEN 1 END) AS successes
FROM (
SELECT to_char(created_date, 'Mon') AS Month
, created_date::DATE AS start_date
, (created_date::DATE + interval '1 month - 1 day ')::DATE AS end_date
FROM generate_series(DATE '2021-01-26', DATE '2022-04-26', interval '1 month') AS g(created_date)
) AS "thang"
LEFT JOIN kyc ON kyc.created_date>= thang.start_date
AND kyc.created_date < thang.end_date
GROUP BY thang.month;

Get week number, with weeks starting on Sunday, like Excel WEEKNUM

In PostgreSQL (I'm on version 9.6.6), what's the simplest way to get the week number, starting on Sunday?
DATE_PART('week',x) returns:
The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year. (doc)
Say my query is like:
WITH dates as (SELECT generate_series(timestamp '2014-01-01',
timestamp '2014-01-31',
interval '1 day'
)::date AS date
)
SELECT
date,
TO_CHAR(date,'Day') AS dayname,
DATE_PART('week',date) AS weekofyear
FROM dates
Returns:
date dayname weekofyear
--------------------------------
2014-01-01 Wednesday 1
2014-01-02 Thursday 1
2014-01-03 Friday 1
2014-01-04 Saturday 1
2014-01-05 Sunday 1 <- I want this to be 2
2014-01-06 Monday 2
2014-01-07 Tuesday 2
2014-01-08 Wednesday 2
So far I have tried:
SELECT
date,
TO_CHAR(date,'Day') AS dayname,
DATE_PART('week',date) AS week_iso,
DATE_PART('week',date + interval '1 day') AS week_alt
FROM dates
which won't quite work if the year begins on a Sunday.
Also, I want week 1 to contain January 1 of that year. So if January 1 is a Saturday, I want week 1 to be one day long (instead of being week 53 in the ISO style). This behavior is consistent with the Excel WEEKNUM function.
To get the week number of the year, with weeks starting on Sunday, we need to know how many Sundays between the first day of the year and the target date.
I adapted the solution here by #Erwin Brandstetter. This solution counts Sundays inclusive of the first day of the year and exclusive of the target date.
Then, because I want the first (partial) week to be week one (not zero), I need to add 1 unless the first day of the year is a Sunday (in which case it's already week one).
WITH dates as (SELECT generate_series(timestamp '2014-01-01',
timestamp '2014-01-31',
interval '1 day'
)::date AS date
)
SELECT
date,
TO_CHAR(date,'Day') AS dayname,
DATE_PART('week',date) AS week_iso,
((date - DATE_TRUNC('year',date)::date) + DATE_PART('isodow', DATE_TRUNC('year',date)) )::int / 7
+ CASE WHEN DATE_PART('isodow', DATE_TRUNC('year',date)) = 7 THEN 0 ELSE 1 END
AS week_sundays
FROM dates
Returns
date dayname weekofyear week_sundays
--------------------------------
2014-01-01 Wednesday 1 1
2014-01-02 Thursday 1 1
2014-01-03 Friday 1 1
2014-01-04 Saturday 1 1
2014-01-05 Sunday 1 2
2014-01-06 Monday 2 2
2014-01-07 Tuesday 2 2
To show how this works for years starting on Sunday:
2017-01-01 Sunday 52 1
2017-01-02 Monday 1 1
2017-01-03 Tuesday 1 1
2017-01-04 Wednesday 1 1
2017-01-05 Thursday 1 1
2017-01-06 Friday 1 1
2017-01-07 Saturday 1 1
2017-01-08 Sunday 1 2
The task is not as daunting as it first appears. It mainly requires finding the first Sun on or after the 1-Jan. That date becomes the last day of the first week. From there calculation of subsequent weeks is merely. a matter of addition. The other significant point is with week definition there will always be 53 week per year and the last day of the last week is 31-Dec. The following generates an annual calendar for this week definition.
create or replace function non_standard_cal(year_in integer)
returns table (week_number integer, first_day_of_week date, last_day_of_week date)
language sql immutable leakproof strict rows 53
as $$
with recursive cal as
(select 1 wk, d1 start_of_week, ds end_of_week, de stop_date
from (select d1+substring( '0654321'
, extract(dow from d1)::integer+1
, 1)::integer ds
, d1, de
from ( select make_date (year_in, 1,1) d1
, make_date (year_in+1, 1,1) -1 de
) a
) b
union all
select wk+1, end_of_week+1, case when end_of_week+7 > stop_date
then stop_date
else end_of_week+7
end
, stop_date
from cal
where wk < 53
)
select wk, start_of_week, end_of_week from cal;
$$ ;
As a general rule I avoid magic numbers, but sometimes they're useful; as in this case. In magic number (actually a string) '0654321' each digit represents the number of days needed to reach the first Mon on or after 1-Jan when indexed by the standard day numbering system (0-6 as Sun-Sat). The result is the Mon being the last day of the first week. That generatess the 1st row of the recursive CTE. The remaining rows just add the appropriate number days for each week until the 53 weeks have been generated. The following shows the years needed to ensure each day of week gets it's turn to 1-Jan (yea some days duplicate). Run individual years to validate its calendar.
do $$
declare
cal record;
yr_cal cursor (yr integer) for
select * from non_standard_cal(2000+yr) limit 1;
begin
for yr in 18 .. 26
loop
open yr_cal(yr);
fetch yr_cal into cal;
raise notice 'For Year: %, week: %, first_day: %, Last_day: %, First day is: %'
, 2000+yr
,cal.week_number
,cal.first_day_of_week
,cal.last_day_of_week
,to_char(cal.first_day_of_week, 'Day');
close yr_cal;
end loop;
end; $$;
Following may work - tested with two cases in mind:
WITH dates as (SELECT generate_series(timestamp '2014-01-01',
timestamp '2014-01-10',
interval '1 day'
)::date AS date
union
SELECT generate_series(timestamp '2017-01-01',
timestamp '2017-01-10',
interval '1 day'
)::date AS date
)
, alt as (
SELECT
date,
TO_CHAR(date,'Day') AS dayname,
DATE_PART('week',date) AS week_iso,
DATE_PART('week',date + interval '1 day') AS week_alt
FROM dates
)
select date, dayname,
week_iso, week_alt, case when week_alt <> week_iso
then week_alt
else week_iso end as expected_week
from alt
order by date
Output:
date dayname week_iso week_alt expected_week
2014-01-01 Wednesday 1 1 1
2014-01-02 Thursday 1 1 1
2014-01-03 Friday 1 1 1
2014-01-04 Saturday 1 1 1
2014-01-05 Sunday 1 2 2
2014-01-06 Monday 2 2 2
2014-01-07 Tuesday 2 2 2
....
2017-01-01 Sunday 52 1 1
2017-01-02 Monday 1 1 1
2017-01-03 Tuesday 1 1 1
2017-01-04 Wednesday 1 1 1
2017-01-05 Thursday 1 1 1
2017-01-06 Friday 1 1 1
2017-01-07 Saturday 1 1 1
2017-01-08 Sunday 1 2 2
This query works perfectly replacing monday with sunday as the start of the week.
QUERY
SELECT CASE WHEN EXTRACT(day from '2014-01-05'::date)=4 AND
EXTRACT(month from '2014-01-05'::date)=1 THEN date_part('week',
'2014-01-05'::date) ELSE date_part('week', '2014-01-05'::date + 1)
END;
OUTPUT
date_part
-----------
2
(1 row)

Count days for each month between two dates - postgresql

I am trying to write a query which gives the number of days in each month between two specified dates.
Example:
date 1: 2018-01-01
date 2: 2018-05-23
Expected Output:
month days
2018-01-01, 31
2018-02-01, 28
2018-03-01, 31
2018-04-01, 30
2018-05-01, 23
Use generate_series and group by date_trunc
SELECT date_trunc('month',dt) AS month,
COUNT(*) as days
FROM generate_series( DATE '2018-01-01',DATE '2018-05-23',interval '1 DAY' )
as dt group by date_trunc('month',dt)
order by month;
Demo

How to get last 3 Months of "Monday to Sunday" dates In Redshift?

How can I get last 3 Months of "Monday to Sunday" dates in Redshift?
S.no Start_dt End_dt week
1 18-Jul-16 24-Jul-16 Week1
2 25-Jul-16 31-Jul-16 Week2
3 1-Aug-16 7-Aug-16 Week3
4 8-Aug-16 14-Aug-16 Week4
5 15-Aug-16 21-Aug-16 Week5
6 22-Aug-16 28-Aug-16 Week6
7 29-Aug-16 4-Sep-16 Week7
8 5-Sep-16 11-Sep-16 Week8
9 12-Sep-16 18-Sep-16 Week9
10 19-Sep-16 25-Sep-16 Week10
11 26-Sep-16 2-Oct-16 Week11
12 3-Oct-16 9-Oct-16 Week12
13 10-Oct-16 16-Oct-16 Week13
I've tried this:
select
trunc(date_trunc('week',sysdate)) st_dt,
trunc(date_trunc('week', sysdate)+6) ed_dt,
'week'||row_number() over (order by null) as week
but it only returns the current week's Monday and Sunday.
You can use generate_series() to generate a range of dates:
SELECT
trunc(day) as start_date,
trunc(day + 6) as end_date
FROM
(select date_trunc('week', sysdate) + (generate_series(1, 12) * interval '1 week') as day)
ORDER BY 1 ASC
This results in:
week start week end
2016-10-24 2016-10-30
2016-10-31 2016-11-06
2016-11-07 2016-11-13
2016-11-14 2016-11-20
2016-11-21 2016-11-27
2016-11-28 2016-12-04
2016-12-05 2016-12-11
2016-12-12 2016-12-18
2016-12-19 2016-12-25
2016-12-26 2017-01-01
2017-01-02 2017-01-08
2017-01-09 2017-01-15
Please note that generate_series() in Amazon Redshift cannot be joined with existing tables. It can only be used as a "Leader-only" query.