Certain Range of Date in each Month - date

I'd like to have a range of day 20th - 25th in each month in BigQuery but i dont know what syntax should i use. For ex:
Jan 20 - 25
Feb 20 - 25
and so on
I only can think of creating a CTE for every month then union all those.

Consider below query.
SELECT DATE_ADD(month, INTERVAL day - 1 DAY) date_range,
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-03-01', INTERVAL 1 MONTH)) month,
UNNEST(GENERATE_ARRAY(20, 25)) day;
Query results
Below seem to be more simple than my original answer and you could adjust date range by specifying condition on WHERE clause.
SELECT *
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-12-31', INTERVAL 1 DAY)) date_range
WHERE EXTRACT(DAY FROM date_range) BETWEEN 21 AND 25
For the usecase that you commented,
WHERE EXTRACT(DAY FROM date_range) >= 21 OR EXTRACT(DAY FROM date_range) = 1

Related

How to split and aggregate days into different month

db fiddle
run select *, return_date - pickup_date as total from order_history order by id; return the following result:
id pickup_date return_date date_ranges total
1 2020-03-01 2020-03-12 [2020-03-01,2020-04-01) 11
2 2020-03-01 2020-03-22 [2020-03-01,2020-04-01) 21
3 2020-03-11 2020-03-22 [2020-03-01,2020-04-01) 11
4 2020-02-11 2020-03-22 [2020-02-01,2020-03-01) 40
5 2020-01-01 2020-01-22 [2020-01-01,2020-02-01) 21
6 2020-01-01 2020-04-22 [2020-01-01,2020-02-01) 112
for example:
--id=6. total = 112. 112 = 22+ 31 + 29 + 30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.
but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')
step-by-step demo: db<>fiddle
Not quite perfect, but a sketch:
SELECT
id,
ARRAY_AGG( -- 4
LEAST(return_date, gs + interval '1 month - 1 day') -- 2
- GREATEST(pickup_date, gs) -- 3
+ interval '1 day'
)
FROM order_history,
generate_series( -- 1
date_trunc('month', pickup_date),
date_trunc('month', return_date),
interval '1 month'
) gs
GROUP BY id
Generate a set of months that are included in the given date range
a) Calculate the last day of the month (first of a month + 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
Same for pickup day. Afterwards calculate the difference of the days kept in one month.
Aggregate the values for one month.
Open questions / Potential enhancements:
You said:
jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?
If you don't want to count the very first day, then you can change (3.) to
GREATEST(pickup_date + interval '1 day', gs)
There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.

I need help in writing a subquery

I have a query like this to create date series:
Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"
And the table looks like this:
month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Now I want to count the status from the KYC table.
So I try this:
Select
(Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"),
count(*) filter (where status = 4) as "KYC_Success"
From kyc
group by 1
I hope the result will be like this:
Month | KYC_Success
Jan | 234
Feb | 435
Mar | 546
Apr | 157
But it said
error: more than one row returned by a subquery used as an expression
What should I change in this query?
Let us assume that the table KYC has a timestamp column called created_date and the status column, and, that you want to count the success status per month - even if there was zero success items in a month.
SELECT thang.month
, count(CASE WHEN kyc.STATUS = 'success' THEN 1 END) AS successes
FROM (
SELECT to_char(created_date, 'Mon') AS Month
, created_date::DATE AS start_date
, (created_date::DATE + interval '1 month - 1 day ')::DATE AS end_date
FROM generate_series(DATE '2021-01-26', DATE '2022-04-26', interval '1 month') AS g(created_date)
) AS "thang"
LEFT JOIN kyc ON kyc.created_date>= thang.start_date
AND kyc.created_date < thang.end_date
GROUP BY thang.month;

Age Less than or Equal to a month evaluates to False

When evaluating intervals, postgres appears to define a month as 30 days exactly, even when there are 31 days in a month:
select age('2021-03-31 23:59:59.999', '2021-03-01'::date)
Returns: 30 days 23:59:59.999
Which in the case of March is Less than 1 month.
Yet:
select age('2021-03-31 23:59:59.999', '2021-03-01'::date) <= '1 month'
Evaluates to false.
A (not very clean) solution to this is:
select age('2021-03-31 23:59:59.999', '2021-03-01'::date) <= case (select DATE_PART('days', DATE_TRUNC('month', '2021-03-31'::Date) + '1 MONTH'::interval - '1 DAY'::INTERVAL))
when 31 then '31 days'::interval when 30 then '30 days'::interval
when 29 then '29 days'::interval else '28 days'::interval end
My question is in 2 parts:
Why does postgresql define a month as 30 days, particularly in the case where I give two dates as input to a builtin function?
Is there a cleaner solution to my problem than the above snippet?
Perhaps interval '1 month' is ambiguous. Is it 28, 29, 30 or 31 days as all them are correct depending upon which month. With nothing to compare it seems to just choose 1. Try reformulating the comparison.
select '2021-03-31 23:59:59.999'::timestamp - interval '1 month' < '2021-03-01'::date

Count days for each month between two dates - postgresql

I am trying to write a query which gives the number of days in each month between two specified dates.
Example:
date 1: 2018-01-01
date 2: 2018-05-23
Expected Output:
month days
2018-01-01, 31
2018-02-01, 28
2018-03-01, 31
2018-04-01, 30
2018-05-01, 23
Use generate_series and group by date_trunc
SELECT date_trunc('month',dt) AS month,
COUNT(*) as days
FROM generate_series( DATE '2018-01-01',DATE '2018-05-23',interval '1 DAY' )
as dt group by date_trunc('month',dt)
order by month;
Demo

Retrieving the start and end hour queries correctly in PostgreSQL Query

I have a CTE-based query in which I retrieve hourly intervals between two given timespans. My query works as following:
Getting start and end datetimes (let's say 07-13-2011 00:21:09 and 07-31-2011 21:11:21)
get the hourly total query values between the hourly intervals (in here it's from 00 to 21, a total of 21 hours but this is parametric and depends on the hours I give for the inputs) for each day. This query works well but there is a problem. It displays hourly amounts but for the start time, it gets all the queries between 00:00:00 and 00:59:59 for each day instead of 00:21:09 - 00:59:59 and same applies for the end time, it gets all the queries between 21:00:00 and 22:00:00 for each day instead of 21:00:00 and 21:11:21. -By the way, the other hour intervals e.g 03:00 - 04:00 etc are currently retrieved normally, no minute and seconds provided, just 1 hour flat intervals- How can I fix that? The query is below, thanks.
WITH cal AS (
SELECT generate_series('2011-02-02 00:00:00'::timestamp , '2012-04-01 05:00:00'::timestamp , '1 hour'::interval) AS stamp
)
, qqq AS (
SELECT date_trunc('hour', calltime) AS stamp
, count(*) AS zcount
FROM mytable
WHERE calltime >= '07-13-2011 00:21:09' AND calltime <='07-31-2011 21:11:21' AND date_part('hour', calltime) >= 0 AND date_part('hour', calltime) <= 21
GROUP BY date_trunc('hour', calltime)
)
SELECT cal.stamp
, COALESCE (qqq.zcount, 0) AS zcount
FROM cal
LEFT JOIN qqq ON cal.stamp = qqq.stamp
WHERE cal.stamp >= '07-13-2011 00:00:00' AND cal.stamp<='07-31-2011 21:11:21' AND date_part('hour', cal.stamp) >= 0 AND date_part('hour', cal.stamp) <= 21
ORDER BY stamp ASC;
EDIT:
What I mean with my problem is, despite giving 00:21:09 for my starting hour on first day, the days after that day calculate the total query count for the first hour interval as count of total queries between 00:00:00-01:00:00 instead of 00:21:09-01:00:00.(by the way this should apply to the first hour interval for every day, I can give 04:30:21 for the starting hour and the day will start to count total queries hourly starting from there etc.- Same applies to the ending hour 21:00:00-21:11:21, only the LAST day in the query results take this interval, other days before it take the query count between hour 21 and 22 by counting all queries between 21:00:00-22:00:00 instead of 21:00:00-21:11:21.
For example, if there are 200 queries between 00:00:00 and 01:00:00 on july 14 2011 (the next day after july 13, the start date) but there are 159 queries between 00:21:09 - 01:00:00, I should get 159 queries instead of 200. Also, if there are 300 queries between 21:00:00-22:00:00 on any random day, and 123 of them are between 21:00:00-21:11:21, I should get 123 queries as result instead of 300. (This applies to every single day, other hourly intervals should be counted as usual such as 01:00-02:00, 20:00-21:00 etc. This is parametric, hourly intervals and start-end times depend on user input-
Adding AND calltime::time >= '00:21:09' AND calltime::time <= '21:11:21' to the WHERE calltime >= '07-13-2011 00:21:09' AND calltime <='07-31-2011 21:11:21' block solved the issue.