Identify period intervals between two dates - postgresql
I have a table called billing_cycle and it has customer wise billing pay period information like monthly, weekly, bi-weekly, Quarterly, Yearly.
Table Columns : Customer , Frequency, billing_start_date
Example:
Customer , Frequency, billing_start_date
001 , Monthly , 04-Feb-2021
002 , Weekly , 01-Mar-2021
003 , Bi-Weekly , 01-Mar-2021
My requirement is, I want to identify (query) what are the billing periods based on frequency type for a customer between given date range (From and To)
For example, Given date range is 01-Feb-2021 to 30-Oct-2021.
Then out put for customer 001(Monthly frequency) is
Pay_period_start , Pay_period_end
01-Feb-2021 , 28-Feb-2021
01-Mar-2021 , 31-Mar-2021
01-Apr-2021 , 30-Apr-2021 and so on till
01-Oct-2021 to 31-Oct-2021
Output for customer 002 (weekly interval 7 days):
Pay_period_start , Pay_period_end
01-Feb-2021 , 07-Feb-2021
08-Feb-2021 , 14-Feb-2021
15-Feb-2021 , 21-Feb-2021
22-Feb-2021 , 28-Feb-2021
01-Mar-2021 , 07-Mar-2021 and so on till
31-Oct-2021
and similarly for Customer 003 on Bi-weekly basis(15 days).
Get the average time between pay peroids. Create a case statement
case when timediff = 7 then weekly when timediff = 14 then biweekly else monthly end
you can fill in the other values for quarterly yearly and such
Here's Oracle code; see if it helps.
Setting date format (just to see what is what; you don't have to do that):
SQL> alter session set nls_date_Format = 'dd.mm.yyyy';
Session altered.
Here we go (read comments within code):
SQL> with
2 -- This is your sample table
3 customer (customer, frequency, billing_start_date) as
4 (select '001', 'Monthly' , date '2021-04-02' from dual union all
5 select '002', 'Weekly' , date '2021-03-01' from dual union all
6 select '003', 'Bi-Weekly', date '2021-03-01' from dual
7 ),
8 -- Date range, as you stated
9 date_range (start_date, end_date) as
10 (select date '2021-02-01', date '2021-10-30' from dual
11 )
12 -- Billing periods
13 select
14 c.customer,
15 --
16 case when c.frequency = 'Monthly' then add_months(d.start_date, column_value - 1)
17 when c.frequency = 'Weekly' then d.start_date + ( 7 * (column_value - 1))
18 when c.frequency = 'Bi-Weekly' then d.start_date + (14 * (column_value - 1))
19 end as pay_period_start,
20 --
21 case when c.frequency = 'Monthly' then add_months(d.start_date, column_value - 0) - 1
22 when c.frequency = 'Weekly' then d.start_date + ( 7 * (column_value - 0)) - 1
23 when c.frequency = 'Bi-Weekly' then d.start_date + (14 * (column_value - 0)) - 1
24 end as pay_period_end
25 from customer c cross join date_range d
26 cross join table(cast(multiset(select level
27 from dual
28 connect by level <= case when c.frequency = 'Monthly' then months_between(d.end_date, d.start_date)
29 when c.frequency = 'Weekly' then (d.end_date - d.start_date) / 7
30 when c.frequency = 'Bi-Weekly' then (d.end_date - d.start_date) / 14
31 end + 1
32 ) as sys.odcinumberlist))
33 order by c.customer, pay_period_start;
Result:
CUSTOMER PAY_PERIOD_START PAY_PERIOD_END
---------- -------------------- --------------------
001 01.02.2021 28.02.2021
001 01.03.2021 31.03.2021
001 01.04.2021 30.04.2021
001 01.05.2021 31.05.2021
001 01.06.2021 30.06.2021
001 01.07.2021 31.07.2021
001 01.08.2021 31.08.2021
001 01.09.2021 30.09.2021
001 01.10.2021 31.10.2021
002 01.02.2021 07.02.2021
002 08.02.2021 14.02.2021
002 15.02.2021 21.02.2021
002 22.02.2021 28.02.2021
002 01.03.2021 07.03.2021
002 08.03.2021 14.03.2021
002 15.03.2021 21.03.2021
002 22.03.2021 28.03.2021
002 29.03.2021 04.04.2021
002 05.04.2021 11.04.2021
002 12.04.2021 18.04.2021
002 19.04.2021 25.04.2021
002 26.04.2021 02.05.2021
002 03.05.2021 09.05.2021
002 10.05.2021 16.05.2021
002 17.05.2021 23.05.2021
002 24.05.2021 30.05.2021
002 31.05.2021 06.06.2021
002 07.06.2021 13.06.2021
002 14.06.2021 20.06.2021
002 21.06.2021 27.06.2021
002 28.06.2021 04.07.2021
002 05.07.2021 11.07.2021
002 12.07.2021 18.07.2021
002 19.07.2021 25.07.2021
002 26.07.2021 01.08.2021
002 02.08.2021 08.08.2021
002 09.08.2021 15.08.2021
002 16.08.2021 22.08.2021
002 23.08.2021 29.08.2021
002 30.08.2021 05.09.2021
002 06.09.2021 12.09.2021
002 13.09.2021 19.09.2021
002 20.09.2021 26.09.2021
002 27.09.2021 03.10.2021
002 04.10.2021 10.10.2021
002 11.10.2021 17.10.2021
002 18.10.2021 24.10.2021
002 25.10.2021 31.10.2021
003 01.02.2021 14.02.2021
003 15.02.2021 28.02.2021
003 01.03.2021 14.03.2021
003 15.03.2021 28.03.2021
003 29.03.2021 11.04.2021
003 12.04.2021 25.04.2021
003 26.04.2021 09.05.2021
003 10.05.2021 23.05.2021
003 24.05.2021 06.06.2021
003 07.06.2021 20.06.2021
003 21.06.2021 04.07.2021
003 05.07.2021 18.07.2021
003 19.07.2021 01.08.2021
003 02.08.2021 15.08.2021
003 16.08.2021 29.08.2021
003 30.08.2021 12.09.2021
003 13.09.2021 26.09.2021
003 27.09.2021 10.10.2021
003 11.10.2021 24.10.2021
003 25.10.2021 07.11.2021
68 rows selected.
SQL>
If you'd actually want to set periods regarding customer.billing_start_date, then all references to date_range.start_date should be modified to billing_start_date.
Here is a solution for Postgres.
There is a slight difference compared to your expected output: the pay_period_start is calculated to not start before billing_start_date:
select t.customer,
case
when g.nr = 1 then t.billing_start_date
else g.dt::date
end as pay_period_start,
case t.frequency
when 'Weekly' then (g.dt + interval '1 week' - interval '1 day')::date
when 'Bi-Weekly' then (g.dt + interval '2 week' - interval '1 day')::date
else (g.dt + interval '1 month' - interval '1 day')::date
end as pay_period_end
from the_table t
cross join generate_series(date_trunc('month', t.billing_start_date), date '2021-10-31',
case t.frequency
when 'Weekly' then interval '1 week'
when 'Bi-Weekly' then interval '2 week'
else interval '1 month'
end
) with ordinality as g(dt,nr)
order by t.customer, pay_period_start
If you indeed want pay_period_start to start on 2021-02-01 regardless of the actual billing_start_date you need to change the start value for generate_series() and the CASE expression for pay_period_start can also be simplified
In Oracle, you can use:
WITH date_range (range_start, range_end) AS (
SELECT DATE '2021-02-01', DATE '2021-10-01' FROM DUAL
),
periods (customer, frequency, period_start, range_end) AS (
SELECT t.customer,
t.frequency,
CASE t.frequency
WHEN 'Monthly'
THEN ADD_MONTHS(
billing_start_date,
GREATEST(TRUNC(MONTHS_BETWEEN(range_start, billing_start_date)), 0)
)
WHEN 'Bi-Weekly'
THEN billing_start_date + 14 * GREATEST(TRUNC((range_start - billing_start_date)/14), 0)
WHEN 'Weekly'
THEN billing_start_date + 7 * GREATEST(TRUNC((range_start - billing_start_date)/7), 0)
END,
d.range_end
FROM table_name t
CROSS JOIN date_range d
WHERE t.billing_start_date <= d.range_end
UNION ALL
SELECT customer,
frequency,
CASE frequency
WHEN 'Monthly' THEN ADD_MONTHS(period_start, 1)
WHEN 'Bi-Weekly' THEN period_start + 14
WHEN 'Weekly' THEN period_start + 7
END,
range_end
FROM periods
WHERE period_start < range_end
)
SEARCH DEPTH FIRST BY customer SET order_rn
SELECT customer,
frequency,
period_start,
CASE frequency
WHEN 'Monthly' THEN ADD_MONTHS(period_start, 1)
WHEN 'Bi-Weekly' THEN period_start + 14
WHEN 'Weekly' THEN period_start + 7
END - 1 AS period_end
FROM periods
WHERE period_start <= range_end;
Which, for the sample data:
CREATE TABLE table_name (Customer , Frequency, billing_start_date) AS
SELECT '001', 'Monthly', DATE '2021-02-04' FROM DUAL UNION ALL
SELECT '002', 'Weekly', DATE '2021-03-01' FROM DUAL UNION ALL
SELECT '003', 'Bi-Weekly', DATE '2020-03-05' FROM DUAL;
Outputs:
CUSTOMER
FREQUENCY
PERIOD_START
PERIOD_END
001
Monthly
2021-02-04 00:00:00
2021-03-03 00:00:00
001
Monthly
2021-03-04 00:00:00
2021-04-03 00:00:00
001
Monthly
2021-04-04 00:00:00
2021-05-03 00:00:00
001
Monthly
2021-05-04 00:00:00
2021-06-03 00:00:00
001
Monthly
2021-06-04 00:00:00
2021-07-03 00:00:00
001
Monthly
2021-07-04 00:00:00
2021-08-03 00:00:00
001
Monthly
2021-08-04 00:00:00
2021-09-03 00:00:00
001
Monthly
2021-09-04 00:00:00
2021-10-03 00:00:00
002
Weekly
2021-03-01 00:00:00
2021-03-07 00:00:00
002
Weekly
2021-03-08 00:00:00
2021-03-14 00:00:00
002
Weekly
2021-03-15 00:00:00
2021-03-21 00:00:00
...
...
...
...
002
Weekly
2021-09-13 00:00:00
2021-09-19 00:00:00
002
Weekly
2021-09-20 00:00:00
2021-09-26 00:00:00
002
Weekly
2021-09-27 00:00:00
2021-10-03 00:00:00
003
Bi-Weekly
2021-01-21 00:00:00
2021-02-03 00:00:00
003
Bi-Weekly
2021-02-04 00:00:00
2021-02-17 00:00:00
003
Bi-Weekly
2021-02-18 00:00:00
2021-03-03 00:00:00
...
...
...
...
003
Bi-Weekly
2021-09-02 00:00:00
2021-09-15 00:00:00
003
Bi-Weekly
2021-09-16 00:00:00
2021-09-29 00:00:00
003
Bi-Weekly
2021-09-30 00:00:00
2021-10-13 00:00:00
db<>fiddle here
Related
Count users with more than X amount of transactions within Y days by date
Scenario: Trying to count more active users for time series analysis. Need: With postgreSQL(redshift) Count customers that have more than X unique transactions within Y days from said date, group by date. How do i achieve this? Table: orders date user_id product_id transaction_id 2022-01-01 001 003 001 2022-01-02 002 001 002 2022-03-01 003 001 003 2022-03-01 003 002 003 ... ... ... ... Outcome: date active_customers 2022-01-01 10 2022-01-02 12 2022-01-03 9 2022-01-04 13
You may be able to use the window functions LEAD() and LAG() here but this solution may also work for you. WITH data AS ( SELECT o.date , o.user_id , COUNT(o.trans_id) tcount FROM orders o WHERE o.date BETWEEN o.date - '30 DAYS'::INTERVAL AND o.date -- Y days from given date GROUP BY o.date, o.user_id ), user_transaction_count AS ( SELECT d.date , COUNT(d.user_id) FILTER (WHERE d.tcount > 1) -- X number of transactions OVER (PARTITION BY d.user_id) user_count FROM data d ) SELECT u.date , SUM(u.user_count) active_customers FROM user_transaction_count u GROUP BY u.date ORDER BY u.date ; Here is a DBFiddle that demos a couple options.
PostgreSQL: GROUP BY and ORDER BY, whole dataset as a result
In a Postgres database I have a table with the following columns: ID (Pimary Key) Code Date I'm trying to extract data ordered by Date and grouped by Code so that the most recent date will determine what code rows should be grouped first and so forth (if it makes sense). An example: 007 2022-01-04 007 2022-01-01 007 2021-12-19 002 2022-01-03 002 2021-12-02 002 2021-11-15 035 2022-01-01 035 2021-11-30 035 2021-05-03 001 2021-12-31 022 2021-12-07 076 2021-11-19 I thought I could achieve this with the following query: SELECT * FROM Table GROUP BY Table.Code ORDER BY Table.Date DESC but this gives me ERROR: column "Table.ID" must appear in the GROUP BY clause or be used in an aggregate function And if I add the column ID to the GROUP BY the result I get is just a list ordered by Date with all the Codes mixed. Is there any way to achieve whai I want?
Edit 3 More elegant solution using max over partition by. SELECT "Code", "Date" FROM "Table" ORDER BY max("Date") over (partition by "Code") DESC, "Table"."Date" DESC ; Output: Code Date 007 2022-01-04T00:00:00Z 007 2022-01-01T00:00:00Z 007 2021-12-19T00:00:00Z 002 2022-01-03T00:00:00Z 002 2021-12-02T00:00:00Z 002 2021-11-15T00:00:00Z 035 2022-01-01T00:00:00Z 035 2021-11-30T00:00:00Z 035 2021-05-03T00:00:00Z 001 2021-12-31T00:00:00Z 022 2021-12-07T00:00:00Z 076 2021-11-19T00:00:00Z Edit 2: I join a select b with the entire dataset. The select b is used for sort only and is what you tried. With "b" as ( select "Code", max("Date") as "Date" from "Table" group by "Code" ) SELECT "Table"."Code", "Table"."Date" FROM "Table" left join "b" on "Table"."Code" = "b"."Code" ORDER BY "b"."Date" desc, "Table"."Date" DESC; Output: Code Date 007 2022-01-04T00:00:00Z 007 2022-01-01T00:00:00Z 007 2021-12-19T00:00:00Z 002 2022-01-03T00:00:00Z 002 2021-12-02T00:00:00Z 002 2021-11-15T00:00:00Z 035 2022-01-01T00:00:00Z 035 2021-11-30T00:00:00Z 035 2021-05-03T00:00:00Z 001 2021-12-31T00:00:00Z 022 2021-12-07T00:00:00Z 076 2021-11-19T00:00:00Z Edit1 A group by clause should contain a unique value per line. The example below show a way to fix the error on your data. Table with ID: CREATE TABLE "Table" ( "ID" serial not null primary key, "Code" varchar, "Date" timestamp ); INSERT INTO "Table" ("Code", "Date") VALUES ('007', '2022-01-04 00:00:00'), ('007', '2022-01-01 00:00:00'), ('007', '2021-12-19 00:00:00'), ('002', '2022-01-03 00:00:00'), ('002', '2021-12-02 00:00:00'), ('002', '2021-11-15 00:00:00'), ('035', '2022-01-01 00:00:00'), ('035', '2021-11-30 00:00:00'), ('035', '2021-05-03 00:00:00'), ('001', '2021-12-31 00:00:00'), ('022', '2021-12-07 00:00:00'), ('076', '2021-11-19 00:00:00') ; Select: SELECT * FROM "Table" ORDER BY "Code", "Date" DESC; Output: ID Code Date 10 001 2021-12-31T00:00:00Z 4 002 2022-01-03T00:00:00Z 5 002 2021-12-02T00:00:00Z 6 002 2021-11-15T00:00:00Z 1 007 2022-01-04T00:00:00Z 2 007 2022-01-01T00:00:00Z 3 007 2021-12-19T00:00:00Z 11 022 2021-12-07T00:00:00Z 7 035 2022-01-01T00:00:00Z 8 035 2021-11-30T00:00:00Z 9 035 2021-05-03T00:00:00Z 12 076 2021-11-19T00:00:00Z Original Answer First, select the columns that you want to group e.g. Code, that you want to apply an aggregate function (Date). Second, list the columns that you want to group in the GROUP BY clause. In the order by clause, use the same logic as the select clause. https://www.postgresqltutorial.com/postgresql-group-by/ Tables: CREATE TABLE "Table" ("Code" int, "Date" timestamp) ; INSERT INTO "Table" ("Code", "Date") VALUES (007, '2022-01-04 00:00:00'), (007, '2022-01-01 00:00:00'), (007, '2021-12-19 00:00:00'), (002, '2022-01-03 00:00:00'), (002, '2021-12-02 00:00:00'), (002, '2021-11-15 00:00:00'), (035, '2022-01-01 00:00:00'), (035, '2021-11-30 00:00:00'), (035, '2021-05-03 00:00:00'), (001, '2021-12-31 00:00:00'), (022, '2021-12-07 00:00:00'), (076, '2021-11-19 00:00:00') ; Select SELECT "Table"."Code", max("Table"."Date") FROM "Table" GROUP BY "Table"."Code" ORDER BY max("Table"."Date") DESC Output: Code max 7 2022-01-04T00:00:00Z 2 2022-01-03T00:00:00Z 35 2022-01-01T00:00:00Z 1 2021-12-31T00:00:00Z 22 2021-12-07T00:00:00Z 76 2021-11-19T00:00:00Z
PostgreSQL - filter function for dates
I am trying to use the built-in filter function in PostgreSQL to filter for a date range in order to sum only entries falling within this time-frame. I cannot understand why the filter isn't being applied. I am trying to filter for all product transactions that have a created_at date of the previous month (so in this case that were created in June 2017). SELECT pt.created_at::date, pt.customer_id, sum(pt.amount/100::double precision) filter (where (date_part('month', pt.created_at) =date_part('month', NOW() - interval '1 month') and date_part('year', pt.created_at) = date_part('year', NOW()) )) from product_transactions pt LEFT JOIN customers c ON c.id= pt.customer_id GROUP BY pt.created_at::date,pt.customer_id Please find my expected results (sum of the amount for each day in the previous month - for each customer_id if an entry for that day exists) and the actual results I get from the query - below (using date_trunc). Expected results: created_at| customer_id | amount 2017-06-30 1 220.5 2017-06-28 15 34.8 2017-06-28 12 157 2017-06-28 48 105.6 2017-06-27 332 425.8 2017-06-25 1 58.0 2017-06-25 23 22.5 2017-06-21 14 88.9 2017-06-17 2 34.8 2017-06-12 87 250 2017-06-05 48 135.2 2017-06-05 12 95.7 2017-06-01 44 120 Results: created_at| customer_id | amount 2017-06-30 1 220.5 2017-06-28 15 34.8 2017-06-28 12 157 2017-06-28 48 105.6 2017-06-27 332 425.8 2017-06-25 1 58.0 2017-06-25 23 22.5 2017-06-21 14 88.9 2017-06-17 2 34.8 2017-06-12 87 250 2017-06-05 48 135.2 2017-06-05 12 95.7 2017-06-01 44 120 2017-05-30 XX YYY 2017-05-25 XX YYY 2017-05-15 XX YYY 2017-04-30 XX YYY 2017-03-02 XX YYY 2016-11-02 XX YYY The actual results give me the sum for all dates in the database, so no date time-frame is being applied in the query for a reason I cannot understand. I'm seeing dates that are both not for June 2017 and also from previous years.
Use date_trunc(..) function: SELECT pt.created_at::date, pt.customer_id, c.name, sum(pt.amount/100::double precision) filter (where date_trunc('month', pt.created_at) = date_trunc('month', NOW() - interval '1 month')) from product_transactions pt LEFT JOIN customers c ON c.id= pt.customer_id GROUP BY pt.created_at::date
Writing a select query
I have two tables: table1 =tbl_main: item_id fastec_qty 001 102 002 200 003 300 004 400 table2= tbl_dOrder order_id item_id amount 1001 001 30 1001 002 40 1002 001 50 1002 003 70 How can I write a query so that the result of the tables are as follows: item_id amount difference 001 102 22 002 200 160 003 300 230 004 400 400 The difference between the amount in table 1 and the total amounts disbursed from the Table 2.
SELECT q.item_id, a.fastec_qty AS amount, a.fastec_qty - q.amount AS difference FROM ( SELECT item_id, SUM(amount) AS amount FROM tbl_dOrder GROUP BY item_id ) q JOIN tbl_main a ON a.item_id = q.item_id Here this query is going to first SUM the amounts from tbl2 grouped by the item_id, then it's going to JOIN the results of that query with the first table so it can do the calculation for the difference column.
Select a row when data is missing
i got a question. I use this straight foreward query to retrieve data on a daily basis. part of the data is an ID. For example, i got ID's 001 002 003 and 004. Every ID has some columns with data. I daily generate a report based on that data. A typical day looks lke ID Date Value 001 2013-07-02 900 002 2013-07-02 800 003 2013-07-02 750 004 2013-07-02 950 Select * FROM myTable WHERE datum > now() - INTERVAL '2 days' and ht not in (select ht from blocked_ht) order by ht, id; Some times the import for 1 id fails. So my data looks like ID Date Value 001 2013-07-02 900 003 2013-07-02 750 004 2013-07-02 950 Its vital to know that 1 ID is missing, visualized in my report (made in Japserreports) So i instert an ID without a date and value 0 and eddited the query: SELECT * FROM "lptv_import" lptv_import WHERE datum > now() - INTERVAL '2 days' and ht not in (select ht from negeren_ht) OR datum IS NULL order by ht, id; Now the data looks like this: 001 2013-07-02 900 002 800 003 2013-07-02 750 004 2013-07-02 950 How can i select from the tabel the row without the date WHEN ID 002 WITH a date is missing? Hmm, this looks more compliacted than i thought...
select id, coalesce(datum::text, 'NULL') as "date", "value" from ( select distinct id from lptv ) id left join lptv using (id) where datum > now() - INTERVAL '2 days' and not exists (select ht from negeren_ht where ht = lptv.ht) order by id