PostgreSQL: GROUP BY and ORDER BY, whole dataset as a result - postgresql

In a Postgres database I have a table with the following columns:
ID (Pimary Key)
Code
Date
I'm trying to extract data ordered by Date and grouped by Code so that the most recent date will determine what code rows should be grouped first and so forth (if it makes sense). An example:
007 2022-01-04
007 2022-01-01
007 2021-12-19
002 2022-01-03
002 2021-12-02
002 2021-11-15
035 2022-01-01
035 2021-11-30
035 2021-05-03
001 2021-12-31
022 2021-12-07
076 2021-11-19
I thought I could achieve this with the following query:
SELECT * FROM Table
GROUP BY Table.Code
ORDER BY Table.Date DESC
but this gives me
ERROR: column "Table.ID" must appear in the GROUP BY clause or be used in an aggregate function
And if I add the column ID to the GROUP BY the result I get is just a list ordered by Date with all the Codes mixed.
Is there any way to achieve whai I want?

Edit 3
More elegant solution using max over partition by.
SELECT
"Code",
"Date"
FROM
"Table"
ORDER BY
max("Date") over (partition by "Code") DESC,
"Table"."Date" DESC
;
Output:
Code
Date
007
2022-01-04T00:00:00Z
007
2022-01-01T00:00:00Z
007
2021-12-19T00:00:00Z
002
2022-01-03T00:00:00Z
002
2021-12-02T00:00:00Z
002
2021-11-15T00:00:00Z
035
2022-01-01T00:00:00Z
035
2021-11-30T00:00:00Z
035
2021-05-03T00:00:00Z
001
2021-12-31T00:00:00Z
022
2021-12-07T00:00:00Z
076
2021-11-19T00:00:00Z
Edit 2:
I join a select b with the entire dataset. The select b is used for sort only and is what you tried.
With "b" as
( select
"Code",
max("Date") as "Date"
from
"Table"
group by
"Code"
)
SELECT
"Table"."Code",
"Table"."Date"
FROM
"Table" left join "b" on "Table"."Code" = "b"."Code"
ORDER BY
"b"."Date" desc,
"Table"."Date" DESC;
Output:
Code
Date
007
2022-01-04T00:00:00Z
007
2022-01-01T00:00:00Z
007
2021-12-19T00:00:00Z
002
2022-01-03T00:00:00Z
002
2021-12-02T00:00:00Z
002
2021-11-15T00:00:00Z
035
2022-01-01T00:00:00Z
035
2021-11-30T00:00:00Z
035
2021-05-03T00:00:00Z
001
2021-12-31T00:00:00Z
022
2021-12-07T00:00:00Z
076
2021-11-19T00:00:00Z
Edit1
A group by clause should contain a unique value per line.
The example below show a way to fix the error on your data.
Table with ID:
CREATE TABLE "Table" (
"ID" serial not null primary key,
"Code" varchar,
"Date" timestamp
);
INSERT INTO "Table"
("Code", "Date")
VALUES
('007', '2022-01-04 00:00:00'),
('007', '2022-01-01 00:00:00'),
('007', '2021-12-19 00:00:00'),
('002', '2022-01-03 00:00:00'),
('002', '2021-12-02 00:00:00'),
('002', '2021-11-15 00:00:00'),
('035', '2022-01-01 00:00:00'),
('035', '2021-11-30 00:00:00'),
('035', '2021-05-03 00:00:00'),
('001', '2021-12-31 00:00:00'),
('022', '2021-12-07 00:00:00'),
('076', '2021-11-19 00:00:00')
;
Select:
SELECT * FROM "Table" ORDER BY "Code", "Date" DESC;
Output:
ID
Code
Date
10
001
2021-12-31T00:00:00Z
4
002
2022-01-03T00:00:00Z
5
002
2021-12-02T00:00:00Z
6
002
2021-11-15T00:00:00Z
1
007
2022-01-04T00:00:00Z
2
007
2022-01-01T00:00:00Z
3
007
2021-12-19T00:00:00Z
11
022
2021-12-07T00:00:00Z
7
035
2022-01-01T00:00:00Z
8
035
2021-11-30T00:00:00Z
9
035
2021-05-03T00:00:00Z
12
076
2021-11-19T00:00:00Z
Original Answer
First, select the columns that you want to group e.g. Code, that you want to apply an aggregate function (Date).
Second, list the columns that you want to group in the GROUP BY clause.
In the order by clause, use the same logic as the select clause.
https://www.postgresqltutorial.com/postgresql-group-by/
Tables:
CREATE TABLE "Table"
("Code" int, "Date" timestamp)
;
INSERT INTO "Table"
("Code", "Date")
VALUES
(007, '2022-01-04 00:00:00'),
(007, '2022-01-01 00:00:00'),
(007, '2021-12-19 00:00:00'),
(002, '2022-01-03 00:00:00'),
(002, '2021-12-02 00:00:00'),
(002, '2021-11-15 00:00:00'),
(035, '2022-01-01 00:00:00'),
(035, '2021-11-30 00:00:00'),
(035, '2021-05-03 00:00:00'),
(001, '2021-12-31 00:00:00'),
(022, '2021-12-07 00:00:00'),
(076, '2021-11-19 00:00:00')
;
Select
SELECT
"Table"."Code",
max("Table"."Date")
FROM
"Table"
GROUP BY
"Table"."Code"
ORDER BY
max("Table"."Date") DESC
Output:
Code
max
7
2022-01-04T00:00:00Z
2
2022-01-03T00:00:00Z
35
2022-01-01T00:00:00Z
1
2021-12-31T00:00:00Z
22
2021-12-07T00:00:00Z
76
2021-11-19T00:00:00Z

Related

Window Function For Consecutive Dates

I want to know how many users were active for 3 consecutive days on any given day.
e.g on 2022-11-03, 1 user (user_id = 111) was active 3 days in a row. Could someone please advise what kind of window function(?) would be needed?
This is my dataset:
user_id
active_date
111
2022-11-01
111
2022-11-02
111
2022-11-03
222
2022-11-01
333
2022-11-01
333
2022-11-09
333
2022-11-10
333
2022-11-11
If you are confident there are no duplicate user_id + active_date rows in the source data, then you can use two LAG functions like this:
SELECT user_id,
active_date,
CASE WHEN DATEADD(day, -1, active_date) = LAG(active_date, 1) OVER (PARTITION BY user_id ORDER BY active_date)
AND DATEADD(day, -2, active_date) = LAG(active_date, 2) OVER (PARTITION BY user_id ORDER BY active_date)
THEN 'Yes'
ELSE 'No'
END AS rowof3
FROM your_table
ORDER BY user_id, active_date;
If there might be duplication, use this FROM clause instead:
FROM (SELECT DISTINCT user_id, active_date :: DATE FROM your_table)

Count users with more than X amount of transactions within Y days by date

Scenario: Trying to count more active users for time series analysis.
Need: With postgreSQL(redshift) Count customers that have more than X unique transactions within Y days from said date, group by date.
How do i achieve this?
Table: orders
date
user_id
product_id
transaction_id
2022-01-01
001
003
001
2022-01-02
002
001
002
2022-03-01
003
001
003
2022-03-01
003
002
003
...
...
...
...
Outcome:
date
active_customers
2022-01-01
10
2022-01-02
12
2022-01-03
9
2022-01-04
13
You may be able to use the window functions LEAD() and LAG() here but this solution may also work for you.
WITH data AS
(
SELECT o.date
, o.user_id
, COUNT(o.trans_id) tcount
FROM orders o
WHERE o.date BETWEEN o.date - '30 DAYS'::INTERVAL AND o.date -- Y days from given date
GROUP BY o.date, o.user_id
), user_transaction_count AS
(
SELECT d.date
, COUNT(d.user_id) FILTER (WHERE d.tcount > 1) -- X number of transactions
OVER (PARTITION BY d.user_id) user_count
FROM data d
)
SELECT u.date
, SUM(u.user_count) active_customers
FROM user_transaction_count u
GROUP BY u.date
ORDER BY u.date
;
Here is a DBFiddle that demos a couple options.

Identify period intervals between two dates

I have a table called billing_cycle and it has customer wise billing pay period information like monthly, weekly, bi-weekly, Quarterly, Yearly.
Table Columns : Customer , Frequency, billing_start_date
Example:
Customer , Frequency, billing_start_date
001 , Monthly , 04-Feb-2021
002 , Weekly , 01-Mar-2021
003 , Bi-Weekly , 01-Mar-2021
My requirement is, I want to identify (query) what are the billing periods based on frequency type for a customer between given date range (From and To)
For example, Given date range is 01-Feb-2021 to 30-Oct-2021.
Then out put for customer 001(Monthly frequency) is
Pay_period_start , Pay_period_end
01-Feb-2021 , 28-Feb-2021
01-Mar-2021 , 31-Mar-2021
01-Apr-2021 , 30-Apr-2021 and so on till
01-Oct-2021 to 31-Oct-2021
Output for customer 002 (weekly interval 7 days):
Pay_period_start , Pay_period_end
01-Feb-2021 , 07-Feb-2021
08-Feb-2021 , 14-Feb-2021
15-Feb-2021 , 21-Feb-2021
22-Feb-2021 , 28-Feb-2021
01-Mar-2021 , 07-Mar-2021 and so on till
31-Oct-2021
and similarly for Customer 003 on Bi-weekly basis(15 days).
Get the average time between pay peroids. Create a case statement
case when timediff = 7 then weekly when timediff = 14 then biweekly else monthly end
you can fill in the other values for quarterly yearly and such
Here's Oracle code; see if it helps.
Setting date format (just to see what is what; you don't have to do that):
SQL> alter session set nls_date_Format = 'dd.mm.yyyy';
Session altered.
Here we go (read comments within code):
SQL> with
2 -- This is your sample table
3 customer (customer, frequency, billing_start_date) as
4 (select '001', 'Monthly' , date '2021-04-02' from dual union all
5 select '002', 'Weekly' , date '2021-03-01' from dual union all
6 select '003', 'Bi-Weekly', date '2021-03-01' from dual
7 ),
8 -- Date range, as you stated
9 date_range (start_date, end_date) as
10 (select date '2021-02-01', date '2021-10-30' from dual
11 )
12 -- Billing periods
13 select
14 c.customer,
15 --
16 case when c.frequency = 'Monthly' then add_months(d.start_date, column_value - 1)
17 when c.frequency = 'Weekly' then d.start_date + ( 7 * (column_value - 1))
18 when c.frequency = 'Bi-Weekly' then d.start_date + (14 * (column_value - 1))
19 end as pay_period_start,
20 --
21 case when c.frequency = 'Monthly' then add_months(d.start_date, column_value - 0) - 1
22 when c.frequency = 'Weekly' then d.start_date + ( 7 * (column_value - 0)) - 1
23 when c.frequency = 'Bi-Weekly' then d.start_date + (14 * (column_value - 0)) - 1
24 end as pay_period_end
25 from customer c cross join date_range d
26 cross join table(cast(multiset(select level
27 from dual
28 connect by level <= case when c.frequency = 'Monthly' then months_between(d.end_date, d.start_date)
29 when c.frequency = 'Weekly' then (d.end_date - d.start_date) / 7
30 when c.frequency = 'Bi-Weekly' then (d.end_date - d.start_date) / 14
31 end + 1
32 ) as sys.odcinumberlist))
33 order by c.customer, pay_period_start;
Result:
CUSTOMER PAY_PERIOD_START PAY_PERIOD_END
---------- -------------------- --------------------
001 01.02.2021 28.02.2021
001 01.03.2021 31.03.2021
001 01.04.2021 30.04.2021
001 01.05.2021 31.05.2021
001 01.06.2021 30.06.2021
001 01.07.2021 31.07.2021
001 01.08.2021 31.08.2021
001 01.09.2021 30.09.2021
001 01.10.2021 31.10.2021
002 01.02.2021 07.02.2021
002 08.02.2021 14.02.2021
002 15.02.2021 21.02.2021
002 22.02.2021 28.02.2021
002 01.03.2021 07.03.2021
002 08.03.2021 14.03.2021
002 15.03.2021 21.03.2021
002 22.03.2021 28.03.2021
002 29.03.2021 04.04.2021
002 05.04.2021 11.04.2021
002 12.04.2021 18.04.2021
002 19.04.2021 25.04.2021
002 26.04.2021 02.05.2021
002 03.05.2021 09.05.2021
002 10.05.2021 16.05.2021
002 17.05.2021 23.05.2021
002 24.05.2021 30.05.2021
002 31.05.2021 06.06.2021
002 07.06.2021 13.06.2021
002 14.06.2021 20.06.2021
002 21.06.2021 27.06.2021
002 28.06.2021 04.07.2021
002 05.07.2021 11.07.2021
002 12.07.2021 18.07.2021
002 19.07.2021 25.07.2021
002 26.07.2021 01.08.2021
002 02.08.2021 08.08.2021
002 09.08.2021 15.08.2021
002 16.08.2021 22.08.2021
002 23.08.2021 29.08.2021
002 30.08.2021 05.09.2021
002 06.09.2021 12.09.2021
002 13.09.2021 19.09.2021
002 20.09.2021 26.09.2021
002 27.09.2021 03.10.2021
002 04.10.2021 10.10.2021
002 11.10.2021 17.10.2021
002 18.10.2021 24.10.2021
002 25.10.2021 31.10.2021
003 01.02.2021 14.02.2021
003 15.02.2021 28.02.2021
003 01.03.2021 14.03.2021
003 15.03.2021 28.03.2021
003 29.03.2021 11.04.2021
003 12.04.2021 25.04.2021
003 26.04.2021 09.05.2021
003 10.05.2021 23.05.2021
003 24.05.2021 06.06.2021
003 07.06.2021 20.06.2021
003 21.06.2021 04.07.2021
003 05.07.2021 18.07.2021
003 19.07.2021 01.08.2021
003 02.08.2021 15.08.2021
003 16.08.2021 29.08.2021
003 30.08.2021 12.09.2021
003 13.09.2021 26.09.2021
003 27.09.2021 10.10.2021
003 11.10.2021 24.10.2021
003 25.10.2021 07.11.2021
68 rows selected.
SQL>
If you'd actually want to set periods regarding customer.billing_start_date, then all references to date_range.start_date should be modified to billing_start_date.
Here is a solution for Postgres.
There is a slight difference compared to your expected output: the pay_period_start is calculated to not start before billing_start_date:
select t.customer,
case
when g.nr = 1 then t.billing_start_date
else g.dt::date
end as pay_period_start,
case t.frequency
when 'Weekly' then (g.dt + interval '1 week' - interval '1 day')::date
when 'Bi-Weekly' then (g.dt + interval '2 week' - interval '1 day')::date
else (g.dt + interval '1 month' - interval '1 day')::date
end as pay_period_end
from the_table t
cross join generate_series(date_trunc('month', t.billing_start_date), date '2021-10-31',
case t.frequency
when 'Weekly' then interval '1 week'
when 'Bi-Weekly' then interval '2 week'
else interval '1 month'
end
) with ordinality as g(dt,nr)
order by t.customer, pay_period_start
If you indeed want pay_period_start to start on 2021-02-01 regardless of the actual billing_start_date you need to change the start value for generate_series() and the CASE expression for pay_period_start can also be simplified
In Oracle, you can use:
WITH date_range (range_start, range_end) AS (
SELECT DATE '2021-02-01', DATE '2021-10-01' FROM DUAL
),
periods (customer, frequency, period_start, range_end) AS (
SELECT t.customer,
t.frequency,
CASE t.frequency
WHEN 'Monthly'
THEN ADD_MONTHS(
billing_start_date,
GREATEST(TRUNC(MONTHS_BETWEEN(range_start, billing_start_date)), 0)
)
WHEN 'Bi-Weekly'
THEN billing_start_date + 14 * GREATEST(TRUNC((range_start - billing_start_date)/14), 0)
WHEN 'Weekly'
THEN billing_start_date + 7 * GREATEST(TRUNC((range_start - billing_start_date)/7), 0)
END,
d.range_end
FROM table_name t
CROSS JOIN date_range d
WHERE t.billing_start_date <= d.range_end
UNION ALL
SELECT customer,
frequency,
CASE frequency
WHEN 'Monthly' THEN ADD_MONTHS(period_start, 1)
WHEN 'Bi-Weekly' THEN period_start + 14
WHEN 'Weekly' THEN period_start + 7
END,
range_end
FROM periods
WHERE period_start < range_end
)
SEARCH DEPTH FIRST BY customer SET order_rn
SELECT customer,
frequency,
period_start,
CASE frequency
WHEN 'Monthly' THEN ADD_MONTHS(period_start, 1)
WHEN 'Bi-Weekly' THEN period_start + 14
WHEN 'Weekly' THEN period_start + 7
END - 1 AS period_end
FROM periods
WHERE period_start <= range_end;
Which, for the sample data:
CREATE TABLE table_name (Customer , Frequency, billing_start_date) AS
SELECT '001', 'Monthly', DATE '2021-02-04' FROM DUAL UNION ALL
SELECT '002', 'Weekly', DATE '2021-03-01' FROM DUAL UNION ALL
SELECT '003', 'Bi-Weekly', DATE '2020-03-05' FROM DUAL;
Outputs:
CUSTOMER
FREQUENCY
PERIOD_START
PERIOD_END
001
Monthly
2021-02-04 00:00:00
2021-03-03 00:00:00
001
Monthly
2021-03-04 00:00:00
2021-04-03 00:00:00
001
Monthly
2021-04-04 00:00:00
2021-05-03 00:00:00
001
Monthly
2021-05-04 00:00:00
2021-06-03 00:00:00
001
Monthly
2021-06-04 00:00:00
2021-07-03 00:00:00
001
Monthly
2021-07-04 00:00:00
2021-08-03 00:00:00
001
Monthly
2021-08-04 00:00:00
2021-09-03 00:00:00
001
Monthly
2021-09-04 00:00:00
2021-10-03 00:00:00
002
Weekly
2021-03-01 00:00:00
2021-03-07 00:00:00
002
Weekly
2021-03-08 00:00:00
2021-03-14 00:00:00
002
Weekly
2021-03-15 00:00:00
2021-03-21 00:00:00
...
...
...
...
002
Weekly
2021-09-13 00:00:00
2021-09-19 00:00:00
002
Weekly
2021-09-20 00:00:00
2021-09-26 00:00:00
002
Weekly
2021-09-27 00:00:00
2021-10-03 00:00:00
003
Bi-Weekly
2021-01-21 00:00:00
2021-02-03 00:00:00
003
Bi-Weekly
2021-02-04 00:00:00
2021-02-17 00:00:00
003
Bi-Weekly
2021-02-18 00:00:00
2021-03-03 00:00:00
...
...
...
...
003
Bi-Weekly
2021-09-02 00:00:00
2021-09-15 00:00:00
003
Bi-Weekly
2021-09-16 00:00:00
2021-09-29 00:00:00
003
Bi-Weekly
2021-09-30 00:00:00
2021-10-13 00:00:00
db<>fiddle here

Postgresql split date range by year parts (financial year)

I have a table like follows:
id start_date end_date
1 2020-01-01 2020-05-01
2 2020-03-01 2021-04-02
I need to be able to split the rows by financial year e.g. 2020-04-01 -> 2021-03-31)
So the result of the query would be as follows:
id start_date end_date
1 2020-01-01 2020-03-31
1 2020-04-01 2020-05-01
2 2020-03-01 2020-03-31
2 2020-04-01 2021-03-31
2 2021-04-01 2021-04-02
Actually another post helped me resolve this: Date split-up based on Fiscal Year
DROP TABLE your_table;
CREATE TABLE your_table (id int, start_date date, end_date date);
INSERT INTO your_table VALUES (1, '2020-01-01', '2020-05-01');
INSERT INTO your_table VALUES (2, '2020-03-01', '2021-04-02');
SELECT
id,
GREATEST(start_date, ('01-04-'||series.year)::date) AS year_start,
LEAST(end_date, ('31-03-'||series.year + 1)::date) AS year_end
FROM
(SELECT
id,
start_date,
end_date,
generate_series(
date_part('year', your_table.start_date - INTERVAL '3 months')::int,
date_part('year', your_table.end_date - INTERVAL '3 months')::int)
FROM your_table) AS series(id, start_date, end_date, year)
ORDER BY
start_date;
Result:
"id","year_start","year_end"
1,"2020-01-01","2020-03-31"
1,"2020-04-01","2020-05-01"
2,"2020-03-01","2020-03-31"
2,"2020-04-01","2021-03-31"
2,"2021-04-01","2021-04-02"

Select a row when data is missing

i got a question.
I use this straight foreward query to retrieve data on a daily basis. part of the data is an ID.
For example, i got ID's 001 002 003 and 004. Every ID has some columns with data.
I daily generate a report based on that data.
A typical day looks lke
ID Date Value
001 2013-07-02 900
002 2013-07-02 800
003 2013-07-02 750
004 2013-07-02 950
Select *
FROM
myTable
WHERE datum > now() - INTERVAL '2 days' and ht not in (select ht from blocked_ht)
order by ht, id;
Some times the import for 1 id fails. So my data looks like
ID Date Value
001 2013-07-02 900
003 2013-07-02 750
004 2013-07-02 950
Its vital to know that 1 ID is missing, visualized in my report (made in Japserreports)
So i instert an ID without a date and value 0 and eddited the query:
SELECT *
FROM
"lptv_import" lptv_import
WHERE datum > now() - INTERVAL '2 days' and ht not in (select ht from negeren_ht) OR datum IS NULL
order by ht, id;
Now the data looks like this:
001 2013-07-02 900
002 800
003 2013-07-02 750
004 2013-07-02 950
How can i select from the tabel the row without the date WHEN ID 002 WITH a date is missing?
Hmm, this looks more compliacted than i thought...
select
id, coalesce(datum::text, 'NULL') as "date", "value"
from
(
select distinct id
from lptv
) id
left join
lptv using (id)
where
datum > now() - INTERVAL '2 days'
and not exists (select ht from negeren_ht where ht = lptv.ht)
order by id