Postgresql order by created date and group by transaction id - postgresql

My table format is like this
id
transaction_id
status
created_at
updated_at
uuid-1
a293b1fe0369e0198df3293a8aef9c97ea532b30
completed
2022-08-25 02:32:44
2022-08-25 02:32:44
uuid-2
a293b1fe0369e0198df3293a8aef9c97ea532b24
failed
2022-08-24 12:33:22
2022-08-24 12:33:22
uuid-3
3b97c805fc7ce00119433c5284102b47781f9f66
pending
2022-08-24 12:30:22
2022-08-24 12:33:22
uuid-4
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-23 9:32:14
2022-08-23 9:32:14
uuid-5
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-05 9:22:34
2022-08-05 9:22:34
uuid-6
a293b1fe0369e0198df3293a8aef9c97ea532b24
pending
2022-08-04 03:33:12
2022-08-04 03:33:12
uuid-7
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-01 4:04:25
2022-08-01 4:04:25
uuid-8
a293b1fe0369e0198df3293a8aef9c97ea532b30
pending
2022-07-20 7:43:22
2022-07-20 7:43:22
I am trying to get results in this order
Latest user submitted transaction on top
Then order the actions that happened on each transaction in their created order
actions order will be pending, failed, completed.
transaction_id
status
created_at
updated_at
3b97c805fc7ce00119433c5284102b47781f9f66
pending
2022-08-24 12:30:22
2022-08-24 12:33:22
a293b1fe0369e0198df3293a8aef9c97ea532b24
pending
2022-08-04 03:33:12
2022-08-04 03:33:12
a293b1fe0369e0198df3293a8aef9c97ea532b24
failed
2022-08-24 12:33:22
2022-08-24 12:33:22
a293b1fe0369e0198df3293a8aef9c97ea532b30
pending
2022-07-20 7:43:22
2022-07-20 7:43:22
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-01 4:04:25
2022-08-01 4:04:25
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-05 9:22:34
2022-08-05 9:22:34
a293b1fe0369e0198df3293a8aef9c97ea532b30
failed
2022-08-23 9:32:14
2022-08-23 9:32:14
a293b1fe0369e0198df3293a8aef9c97ea532b30
completed
2022-08-25 02:32:44
2022-08-25 02:32:44
I tried to group and order with RANK function like below, but no idea how to sort group by their first created date
select
transaction_id,
created_at,
status,
RANK() over (partition by transaction_id order by created_at) group_rank
from
transactions;

I would use ROW_NUMBER() and MAX() here:
WITH cte AS (
SELECT *, MAX(created_at) OVER (PARTITION BY transaction_id) AS max_created_at,
ROW_NUMBER() OVER (PARTITION BY transaction_id
ORDER BY CASE status WHEN 'pending' THEN 1
WHEN 'failed' THEN 2
WHEN 'completed' THEN 3 END,
created_at) rn
FROM transactions
)
SELECT transaction_id, status, created_at, updated_at
FROM cte
ORDER BY max_created_at DESC, rn;
The ordering logic is that we first order each block of records belonging to the same transaction using the most recent created timestamp. Next, within each block we order by status. Finally, for two or more records having the same status, we again break that tie using the created timestamp.

Related

How can I get weekly sales for every salesman

I have a table like below (tablename: sales)
sales_datetime
sales
salesman
2022-08-01 09:00:00
100
John
2022-08-01 11:00:00
200
John
2022-08-02 10:00:00
100
Peter
2022-08-02 13:00:00
300
John
2022-08-04 14:00:00
300
Peter
2022-08-05 12:00:00
100
John
2022-08-05 16:00:00
200
John
From that table I want to make a summary sales for 5 days period for each salesman. So the summary table that I want is look like this
periode
total_sales
salesman
2022-08-01
300
John
2022-08-01
0
Peter
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
John
2022-08-03
0
Peter
2022-08-04
0
John
2022-08-04
300
Peter
2022-08-05
300
John
2022-08-05
0
Peter
I have created following query (PSQL) but the results were not same as I want. Assume today is 2022-08-05
with dateseries as
(select generate_series(current_date-'4 days'::interval,
current_date::date,
'1 day'::interval)::date as periode)
select d.periode,coalesce(sum(s.sales),0) as total_sales,s.salesman from dateseries d
left outer join sales s
on d.periode=s.sales_datetime::date
group by d.periode, s.salesman order by d.periode
results:
periode
total_sales
salesman
2022-08-01
300
John
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
(NULL)
2022-08-04
300
Peter
2022-08-05
300
John
Any advices would be so great. Thank you
Step by step first aggregate the daily sales per salesperson (aggregated_sales CTE), create a list of days to report (days CTE), create a list of salesmen (salesmen CTE) and then query the sales for each day/salesman pair.
with aggregated_sales as
(
select sales_datetime::date sales_date, sum(sales) sales, salesman
from sales group by sales_datetime::date, salesman
),
days(sales_date) as
(
select d::date
from generate_series('2022-08-01', '2022-08-08', interval '1 day') d
),
salesmen (salesman) as
(
select distinct salesman from sales
)
select sales_date, coalesce(sales, 0) sales, salesman
from (select * from days cross join salesmen) fl
left outer join aggregated_sales ags using (sales_date, salesman);
The query may be shorter if CTEs are inlined yet I think that clarity and readability are more important than mere size.
In order to "make a summary sales for 5 days period for each salesman" replace generate_series('2022-08-01', '2022-08-08', interval '1 day') with generate_series(current_date - 4, current_date, interval '1 day').
the results were not same as I want. Assume today is 2022-08-05
Please note that '2022-08-05'::date - '5 days'::interval will give you 2022-07-31, and not 2022-08-01 as you assume. Because of that, I think you meant it to be current_date - '4 days'::interval.
With that out of the way, here is one possible query:
with sales_by_date as (
select
salesman,
sales_datetime::date,
sum(sales) total_sales
from sales
where
-- assuming you need to have totals for salesmen that had sales in specified period only
sales_datetime::date between current_date-'4 days'::interval and current_date
group by
salesman,
sales_datetime::date),
dateseries as (
select
distinct salesman,
generate_series(current_date-'4 days'::interval, current_date, '1 day'::interval)::date as periode
from sales_by_date)
select
d.periode,
coalesce(s.total_sales, 0) total_sales,
d.salesman
from dateseries d
left join sales_by_date s
on d.periode = s.sales_datetime
and d.salesman = s.salesman
order by d.periode, d.salesman;
But you still have to figure out some requirements for this problem. E.g. what if for the specified period there are no sales at all in the sales table?

TSQL, counting coherent days of holiday

I hope someone can help me on this one. :-)
I wish to count coherent periods of holiday to see if anyone had coherent holiday more than three days in a row. In other words it is not enough to count the number of days overall. The days have to be coherent. In the example of my data below I have illustrated three people with each their own days of holiday. Person 1234 has two periods of two days of holiday in a row, so this person has no periods above three days since there is a day in between two periods (the 3rd). Person 1235 and 1236 each have one period above three days. Time of day in the timestamps has no relevance, so data can be formatted as just date.
What I have:
ID
Start
1234
2022-01-01 00:00:00
1234
2022-01-02 00:00:00
1234
2022-01-04 06:50:00
1234
2022-01-05 06:50:00
1235
2022-01-04 06:50:00
1235
2022-01-05 06:50:00
1235
2022-01-06 00:00:00
1236
2022-01-01 00:00:00
1236
2022-01-02 00:00:00
1236
2022-01-03 06:50:00
1236
2022-01-04 06:50:00
1236
2022-01-05 06:50:00
1236
2022-01-08 00:00:00
What I hope to get:
ID
N holidays > 3 days
1234
0
1235
1
1236
1
Anyways, any help will be appreciated!
Kind regards,
Jacob
This is a "gaps and islands" problem. You need to first group the data into "islands", which in your case is groups of consecutive holidays. Then summarize them in your final result set
Side note: your question requests greater than 3 days, but your expected output uses greater than or equal to 3 so I used that instead.
DROP TABLE IF EXISTS #Holiday;
DROP TABLE IF EXISTS #ConsecutiveHoliday
CREATE TABLE #Holiday (ID INT,StartDateTime DATETIME)
INSERT INTO #Holiday
VALUES (1234,'2022-01-01 00:00:00')
,(1234,'2022-01-02 00:00:00')
,(1234,'2022-01-04 06:50:00')
,(1234,'2022-01-05 06:50:00')
,(1235,'2022-01-04 06:50:00')
,(1235,'2022-01-05 06:50:00')
,(1235,'2022-01-06 00:00:00')
,(1236,'2022-01-01 00:00:00')
,(1236,'2022-01-02 00:00:00')
,(1236,'2022-01-03 06:50:00')
,(1236,'2022-01-04 06:50:00')
,(1236,'2022-01-05 06:50:00')
,(1236,'2022-01-08 00:00:00');
WITH cte_Previous AS (
SELECT A.ID,B.StartDate
,IsHolidayConsecutive = CASE WHEN DATEADD(day,-1,StartDate) /*Current day minus 1*/ = LAG(StartDate) OVER (PARTITION BY ID ORDER BY StartDate) /*Previous holiday date*/
THEN 0
ELSE 1
END
FROM #Holiday AS A
CROSS APPLY (SELECT StartDate = CAST(StartDateTime AS DATE)) AS B
),
cte_Groups AS (
SELECT *,GroupID = SUM(IsHolidayConsecutive) OVER (PARTITION BY ID ORDER BY StartDate)
FROM cte_Previous
)
/*Groups of holidays taken consecutively*/
SELECT ID
,StartDate = MIN(StartDate)
,EndDate = MAX(StartDate)
,NumOfDays = COUNT(*)
INTO #ConsecutiveHoliday
FROM cte_Groups
GROUP BY ID,GroupID
ORDER BY ID,StartDate
/*See list of consecutive holidays taken*/
SELECT *
FROM #ConsecutiveHoliday
/*Formatted result*/
SELECT ID
,[N holidays >= 3 days] = COUNT(CASE WHEN NumOfDays >= 3 THEN 1 END)
FROM #ConsecutiveHoliday
GROUP BY ID

Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range

I am trying to write a SQL statement that will find the accounts that have had 3 or more transactions within 3 days whose absolute value is greater than $10.00 over the course of a week and then return those transactions.
Consider this data...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
1 0123 2020-09-01 45.75
2 0123 2020-09-02 5.23
3 0123 2020-09-03 9.94
4 0123 2020-09-05 8.35
5 0123 2020-09-06 -16.23
6 0123 2020-09-07 14.71
7 0123 2020-09-08 15.03
8 0123 2020-09-08 23.10
9 0123 2020-09-09 94.20
10 0123 2020-09-09 5.01
11 0123 2020-09-10 3.02
12 0123 2020-09-11 4.37
13 0123 2020-09-12 4.54
14 9876 2020-09-01 -45.75
15 9876 2020-09-02 5.27
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
If the week under review is 2020-09-01 to 2020-09-07 I would expect only AccountNumber 9876 to fit the criteria with TransactionIDs 16, 17, and 18 being the 3 transactions within 3 days with an absolute value greater than $10.00.
It seems like I should be able to use window functions (and perhaps framing), but I can't figure out how to start.
I have attempted without the use of window functions based on the answers to this question...
multiple transactions within a certain time period, limited by date range
DECLARE
#BeginDate DATE
, #EndDate DATE
, #ThresholdAmount DECIMAL(10, 2)
, #ThresholdCount INT
, #NumberOfDays INT;
SET #BeginDate = '09/01/2020';
SET #EndDate = '09/07/2020';
SET #ThresholdAmount = 10.00;
SET #ThresholdCount = 3;
SET #NumberOfDays = 3;
SELECT t.*
FROM (
SELECT
t1.*
, (
SELECT COUNT(*)
FROM Transactions t2
WHERE t2.AccountNumber = t1.AccountNumber
AND t2.TransactionID <> t1.TransactionID
AND t2.TransactionDate >= t1.TransactionDate
AND t2.TransactionDate < DATEADD(DAY, #NumberOfDays, t1.TransactionDate)
AND ABS(t2.TransactionAmount) > #ThresholdAmount
) AS NumberWithinXDays
FROM Transactions t1
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
) t
WHERE t.NumberWithinXDays >= #ThresholdCount;
SELECT *
FROM Transactions t
WHERE EXISTS (
SELECT *
FROM (
SELECT t1.AccountNumber
FROM Transactions t1
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionID <> t2.TransactionID
AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (#NumberOfDays-1)
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
HAVING COUNT(t1.TransactionID) >= #ThresholdCount
) x
WHERE x.AccountNumber = t.AccountNumber
)
AND t.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t.TransactionAmount) > #ThresholdAmount
My first query comes back with...
TransactionID AccountNumber TransactionDate TransactionAmount NumberWithinXDays
------------- ------------- --------------- ----------------- -----------------
5 0123 2020-09-06 -16.23 3
6 0123 2020-09-07 14.71 3
Not even close. And the second query returns...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
14 9876 2020-09-01 -45.75
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Closer, but not restricted to just transaction within 3 days of each other. This is the result I want.
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Now it is certainly possible I have not implemented these suggested queries correctly. Or maybe there is some subtle difference I am missing and they just don't fit my situation.
Any suggestions on fixing either of my attempted queries or something completely different with or without window functions?
Here is full dbfiddle of my code.
I was not able to come up with a solution using window functions. As I thought about it more I thought I might be able to use a CTE, but I could not figure that out either.
I solve it using a couple of subqueries. I was concerned about performance given my transaction table has 86 million rows. However, it runs in less than 30 seconds and that is good enough for me.
-- distinct is need because a particular transaction may fit into more than
-- one transaction window but we only want to see it once in the results
SELECT DISTINCT
t.TransactionID
, t.AccountNumber
, t.TransactionDate
, t.TransactionAmount
FROM (
SELECT
t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
, COUNT(DISTINCT t2.TransactionID) AS Count
FROM (
-- establish the transaction window for each transaction within the
-- larger date range and an absolute value above the threshold
SELECT
TransactionID
, AccountNumber
, TransactionDate AS [TransactionDateWindowBegin]
, DATEADD(DAY, #NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
, TransactionAmount
FROM Transactions
WHERE TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(TransactionAmount) > #ThresholdAmount
) t1
-- join back to the transaction table to find transactions within the transaction window for
-- each transaction, count them, and only keep those that are above the threshold count
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionDateWindowBegin <= t2.TransactionDate
AND t1.TransactionDateWindowEnd >= t2.TransactionDate
WHERE t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
HAVING COUNT(DISTINCT t2.TransactionID) >= #ThresholdCount
) x
-- join back to the transaction table again to get the details for the
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber = t.AccountNumber
AND x.TransactionDateWindowBegin <= t.TransactionDate
AND x.TransactionDateWindowEnd >= t.TransactionDate
AND ABS(t.TransactionAmount) > #ThresholdAmount;
Here is the full demo.

Postgresql order by and limit not returning expected row

SELECT id, created_at FROM "location_reviews" ORDER BY created_at DESC;
id | created_at
-----+---------------------
251 | 2015-12-20 00:00:00
426 | 2015-12-20 00:00:00
357 | 2015-12-20 00:00:00
SELECT id, created_at FROM "location_reviews" ORDER BY created_at DESC LIMIT 1;
id | created_at
-----+---------------------
251 | 2015-12-20 00:00:00
SELECT id, created_at FROM "location_reviews" ORDER BY created_at DESC LIMIT 1 OFFSET 1;
id | created_at
-----+---------------------
251 | 2015-12-20 00:00:00
SELECT id, created_at FROM "location_reviews" ORDER BY created_at DESC LIMIT 1 OFFSET 2;
id | created_at
-----+---------------------
357 | 2015-12-20 00:00:00
Why doesn't my OFFSET 1 query return the second entry (id = 426)? Instead it returns the same row as the query with no OFFSET. The created_at column is of type timestamp without time zone.
It seems that created_at has the same values in these rows. In that case you should add another field in ORDER BY. Otherwise the behaviour is not defined - PostgreSQL can sort them as it wants.

which is more efficient, select array_agg over partition, or select array (subquery)?

I have data like:
group_id | day | amount
----------+-------------+-------
1 | 15 Nov 2015 | 5.0
1 | 15 Nov 2015 | 6.0
1 | 14 Nov 2015 | 3.0
2 | 17 Nov 2015 | 5.0
2 | 15 Nov 2015 | 5.0
and I want to select the top ten amounts for each (group_id, day). I tried writing things like:
Postgres 9.4
select max(x.group_id), max(x.day), max(x.amounts)
from (select group_id, day, array_agg(amount) over w as amounts,
row_number() over w as r
from my_table window w as (partition group_id, day
order by amount desc)) as x
where x.r<=10 group by x.group_id,x.day
It also occurred to me that I could write a much more straightforward query:
select a.day, a.group_id, array(select amount
from my_table
where day=a.day and group_id=a.group_id
order by amount desc limit 10)
from my_table as a group by a.day, a.group_id
Which does exactly what I want. This led me to the question: assuming I can tweak the first example to get what I want, which query would be faster? Is the subquery slower than the partitions ?
You probably should use an analytic function.
Dont know why you also have MAX, MIN outside the subquery. Your querys doesnt seem to be equivalents.
Your request of top 10 by group should be:
WITH ranked as (
SELECT group_id,
day,
row_number() OVER
(partition by group_id, day ORDER BY ammount DESC) rn
FROM my_table
)
SELECT group_id,
day,
array_agg(amount) over (partition by group_id, day ORDER BY rn)
FROM ranked
WHERE rn <=10