SQL select converting transaction rows to columns - tsql

I have a table that lists all transactions as follows:
ID Account Date Amount
---------------------------
1 2 02/01/2015 30
2 5 05/01/2015 25
3 2 05/01/2015 12
4 2 07/01/2015 42
5 5 10/012015 19
6 2 11/01/2015 58
7 3 15/01/2015 36
Would like to write a select statement that will list only the last 3 transactions of each account, as follows please.
Account Date1 Amount Date2 Amount Date3 Amount
---------------------------------------------------------------
2 11/01/2015 58 07/01/2015 42 05/01/2015 12
3 15/01/2015 36
5 10/01/2015 19 05/01/2015 25
Thank you for any advice

You can use the row_number() function in a derived table to partition the data by account, and give each date within the partition a number, and then do a conditional aggregation over the rows with the top 3 numbers, grouped by account:
select
account,
date1 = max(case when rn = 1 then date end),
amount = max(case when rn = 1 then amount end),
date2 = max(case when rn = 2 then date end),
amount = max(case when rn = 2 then amount end),
date3 = max(case when rn = 3 then date end),
amount = max(case when rn = 3 then amount end)
from (
select *, rn = row_number() over (partition by account order by date desc)
from your_table
) a
where rn <= 3
group by account
Sample SQL Fiddle

Related

How can I show all dates?. Now my dates are available when sales have. I want to show all dates

select date(products.date_of_sale) as Date,
count(case when products.product_type_id = 6 then product_types.id end) as "POYOYO"
and products.product_type_id = 55
and products.product_type_id = 55
and products.product_type_id = 55
from products
and products.product_type_id = 6
and products.product_type_id = 55
and products.product_type_id = 55
order by date(products.date_of_sale) desc
Now my dates show only sales have.and products.product_type_id = 55and products.product_type_id = 55
You can use generate_series function in order to build a series of dates for which you want to build the series.
Lets say that you want a series from the first sale until last sale it would look like this.
with cte_sales as (
select date(products.date_of_sale) as Date,
count(case when products.product_type_id = 6 then product_types.id end) as "Car_Rentals"
from products
join product_types on product_types.id = products.product_type_id
where date(products.date_of_sale) >= '2021-06-01'
and date(products.date_of_sale) <= current_date
and products.product_type_id = 6
and products.sales_category_id = 1
and products.sales_category_id <> 2
and products.status_id = 7
group by date(products.date_of_sale)
order by date(products.date_of_sale) desc
)
, cte_series as (
select generate_series(min(Date), max(Date), '1 day'::interval)::date as generated_series
from cte_sales)
select
*
from
cte_series series
left join
cte_sales sales
on sales.Date = series.generated_series
You can play with different ways of generating series, it is a powerful function.

look for records with consecutive dates and know the number of days

I have a table containing the fields:
cazi, cdip, date
1 2 13/03/2021
1 2 14/03/2021
1 2 15/03/2021
1 2 18/03/2021
1 2 19/03/2021
1 3 13/03/2021
1 3 14/03/2021
1 3 15/03/2021
1 3 20/03/2021
1 3 21/03/2021
I can't get the result with the columns:
cazi, cdip, date1, date2, num_dd
1 2 13/03/2021 15/03/2021 3
1 2 18/03/2021 19/03/2021 2
1 3 13/03/2021 15/03/2021 3
1 3 20/03/2021 21/03/2021 2
Can you help me ?
With the following code I get the min and max of the records, but I need the consecutive records:
WITH
dateGroup AS
(
SELECT DISTINCT
UniqueDate = [date]
,DateGroup = DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY [date]), [date])
FROM malt
GROUP BY [date]
)
SELECT distinct
StartDate = MIN(UniqueDate)
,EndDate = MAX(UniqueDate)
,Days = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1
,cazi
,cdip
FROM dateGroup JOIN
malt u ON u.date = UniqueDate
GROUP BY
DateGroup
,cazi
,cdip
This is traditional GAPS & ISLAND problem. You can try below query to achieve the desired result -
SELECT cazi, cdip, MIN(T.[date]), MAX(T.[date])
FROM (SELECT M.*, ROW_NUMBER() OVER(PARTITION BY cdip ORDER BY [date]) RN
FROM malt M) T
GROUP BY cazi, cdip, DATEADD(DAY, - RN, [date]);
Demo.

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

Getting the top 2 amount with the most recent dates

My original code was taking all transactions in the last 12 months and compare the top two highest single transaction.
If the highest single gift within that time period is more than two times greater than the 2nd largest single transaction, take the 2nd highest single gift. If the #1 single highest gift is not two times greater, it is used.
I found I need to use the most recent dates with the top 2 amount with the above rules. If I use the last 12 months, I'm not getting all the amount value I need.
How do I change the where statement to get the most recent dates instead of the last 12 months from the current current date.
Input Values
account number, date, and transaction amount.
7428, 01262018, 2
7428, 12302018, 5
16988 02142016, 100
16988 01152016, 25
22450 04191971, 8
22450 08291971, 10
Results
AccountNumber Number Amount
------------------------------
7428 2 5.00
16988 2 25.00
22450 2 10.00
26997 2 10.00
27316 2 25.00
27365 2 25.00
28620 2 10.00
28951 2 10.00
29905 2 5.00
Code:
DECLARE #start_date date
DECLARE #end_date date
SET #start_date = DATEADD(YEAR, -1, GETDATE())
SET #end_date = GETDATE()
SELECT
AccountNumber,
COUNT(amount) as Number,
CASE
WHEN MAX(CASE WHEN row_num = 1 THEN amount END) > MAX(CASE WHEN row_num = 2 THEN amount END) * 2
THEN MAX(CASE WHEN row_num = 2 THEN amount END)
ELSE MAX(CASE WHEN row_num = 1 THEN amount END)
END AS Amount
FROM
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY AccountNumber ORDER BY amount DESC) AS row_num
FROM
dbo.[T01_TransactionMaster]
WHERE
date >= #start_date AND date < #end_date) AS tt
WHERE
row_num IN (1, 2)
AND amount > 0
-- AND AccountNumber = 301692
GROUP BY
AccountNumber

Drive EndDate of Current Row From StarDate of Next Row

Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate
Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.