Query to Get all Row Combinations - tsql

I want a query to retrive all row combinations from the below data set
This is my original Dataset.
SId Sequence RId
2976 1 100
4576 1 100
19472 1 100
80591 1 100
58811 1 100
70859 1 100
170941 2 100
167578 2 100
131885 2 100
117608 2 100
78117 1 101
69481 1 101
70987 2 101
46857 2 101
28396 2 101
From this data set I want the result based on RId and combination of each sequence of 1 and 2.
So For the above case for RId 100 there should be 24 combinations like
the below data:
RSId Sid Sequence RId
1 2976 1 100
1 170941 2 100
2 2976 1 100
2 167578 2 100
3 2976 1 100
3 131885 2 100
the below is the input table format
CREATE TABLE #temp ( SId INT,Sequence INT,Rid INT)
INSERT into #temp values (2976,1,100)
insert into #temp values (4576,1,100)
insert into #temp values (19472,1,100)
insert into #temp values (80591,1,100)
insert into #temp values (58811,1,100)
insert into #temp values (70859,1,100)
insert into #temp values (170941,2,100)
insert into #temp values (167578,2,100)
insert into #temp values (131885,2,100)
insert into #temp values (117608,2,100)
insert into #temp values (78117,1,101)
insert into #temp values (69481,1,101)
insert into #temp values (70987,2,101)
insert into #temp values (46857,2,101)
insert into #temp values (28396,2,101)
SELECT * FROM #Temp
the result should be of the below table format:
RSId Sid Sequence RId
1 2976 1 100
1 170941 2 100
2 2976 1 100
2 167578 2 100
3 2976 1 100
3 131885 2 100
4 2976 1 100
4 117608 2 100
5 4576 1 100
5 170941 2 100
6 4576 1 100
6 167578 2 100
7 4576 1 100
7 131885 2 100
8 4576 1 100
8 117608 2 100
9 19472 1 100
9 170941 2 100
10 19472 1 100
10 167578 2 100
11 19472 1 100
11 131885 2 100
12 19472 1 100
12 117608 2 100
13 80591 1 100
13 170941 2 100
14 80591 1 100
14 167578 2 100
15 80591 1 100
15 131885 2 100
16 80591 1 100
16 117608 2 100
17 58811 1 100
17 170941 2 100
18 58811 1 100
18 167578 2 100
19 58811 1 100
19 131885 2 100
20 58811 1 100
20 117608 2 100
21 70859 1 100
21 117608 2 100
22 70859 1 100
22 170941 2 100
23 70859 1 100
23 167578 2 100
24 70859 1 100
24 131885 2 100

One way to do it is to use common table expressions, cross join and union.
It might be a bit cumbersome but it should have pretty good performance:
DECLARE #Rid int = 100;
With cte1 As
(
SELECT SID, Sequence, Rid
FROM #Temp
WHERE Sequence = 1
AND Rid = #Rid
), cte2 AS
(
SELECT SID, Sequence, Rid
FROM #Temp
WHERE Sequence = 2
AND Rid = #Rid
), cteCJ AS
(
SELECT Cte1.Sid As Sid1, Cte1.Sequence As Seq1, Cte1.Rid As Rid,
Cte2.Sid As Sid2, Cte2.Sequence As Seq2,
ROW_NUMBER() OVER(ORDER BY Cte1.Sid) As RSId
FROM Cte1
CROSS JOIN Cte2
)
SELECT RSId, Sid1 As Sid, Seq1 As Sequence, Rid
FROM cteCJ
UNION
SELECT RSId, sid2, Seq2, Rid
FROM cteCJ
ORDER BY RSId, Seq1
Results:
RSId Sid Sequence Rid
1 2976 1 100
1 170941 2 100
2 2976 1 100
2 167578 2 100
3 2976 1 100
3 131885 2 100
4 2976 1 100
4 117608 2 100
5 4576 1 100
5 170941 2 100
6 4576 1 100
6 167578 2 100
7 4576 1 100
7 131885 2 100
8 4576 1 100
8 117608 2 100
9 19472 1 100
9 170941 2 100
10 19472 1 100
10 167578 2 100
11 19472 1 100
11 131885 2 100
12 19472 1 100
12 117608 2 100
13 58811 1 100
13 170941 2 100
14 58811 1 100
14 167578 2 100
15 58811 1 100
15 131885 2 100
16 58811 1 100
16 117608 2 100
17 70859 1 100
17 170941 2 100
18 70859 1 100
18 167578 2 100
19 70859 1 100
19 131885 2 100
20 70859 1 100
20 117608 2 100
21 80591 1 100
21 170941 2 100
22 80591 1 100
22 167578 2 100
23 80591 1 100
23 131885 2 100
24 80591 1 100
24 117608 2 100

Related

Sum of amounts of latest id by date PySpark

I have data like in a dataframe
CommsId
Id
Amount
Date
85
1
10
07/10/2020
72
1
15
09/09/2021
85
1
25
09/09/2021
70
1
30
09/09/2021
72
1
-15
05/11/2020
70
1
-30
05/11/2020
For each date, I want to find the sum of amounts of the latest CommsId as of the date.
Expected output is as below
Date
Sum_Amount
Id
07/10/2020
10
1
09/09/2021
70
1
05/11/2021
25
1

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

how can I change where value in postgresql?

id o_num d_num
69af4bf986c4df522afb54da6512bdc5 5 5
69af6111de53b550b0d13f86398b59e5 19 19
69b264c4b93a1984450689b16807b293 10 10
69b26c0fb38ff1cd2d4b01696aa14883 20 20
69b5c46bdc8a8f49f913d9d2325f0a76 15 15
69b71276a69dece5630ed3405ceca411 1 6
69b790c7937602e8fd52bc4d28194625 5 17
69b7bfde4effdaf31d362165a23a8dd0 4 13
69b93626a799636aef2ab3567cf3a110 14 14
I have a table like this, there are total 20 o_num in the table, and i want to select all the row that o_num is 1 then group by the d_num to count the id number, and them change the o_num to 2, until o_num to 20. and the result is in one table.
here is my code for 1 time:
SELECT COUNT(id), o_num, d_num
FROM table1
WHERE o_num = 1
GROUP BY o_num, d_num
how can i change the code to get my table
I want get the reselt like this,a table with 3 columns
sum o_num d_num
9 1 1
8 1 2
4 1 3
……
5 1 20
4 2 1
6 2 2
8 2 3
……
3 2 20
5 3 1
……
……
2 20 20

TSQL Order BY on occasion doesn't order correctly

TSQL MSSQL 2008r2
I'm re-writing the question to try and make it clear what the issue is that I'm trying to explain.
I've got a stored proc that takes 3 parameters. VehicleKey, StartDate and EndDateTime. I'm querying a Data Warehouse db. So the data shouldn't change.
When the proc is called with the same parameters then most of the time the results will be as expected but on some random occasions, with those same parameters, the results differ. I'm querying a Data WH so the data doesn't change.
The problem is with the dynamic derived column "Island".
It's completely random. The proc can be executed 20 times and give the expected results and then the next 2 will give incorrect results.
There can be 1 or more VehicleKey/DriverKey combinations in a given date range.
This is the problem query
SELECT
A.VehicleKey
,A.NodeId
,A.DriverKey
,MIN(A.StartTrip) 'StartTrip'
,MAX(A.EndTrip) 'EndTrip'
,SUM(A.PrivOdo) 'Private'
,SUM(A.BusOdo) 'Business'
,SUM(A.TravOdo) 'Travel'
,SUM(A.PrivOdo + A.BusOdo + A.TravOdo )'Total'
FROM
(
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
) AS A
GROUP BY
A.Island
,A.VehicleKey
,A.NodeId
,A.DriverKey
ORDER BY
A.VehicleKey
,MIN(A.StartTrip);
I am of the understanding that the ORDER BY should be on the outside of the derived table for it to take effect.
I think I've narrowed it down to the issue presenting itself only when a Vehicle has 2 or more DriverKey combinations.
for example, Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
This is the correct result - including Island column
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 0 2016-09-06 14:06:08 2016-09-28 17:02:08 54.75 737.83 0 792.58
4865 458 1202 134 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
and this is when it's wrong. Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
- including Island column
The first two rows here should be combined.
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 98 2016-09-06 14:06:08 2016-09-21 09:15:49 0 313.87 0 313.87
4865 458 0 -63 2016-09-21 09:21:10 2016-09-28 17:02:08 54.75 423.96 0 478.71
4865 458 1202 71 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
If I show the first few rows from the derived table, I've broken down the "Island" column
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,Island_x =( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) )
,Island_y = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
The correct result should be
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
0 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
0 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
0 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
0 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
0 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
0 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
0 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
0 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
but I sometimes get this with the same input paramaters.
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
98 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
98 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
98 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
98 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
98 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
98 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
98 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
98 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
Why is the "Island" calculated column wrong? 1-1 = 0 not 98.
Where am I going wrong?
EDIT - #YourData now looks like your raw table
Declare #YourTable table (VehicleKey int,NodeId int,DriverKey int,StartTrip datetime,EndTrip datetime,PrivOdo decimal(10,2),BusOdo decimal(10,2), TravOdo decimal(10,2))
Insert Into #YourTable values
(4865,458,0 ,'2016-09-06 14:06:08','2016-09-21 09:15:49',0 ,313.87,0),
(4865,458,0 ,'2016-09-21 09:21:10','2016-09-28 17:02:08',54.75,423.96,0),
(4865,458,1202,'2016-09-29 11:10:04','2016-09-30 17:25:51',0 ,211.32,0),
(4865,458,0 ,'2016-10-03 07:39:25','2016-10-14 17:00:15',0 ,579.81,0)
Select VehicleKey
,NodeID
,VehicleKey
,DriverKey
,StartTrip = min(StartTrip)
,EndTrip = max(EndTrip)
,Private = sum(PrivOdo)
,Business = sum(BusOdo)
,Travel = sum(TravOdo)
,Total = sum(PrivOdo + BusOdo + TravOdo )
From (
Select Island = ( ROW_NUMBER() OVER (PARTITION BY VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY VehicleKey, DriverKey ORDER BY StartTrip) )
,*
From #YourTable
) A
Group By Island,VehicleKey,NodeID,VehicleKey,DriverKey
Order By min(StartTrip)
Returns
FYI - The sub-query produces

How to use TOP N WITH TIES alongside COUNT

I've been trying to adapt other answers here for ages without success, so here goes... I have a basic query:
SELECT
*
FROM
#tbl_counts
ORDER BY
sgId,
CategoryCount DESC,
qccId;
This shows the following results:
sgId qccId CategoryCount
------- -------- -------------
4668 18 8
4668 77 7
4668 2 6
4669 43 2
4669 46 2
4670 25 3
4670 27 3
4670 74 2
4671 56 4
4671 60 3
4671 74 3
4671 54 3
4671 55 3
4671 78 2
4671 88 1
4671 89 1
4671 90 3
I need to amend this query to show the following:
For each unique sgId value, show the top 3 CategoryCount values (with ties if they exist), and the appropriate qccId value. Therefore, the results should be:
sgId qccId CategoryCount
------- ----- -------------
4668 18 8
4668 77 7
4668 2 6 -- top 3 4668
4669 43 2
4669 46 2 -- top 2 4669 because only 2 existed
4670 25 3
4670 27 3
4670 74 2 -- top 3 4670
4671 56 4
4671 60 3
4671 74 3
4671 54 3
4671 55 3 -- top 5 4671 caused by TIES, but discards others
Usually I would ROW_NUMBER here, but am struggling because it doesn't present TIES (I don't think). Therefore on adapting other answers I've found, I've got this far but it doesn't work properly...
SELECT
cnt.*
FROM
(
SELECT DISTINCT
sgId
FROM
#tbl_counts) sg INNER JOIN
(
SELECT TOP 3 WITH TIES
*
FROM
#tbl_counts
ORDER BY
CategoryCount DESC) cnt ON cnt.sgId = sg.sgId
Look like the perfect job for the window function DENSE_RANK:
;WITH
cte AS
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY sgId ORDER BY CategoryCount DESC) As RowRank
FROM #tbl_counts
)
SELECT *
FROM cte
WHERE RowRank <= 3
ORDER BY sgId, CategoryCount DESC, qccId
You could use TOP 1 WITH TIES as you proposed in title:
SELECT TOP 1 WITH TIES *
FROM #tbl_counts
ORDER BY
IIF(DENSE_RANK() OVER(PARTITION BY sgId ORDER BY CategoryCount DESC)<=3,0,1);
LiveDemo