How to use TOP N WITH TIES alongside COUNT - tsql

I've been trying to adapt other answers here for ages without success, so here goes... I have a basic query:
SELECT
*
FROM
#tbl_counts
ORDER BY
sgId,
CategoryCount DESC,
qccId;
This shows the following results:
sgId qccId CategoryCount
------- -------- -------------
4668 18 8
4668 77 7
4668 2 6
4669 43 2
4669 46 2
4670 25 3
4670 27 3
4670 74 2
4671 56 4
4671 60 3
4671 74 3
4671 54 3
4671 55 3
4671 78 2
4671 88 1
4671 89 1
4671 90 3
I need to amend this query to show the following:
For each unique sgId value, show the top 3 CategoryCount values (with ties if they exist), and the appropriate qccId value. Therefore, the results should be:
sgId qccId CategoryCount
------- ----- -------------
4668 18 8
4668 77 7
4668 2 6 -- top 3 4668
4669 43 2
4669 46 2 -- top 2 4669 because only 2 existed
4670 25 3
4670 27 3
4670 74 2 -- top 3 4670
4671 56 4
4671 60 3
4671 74 3
4671 54 3
4671 55 3 -- top 5 4671 caused by TIES, but discards others
Usually I would ROW_NUMBER here, but am struggling because it doesn't present TIES (I don't think). Therefore on adapting other answers I've found, I've got this far but it doesn't work properly...
SELECT
cnt.*
FROM
(
SELECT DISTINCT
sgId
FROM
#tbl_counts) sg INNER JOIN
(
SELECT TOP 3 WITH TIES
*
FROM
#tbl_counts
ORDER BY
CategoryCount DESC) cnt ON cnt.sgId = sg.sgId

Look like the perfect job for the window function DENSE_RANK:
;WITH
cte AS
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY sgId ORDER BY CategoryCount DESC) As RowRank
FROM #tbl_counts
)
SELECT *
FROM cte
WHERE RowRank <= 3
ORDER BY sgId, CategoryCount DESC, qccId

You could use TOP 1 WITH TIES as you proposed in title:
SELECT TOP 1 WITH TIES *
FROM #tbl_counts
ORDER BY
IIF(DENSE_RANK() OVER(PARTITION BY sgId ORDER BY CategoryCount DESC)<=3,0,1);
LiveDemo

Related

PostgreSQL : comparing two sets of results does not work

I have a table that contains 3 columns of ids, clothes, shoes, customers and relates them.
I have a query that works fine :
select clothes, shoes from table where customers = 101 (all clothes and shoes of customer 101). This returns
clothes - shoes (SET A)
1 6
1 2
33 12
24 null
Another query that works fine :
select clothes ,shoes from table
where customers in
(select customers from table where clothes = 1 and customers <> 101 ) (all clothes and shoes of any other customer than 101, with specified clothes). This returns
shoes - clothes(SET B)
6 null
null 24
1 1
2 1
12 null
null 26
14 null
Now I want to get all clothes and shoes from SET A that are not in SET B.
So (example) select from SET A where NOT IN SET B. This should return just clothes 33, right?
I try to convert this to a working query :
select clothes, shoes from table where customers = 101
and
(clothes,shoes) not in
(
select clothes,shoes from
table where customers in
(select customers from table where clothes = 1 and customers <> 101 )
) ;
I tried different syntaxes, but the above looks more logic.
Problem is I never get clothes 33, just an empty set.
How do I fix this? What goes wrong?
Thanks
Edit , here is the contents of the table
id shoes customers clothes
1 1 1 1
2 1 4 1
3 1 5 1
4 2 2 2
5 2 3 1
6 1 3 1
44 2 101 1
46 6 101 1
49 12 101 33
51 13 102
52 101 24
59 107 51
60 107 24
62 23 108 51
63 23 108 2
93 124 25
95 6 125
98 127 25
100 3 128
103 24 131
104 25 132
105 102 28
106 10 102
107 23 133
108 4 26
109 6 4
110 4 24
111 12 4
112 14 4
116 102 48
117 102 24
118 102 25
119 102 26
120 102 29
122 134 31
The except clause in PostgreSQL works the way the minus operator does in Oracle. I think that will give you what you want.
I think notionally your query looks right, but I suspect those pesky nulls are impacting your results. Just like a null is not-NOT equal to 5 (it's nothing, therefore it's neither equal to nor not equal to anything), a null is also not-NOT "in" anything...
select clothes, shoes
from table1
where customers = 101
except
select clothes, shoes
from table1
where customers in (
select customers
from table1
where clothes = 1 and customers != 101
)
For PostgreSQL null is undefined value, so You must get rid of potential nulls in your result:
select id,clothes,shoes from t1 where customers = 101 -- or select id...
and (
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
OR
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
)
if You wanted unique pairs you would use:
select clothes, shoes from t1 where customers = 101
and
(clothes,shoes) not in
(
select coalesce(clothes,-1),coalesce(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
) ;
You can't get "clothes 33" if You are selecting both clothes and shoes columns...
Also if u need to know exactly which column, clothes or shoes was unique to this customer, You might use this little "hack":
select id,clothes,-1 AS shoes from t1 where customers = 101
and
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
UNION
select id,-1,shoes from t1 where customers = 101
and
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
And Your result would be:
id=49, clothes=33, shoes=-1
(I assume that there aren't any clothes or shoes with id -1, You may put any exotic value here)
Cheers

Flatten one column keeping others in POSTGRESQL

I created a view which is presently giving the data like this:
practice_name message_type message_count
CHC ALOG_SYNC 1
CHC BULKNT 0
CHC PIE_SYNC 1
CHC PPRV_SYNC 1
CHC SYNC_PRACT 3
CHC SYNC_PROV 9
CHC SYNC_WTXT 3
CHC SYNC_XYZ 0
Midtown ALOG_SYNC 0
Midtown BULKNT 0
Midtown PIE_SYNC 0
Midtown PPRV_SYNC 0
Midtown SYNC_PRACT 3
Midtown SYNC_PROV 0
Midtown SYNC_WTXT 3
Midtown SYNC_XYZ 0
NextGen MedicalPractice ALOG_SYNC 0
NextGen MedicalPractice BULKNT 1
NextGen MedicalPractice PIE_SYNC 0
NextGen MedicalPractice PPRV_SYNC 0
NextGen MedicalPractice SYNC_PRACT 3
NextGen MedicalPractice SYNC_PROV 591
NextGen MedicalPractice SYNC_WTXT 3
NextGen MedicalPractice SYNC_XYZ 0
My View:
CREATE OR REPLACE VIEW sha.sha_export_queue_view AS
SELECT q3.practice_name,
q3.message_type,
q3.share_site_org_key,
COALESCE(q2.message_count, '0'::text) AS message_count
FROM ( SELECT q1.practice_name,
mt.message_type,
q1.share_site_org_key
FROM sha.message_types mt,
( SELECT DISTINCT jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Practice Name'::text AS practice_name,
ai.share_site_org_key
FROM sha.sha_share_site_view ssv
LEFT JOIN ( SELECT mytable2.assessment_id,
mytable2.result_json,
mytable2.share_site_org_key,
mytable2.rnk
FROM ( SELECT assessment_info.assessment_id,
assessment_info.result_json,
assessment_info.share_site_org_key,
dense_rank() OVER (PARTITION BY assessment_info.share_site_org_key ORDER BY assessment_info.modified_datetime DESC) AS rnk
FROM sha.assessment_info
WHERE assessment_info.assessment_id = 8::numeric) mytable2
WHERE mytable2.rnk = 1) ai ON ssv.share_site_org_key = ai.share_site_org_key) q1) q3
LEFT JOIN ( SELECT jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Practice Name'::text AS practice_name,
jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Message Type'::text AS message_type,
jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Message Count'::text AS message_count
FROM sha.sha_share_site_view ssv
LEFT JOIN ( SELECT mytable2.assessment_id,
mytable2.result_json,
mytable2.share_site_org_key,
mytable2.rnk
FROM ( SELECT assessment_info.assessment_id,
assessment_info.result_json,
assessment_info.share_site_org_key,
dense_rank() OVER (PARTITION BY assessment_info.share_site_org_key ORDER BY assessment_info.modified_datetime DESC) AS rnk
FROM sha.assessment_info
WHERE assessment_info.assessment_id = 8::numeric) mytable2
WHERE mytable2.rnk = 1) ai ON ssv.share_site_org_key = ai.share_site_org_key) q2 ON q3.message_type::text = q2.message_type AND q3.practice_name = q2.practice_name
ORDER BY q3.practice_name;
I want the second column to be flattened:
Practice Time Stamp <<message type 1>> <<message type 2>> <<message type 3>> <<message type 4 >> <<message type 5>> <<message type 6>> <<message type 7>> <<message type 8>>
Practice Name 1 21-12-2016 10:00 23 25 27 29 31 33 35 37
Practice Name 2 21-12-2016 10:00 24 26 28 30 32 34 36 38
Practice Name 3 21-12-2016 13:00 25 27 29 31 33 35 37 39
Practice Name 4 21-12-2016 13:00 26 28 30 32 34 36 38 40
Practice Name 5 24-12-2016 13:00 27 29 31 33 35 37 39 41
Practice Name 6 27-12-2016 13:00 28 30 32 34 36 38 40 42
Practice Name 7 30-12-2016 13:00 29 31 33 35 37 39 41 43
Practice Name 8 02-01-2017 13:00 30 32 34 36 38 40 42 44
Practice Name 1 05-01-2017 13:00 31 33 35 37 39 41 43 45
Practice Name 2 08-01-2017 13:00 32 34 36 38 40 42 44 46
Practice Name 3 11-01-2017 13:00 33 35 37 39 41 43 45 47
Is there any way I can achieve that?
Sorry for the little alignment issue.
The values are corresponding message type values
Sample for query (idea in comments to question):
SELECT
practice_name,
sum("ALOG_SYNC") AS "ALOG_SYNC",
sum("BULKNT") AS "BULKNT",
...
FROM (
SELECT
practice_name,
CASE WHEN q3.message_type = 'ALOG_SYNC' THEN sum(message_count) END AS "ALOG_SYNC",
CASE WHEN q3.message_type = 'BULKNT' THEN sum(message_count) END AS "BULKNT"
FROM
<your from + where clause>
) AS A
GROUP BY 1
Probably your query might be optimised.
Or you can use crosstab function (https://www.postgresql.org/docs/current/static/tablefunc.html)

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

Subselect and Max

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange
Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.
Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.
In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)
I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)