Can this query be solved using something besides 2 CTEs? - tsql

I’m writing a query against a table of fictional insurance clams called CLAIMS, using RANDOMLY GENERATED FICTIONAL NAMES AND DATA.
There are 5 distinct categories in the column called PRIMARY_DX:
Alcoholism, Anxiety Disorder, Depression, Psychosis, Substance Use Disorder
The other main columns are PATIENT_ID and CLAIM_PAID_AMT
I want to sum up the CLAIM_PAID_AMT per PATIENT per PRIMARY_DX and list only the top 5 patients who have the highest sum per PRIMARY_DX
The only way I could think to do this was with two Common Table Expressions, where in CTE1 I partition by PRIMARY_DX and PATIENT_ID and SUM the CLAIM_PAID_AMT for each PATIENT.
Then in CTE2 I use a ROW_NUMBER function on CTE1, to partition by PRIMARY_DX and sort by the TotalClaims DESC and select the top 5 from each PRIMARY_DX.
I’ve been writing SQL for less than 2 years and was wondering if this could be accomplished in one CTE or perhaps with some form of Cross Apply?
I’m including my code and the output below.
;WITH CTE1 AS
(
select PRIMARY_DX, PATIENT_ID, TotalClaims = SUM(CLAIM_PAID_AMT)
OVER (PARTITION BY PRIMARY_DX, PATIENT_ID ORDER BY PATIENT_ID, CLAIM_PAID_AMT DESC)
from claims
)
,
CTE2 AS
(SELECT *, RowCounter = ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY TotalClaims DESC) FROM CTE1)
select CTE2.PRIMARY_DX, CTE2.TotalClaims from CTE2
where RowCounter <= 5
order by CTE2.PRIMARY_DX, CTE2.TotalClaims DESC
Alcoholism 3737.51 Joe Smith
Alcoholism 3282.07 Suzie Homemaker
Alcoholism 3207.72 Joey Strummer
Alcoholism 3040.52 Rusty Nailfile
Alcoholism 2997.02 Big Ben
Anxiety Disorder 3291.14 Norman Pigsty
Anxiety Disorder 3113.05 Billy Bob
Anxiety Disorder 3101.13 Rachel Antarctica
Anxiety Disorder 3058.52 John John
Anxiety Disorder 3021.98 Kathy Europa
Depression 3466.14 Freda Beagallly
Depression 3279.25 Ron Jeremize
Depression 3140.43 Sharon Sharonaz
Depression 3119.26 Allie Kat
Depression 3118.54 Biff Biffstoferson
Psychosis 3098.13 James Monopoly
Psychosis 2991.23 Leon Erroneously
Psychosis 2857.69 Lucie Ratched-McMurphy
Psychosis 2678.88 Billy Bibbitz
Psychosis 2602.24 Sam Zypperzsky
Substance Use Disorder 3435.27 Donald Duckaronawitz
Substance Use Disorder 3300.33 Mickey Mousetrap
Substance Use Disorder 3285.41 Hector Heathercoatz
Substance Use Disorder 3179 Erin GoBragh
Substance Use Disorder 3147.09 Bono Edgerstein

You should only need one sub-query or CTE since you can use the aggregate within the ROW_NUMBER().
Here is an approach using the sub-query:
SELECT *
FROM (
SELECT PRIMARY_DX, PATIENT, SUM(CLAIM_PAID_AMT) AS CLAIM_PAID_AMT,
ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY SUM(CLAIM_PAID_AMT) DESC) AS RowId
FROM Claims GROUP BY PRIMARY_DX, PATIENT
) T
WHERE RowId <= 5
And if you prefer CTE:
;WITH CTE AS (
SELECT PRIMARY_DX, PATIENT, SUM(CLAIM_PAID_AMT) AS CLAIM_PAID_AMT,
ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY SUM(CLAIM_PAID_AMT) DESC) AS RowId
FROM Claims GROUP BY PRIMARY_DX, PATIENT
) SELECT * FROM CTE WHERE RowId <= 5

Related

select first order for each customer from two tables

Hi guys I have two tables dbo.Sales (customer_id, order_date, product_id) and dbo.Menu (Product_id, product_name, price). The question is
What was the first item from the menu purchased by each customer?
My solution is
select A.customer_id,m.product_id, m.product_name
from dbo.menu m
cross apply
(select top 1 * from dbo.sales s
where s.product_id=m.product_id
group by s.customer_id,s.order_date, s.product_id
order by s.order_date) A
customer_id product_id product_name
A 1 sushi
A 2 curry
C 3 ramen
Missing customer is B. Instead of B it gives me the second first order by A.
I need for each customer
Murat
You could use a ROW_NUMBER() window function to get the earliest product_id per customer and then join to the Menu table to get your product details.
Edit: Updated ORDER to ASC.
;with cte
as (
select customer_id, product_id, row_number() over (partition by customer_id order by order_date acs) RN
from dbo.Sales)
select c.customer_id, c.product_id, m.product_name
from cte c
join dbo.menu m on c.product_id=m.product_id
where RN = 1
SELECT distinct s.customer_id,
FIRST_VALUE(m.product_name) OVER (partition by s.customer_id order by order_date )
as FirstItem_Customer
FROM [dbo].[sales] S
join [dbo].[menu] M on M.product_id=s.product_id

Concat Names against row_number() or similar function

my data repeats rows for individual relationships between people. For example, the below states that John Smith is known by 3 employees:
Person EmployeeWhoKnowsPerson
John Smith Derek Jones
John Smith Adrian Daniels
John Smith Peter Low
I am looking to do the following:
1) Count the number of people who know John Smith. I have done this via the row_number() function and it appears to be behaving:
select Person, MAX(rowrank) as rowrank
from (
select Person, EmployeeWhoKnowsPerson, rowrank=ROW_NUMBER() over (partition by Person order by EmployeeWhoKnowsPerson desc)
from Data
) as t
group by Person
Which returns:
Person rowrank
John Smith 3
But now i am looking at concatenating the EmployeeWhoKnowsPerson column to return and was wondering how this might be possible:
Person rowrank EmployeesWhoKnow
John Smith 3 Derek Jones, Adrian Daniels, Peter Low
For SQL Server 2017 +
select
person,
count(*) as KnowsCount,
string_agg(EmployeeWhoKnowsPerson, ',') WITHIN GROUP (ORDER BY EmployeeWhoKnowsPerson ASC) AS EmployeesWhoKnowPerson
from
data
group by person;
For prior versions:
select
person,
count(*) as KnowsCount,
stuff((select ',' + EmployeeWhoKnowsPerson
from data as dd
where dd.Person = d.Person
order by EmployeeWhoKnowsPerson
for xml path('')), 1, 1, '') AS EmployeesWhoKnowPerson
from
data as d
group by person;
And you're overthinking that whole count of who knows piece.
Here's a SQL Fiddle Demo with an extra name thrown in.
If 2017+, you can use string_agg() in a simple group by
Example
Declare #YourTable Table ([Person] varchar(50),[EmployeeWhoKnowsPerson] varchar(50)) Insert Into #YourTable Values
('John Smith','Derek Jones')
,('John Smith','Adrian Daniels')
,('John Smith','Peter Low')
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = string_agg([EmployeeWhoKnowsPerson],', ')
From #YourTable
Group By Person
Returns
Person rowrank EmployeeWhoKnowsPerson
John Smith 3 Derek Jones, Adrian Daniels, Peter Low
If <2017 ... use the stuff()/xml approach
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = stuff((Select ', ' + [EmployeeWhoKnowsPerson]
From #YourTable
Where Person=A.Person
For XML Path ('')),1,2,'')
From #YourTable A
Group By Person

Make a column values header for rest of columns using TSQL

I have following table
ID | Group | Type | Product
1 Dairy Milk Fresh Milk
2 Dairy Butter Butter Cream
3 Beverage Coke Coca cola
4 Beverage Diet Dew
5 Beverage Juice Fresh Juice
I need following output/query result:
ID | Group | Type | Product
1 Dairy
1 Milk Fresh Milk
2 Butter Butter Cream
2 Beverage
1 Coke Coca cola
2 Diet Dew
3 Juice Fresh Juice
For above sample a hard coded script can do the job but I look for a dynamic script for any number of groups. I do not have any idea how it can be done so, I do not have a sample query yet. I need ideas, examples that at least give me an idea. PIVOT looks a close option but does not looks to be fully fit for this case.
Here's a possible way. It basically unions the "Group-Headers" and the "Group-Items". The difficulty was to order them correctly.
WITH CTE AS
(
SELECT ID,[Group],Type,Product,
ROW_NUMBER() OVER (PARTITION BY [Group] Order By ID)AS RN
FROM Drink
)
SELECT ID,[Group],Type,Product
FROM(
SELECT RN AS ID,[Group],[Id]AS OriginalId,'' As Type,'' As Product, 0 AS RN, 'Group' As RowType
FROM CTE WHERE RN = 1
UNION ALL
SELECT RN AS ID,'' AS [Group],[Id]AS OriginalId,Type,Product, RN, 'Item' As RowType
FROM CTE
)X
ORDER BY OriginalId ASC
, CASE WHEN RowType='Group' THEN 0 ELSE 1 END ASC
, RN ASC
Here's a demo-fiddle: http://sqlfiddle.com/#!6/ed6ca/2/0
A slightly simplified approach:
With Groups As
(
Select Distinct Min(Id) As Id, [Group], '' As [Type], '' As Product
From dbo.Source
Group By [Group]
)
Select Coalesce(Cast(Z.Id As varchar(10)),'') As Id
, Coalesce(Z.[Group],'') As [Group]
, Z.[Type], Z.Product
From (
Select Id As Sort, Id, [Group], [Type], Product
From Groups
Union All
Select G.Id, Null, Null, S.[Type], S.Product
From dbo.Source As S
Join Groups As G
On G.[Group] = S.[Group]
) As Z
Order By Sort
It should be noted that the use of Coalesce is purely for aesthetic reasons. You could simply return null in these cases.
SQL Fiddle
And an approach with ROW_NUMBER:
IF OBJECT_ID('dbo.grouprows') IS NOT NULL DROP TABLE dbo.grouprows;
CREATE TABLE dbo.grouprows(
ID INT,
Grp NVARCHAR(MAX),
Type NVARCHAR(MAX),
Product NVARCHAR(MAX)
);
INSERT INTO dbo.grouprows VALUES
(1,'Dairy','Milk','Fresh Milk'),
(2,'Dairy','Butter','Butter Cream'),
(3,'Beverage','Coke','Coca cola'),
(4,'Beverage','Diet','Dew'),
(5,'Beverage','Juice','Fresh Juice');
SELECT
CASE WHEN gg = 0 THEN dr1 END GrpId,
CASE WHEN gg = 1 THEN rn1 END TypeId,
ISNULL(Grp,'')Grp,
CASE WHEN gg = 1 THEN Type ELSE '' END Type,
CASE WHEN gg = 1 THEN Product ELSE '' END Product
FROM(
SELECT *,
DENSE_RANK()OVER(ORDER BY Grp DESC) dr1
FROM(
SELECT *,
ROW_NUMBER()OVER(PARTITION BY Grp ORDER BY type,gg) rn1,
ROW_NUMBER()OVER(ORDER BY type,gg) rn0
FROM(
SELECT Grp,Type,Product, GROUPING(Grp) gg, GROUPING(type) tg FROM dbo.grouprows
GROUP BY Product, Type, Grp
WITH ROLLUP
)X1
WHERE tg = 0
)X2
WHERE gg=1 OR rn1 = 1
)X3
ORDER BY rn0

TSQL invalid HAVING count

I am using SSMS 2008 and trying to use a HAVING statement. This should be a real simple query. However, I am only getting one record returned event though there are numerous duplicates.
Am I doing something wrong with the HAVING statement here? Or is there some other function that I could use instead?
select
address_desc,
people_id
from
dbo.address_view
where people_id is not NULL
group by people_id , address_desc
having count(*) > 1
sample data from address_view:
people_id address_desc
---------- ------------
Murfreesboro, TN 37130 F15D1135-9947-4F66-B778-00E43EC44B9E
11 Mohawk Rd., Burlington, MA 01803 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Unknown, UN 00000 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Jacksonville, NC 28546 FC7C78CD-8AEA-4C8E-B93D-010BF8E4176D
Memphis, TN 38133 8ED8C601-5D35-4EB7-9217-012905D6E9F1
44 Maverick St., Fitchburg, MA 8ED8C601-5D35-4EB7-9217-012905D6E9F1
The GROUP BY is going to lump your duplicates together into a single row.
I think instead, you want to find all people_id values with duplicate address_desc:
SELECT a.address_desc, a.people_id
FROM dbo.address_view a
INNER JOIN (SELECT address_desc
FROM dbo.address_view
GROUP BY address_desc
HAVING COUNT(*) > 1) t
ON a.address_desc = t.address_desc
using row_number and partition you can find the duplicate occurrences where row_num>1
select address_desc,
people_id,
row_num
from
(
select
address_desc,
people_id,
row_number() over (partition by address_desc order by address_desc) row_num
from
dbo.address_view
where people_id is not NULL
) x
where row_num>1

Grouping SQL results by continous time intervals (oracle sql)

I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1
Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)