Concat Names against row_number() or similar function - tsql

my data repeats rows for individual relationships between people. For example, the below states that John Smith is known by 3 employees:
Person EmployeeWhoKnowsPerson
John Smith Derek Jones
John Smith Adrian Daniels
John Smith Peter Low
I am looking to do the following:
1) Count the number of people who know John Smith. I have done this via the row_number() function and it appears to be behaving:
select Person, MAX(rowrank) as rowrank
from (
select Person, EmployeeWhoKnowsPerson, rowrank=ROW_NUMBER() over (partition by Person order by EmployeeWhoKnowsPerson desc)
from Data
) as t
group by Person
Which returns:
Person rowrank
John Smith 3
But now i am looking at concatenating the EmployeeWhoKnowsPerson column to return and was wondering how this might be possible:
Person rowrank EmployeesWhoKnow
John Smith 3 Derek Jones, Adrian Daniels, Peter Low

For SQL Server 2017 +
select
person,
count(*) as KnowsCount,
string_agg(EmployeeWhoKnowsPerson, ',') WITHIN GROUP (ORDER BY EmployeeWhoKnowsPerson ASC) AS EmployeesWhoKnowPerson
from
data
group by person;
For prior versions:
select
person,
count(*) as KnowsCount,
stuff((select ',' + EmployeeWhoKnowsPerson
from data as dd
where dd.Person = d.Person
order by EmployeeWhoKnowsPerson
for xml path('')), 1, 1, '') AS EmployeesWhoKnowPerson
from
data as d
group by person;
And you're overthinking that whole count of who knows piece.
Here's a SQL Fiddle Demo with an extra name thrown in.

If 2017+, you can use string_agg() in a simple group by
Example
Declare #YourTable Table ([Person] varchar(50),[EmployeeWhoKnowsPerson] varchar(50)) Insert Into #YourTable Values
('John Smith','Derek Jones')
,('John Smith','Adrian Daniels')
,('John Smith','Peter Low')
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = string_agg([EmployeeWhoKnowsPerson],', ')
From #YourTable
Group By Person
Returns
Person rowrank EmployeeWhoKnowsPerson
John Smith 3 Derek Jones, Adrian Daniels, Peter Low
If <2017 ... use the stuff()/xml approach
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = stuff((Select ', ' + [EmployeeWhoKnowsPerson]
From #YourTable
Where Person=A.Person
For XML Path ('')),1,2,'')
From #YourTable A
Group By Person

Related

Query to return multiple MAX values with HAVING clause

I want to write a query that will return the name of students who did the most projects with the count of the project. I want the query to return a table like this:
student_name
max_project_count
John Doe
2
Anna Do
2
This is the code I have so far but it's only giving me the 2 column names student_name and count, but not the result.
SELECT s.student_name, COUNT(student_name)
FROM student s
GROUP BY student_name
HAVING COUNT(student_name) = (
SELECT MAX(count)
FROM (SELECT s.student_name, COUNT(*) AS count
FROM student_project k, student s
WHERE s.student_id = k.student_id
GROUP BY student_name) AS foo)
Result I have right now:
student_name
max_project_count
These are the tables I have in my database:
student
student_id
student_name
jd123
John Doe
ad456
Anna Do
js678
Jess Smith
dk789
Daniel Kim
school_project
project_id
project_name
math_1023
Math Comp.
sci_9872
Science Comp.
student_project
student_id
project_id
jd123
math_1023
ad456
math_1023
jd123
sci_9872
ad456
sci_9872
js678
sci_9872
dk789
sci_9872
with projects as (
Select student_id, count(*) as pcount from student_project group by 1),
max_proj as (
Select max(pcount) as max_project_count from projects)
Select
student_name, max_project_count
from student s,projects p,max_proj m
where
s.student_id=p.student_id and pcount=max_project_count

Unpivot Columns with Most Recent Record

Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1
First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01

Can this query be solved using something besides 2 CTEs?

I’m writing a query against a table of fictional insurance clams called CLAIMS, using RANDOMLY GENERATED FICTIONAL NAMES AND DATA.
There are 5 distinct categories in the column called PRIMARY_DX:
Alcoholism, Anxiety Disorder, Depression, Psychosis, Substance Use Disorder
The other main columns are PATIENT_ID and CLAIM_PAID_AMT
I want to sum up the CLAIM_PAID_AMT per PATIENT per PRIMARY_DX and list only the top 5 patients who have the highest sum per PRIMARY_DX
The only way I could think to do this was with two Common Table Expressions, where in CTE1 I partition by PRIMARY_DX and PATIENT_ID and SUM the CLAIM_PAID_AMT for each PATIENT.
Then in CTE2 I use a ROW_NUMBER function on CTE1, to partition by PRIMARY_DX and sort by the TotalClaims DESC and select the top 5 from each PRIMARY_DX.
I’ve been writing SQL for less than 2 years and was wondering if this could be accomplished in one CTE or perhaps with some form of Cross Apply?
I’m including my code and the output below.
;WITH CTE1 AS
(
select PRIMARY_DX, PATIENT_ID, TotalClaims = SUM(CLAIM_PAID_AMT)
OVER (PARTITION BY PRIMARY_DX, PATIENT_ID ORDER BY PATIENT_ID, CLAIM_PAID_AMT DESC)
from claims
)
,
CTE2 AS
(SELECT *, RowCounter = ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY TotalClaims DESC) FROM CTE1)
select CTE2.PRIMARY_DX, CTE2.TotalClaims from CTE2
where RowCounter <= 5
order by CTE2.PRIMARY_DX, CTE2.TotalClaims DESC
Alcoholism 3737.51 Joe Smith
Alcoholism 3282.07 Suzie Homemaker
Alcoholism 3207.72 Joey Strummer
Alcoholism 3040.52 Rusty Nailfile
Alcoholism 2997.02 Big Ben
Anxiety Disorder 3291.14 Norman Pigsty
Anxiety Disorder 3113.05 Billy Bob
Anxiety Disorder 3101.13 Rachel Antarctica
Anxiety Disorder 3058.52 John John
Anxiety Disorder 3021.98 Kathy Europa
Depression 3466.14 Freda Beagallly
Depression 3279.25 Ron Jeremize
Depression 3140.43 Sharon Sharonaz
Depression 3119.26 Allie Kat
Depression 3118.54 Biff Biffstoferson
Psychosis 3098.13 James Monopoly
Psychosis 2991.23 Leon Erroneously
Psychosis 2857.69 Lucie Ratched-McMurphy
Psychosis 2678.88 Billy Bibbitz
Psychosis 2602.24 Sam Zypperzsky
Substance Use Disorder 3435.27 Donald Duckaronawitz
Substance Use Disorder 3300.33 Mickey Mousetrap
Substance Use Disorder 3285.41 Hector Heathercoatz
Substance Use Disorder 3179 Erin GoBragh
Substance Use Disorder 3147.09 Bono Edgerstein
You should only need one sub-query or CTE since you can use the aggregate within the ROW_NUMBER().
Here is an approach using the sub-query:
SELECT *
FROM (
SELECT PRIMARY_DX, PATIENT, SUM(CLAIM_PAID_AMT) AS CLAIM_PAID_AMT,
ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY SUM(CLAIM_PAID_AMT) DESC) AS RowId
FROM Claims GROUP BY PRIMARY_DX, PATIENT
) T
WHERE RowId <= 5
And if you prefer CTE:
;WITH CTE AS (
SELECT PRIMARY_DX, PATIENT, SUM(CLAIM_PAID_AMT) AS CLAIM_PAID_AMT,
ROW_NUMBER() OVER (PARTITION BY PRIMARY_DX ORDER BY SUM(CLAIM_PAID_AMT) DESC) AS RowId
FROM Claims GROUP BY PRIMARY_DX, PATIENT
) SELECT * FROM CTE WHERE RowId <= 5

PostgreSQL - How to display a corresponding string on every entry in string_agg()?

I have 2 tables:
Employee
ID Name
1 John
2 Ben
3 Adam
Employer
ID Name
1 James
2 Rob
3 Paul
I want to string_agg() and concatenate the two tables in one record as a single column. Now I wanted another column than will determine that if that string is from "Employee" table, it will display "Employee" and "Employer" if the data comes from the "Employer" table.
Here's my code for displaying the table:
SELECT string_agg(e.Name, CHR(10)) || CHR(10) || string_agg(er.Name, CHR(10)), PERSON_STATUS
FROM Employee e, Employer er
Here's my expected output:
ID Name PERSON_STATUS
1 John Employee
Ben Employee
Adam Employee
James Employer
Rob Employer
Paul Employer
NOTE: I know this can be done by adding another column in the table but that's not the case of this scenario. This is just an example to illustrate my problem.
Based on your sample, I'd say that you need UNION ALL rather than an aggregate:
SELECT id, name, 'Employee'::text AS person_status
FROM employee
UNION ALL
SELECT id, name, 'Employer'::text
from employer;
SELECT 1 AS id, STRING_AGG(name, E'\r\n') AS name, STRING_AGG(person_status, E'\r\n') AS person_status
FROM (
SELECT name, 'Employee' AS person_status
FROM employee
UNION ALL
SELECT name, 'Employer'
FROM employer
) data
Returns:
Ok, so first we merge our 2 tables into 3 columns. We can select arbitrary values this way.
select
"ID", -- Double quotes are necesary for capitalised aliases
"Name",
'Employee' as "PERSON_STATUS"
from
employee
union
select
"ID",
"Name",
'Employer'
from
employer
We then subquery this and perform our string operations as required.
select
string_agg(concat(people."Name", ' ', people."PERSON_STATUS"), chr(10))
from
(
select
"ID",
"Name",
'Employee' as "PERSON_STATUS"
from
employee
union
select
"ID",
"Name",
'Employer'
from
employer
) as people

Combine similar rows using case statement

I have a query currently populating a report which has a few rows of "duplicate" information. Similar IDs are being passed through which should be combined but are unique enough that we do not want to Concat/Insert them within our model. In order for the report to be processed correctly, I need to sum their $ values (The only information I actually need to keep preserved is the name, the final Summed amount, and the ID.
Is there a simple way to achieve this by creating a case statement the solely will sum the Amount field? I tried using a SUM(CASE WHEN statement but I do not want a new column since my report is only using that field to populate $$ information. Here is a sample of my issue below:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A -210.30 John Doe
210115 Place A-a 6500.70 John Doe
21060 Place B 255.00 Wayne C
2106015 Place Bb 212.30 Wayne C
2106015 Place Bb 1212.30 Wayne C
2106015 Place Bb 212.30 Wayne C
21080 Place J 57212.30 Billy J
My desired result for this would be:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A 6290.40 John Doe
21060 Place B 1889.90 Wayne C
21080 Place J 57212.30 Billy J
Is there a simplified way to combine these rows in TSQL without modifying the db?
You can try this (provided your ID column is a number and not a character field):
;WITH cte_getsum AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Person ORDER BY ID) AS RowNum,
ID,
NAME,
(SELECT SUM(Amount) FROM TableName WHERE TableName.Person = t1.Person) AS SumAmount,
Person
FROM
TableName t1
)
SELECT * FROM cte_getsum
WHERE rownum = 1
You can try with below script, I created a temp table just for sample Data.. but in your case you can directly refer to table you have.
SELECT * INTO #tmpInput
FROM (VALUES('21011','Place A', -210.30,'John Doe'),
('210115','Place A-a',6500.70,'John Doe'),
('21060', 'Place B' ,255.00,'Wayne C'),
('2106015', 'Place Bb' ,212.30,'Wayne C'),
('2106015' , 'Place Bb' ,1212.30,'Wayne C'),
('2106015' , 'Place Bb' ,212.30 ,'Wayne C')
,('21080' , 'Place J' ,57212.30,'Billy J')
)Input (ID,Name,Amount,Person)
SELECT SUBSTRING(t1.ID,0,6) ID
,t2.Name
,SUM(t1.Amount) AMOUNT
,t2.Person
FROM #tmpInput t1
INNER JOIN #tmpInput t2 ON t2.ID=SUBSTRING(t1.ID,0,6)
GROUP BY SUBSTRING(t1.ID,0,6),t2.Name,t2.Person