Query to return multiple MAX values with HAVING clause - postgresql

I want to write a query that will return the name of students who did the most projects with the count of the project. I want the query to return a table like this:
student_name
max_project_count
John Doe
2
Anna Do
2
This is the code I have so far but it's only giving me the 2 column names student_name and count, but not the result.
SELECT s.student_name, COUNT(student_name)
FROM student s
GROUP BY student_name
HAVING COUNT(student_name) = (
SELECT MAX(count)
FROM (SELECT s.student_name, COUNT(*) AS count
FROM student_project k, student s
WHERE s.student_id = k.student_id
GROUP BY student_name) AS foo)
Result I have right now:
student_name
max_project_count
These are the tables I have in my database:
student
student_id
student_name
jd123
John Doe
ad456
Anna Do
js678
Jess Smith
dk789
Daniel Kim
school_project
project_id
project_name
math_1023
Math Comp.
sci_9872
Science Comp.
student_project
student_id
project_id
jd123
math_1023
ad456
math_1023
jd123
sci_9872
ad456
sci_9872
js678
sci_9872
dk789
sci_9872

with projects as (
Select student_id, count(*) as pcount from student_project group by 1),
max_proj as (
Select max(pcount) as max_project_count from projects)
Select
student_name, max_project_count
from student s,projects p,max_proj m
where
s.student_id=p.student_id and pcount=max_project_count

Related

Suggestion for this simple SQL

I have a table with columns like:
xxx
category_id
product
yyy
Id-1
Prod-1
Id-1
Prod2
Id-1
...
Id-2
Prod-11
Id-2
Prod-1
...
How do I find if there is any same product (say prod-1 in this example) belongs to 2 category-Ids?
Create a group for each product and choose only those groups that have more than one category_id:
SELECT product
FROM mytable
GROUP BY product
HAVING count(DISTINCT category_id) > 1;
You can do :
SELECT product
FROM table as a
WHERE (SELECT count(DISTINCT category_id) FROM table as b WHERE b.product=a.product) > 1

Unpivot Columns with Most Recent Record

Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1
First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01

Concat Names against row_number() or similar function

my data repeats rows for individual relationships between people. For example, the below states that John Smith is known by 3 employees:
Person EmployeeWhoKnowsPerson
John Smith Derek Jones
John Smith Adrian Daniels
John Smith Peter Low
I am looking to do the following:
1) Count the number of people who know John Smith. I have done this via the row_number() function and it appears to be behaving:
select Person, MAX(rowrank) as rowrank
from (
select Person, EmployeeWhoKnowsPerson, rowrank=ROW_NUMBER() over (partition by Person order by EmployeeWhoKnowsPerson desc)
from Data
) as t
group by Person
Which returns:
Person rowrank
John Smith 3
But now i am looking at concatenating the EmployeeWhoKnowsPerson column to return and was wondering how this might be possible:
Person rowrank EmployeesWhoKnow
John Smith 3 Derek Jones, Adrian Daniels, Peter Low
For SQL Server 2017 +
select
person,
count(*) as KnowsCount,
string_agg(EmployeeWhoKnowsPerson, ',') WITHIN GROUP (ORDER BY EmployeeWhoKnowsPerson ASC) AS EmployeesWhoKnowPerson
from
data
group by person;
For prior versions:
select
person,
count(*) as KnowsCount,
stuff((select ',' + EmployeeWhoKnowsPerson
from data as dd
where dd.Person = d.Person
order by EmployeeWhoKnowsPerson
for xml path('')), 1, 1, '') AS EmployeesWhoKnowPerson
from
data as d
group by person;
And you're overthinking that whole count of who knows piece.
Here's a SQL Fiddle Demo with an extra name thrown in.
If 2017+, you can use string_agg() in a simple group by
Example
Declare #YourTable Table ([Person] varchar(50),[EmployeeWhoKnowsPerson] varchar(50)) Insert Into #YourTable Values
('John Smith','Derek Jones')
,('John Smith','Adrian Daniels')
,('John Smith','Peter Low')
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = string_agg([EmployeeWhoKnowsPerson],', ')
From #YourTable
Group By Person
Returns
Person rowrank EmployeeWhoKnowsPerson
John Smith 3 Derek Jones, Adrian Daniels, Peter Low
If <2017 ... use the stuff()/xml approach
Select Person
,rowrank = sum(1)
,[EmployeeWhoKnowsPerson] = stuff((Select ', ' + [EmployeeWhoKnowsPerson]
From #YourTable
Where Person=A.Person
For XML Path ('')),1,2,'')
From #YourTable A
Group By Person

Remove Duplicates from Employees Self Join

I have an employees table where all employees are located. I need to extract a subset of the employees with their corresponding supervisor. The table looks similar to this:
Emp_id | F_name | L_name | Superv_id | Superv_flg
---------------------------------------------------
123 john doe 456 N
456 jane doe 278 Y
234 Jack smith 268 N
My query looks like this so far:
with cte as
(
select f_name + ' ' l_name as supervisor, superv_id, emp_id
from [dbo].[SAP_worker_all]
where supvr_flag = 'Y'
)
SELECT distinct w.[first_name]
,w.[last_name]
,cte.supervisor
FROM [dbo].[SAP_worker_all] w
join cte
on w.[superv_id] = cte.[superv_id];
I am getting duplicate values and the supervisors returned are not the correct values. What did I do wrong?
if empID is unique you should not have duplicates
SELECT w.*, s.*
FROM [SAP_worker_all] w
JOIN [SAP_worker_all] s
ON s.[Emp_id] = w.[Superv_id]
AND s.[Superv_flg] = 'Y'

How to get firstname and second maximum salary of the record using subqueries?

I'm new to oracle. I have to get firstname and second maximum salary of the record from the table using sub-queries.
I've tried below query:
select max(salary)
from employees
where salary > (select max(salary)
from empoloyees);
this query used to get second max salary from the table. Now I have to get firstname of the second salary record.
firstname salary
-------------------
mani 45666
vijay 50000
sanjay 65000
SELECT firstname, salary FROM
(SELECT * FROM employees ORDER BY salary DESC)
WHERE rownum = 2;
The inner SELECT sorts the table by salary, in order from greatest to least (hence DESC).
The outer SELECT takes the two fields you want from row 2 (which holds the second highest salary) of the sorted table.
you can use dense_rank for this.
select firstname, salary
from (select /*+ first_rows(2) */ firstname, salary,
dense_rank() Over (order by salary desc) r
from employees)
where r = 2;
the first_rows hint is there as to help it use an index (index on (salary) or (salary, firstname).
This may return > 1 row if 2 people happen to share the same salary (you can add and rownum = 1 to pick just one at random).
Try this out
SELECT * FROM EMP WHERE SAL >=(SELECT MAX (SAL) FROM EMP WHERE SAL < (SELECT MAX(SAL) FROM EMP WHERE SAL <(SELECT MAX(SAL) FROM EMP))) AND ROWNUM < 4 ORDER BY SAL