Find top salary per department - is there a more efficient query? - postgresql

I have a query that works but I suspect I'm doing this inefficiently. Is there a more elegant approach to find the top salary in each department and the employee that earns it?
I'm doing a cte to find the max salary per dept id and then join that up with the employee data by matching salary and dept id. I have code below to build/populate the tables and the query at the end.
CREATE TABLE employee (
emplid SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
salary FLOAT NOT NULL,
depid INTEGER
);
INSERT INTO employee (name, salary, depid)
VALUES
('Chris',23456.99,1),
('Bob',98756.34,1),
('Malin',34567.22,2),
('Lisa',34967.73,2),
('Deepak',88582.22,3),
('Chester',99487.41,3);
CREATE TABLE department (
depid SERIAL PRIMARY KEY,
deptname VARCHAR NOT NULL
);
INSERT INTO department (deptname)
VALUES
('Engineering'),
('Sales'),
('Marketing');
--top salary by department
WITH cte AS (
SELECT d.depid, deptname, MAX(salary) AS maxsal
FROM employee e
JOIN department d ON d.depid = e.depid
GROUP BY d.depid, deptname
)
SELECT cte.deptname, e.name, cte.maxsal
FROM cte
JOIN employee e ON cte.depid = e.depid
AND e.salary = cte.maxsal
ORDER BY maxsal DESC;
Here is the target result:
"Marketing" "Chester" "99487.41"
"Engineering" "Bob" "98756.34"
"Sales" "Lisa" "34967.73"

In Postgres this can solved using the distinct on () operator:
SELECT distinct on (d.depid) d.depid, deptname, e.name, e.salary AS maxsal
FROM employee e
JOIN department d ON d.depid = e.depid
order by d.depid, e.salary desc;
Or you can use a window function:
select depid, deptname, emp_name, salary
from (
SELECT d.depid,
deptname,
e.name as emp_name,
e.salary,
max(e.salary) over (partition by d.depid) AS maxsal
FROM employee e
JOIN department d ON d.depid = e.depid
) t
where salary = maxsal;
Online example: https://rextester.com/MBAF73582

You should have an index:
create index employee_depid_salary_desc_idx on employee(depid, salary desc);
And then use the following query that can use the index:
select
depid,
deptname,
(
select
emplid
from employees
where depid=department.depid
order by salary desc
limit 1
) as max_salaried_emplid
from department;
(A join for retrieving data from the emplid left as an exercise for the reader).

Related

SQL Server : group by with corresponding row values

I need to write a T-SQL group by query for a table with multiple dates and seq columns:
DROP TABLE #temp
CREATE TABLE #temp(
id char(1),
dt DateTime,
seq int)
Insert into #temp values('A','2015-03-31 10:00:00',1)
Insert into #temp values('A','2015-08-31 10:00:00',2)
Insert into #temp values('A','2015-03-31 10:00:00',5)
Insert into #temp values('B','2015-09-01 10:00:00',1)
Insert into #temp values('B','2015-09-01 10:00:00',2)
I want the results to contains only the items A,B with their latest date and the corresponding seq number, like:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 2
B 2015-09-01 10:00:00.000 2
I am trying with (the obviously wrong!):
select id, max(dt) as MaxDate, max(seq) as CorrespondentSeq
from #temp
group by id
which returns:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 5 <-- 5 is wrong
B 2015-09-01 10:00:00.000 2
How can I achieve that?
EDIT
The dt datetime column has duplicated values (exactly same date!)
I am using SQL Server 2005
You can use a ranking subselect to get only the highest ranked entries for an id:
select id, dt, seq
from (
select id, dt, seq, rank() over (partition by id order by dt desc, seq desc) as r
from #temp
) ranked
where r=1;
SELECT ID, DT, SEQ
FROM (
SELECT ID, DT, SEQ, Row_Number()
OVER (PARTITION BY id ORDER BY dt DESC, seq DESC) AS row_number
FROM temp
) cte
WHERE row_number = 1;
Demo : http://www.sqlfiddle.com/#!3/3e3d5/5
With trial and errors maybe I have found a solution, but I'm not completely sure this is correct:
select A.id, B.dt, max(B.seq)
from (select id, max(dt) as maxDt
from #temp
group by id) as A
inner join #temp as B on A.id = B.id AND A.maxDt = B.dt
group by A.id, B.dt
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
If there are duplicate rows, then you also need to specify what the query processor should use to determine which of the duplicates to return. Say you want the lowest value of seq,
Then you could write:
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
and seq = (Select Min(Seq) from #temp
where id = t.Id
and dt = t.dt)

postgres hierarchy - count of child levels and sort by date of children or grandchildren

I would like to know how to write a postgres subquery so that the following table example will output what I need.
id parent_id postdate
1   -1 2015-03-10
2     1 2015-03-11 (child level 1)
3     1 2015-03-12 (child level 1)
4     3 2015-03-13 (child level 2)
5    -1 2015-03-14
6    -1 2015-03-15
7     6 2015-03-16 (child level 1)
If I want to sort all the root ids by child level 1 with a count of children(s) from the parent, the output would be something like this
id count  date
6   2    2015-03-15
1   4    2015-03-10
5   1    2015-03-14
The output is sorted by postdate based on the root's child. The 'date' being outputted is the date of the root's postdate. Even though id#5 has a more recent postdate, the rootid#6's child (id#7) has the most recent postdate because it is being sorted by child's postdate. id#5 doesnt have any children so it just gets placed at the end, sorted by date. The 'count' is the number children(child level 1), grandchildren(child level 2) and itself (root). For instance, id #2,#3,#4 all belong to id#1 so for id#1, the count would be 4.
My current subquery thus far:
SELECT p1.id,count(p1.id),p1.postdate
FROM mytable p1
LEFT JOIN mytable c1 ON c1.parent_id = p1.id AND p1.parent_id = -1
LEFT JOIN mytable c2 ON c2.parent_id = c1.id AND p1.parent_id = -1
GROUP BY p1.id,c1.postdate,p1.postdate
ORDER by c1.postdate DESC,p1.postdate DESC
create table mytable ( id serial primary key, parent_id int references mytable, postdate date );
create index mytable_parent_id_idx on mytable (parent_id);
insert into mytable (id, parent_id, postdate) values (1, null, '2015-03-10');
insert into mytable (id, parent_id, postdate) values (2, 1, '2015-03-11');
insert into mytable (id, parent_id, postdate) values (3, 1, '2015-03-12');
insert into mytable (id, parent_id, postdate) values (4, 3, '2015-03-13');
insert into mytable (id, parent_id, postdate) values (5, null, '2015-03-14');
insert into mytable (id, parent_id, postdate) values (6, null, '2015-03-15');
insert into mytable (id, parent_id, postdate) values (7, 6, '2015-03-16');
with recursive recu as (
select id as parent, id as root, null::date as child_postdate
from mytable
where parent_id is null
union all
select r.parent, mytable.id, mytable.postdate
from recu r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate, c.max_child_date
from mytable m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date
from recu
group by parent
) c on c.parent = m.id
order by c.max_child_date desc nulls last, m.postdate desc;
You'll need a recursive query to count the elements in the subtrees:
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY t.id
;
UPDATE (it appears the OP also wants the maxdate per tree)
-- The same, but also select the postdate
-- --------------------------------------
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
, postdate AS postdate
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
-- , GREATEST(o.postdate,t.postdate) AS postdate
, t.postdate AS postdate
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
, c.maxdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
, MAX(o.postdate) AS maxdate -- and obtain the max()
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY c.maxdate, t.id
;
After looking at everyone's code, I created the subquery I needed. I can use PHP to vary the 'case when' code depending on the user's sort selection. For instance, the code below will sort the root nodes based on child level 1's postdate.
with recursive cte as (
select id as parent, id as root, null::timestamp as child_postdate,0 as depth
from mytable
where parent_id = -1
union all
select r.parent, mytable.id, mytable.postdate,depth+1
from cte r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate
from ssf.dtb_021 m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date,depth
from cte
group by parent,depth
) c on c.parent = m.id
order by
case
when depth=2 then 1
when depth=1 then 2
else 0
end DESC,
c.max_child_date desc nulls last, m.postdate desc;
select
p.id,
(1+c.n) as parent_post_plus_number_of_subposts,
p.postdate
from
table as p
inner join
(
select
parent_id, count(*) as n, max(postdate) as _postdate
from table
group by parent_id
) as c
on p.id = c.parent_id
where p.parent_id = -1
order by c._postdate desc

Join two tables with count from first table

I know there is an obvious answer to this question, but I'm like a noob trying to remember how to write queries. I have the following table structure in Postgresql:
CREATE TABLE public.table1 (
accountid BIGINT NOT NULL,
rpt_start DATE NOT NULL,
rpt_end DATE NOT NULL,
CONSTRAINT table1_pkey PRIMARY KEY(accountid, rpt_start, rpt_end)
)
WITH (oids = false);
CREATE TABLE public.table2 (
customer_id BIGINT NOT NULL,
read VARCHAR(255),
CONSTRAINT table2 PRIMARY KEY(customer_id)
)
WITH (oids = false);
The objective of the query is to display a result set of accountid's, count of accountid's in table1 and read from table2. The join is on table1.accountid = table2.customer_id.
The result set should appear as follows:
accountid count read
1234 2 100
1235 9 110
1236 1 91
The count column reflect the number of rows in table1 for each accountid. The read column is a value from table2 associated with the same accountid.
select accountid, "count", read
from
(
select accountid, count(*) "count"
from table1
group by accountid
) t1
inner join
table2 t2 on t1.accountid = t2.customer_id
order by accountid
SELECT table2.customer_id, COUNT(*), table2.read
FROM table2
LEFT JOIN table1 ON (table2.customer_id = table1.accountid)
GROUP BY table2.customer_id, table2.read
SELECT t2.customer_id, t2.read, COUNT(*) AS the_count
FROM table2 t2
JOIN table1 t1 ON t1.accountid = t2.customer_id
GROUP BY t2.customer_id, t2.read
;

display unique row from two tables

I have two tables (one for quarter one, one for quarter two), each of which contains employees who have bonus in that quarter. Every employee has a unique id in the company.
I want to get all employees who has bonus in either q1 or q2. No duplicate employee is needed. Both Id, and Amount are required.
Below is my solution, I want to find out if there is a better solution.
declare #q1 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
declare #q2 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
insert into #q1
(amount)
select 1
insert into #q1
(amount)
select 2
select * from #q1
insert into #q2
(amount)
select 1
insert into #q2
(amount)
select 11
insert into #q2
(amount)
select 22
select * from #q2
My Solution:
;with both as
(
select EmployeeID
from #q1
union
select EmployeeID
from #q2
)
select a.EmployeeID, a.amount
from #q1 as a
where a.EmployeeID in (select EmployeeID from both)
union all
select b.EmployeeID, b.amount
from #q2 as b
where b.EmployeeID in (select EmployeeID from both) and b.EmployeeID NOT in (select EmployeeID from #q1)
Result:
EmployeeID, Amount
1 1
2 2
3 22
SELECT EmployeeID, Name, SUM(amount) AS TotalBonus
FROM
(SELECT EmployeeID, Name, amount
from #q1
UNION ALL
SELECT EmployeeID, Name, amount
from #q2) AS all
GROUP BY EmployeeID, Name
The subselect UNIONS both tables together. The GROUP BY gives you one row per employee and the SUM means that if someone got lucky in both qs then you get the total. I'm guessing that's the right thing for you.
try this one:
SELECT EmployeeID
FROM EmployeeList
WHERE EmployeeID IN
(SELECT EmployeeID From QuarterOne
UNION
SELECT EmployeeID From QuarterTwo)
OR by using JOIN
SELECT EmployeeID
FROM EmployeeList a INNER JOIN QuarterTwo b
ON a.EmployeeID = b.EmployeeID
INNER JOIN QuarterTwo c
ON a.EmployeeID = c.EmployeeID
This will return all EmployeeID that has record in either quarter.
Try:
SELECT DISTINCT q1.EmployeeID --- Same as q2.EmployeeID thanks to the join
, q1.EmployeeName -- Not defined in OP source.
FROM #q1 AS q1
CROSS JOIN #q2 AS q2
WHERE q1.amount IS NOT NULL
OR q2.amount IS NOT NULL

Alternative solution to display a table in T-SQL

I have a simple employee table that I want to display in a particular order. I want to find out if there are alternative solutions (or better solution) to achieve the same result. The T-SQL script is shown below:
CREATE TABLE Employee(
EmployeeID INT IDENTITY(1,1) NOT NULL,
EmployeeName VARCHAR(255) NULL,
ManagerID INT NULL,
EmployeeType VARCHAR(20) NULL
)
GO
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Brad',5,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('James',3,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Ray',null,'Manager');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Tom',8,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Neil',8,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Rob',5,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Paul',5,'Memeber');
INSERT INTO Employee (EmployeeName, ManagerID, EmployeeType)
VALUES ('Tim',null,'Manager');
GO
SELECT e.EmployeeType, e.EmployeeName AS [Team Member],
(SELECT e2.EmployeeName FROM Employee AS e2 WHERE e2.EmployeeID = e.ManagerID) AS Manager
FROM Employee AS e
ORDER BY e.EmployeeType, e.EmployeeID
The rows are ordered by manger first, then employeeID. My concerns is that in my solution, it is sorted by the EmployeeType column. Would it be better to sort it by ManagerId column instead? Because the EmployeeType could be changed in the future, say from Manager to Team Manager, which might cause different result!
If the criteria for a manager is that column ManangerID is null, you can use a case in the order by to get the managers first.
SELECT e.EmployeeType, e.EmployeeName AS [Team Member],
(SELECT e2.EmployeeName FROM Employee AS e2 WHERE e2.EmployeeID = e.ManagerID) AS Manager
FROM Employee AS e
ORDER BY CASE WHEN E.ManagerID IS NULL THEN 0 ELSE 1 END, e.EmployeeID
If you want to set the sort depending on EmployeeType you can do like this
SELECT e.EmployeeType, e.EmployeeName AS [Team Member],
(SELECT e2.EmployeeName FROM Employee AS e2 WHERE e2.EmployeeID = e.ManagerID) AS Manager
FROM Employee AS e
ORDER BY
CASE EmployeeType
WHEN 'Manager' THEN 0
WHEN 'Memeber' THEN 1
ELSE 2
END, e.EmployeeID
Or you can use a table with EmpType's that define the sort order
CREATE TABLE EmpType(EmployeeType VARCHAR(20) PRIMARY KEY, SortOrder INT)
GO
INSERT INTO EmpType VALUES('Manager', 1)
INSERT INTO EmpType VALUES('Memeber', 2)
SELECT e.EmployeeType, e.EmployeeName AS [Team Member],
(SELECT e2.EmployeeName FROM Employee AS e2 WHERE e2.EmployeeID = e.ManagerID) AS Manager
FROM Employee AS e
LEFT OUTER JOIN EmpType as et
ON e.EmployeeType = et.EmployeeType
ORDER BY et.SortOrder, e.EmployeeID
There are no "universal" solution. In your example, not only Employee type but Manager_id can change too.
If you need to get similar results, you should order to hierarchy level. In this case first will be manager set, then employee. If Employee will have another managerId, it will stay at the same level.