Remove Duplicates from Employees Self Join - tsql

I have an employees table where all employees are located. I need to extract a subset of the employees with their corresponding supervisor. The table looks similar to this:
Emp_id | F_name | L_name | Superv_id | Superv_flg
---------------------------------------------------
123 john doe 456 N
456 jane doe 278 Y
234 Jack smith 268 N
My query looks like this so far:
with cte as
(
select f_name + ' ' l_name as supervisor, superv_id, emp_id
from [dbo].[SAP_worker_all]
where supvr_flag = 'Y'
)
SELECT distinct w.[first_name]
,w.[last_name]
,cte.supervisor
FROM [dbo].[SAP_worker_all] w
join cte
on w.[superv_id] = cte.[superv_id];
I am getting duplicate values and the supervisors returned are not the correct values. What did I do wrong?

if empID is unique you should not have duplicates
SELECT w.*, s.*
FROM [SAP_worker_all] w
JOIN [SAP_worker_all] s
ON s.[Emp_id] = w.[Superv_id]
AND s.[Superv_flg] = 'Y'

Related

Query to return multiple MAX values with HAVING clause

I want to write a query that will return the name of students who did the most projects with the count of the project. I want the query to return a table like this:
student_name
max_project_count
John Doe
2
Anna Do
2
This is the code I have so far but it's only giving me the 2 column names student_name and count, but not the result.
SELECT s.student_name, COUNT(student_name)
FROM student s
GROUP BY student_name
HAVING COUNT(student_name) = (
SELECT MAX(count)
FROM (SELECT s.student_name, COUNT(*) AS count
FROM student_project k, student s
WHERE s.student_id = k.student_id
GROUP BY student_name) AS foo)
Result I have right now:
student_name
max_project_count
These are the tables I have in my database:
student
student_id
student_name
jd123
John Doe
ad456
Anna Do
js678
Jess Smith
dk789
Daniel Kim
school_project
project_id
project_name
math_1023
Math Comp.
sci_9872
Science Comp.
student_project
student_id
project_id
jd123
math_1023
ad456
math_1023
jd123
sci_9872
ad456
sci_9872
js678
sci_9872
dk789
sci_9872
with projects as (
Select student_id, count(*) as pcount from student_project group by 1),
max_proj as (
Select max(pcount) as max_project_count from projects)
Select
student_name, max_project_count
from student s,projects p,max_proj m
where
s.student_id=p.student_id and pcount=max_project_count

PostgreSQL: Selecting one address from almost but not exactly duplicate rows

I have a big table that I'm trying to join another table to, however the table has entries such as:
--- Name | Address | Priority
----------------------------------------
1 | Jane Doe | 123 Baker St | 1
2 | Jane Doe | 345 Clay Dr | 2
3 | Jeff Boe | 231 Street St| 1
4 | Karen Al | 4232 Elm St | 1
5 | Karen Al | 5632 Pine Ct | 2
What I really want to select is one single address per person. The correct address I want is priority 2. However some of the addresses don't have a priority 2, so I can't join only on priority 2.
I've tried the following test query:
SELECT DISTINCT n.ID, LastName, FirstName, MAX(Address), MAX(Address2), City, State, PostalCode, n.Phone
FROM NormalTable n
JOIN Contracts cn ON n.ID = cn.ID
Which returns the table that I sketched out above, with the same person/sameID but different addresses.
Is there a way to do this in one query? I can think of maybe doing one INSERT statement into my final table where I do all the priority 2 addresses and then ANOTHER INSERT statement for IDs that aren't in the table yet, and use the priority 1 address for those. But I'd much prefer if there's a way to do this all in one go where I end up with only the address I want.
You could choice the address you need joining a subquery for max priority
select m.LastName, m.FirstName, m.Address, m.Address2, m.City, m.State, m.PostalCode, m.Phone
from my_table m
inner join (
select LastName, FirstName, max(priority) max_priority
from my_table
group by LastName, FirstName
) t on t.LastName = m.LastName
AND t.FirstName = m.FirstName
AND t.max_priority = m.priority
I think you want something like this
SELECT DISTINCT (Name), Address, Priority
ORDER BY Priority DESC
How this works is that the DISTINCT (Name) only returns one row per name. The row returned for each Name is the first row. Which will be the one with the highest priority because of the ORDER BY.

how to get employee details along with deptname based on maximum salary group by deptname

I have two tables Employee and Dept as below
EIN | ENAME | Salary | DeptID
1 | Ravi | 500 | 10
2 | Krishna | 1000 | 20
3 | Kiran | 1500 | 20
DeptID | DeptName
10 | IT
20 | Finance
I want the output as employees along with Deptname who is getting maximum salary group by dept
Here is an option which does not use analytic functions:
SELECT e.ENAME,
COALESCE(d.DeptName, 'NA') AS DeptName,
t.max_salary
FROM Employee e
LEFT JOIN Dept d
ON e.DeptID = d.DeptID
INNER JOIN
(
SELECT DeptID, MAX(SALARY) AS max_salary
FROM Employee
GROUP BY DeptId
) t
ON e.DeptID = t.DeptID AND
e.Salary = t.max_salary
Note that in the event that more than one employee should have the maximum salary in a given department, this query would return all ties. In the absence of further information, this seems reasonable without being able to distinguish one employee from another.
Update:
You could also use a WHERE clause containing a correlated subquery instead of joining to a (non-correlated) subquery as I gave above. Here is what that would look like:
SELECT e.ENAME,
COALESCE(d.DeptName, 'NA') AS DeptName,
e.Salary
FROM Employee e
LEFT JOIN Dept d
ON e.DeptID = d.DeptID
WHERE e.Salary = (SELECT MAX(t.Salary) FROM Employee t WHERE t.DeptID = e.DeptID)
However, I would recommend using the first option as it would probably perform better than the correlated subquery option.

PostgreSQL UNION don't merge lines properly

I have 3 tables in a PostgreSQL database:
localities (loc, 12561 rows)
plants (pl, 17052 rows)
specimens or samples (esp, 9211 rows)
pl and esp each have a field loc, to specify where that tagged plant lives, or where that sample (usually a branch with leaves and flowers) came from.
I need a report of the places that have plants or samples, and the number of plants and samples in each place. The best I did up to now is the union of two subqueries, that runs very fast (33 ms to fetch 69 rows):
(select l.id,l.nome,count(pl.id) pls,null esps
from loc l
left join pl on pl.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
group by l.id,l.nome
union
select l.id,l.nome,null pls,count(e.id) esps
from loc l
left join esp e on e.loc = l.id
where l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome)
order by id
The point is, when the same place has both plants and samples, it becomes two distinct lines, like:
11950 | San Martin | | 5 |
11950 | San Martin | 61 | |
Of course what I want is:
11950 | San Martin | 61 | 5 |
Before that, I have tried doing all in one query:
select l.id,l.nome,count(pl.id),count(e.id) esps
from loc l
left join pl on pl.loc = l.id
left join esp e on e.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
or l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome
but it returns a strange repetition (it's multiplying both results and showing the result twice):
11950 | San Martin | 305 | 305 |
I have tried without subqueries, but it was taking about 13 seconds, which is too long.
I created test layout with:
create table localities (id integer, loc_name text);
create table plants (plant_id integer, loc_id integer);
create table samples (sample_id integer, loc_id integer);
insert into localities select x, ('Loc ' || x::text) from generate_series(1, 12561) x ;
insert into plants select x, (random()*12561)::integer from generate_series(1, 17052) x;
insert into samples select x, (random()*12561)::integer from generate_series(1, 9211) x;
The trick is to create an intermediate table from plants and samples but with same structure. Where data doesn't make sense (plant has no sample_id), you add null:
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples
This table has unified structure and you can then aggregate on it (I'm using WITH to make it a bit more readable.):
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples)
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id;
If you need additional data from localities, you can join them on the aggregated table:
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples),
aggregated as (
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id)
select * from aggregated left outer join localities on aggregated.loc_id = localities.id;
This takes 75ms on my laptop all together.
This should be as easy as
select * from (
select
location.*,
(select count(id) from plant where plant.location = location.id) as plants,
(select count(id) from sample where sample.location = location.id) as samples
from location
) subquery
where subquery.plants > 0 or subquery.samples > 0;
id | name | plants | samples
----+------------+--------+---------
1 | San Martin | 2 | 1
2 | Rome | 1 | 2
3 | Dallas | 3 | 1
(3 rows)
This is the database I quickly set up to experiment with:
create table location(id serial primary key, name text);
create table plant(id serial primary key, name text, location integer references location(id));
create table sample(id serial primary key, name text, location integer references location(id));
insert into location (name) values ('San Martin'), ('Rome'), ('Dallas'), ('Ghost Town');
insert into plant (name, location) values ('San Martin Dandelion', 1),('San Martin Camomile', 1), ('Rome Raspberry', 2), ('Dallas Locoweed', 3), ('Dallas Lemongrass', 3), ('Dallas Setaria', 3);
insert into sample (name, location) values ('San Martin Bramble', 1), ('Rome Iris', 2), ('Rome Eucalypt', 2), ('Dallas Dogbane', 3);
tests=# select * from location;
id | name
----+------------
1 | San Martin
2 | Rome
3 | Dallas
4 | Ghost Town
(4 rows)
tests=# select * from plant;
id | name | location
----+----------------------+----------
1 | San Martin Dandelion | 1
2 | San Martin Camomile | 1
3 | Rome Raspberry | 2
4 | Dallas Locoweed | 3
5 | Dallas Lemongrass | 3
6 | Dallas Setaria | 3
(6 rows)
tests=# select * from sample;
id | name | location
----+--------------------+----------
1 | San Martin Bramble | 1
2 | Rome Iris | 2
3 | Rome Eucalypt | 2
4 | Dallas Dogbane | 3
(4 rows)
I didn't test that but I think it could be something like this:
SELECT
l.id,
l.nome,
SUM(CASE WHEN pl.id IS NOT NULL THEN 1 ELSE 0 END) as plants_count,
SUM(CASE WHEN e.id IS NOT NULL THEN 1 ELSE 0 END) as esp_count
FROM loc l
LEFT JOIN pl ON pl.loc = l.id
LEFT JOIN esp e ON e.loc = l.id
GROUP BY l.id,l.nome
The point is to count non null ids of each type.

Update Count column in Postgresql

I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME