number of lessons for a teacher SQL - postgresql

I have four tables, one is teacher and 3 different kinds of lessons.
My task is to find the teacher that has given the most lessons during a year.
input:
lesson1.id lesson1.date teacher.id
1 2020-12-01 1
2 2020-04-01 1
lesson2.id lesson2.date teacher.id
1 2020-10-01 2
2 2020-05-01 3
lesson3.id lesson3.date teacher.id
1 2020-02-01 1
2 2020-06-01 3
teacher.id teacher.name
1 john
2 scott
3 david
output:
teacher.id teacher.name lessons_given
1 john 3
I tried to join them together with left join on teacher but its not working...
Hope you guys can help me out:)
Thanks

What you are attempting to build is a many-to-many (m:m) between Teacher and Lesson. Instead what you have is many one-to-many relationships. While that works for a small number of lessons (with some difficulty) think about the same requirement with with 50 or 500 or more lessons. What you actually need is 3 tables:
create table lessons( lesson_id integer generated always as identity
, name text
, subject text -- for example
-- other lesson related attributes
);
create table teachers( teacher_id integer generated always as identity
, name text
-- other related teacher attributes
);
create table teacher_lessons( teacher_id integer
, lesson_id integer
, lesson_date date
):
Now you have a structure that can handle any number of either teachers and/or lessons. And are further are available other uses as is, say perhaps students to lessons. See fiddle for current issue.

You could union all the three lesson tables to get a "flat" list of lessons, and then join that on the teachers table:
SELECT t.id, t.name, COUNT(*)
FROM teacher t
JOIN (SELECT teacher_id FROM lesson1 UNION ALL
SELECT teacher_id FROM lesson2 UNION ALL
SELECT teacher_id FROM lesson3) l ON t.id = l.teacher_id

The obvious way to do is to use Belayer's solution.
However if and only if you can not put all the data in a single table for some reason (for example if lesson1, lesson2 and lesson3 all have specific attributes), then another solution would be to use table inheritance.
For instance :
CREATE TABLE lesson (
id INT,
date TIMESTAMP,
teacher INT
);
ALTER TABLE lesson1 INHERIT lesson;
ALTER TABLE lesson2 INHERIT lesson;
ALTER TABLE lesson3 INHERIT lesson;
Now, in order to count the number of lessons each teacher is involved into, you can just use the lesson table:
SELECT teacher.id, teacher.name, COUNT(lesson.id)
FROM teacher
LEFT JOIN lesson ON lesson.teacher = teacher.id
GROUP BY teacher
ORDER BY COUNT(lesson.id) DESC
FETCH FIRST ROW WITH TIES;
You can replace the last line with LIMIT 1 if you are only interested in getting one of the most active teachers, but then your result is no longer deterministic.
Again, please do not use inheritance if there is no need to.

Related

How to get date range between dates from the records in the same table?

I have a table with employment records. It has Employee code, status, and date when table was updated.
Like this:
Employee
Status
Date
001
termed
01/01/2020
001
rehired
02/02/2020
001
termed
03/03/2020
001
rehired
04/04/2021
Problem - I need to get period length when Employee was working for a company, and check if it was less than a year - then don't display that record.
There could be multiple hire-rehire cycles for each Employee. 10-20 is normal.
So, I'm thinking about two separate selects into two tables, and then looking for a closest date from hire in table 1, to termination in table 2. But it seems like overcomplicated idea.
Is there a better way?
Many approaches, but something like this could work:
SELECT
Employee,
SUM(DaysWorked)
FROM
(
SELECT
a1.employee,
IsNull(DateDiff(DD, a1.[Date],
(SELECT TOP 1 [Date] FROM aaa a2 WHERE a2.employee = a1.employee AND a2.[Date] > a1.[Date] and [status] <> 'termed' ORDER BY [Date] )
),DateDiff(DD, a1.[Date], getDate())) as DaysWorked
FROM
aaa a1
WHERE
[Status] = 'termed'
) Totals
GROUP BY
Totals.employee
HAVING SUM(DaysWorked) >= 365
Also using a CROSS JOIN is an option and perhaps more efficient. In this example, replace 'aaa' with the actual table name. The IsNull deals with an employee still working.

postgresql selecting the most representative value

I have a table in which objects have ids and they have names. The ids are correct by definition, the names are almost always correct, but sometimes dirty incoming data causes names to be null or even wrong.
So I do a query like
SELECT id, name, AGGR1(a) as a, AGGR2(b) as b, AGGR3(c) as c
FROM my_table
WHERE d = 3
GROUP BY id
I'd like to have name in the results, but of course the above is wrong. I'd have to group on id, name, in which case what should be one row sometimes becomes more than one -- say, id 2 has names 'John' (correct), 'Jon' (no, but only 1%), or NULL (also a small fraction).
Is there a construct or idiom in postgresql that lets me select what a human looking at the list would say is obviously the consensus name?
(I hear our postgres installation is finally being upgraded soon, if that matters here.)
sample output, in case prose wasn't clear
SELECT id, name, COUNT(id) as c
FROM my_table
WHERE d = 3
GROUP BY id
id name c
2 John 2000
2 Jon 3
2 (NULL) 5
vs
id name c
2 John 2008
You can get the names with
WITH names as (
SELECT
id,
name,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(1) DESC) as rn
FROM my_table
GROUP BY id, name
)
SELECT id, name
FROM names
WHERE rn=1;
and then do your calculations by id only, joining names from this query.

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.

Creating a many to many in postgresql

I have two tables that I need to make a many to many relationship with. The one table we will call inventory is populated via a form. The other table sales is populated by importing CSVs in to the database weekly.
Example tables image
I want to step through the sales table and associate each sale row with a row with the same sku in the inventory table. Here's the kick. I need to associate only the number of sales rows indicated in the Quantity field of each Inventory row.
Example: Example image of linked tables
Now I know I can do this by creating a perl script that steps through the sales table and creates links using the ItemIDUniqueKey field in a loop based on the Quantity field. What I want to know is, is there a way to do this using SQL commands alone? I've read a lot about many to many and I've not found any one doing this.
Assuming tables:
create table a(
item_id integer,
quantity integer,
supplier_id text,
sku text
);
and
create table b(
sku text,
sale_number integer,
item_id integer
);
following query seems to do what you want:
update b b_updated set item_id = (
select item_id
from (select *, sum(quantity) over (partition by sku order by item_id) as sum from a) a
where
a.sku=b_updated.sku and
(a.sum)>
(select count(1) from b b_counted
where
b_counted.sale_number<b_updated.sale_number and
b_counted.sku=b_updated.sku
)
order by a.sum asc limit 1
);

Complex Joins in Postgresql

It's possible I'm stupid, but I've been querying and checking for hours and I can't seem to find the answer to this, so I apologize in advance if the post is redundant... but I can't seem to find its doppelganger.
OK: I have a PostGreSQL db with the following tables:
Key(containing two fields in which I'm interested, ID and Name)
and a second table, Key.
Data contains well... data, sorted by ID. ID is unique, but each Name has multiple ID's. E.G. if Bill enters the building this is ID 1 for Bill. Mary enters the building, ID 2 for Mary, Bill re-enters the building, ID 3 for Bill.
The ID field is in both the Key table, and the DATA table.
What I want to do is... find
The MAX (e.g. last) ID, unique to EACH NAME, and the Data associated with it.
E.g. Bill - Last Login: ID 10. Time: 123UTC Door: West and so on.
So... I'm trying the following query:
SELECT
*
FROM
Data, Key
WHERE
Key.ID = (
SELECT
MAX (ID)
FROM
Key
GROUP BY ID
)
Here's the kicker, there's about... something like 800M items in these tables, so errors are... time consuming. Can anyone help to see if this query is gonna do what I expect?
Thanks so much.
To get the maximum key for each name . . .
select Name, max(ID) as max_id
from data
group by Name;
Join that to your other table.
select *
from key t1
inner join (select Name, max(ID) as max_id
from data
group by Name) t2
on t1.id = t2.max_id