Hierarchical count query in postgresql

Hierarchical count query in postgresql - postgresql

I have a simple hierarchical table (analogous to employee/manager) that I want to show counts of subordinates by parent nodes.
Consider this example from this article
WITH RECURSIVE subordinates AS (
SELECT
employee_id,
manager_id,
full_name
FROM
employees
WHERE
employee_id = 2
UNION
SELECT
e.employee_id,
e.manager_id,
e.full_name
FROM
employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SELECT
*
FROM
subordinates;
What i need to do is generate output like this:
id full_name subordinate_count
---- --------- -----------------
1 Alice 42
2 Bob 18
3 Charlie 4
Let's say Alice is the CEO and Charlie is a low level manager.
It seems like you have to hard-code a clause in the first half of the union query to get a hierarchical query to work. I've tried several approaches but nothing is working. Thanks in advance to anyone that can help.

You can try to wrap this query inside an outer query and group over full_name with counts.
example:
select full_name,count(*)
from ("your recursive query") outer_query
group by outer_query.full_name;

Related

postgresql selecting the most representative value

I have a table in which objects have ids and they have names. The ids are correct by definition, the names are almost always correct, but sometimes dirty incoming data causes names to be null or even wrong.
So I do a query like
SELECT id, name, AGGR1(a) as a, AGGR2(b) as b, AGGR3(c) as c
FROM my_table
WHERE d = 3
GROUP BY id
I'd like to have name in the results, but of course the above is wrong. I'd have to group on id, name, in which case what should be one row sometimes becomes more than one -- say, id 2 has names 'John' (correct), 'Jon' (no, but only 1%), or NULL (also a small fraction).
Is there a construct or idiom in postgresql that lets me select what a human looking at the list would say is obviously the consensus name?
(I hear our postgres installation is finally being upgraded soon, if that matters here.)
sample output, in case prose wasn't clear
SELECT id, name, COUNT(id) as c
FROM my_table
WHERE d = 3
GROUP BY id
id name c
2 John 2000
2 Jon 3
2 (NULL) 5
vs
id name c
2 John 2008

You can get the names with
WITH names as (
SELECT
id,
name,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(1) DESC) as rn
FROM my_table
GROUP BY id, name
)
SELECT id, name
FROM names
WHERE rn=1;
and then do your calculations by id only, joining names from this query.

count(1) and distinct behaviour in postgres

Imagine a table:
name age
John 20
Sam 60
Dave 30
John 15
I want to check count of distinct names, I query the table like so:
SELECT COUNT(1), DISTINCT(name)
FROM table
GROUP BY 2
But I get:
ERROR: syntax error at or near "DISTINCT"
Position: 18
But when I use:
SELECT DISTINCT(name), COUNT(1)
FROM table
GROUP BY 1
I do get what's expected:
John 2
Sam 1
Dave 1
Is there a reason why the first query is not working or am I making a mistake somewhere?

The distinct here is not required. GROUP BY means 'group by a distinct set of values'
so
SELECT COUNT(*), name
FROM table
GROUP BY name;
Will give you the result I think you want.

Inner join with count and group by

I have 2 tables
Timetable :
pupil_id, staff_id, subject, lesson_id
Staff_info :
staff_id, surname
The timetable table contains 1000s of rows because each student's ID is listed under each period they do.
I want to list all the teacher's names, and the number of lessons they do (count). So I have to do SELECT with DISTINCT.
SELECT DISTINCT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID
However I get the error:
Column 'STAFF.SURNAME' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.

This should do what you want:
SELECT s.STAFF_ID, COUNT(tt.LESSON_ID),
s.SURNAME
FROM STAFF s INNER JOIN
TIMETABLE tt
ON tt.STAFF_ID = s.STAFF_ID
GROUP BY s.STAFF_ID, s.SURNAME;
Notes:
You don't need DISTINCT unless there are duplicates in either table. That seems unlikely with this data structure, but if a staff member could have two of the same lesson, you would use COUNT(DISTINCT tt.LESSON_ID).
Table aliases make the query easier to write and to read.
You should include STAFF.SURNAME in the GROUP BY as well as the id.
I have a preference for taking the STAFF_ID column from the table where it is the primary key.
If you wanted staff with no lessons, you would change the INNER JOIN to LEFT JOIN.

SELECT T.STAFF_ID,
T.CNT,
S.SURNAME
FROM STAFF S
JOIN (
SELECT STAFF_ID, CNT = COUNT(/*DISTINCT*/ LESSON_ID)
FROM TIMETABLE
GROUP BY STAFF_ID
) T ON T.STAFF_ID = S.STAFF_ID

Another option:
SELECT DISTINCT si.staff_id, surname, COUNT(lesson_id) OVER(PARTITION BY staff_Id)
FROM Staff_info si
INNER JOIN Timetable tt ON si.staff_id = tt.staff_id

When using Aggregate function(Count, Sum, Min, Max, Avg) in the Select column's list, any other columns that are in the Select column's list but not in a aggregate function, should be mentioned in GROUP BY section too. So you need to change your query as follow and add STAFF.SURNAME to GROUP BY section too:
SELECT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
Distinct is useless also in your scenario. and also as you are going to show the teachers name and Count lessons, you do not need to add TIMETABLE.STAFF_ID to Select's column's list,, but it should remain in Group By section to prevent duplicate names.
SELECT COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
You may need to take a look at this W3C post for more info

query gives two of the same results

I have the following SQL query but I got a problem:
When I execute it I got two of the same serial numbers from the "sn" column in the "products" table.
SELECT specifications.productname,
products.sn, specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
WHERE lendings.user_id = ?
EDIT:
lendings table:
user_id product_id
1 1
1 2
2 3
Specifications table:
productname year type sn
name1 2012 1 1234
name2 2011 2 4321
name3 2010 3 3241
products table:
id sn
1 AAAAAAAA1234
2 BBBBBBBB4321
3 CCCCCCCC3241
EDIT2:
SELECT products.id,
specifications.productname,
products.sn,
specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products2.sn LIKE CONCAT(specifications.sn, \'%\') OR products.type = specifications.type
WHERE lendings.user_id = ?

One of your Join on conditions is too slack then
for instance two lendings records pointing to the same product.

Usually, that means you don't have all the necesary join columns present in one of your joins and you are getting a cartesian product. In database terms, this means you are joining to a table and expected to join to a single row, but multiple rows match the criteria, so you are actually joining to more than one row. When this happens, you will get the same row multiple times (product row in your example) in your result.
It would have been better if you posted some test data so this scenario could be confirmed, but since you didn't, I would recommend checking each of your joins to make sure you are not getting multiple rows back for the given products row.
One part of your query I find particularly suspect is this join:
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
You're joining using a LIKE operator, which seems to have a high chance of getting multiple rows.

SQL top + count() confusion

I've got the following table:
patients
id
name
diagnosis_id
What I need to do is get all the patients with N most popular diagnosis.
And I'm getting nothing using this query:
SELECT name FROM patients
WHERE diagnosis_id IN
(SELECT TOP(5) COUNT(diagnosis_id) FROM patients
GROUP BY diagnosis_id
ORDER BY diagnosis_id)
How to fix it?

SELECT name FROM patients
WHERE diagnosis_id IN
(
SELECT TOP(5) diagnosis_id FROM patients
GROUP BY diagnosis_id
ORDER BY COUNT(diagnosis_id) desc
)

A couple things wrong with this:
First, I'd recommend using a common table expression for the "top 5" lookup rather than a subquery - to me, it makes it a bit clearer, and though it doesn't matter here, it would likely perform better in a real work situation.
The main issue though is that you're ordering the top 5 lookup by the diagnosis id rather than the count. You'll need to do ORDER BY COUNT(diagnosis_id) instead.

select p.name from patients p
inner join (
select top 5 diagnosis_id, count(*) as diagnosis_count
from patients
group by diagnosis_id
order by diagnosis_count) t on t.diagnosis_id = p.diagnosis_id

try this:
SELECT name FROM patients
WHERE diagnosis_id IN
(SELECT TOP(5) diagnosis_id FROM patients
GROUP BY diagnosis_id
ORDER BY COUNT(diagnosis_id))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Hierarchical count query in postgresql - postgresql

You can try to wrap this query inside an outer query and group over full_name with counts. example: select full_name,count(*) from ("your recursive query") outer_query group by outer_query.full_name;

Related

postgresql selecting the most representative value

count(1) and distinct behaviour in postgres

Inner join with count and group by

query gives two of the same results

SQL top + count() confusion

Categories

Resources