SQL top + count() confusion - tsql

I've got the following table:
patients
id
name
diagnosis_id
What I need to do is get all the patients with N most popular diagnosis.
And I'm getting nothing using this query:
SELECT name FROM patients
WHERE diagnosis_id IN
(SELECT TOP(5) COUNT(diagnosis_id) FROM patients
GROUP BY diagnosis_id
ORDER BY diagnosis_id)
How to fix it?

SELECT name FROM patients
WHERE diagnosis_id IN
(
SELECT TOP(5) diagnosis_id FROM patients
GROUP BY diagnosis_id
ORDER BY COUNT(diagnosis_id) desc
)

A couple things wrong with this:
First, I'd recommend using a common table expression for the "top 5" lookup rather than a subquery - to me, it makes it a bit clearer, and though it doesn't matter here, it would likely perform better in a real work situation.
The main issue though is that you're ordering the top 5 lookup by the diagnosis id rather than the count. You'll need to do ORDER BY COUNT(diagnosis_id) instead.

select p.name from patients p
inner join (
select top 5 diagnosis_id, count(*) as diagnosis_count
from patients
group by diagnosis_id
order by diagnosis_count) t on t.diagnosis_id = p.diagnosis_id

try this:
SELECT name FROM patients
WHERE diagnosis_id IN
(SELECT TOP(5) diagnosis_id FROM patients
GROUP BY diagnosis_id
ORDER BY COUNT(diagnosis_id))

Related

Converting counts inside query result tables to percentages of total

I have a table and want to calculate the percentage of total by store_id which each (category_id, store_id) subtotal represents. My code is below:
WITH
example_table (name, store_id)
AS
(
select name, store_id
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
)
SELECT name, store_id, cast(count(*) as numeric)/(SELECT count(*) FROM example_table)
FROM example_table
GROUP BY name, store_id
ORDER BY name, store_id
This code actually works, as in, it doesn't throw an error, only they're not the results I'm looking for. Here each of the subtotals is divided by the total across both stores and all 16 names. Instead, I want the subtotals divided by their respective store totals or divided by their respective name totals.
I'm wondering how to perform calculations on those subtotals in general.
Thanks in advance,
I believe you need to explore the possibilities of using aggregate functions combined with an OVER(PARTITION BY ...) e.g.
SELECT DISTINCT
name, store_id, store_id_count, name_count
FROM (
select name, store_id
, count(*) over(partition by store_id) as store_id_count
, count(*) over(partition by name) as name_count
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
) AS example_table
When using aggregate function with the over clause you get the wanted counts on each row of the result, and it seems that in this case you need this. Note that select distinct has been used simply to reduce the final number of rows returned, you might still need to use a group by but I am not sure if you do.
Once you have the needed values within the derived table (aliases as example_table) then it should be a simple matter of some arithmetic in the overall select clause.

Getting group by attribute in nested query

I am trying to find the most frequent value in a postgresql table. The problem is that I also want to "group by" in that table and only get the most frequent from the values that have the same name.
So I have the following query:
select name,
(SELECT value FROM table where name=name GROUP BY value ORDER BY COUNT(*) DESC limit 1)
as mfq from table group by name;
So, I am using where name=name, trying to get the outside group by attribute "name", but it doesn't seem to work. Any ideas on how to do it?
Edit: for example in the following table:
name value
a 3
a 3
a 3
b 2
b 2
I want to get:
name value
a 3
b 2
but the above statement gives:
name value
a 3
b 3
instead, since where doesn't work correctly.
There is a dedicated function in PostgreSQL for this case: the mode() ordered-set aggregate:
select name, mode() within group (order by value) mode_value
from table
group by name;
which returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) -- which is the same behavior as with your order by count(*) desc limit 1.
It is available from PostgreSQL 9.4+.
http://rextester.com/GHGJH15037
If you want your query to work, you need table aliases. Table aliases and qualified column names are always a good idea:
select t.name,
(select t2.value
from table t2
where t2.name = t.name
group by t2.value
order by COUNT(*) desc
limit 1
) as mfq
from table t
group by t.name;

Inner join with count and group by

I have 2 tables
Timetable :
pupil_id, staff_id, subject, lesson_id
Staff_info :
staff_id, surname
The timetable table contains 1000s of rows because each student's ID is listed under each period they do.
I want to list all the teacher's names, and the number of lessons they do (count). So I have to do SELECT with DISTINCT.
SELECT DISTINCT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID
However I get the error:
Column 'STAFF.SURNAME' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
This should do what you want:
SELECT s.STAFF_ID, COUNT(tt.LESSON_ID),
s.SURNAME
FROM STAFF s INNER JOIN
TIMETABLE tt
ON tt.STAFF_ID = s.STAFF_ID
GROUP BY s.STAFF_ID, s.SURNAME;
Notes:
You don't need DISTINCT unless there are duplicates in either table. That seems unlikely with this data structure, but if a staff member could have two of the same lesson, you would use COUNT(DISTINCT tt.LESSON_ID).
Table aliases make the query easier to write and to read.
You should include STAFF.SURNAME in the GROUP BY as well as the id.
I have a preference for taking the STAFF_ID column from the table where it is the primary key.
If you wanted staff with no lessons, you would change the INNER JOIN to LEFT JOIN.
SELECT T.STAFF_ID,
T.CNT,
S.SURNAME
FROM STAFF S
JOIN (
SELECT STAFF_ID, CNT = COUNT(/*DISTINCT*/ LESSON_ID)
FROM TIMETABLE
GROUP BY STAFF_ID
) T ON T.STAFF_ID = S.STAFF_ID
Another option:
SELECT DISTINCT si.staff_id, surname, COUNT(lesson_id) OVER(PARTITION BY staff_Id)
FROM Staff_info si
INNER JOIN Timetable tt ON si.staff_id = tt.staff_id
When using Aggregate function(Count, Sum, Min, Max, Avg) in the Select column's list, any other columns that are in the Select column's list but not in a aggregate function, should be mentioned in GROUP BY section too. So you need to change your query as follow and add STAFF.SURNAME to GROUP BY section too:
SELECT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
Distinct is useless also in your scenario. and also as you are going to show the teachers name and Count lessons, you do not need to add TIMETABLE.STAFF_ID to Select's column's list,, but it should remain in Group By section to prevent duplicate names.
SELECT COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
You may need to take a look at this W3C post for more info

Grouping by attributes and counting, postgreSQL

I have written the following code that counts how many instances of each book_id there are in the table soldBooks.
SELECT book_id, sum(counter) AS no_of_books_sold, sum(retail_price) AS generated_revenue
FROM(
SELECT book_id,1 AS counter, retail_price
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN stock ON (shipments.isbn = stock.isbn)
) AS soldBooks
GROUP BY book_id
As you can see, I used a "counter" in order to solve my problem. But I am sure there must be a better, more built in way of achieving the same result! There must be some way to group a table together by a given attribute, and to create a new column displaying the count of EACH attribute. Can somebody share this with me?
Thanks!
SELECT book_id,
COUNT(book_id) AS no_books_sold,
SUM(retail_price) AS gen_rev
FROM shipments
JOIN editions ON (shipments.isbn=editions.isbn)
JOIN stock ON (shipments.isbn=stock.isbn)
GROUP BY book_id

PostgreSQL: Select first row as column inside select

I got 2 tables like Customers and Orders, in table Customers I got columns id, name, in table Orders I got columns id, customer_id, order_date.
Now I need to make one select that will return me each Customer's id, name and the last order_date.
I tried to make like this:
select
Customers.id,
Customers.name,
(select Orders.order_date from Orders where Orders.customer_id = Customer.id order by order_date desc) as last_order_date
from
Customers
But it get the wrong index and takes forever to execute.
Whats the best way to make this select in PostgreSQL?
Thanks in advanced.
If not restricting by customer_id, then the query will end up having to scan the entire orders table.
SELECT c.id
,c.name
,MAX(o.order_date) AS last_order_date
FROM Customers c
LEFT OUTER JOIN Orders o ON (o.customer_id = c.id)
GROUP BY c.id, c.name