count max values in postgresql - postgresql

I have a problem to formulate an sql question in postgresql, hoping to get some help here
I have three tables employee, visitor, and visit. I want to find out which employee (fk_employee_id) who have been responsible for most visit that haven't been checked out.
I want to make an sql question which are returning just the number one result, (by max function maybe?) instead of my current one, which are returning a ranked list (this ranked list doesn't work either if the number one position is shared by two persons)
This is my current sql question:
select visitor.fk_employee_id, count(visitor.fk_employee_id)
From Visit
Inner Join visitor on visit.fk_visitor_id = visitor.visitor_id
WHERE check_out_time IS NULL
group by visitor.fk_employee_id, visitor.fk_employee_id
Limit 1
Anyone now how to do this?
enter image description here

To avoid confusion, I will change the column names to:
visitor table, the FK to employee id : employee_in_charge_id
visit table, the FK to employee id : employee_to_meet_id
From your explanation in comments, you are looking for Employee, who has the most visits which are not check-out .
In the case where, more than 1 employees are having same max number of visits which are not check-out, this query lists all the multiple employees:
SELECT * FROM
(
SELECT
r.employee_in_charge_id,
count(*) cnt,
rank() over (ORDER BY count(*) DESC)
FROM visit v
JOIN visitor r ON v.visitor_id = r.id
WHERE v.check_out_time IS NULL
GROUP BY r.employee_in_charge_id
) a
WHERE rank = 1;
Refer SQLFidle link: http://sqlfiddle.com/#!17/423d9/2
Side Note:
To me, it sounds more correct if employee_in_charge_id is part of visit table, rather than visitor table. My assumption is for each visit, there is 1 employee (A) who is responsible to handle the visit, & the visitor is meeting 1 employee (B). So 1 visitor can make multiple visits, which handle by different employees.
Anyway, my answer above is based on your original schema design.

Assuming a standard n:m implementation like detailed here, this whould be one way to do it:
SELECT fk_employee_id
FROM visit
WHERE check_out_time IS NULL
GROUP BY fk_employee_id
ORDER BY count(*) DESC
LIMIT 1;
Assuming referential integrity, you do not need to include the table visitor in the query at all.
count(*) is a bit faster than count(fk_employee_id) doing the same in this case. (assuming fk_employee_id is NOT NULL). See:
PostgreSQL: running count of rows for a query 'by minute'

Related

Combine count and max in postgresql sql

I have a problem to formulate an sql question in postgresql, hoping to get some help here
I have a table called visitor that contains an column called fk_employee_id, fk_employee_id contains different number between 1-10, example:
1,3,4,6,4,6,7,3,2,1,6,7,6
Now I want to find out which value that is the most frequent in this column (in this case 6) I have made an question that seem to solve my question;
SELECT fk_employee_id
FROM visitor
GROUP BY fk_employee_id
ORDER BY COUNT(fk_employee_id) DESC
LIMIT 1
but this question, doesn't get right if it is two values that are the most frequent one. So instead I try to write a question which contains max function but cant figure out how, anyone now how to do this?
We can use RANK here to slightly modify your current query:
WITH cte AS (
SELECT
fk_employee_id,
RANK() OVER (ORDER BY COUNT(*) DESC) rank
FROM visitor
GROUP BY fk_employee_id
)
SELECT fk_employee_id
FROM cte
WHERE rank = 1;
Demo

Most efficient way to retrieve rows of related data: subquery, or separate query with GROUP BY?

I have a very simple PostgreSQL query to retrieve the latest 50 news articles:
SELECT id, headline, author_name, body
FROM news
ORDER BY publish_date DESC
LIMIT 50
Now I also want to retrieve the latest 10 comments for each article as well. I can think of two ways to accomplish retrieving them and I'm not sure which one is best in the context of PostgreSQL:
Option 1:
Do a subquery directly for the comments in the original query and cast the result to an array:
SELECT headline, author_name, body,
ARRAY(
SELECT id, message, author_name,
FROM news_comments
WHERE news_id = n.id
ORDER BY DATE DESC
LIMIT 10
) AS comments
FROM news n
ORDER BY publish_date DESC
LIMIT 50
Obviously, in this case, application logic would need to be aware of which index in the array is which column, that's no problem.
The one problem I see with the method is not knowing how the query planner would execute it. Would this effectively turn into 51 queries?
Option 2:
Use the original very simple query:
SELECT id, headline, author_name, body
FROM news
ORDER BY publish_date DESC
LIMIT 50
Then via application logic, gather all of the news ids and use those in a separate query, row_number() would have to be used here in order to limit the number of results per news article:
SELECT *
FROM (
SELECT *,
row_number() OVER(
PARTITION BY author_id
ORDER BY author_id DESC
) AS rn
FROM (
SELECT *
FROM news_comment
WHERE news_id IN(123, 456, 789)
) s
) s
where rn <= 10
This approach is obviously more complicated, and I'm not sure if this would have to retrieve all comments for the scoped news articles first, then chop off the ones where the row count is great than 10.
Which option is best? Or is there an even better solution I have overlooked?
For context, this is a news aggregator site I've developed myself, I currently have about 40,000 news articles across several categories, with about 500,000 comments, so I'm looking for the best solution to help me keep growing.
You should investigate execution plan for your statements using at least EXPLAIN ANALYZE. This will provide you with plan chosen by the optimizer while executing the statement itself and giving you back actual run times and other statistics as well.
Another solution would be to use LATERAL subquery to retrieve 10 comments for each news in separate rows, but then again - you need to investigate and compare plans to choose the best approach that works for you:
SELECT
n.id, n.headline, n.uathor_name, n.body,
c.id, c.message, c.author_name
FROM news n
LEFT JOIN LATERAL (
SELECT id, message, author_name
FROM news_comments nc
WHERE n.id = nc.news_id
ORDER BY nc.date DESC
LIMIT 10
) c ON TRUE
ORDER BY publish_date DESC
LIMIT 50
When your query contains LATERAL cross-references for each row retrieved from news LATERAL is evaluated using the connection in WHERE clause. Thus making it a repeated execution and joining the information retrieved from it for each row from your source table news.
This approach would save the time needed for your application logic to deal with arrays coming out from option 1 while not having to issue many separate queries for each news like in option 2 saving you (in this case) time needed to open separate transactions, establish connections, retrieve rows etc...
It would be good to look for performance improvements by creating indexes and looking into planner cost constans and planner method configuration parameters that you can experiment with to understand the choice planner has made. More on the subject here.

Number of courses completed by student..MOODLE

what would be query to find no. of students who have completed their courses in MOODLE?
i am using follwing query :
elect mu.id as student_id,count(mcc.course) as completed_course from mdl_user mu join mdl_course_completions mcc on mcc.userid=mu.id JOIN mdl_course mc on mc.id=mcc.course WHERE mcc.userid = $user_id group by mu.id
I notice that you seem to have limited the results to only a single userid (WHERE mcc.userid = $user_id), so you should probably remove that restriction if you want to get details of more than one student.
You don't really need to join with the mdl_course or mdl_user tables, as there is only one mdl_course_completions table for each student + course combination.
You should, however, add a restriction on the 'timecompleted' field, to make sure it is not null (mdl_course_completions records are created when a student starts on a course, to record the timeenrolled and timestarted; when the course is complete the timecompleted field is set as well).
This should give you:
SELECT userid AS student_id, COUNT(*) AS completed_courses
FROM mdl_course_completions
WHERE timecompleted IS NOT NULL
Which will list the number courses each student has completed.
If, instead (and as stated at the start of the question), you want the number of students who have completed at least one course, then the query would be:
SELECT DISTINCT(userid)
FROM mdl_course_completions
WHERE timecompleted IS NOT NULL

Stuck on a query and need support to improve the performance (if any) for the execution !(PostgreSQL)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am new to PostgreSQL and I am learning by taking few examples!
I am solving a queries in PostgreSQL and I came with few but got stuck at one point!
Given the sample data in the SQLFiddle below, I tried:
--6.find most sold product with with sales_id, product_name,quantity and sum(price)
select array_agg(s.sale_id),p.product_name,s.quantity,sum(s.price)
from products p
join sales s
on p.product_id=s.product_id;
but it fails with:
ERROR: column "p.product_name" must appear in the GROUP BY clause or be used in an aggregate function:
This is the SQL Fiddle with sample data.
I'm using PostgreSQL 9.2.
For all that it looks simple, this is quite an interesting problem.
The unsolved #6
There are two stages to this:
find the most sold product; and
display the required detail on that product
The question is badly written; it fails to specify whether you want
the product with the greatest number of sales, or the greatest
dollar sales value. I will assume the former, but it's easy to adapt the following queries to sort by total price instead.
UPDATE: #user2561626 found the simple solution I mentioned I was sure I was overlooking but couldn't think of: http://sqlfiddle.com/#!12/dbe7c/118 . Use the output of SUM in ORDER BY then LIMIT the result set.
The following are the complicated and roundabout ways I tried because I couldn't think of the simple way:
One way is to use a subquery with an ORDER BY and LIMIT to sort products by total number of sales, then pick the top one. You then join on that inner query to generate the desired product summary. In this case I join on sales twice, once in the inner query and once in the outer where I calculate more detail for just one product. It's possibly more efficient to join on it just once in the inner query and do more work, but that'll involve creating and discarding a bigger result set, so it's the sort of thing you'd tune based on your data distribution.
SELECT
array_agg(s.sale_id) AS sales_ids,
(SELECT p.product_name FROM products p WHERE p.product_id = pp.product_id) AS product_name,
sum(s.quantity) AS total_quantity,
sum(s.price) AS total_price
FROM
(
-- Find the product with the largest number of sales
-- If multiple products have the same sales an arbitrary candidate
-- is selected; extend the ORDER BY if you want to control which
-- one gets picked.
SELECT
s2.product_id, sum(s2.quantity) AS total_quantity
FROM sales s2
GROUP BY s2.product_id
ORDER BY 2 DESC
LIMIT 1
) AS pp
INNER JOIN sales s ON (pp.product_id = s.product_id)
GROUP BY s.product_id, pp.product_id;
I'm honestly not too sure how to phrase this in purely standard SQL (i.e. no LIMIT clause). You can use a CTE or multiple scans in subqueries to find the greatest number of sales and the product Id with the greatest number of sales, but that'll give you multiple results if you have more than one product with equal sales.
I can't help but feel I've totally forgotten the simple and obvious way to do this.
Comments on others:
--1.write the query find the products which are not soled
select *
from products
where product_id not in (select distinct PRODUCT_ID from sales );
Your solution is subtly incorrect, because there's no NOT NULL constraint on product_id in sales. It builds a list then filters on the list, but the list could contain NULL, and 2 NOT IN (1, NULL) is NULL, which in WHERE is treated as false.
It is much better to re-phrase this as WHERE NOT EXISTS (SELECT 1 FROM sales s WHERE s.product_id = products.product_id).
With #2 it's again better to use EXISTS, but PostgreSQL can optimize it into the better form automatically since it's semantically the same; the NULL issue doesn't apply for IN, only NOT IN. So your query is fine.
Question #7 highlights that this is an awful schema. You should never store split-up year/month/day like this; a sale would just have a single timestamptz field, and to get the year you'd use date_trunc or extract. That's not your fault, it's bad table design in the question. The question could also be clearer; I think you've answered it correctly as written, but they don't say whether or not years with no sales should be shown - presumably they assume there aren't any. If there are, you'd have to do a left outer join over a generate_series of dates to zero-fill empty years.
Question #8 is another bad question, frankly. "max price". Um. What? "Maximum price paid per item" would be "price/quantity". "Greatest total individual sale value for each product" would be what you wrote. The question seems to allow for either.
The Query Solution for Question#6 is ::
select array_agg(s.sale_id),p.product_name,sum(s.quantity) as Quantity ,sum(s.price) as Total_Price
from sales s,products p
where s.product_id =p.product_id
group by p.product_id
order by sum(s.quantity) desc limit 1 ;
Comment On Others
Question#9: #Robin Hood's
select s.sale_id,p.product_name,s.quantity,s.price
from products p,sales s
where p.product_id=s.product_id and p.product_name LIKE 'S%';
the 'S%' is a case Sensitive .. so it how it works..
Question#10: #Robin Hood's
Stored Procedure is:
CREATE OR REPLACE FUNCTION get_details()
RETURNS TABLE(sale_id integer,product_name varchar,quantity integer,price int) AS
$BODY$
BEGIN
RETURN QUERY
select s.sale_id,p.product_name,s.quantity,s.price
from products p
join sales s
on p.product_id =s.product_id ;
Exception WHEN no_data_found then
RAISE NOTICE 'No data available';
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
select * from get_details(); then you will get the result.
I need a help over these questions even !! i just want to add these queries to.
--Question#9
--9. select product details with sales_id, product_name,quantity and price those product names are started with letter ā€˜sā€™
--This selects my product details
select s.sale_id,p.product_name,s.quantity,s.price
from products p,sales s
where p.product_id=s.product_id ;
--This is'nt working to find those names which start with s.. is there any other way to solve this..
select s.sale_id,p.product_name,s.quantity,s.price
from products p,sales s
where p.product_id=s.product_id and product_name = 's%';
--10. write the stored procedure for extract all the sales and product details with sales_id, product_name,quantity and price with exception handling and raisint the notices

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.